Wednesday, November 12, 2014

Convergence of a Series

Let us explore using simulation some of the concepts of basic asymptotic theory as presented in Wooldridge 2012, Chapter 3.

Definition: A sequence of nonrandom numbers {a_N:N=1,2,...} converges to a if for all epsilon>0 there exists N_epsilon such that N>N_epsilon, then
$$|a_N - a|<\epsilon$$

Paraphrase: A sequence converges on point a, if you can choose any positive number (any epsilon greater than zero) and you will find some point in the series after which the difference between the series and the point it converges to a will always be smaller than epsilon.

Let us try to make this more concrete.

Imagine the series:

Does this series converge on a value a?

Let us pick an epsilon, say 100.
## R ---------------------------
f = function(x) x^2
N = 1:30
plot(N, f(N), type='l', col='lightblue', lwd=4)
## -----------------------------
We can see that there is no a which could exist for which our series seems to converge within the positive bound epsilon=100.

This concergence might have been obvious. Let us try a different series.
## R ---------------------------
g = function(x) 10*x^(-99/100) * sin(x) + pi
N = 1:100
plot(N, g(N), type='l', lwd=2)
## -----------------------------

The series g demonstrates a common convergence pattern. That is you can alway pick a epsilon after which the difference between N and the convergence point is less then epsilon. In this case the series converges on pi. Let us play around with a few epsilons to see if our hypothesis that the series converges seems to hold true.

## R ---------------------------
e1 = 5
abline(h=pi+e1, lwd=2, col='darkred')
abline(h=pi-e1, lwd=2, col='darkred')
## -----------------------------
We can see that at N<10 there are some points which are not bounded by |N-pi|<e1. However by the time we get to N>20, all points are easily bound (at least those we have plotted).

How about a harder epsilon?
## R ---------------------------
e2 = .5
abline(h=pi+e2, lwd=2, col='darkblue')
abline(h=pi-e2, lwd=2, col='darkblue')
## -----------------------------

This epsilon is outside our bounds at 20 but by the time we have reached 40 the series does not reach outside our bounds.

## R ---------------------------
e3 = .1
abline(h=pi+e3, lwd=2, col=rgb(.3,.6,1,.5))
abline(h=pi-e3, lwd=2, col=rgb(.3,.6,1,.5))
## -----------------------------
We can see that with e3=.1 there is no point on the graph we have plotted which is bounded by our epsilon. However plotting the graph at a longer series time frame will give us a number N after which g is bounded.

## R ---------------------------
N = 100:200
plot(N, g(N), type='l', lwd=2)
e3 = .1
abline(h=pi+e3, lwd=2, col=rgb(.3,.6,1,.5))
abline(h=pi-e3, lwd=2, col=rgb(.3,.6,1,.5))
N= 1:30
plot(N, f(N), type='l', col='paste(readLines(paste0('',
              fdate, '/'), warn="F"), collapse='');lightblue', lwd=4)
## -----------------------------

We can see that the graph clearly shows that at our small value of epsilon (.1) we are bounded around pi as early as N=110. For all other even smaller epsilons there will always be an N for which
this is true (even if our computers do not have sufficient accuracy to show this to be the case).

However, since we are working with an infinite series, this also means that there is an infinite area of the graph for which we can never plot. We therefore cannot use graphs to prove that series converge. In theory a series can always break the bounds just beyond the area of inspection. The switch function
$$f(N) = 1[N>100]$$
is an extreme example in which the series seems to converge at 0 up to N<100 but at N=100 or greater the series now converges on 1.

Other series converge without oscillating:

Some more challenging series which also converge include:

Monday, November 3, 2014

Make your own hotly criticised circle graph!!!

Make your own hotly criticized circle graph! The recent critical post on R-bloggers has got me wanting to make my own bubble graphs despite the popular criticism of the graph. After some attempts I have produced something that looks similar but needs some more work.

I would like to say briefly that R-blogger post by while interesting and reasonable seemed to miss the point. The graphic was not supposed to convey massive amounts of information but simply to make obvious the scope and degree of how the current Ebola epidemic compares with other epidemics.

There are many good reasons to critique this graph from an epidemiological perspective (forthcoming), yet from a graphing perspective I think  this graph format is extremely efficient at conveying memorable information while the graphs assembled by PirateGrunt were more detailed they were also entirely forgettable unfortunately.

Here are a couple of examples. The first is with meaningless simulated data demonstrating how large differences in scale may appear. The second is actually using HIV prevalence data to give a graphical representation of how such rates have changed over the years. You can fine the code as well as the data on github.

Table 1: Demonstrates how scale can be communicated by size of circle

Table 2: Demonstrates the relative stability of AIDS in terms of people ill. Top number is  year while lower number is millions of cases.