* This might seem at first like a lot of mumbo jumbo but this result is easily demonstrable and extremely useful.

* To show the central limit theorem (CLT) in action we will use Stata's simulate command.

* First we need to program a "program" to simulate

cap program drop randdraw

program define randdraw

clear

* This tells stata to call the first argument in the command randdraw N and the second distribution

args N distribution

* The first arugment of the

set obs `N'

* Generate the variable

gen x = `distribution'

sum x

end

* It is not neccessary that the mean be equal to zero but it might be useful for comparison between different distributions

randdraw 1000 "rpoisson(10)-10"

* This command drew 1000 draws from the poisson distribution less 10. Which has a E(poisson(10)-10)=E(poisson(10))-10=10-10=0

simulate mean_x =r(mean), reps(1000): randdraw 1000 "rpoisson(10)-10"

hist mean_x, kden

* Looks pretty normal, but that is not so surprising because the poisson distribution begins to look pretty normal once k gets as large as 10.

* So how exactly does the central limit relate to all of this and why is it so cool?

* The argument is that as N the number of draws going into each mean calculation gets large, the distribution of the means begins to look normal.

* Lets first try something as far from normal as possible (not a technical definition)

simulate mean_x =r(mean), reps(1000): randdraw 10 "rbinomial(1,.05)-.05"

* This is the mean of a bernoulli distribution (weighted to .05) either 0 or 1 with each mean representing 10 draws.

hist mean_x

* We can see that this is clearly not normally distributed. However what happens when we increase the draws to 100 rather than 10?

simulate mean_x =r(mean), reps(1000): randdraw 100 "rbinomial(1,.05)-.05"

hist mean_x

* Already things are beginning to look a lot more like a normal distribution. Obviously it is not normal because the potential outcomes are discrete.

* Perhaps it will look better at 1000 draws.

simulate mean_x =r(mean), reps(1000): randdraw 1000 "rbinomial(1,.05)-.05"

hist mean_x

* This is what is meant by asymptotically converging on a normal distribution. As N gets large the distribution of the means begins to look more like a normal distribution (so long as the variance of x is finite).

* Crazy, huh?

* Perhaps it will look better at 10000 draws.

simulate mean_x =r(mean), reps(1000): randdraw 10000 "rbinomial(1,.05)-.05"

hist mean_x, kden

* Still not looking perfect.

simulate mean_x =r(mean), reps(1000): randdraw 100000 "rbinomial(1,.05)-.05"

hist mean_x, kden

* Now things are looking very close to normal. Of course there will be some randomness in the 1000 draws which have enough inherent variation to make even a normal variable look non-normal.

* This might take a little while but let's increase the draws one more time.

simulate mean_x =r(mean), reps(1000): randdraw 1000000 "rbinomial(1,.05)-.05"

hist mean_x, kden

* So using a x variable with are bernoulli(.05) leads to a convergence in means that looks normal not until around 100,000 draws. This might sound bad but it really is not.

* Because, usually we don't need the individually drawn means TO BE normal just to be approximated by a normal distribution as well as their standardized values as being approximated by a chi-squared distribution for null tests.

* In addition, a bernoulli(.05) is an extreme distribution with a large skew and discrete outcomes. Most distributions, even binary and count distributions are going to have properties which are more similar to a normal distribution causing more rapid convergence on normality.

* Even a .5 bernoulli begins to look pretty normal at 1000 draws.

simulate mean_x =r(mean), reps(1000): randdraw 1000 "rbinomial(1,.5)-.5"

hist mean_x, kden

* Let alone distributions such as uniform

simulate mean_x =r(mean), reps(1000): randdraw 1000 "runiform()-.5"

hist mean_x, kden

* Compare this to a normal distribution for which the mean by definition normally distributed (because Normal + Normal ~ Normal)

simulate mean_x =r(mean), reps(1000): randdraw 1000 "rnormal()"

hist mean_x, kden

Great program! Thank you!

ReplyDelete