* This might seem at first like a lot of mumbo jumbo but this result is easily demonstrable and extremely useful.
* To show the central limit theorem (CLT) in action we will use Stata's simulate command.
* First we need to program a "program" to simulate
cap program drop randdraw
program define randdraw
clear
* This tells stata to call the first argument in the command randdraw N and the second distribution
args N distribution
* The first arugment of the
set obs `N'
* Generate the variable
gen x = `distribution'
sum x
end
* It is not neccessary that the mean be equal to zero but it might be useful for comparison between different distributions
randdraw 1000 "rpoisson(10)-10"
* This command drew 1000 draws from the poisson distribution less 10. Which has a E(poisson(10)-10)=E(poisson(10))-10=10-10=0
simulate mean_x =r(mean), reps(1000): randdraw 1000 "rpoisson(10)-10"
hist mean_x, kden
* Looks pretty normal, but that is not so surprising because the poisson distribution begins to look pretty normal once k gets as large as 10.
* So how exactly does the central limit relate to all of this and why is it so cool?
* The argument is that as N the number of draws going into each mean calculation gets large, the distribution of the means begins to look normal.
* Lets first try something as far from normal as possible (not a technical definition)
simulate mean_x =r(mean), reps(1000): randdraw 10 "rbinomial(1,.05)-.05"
* This is the mean of a bernoulli distribution (weighted to .05) either 0 or 1 with each mean representing 10 draws.
hist mean_x
* We can see that this is clearly not normally distributed. However what happens when we increase the draws to 100 rather than 10?
simulate mean_x =r(mean), reps(1000): randdraw 100 "rbinomial(1,.05)-.05"
hist mean_x
* Already things are beginning to look a lot more like a normal distribution. Obviously it is not normal because the potential outcomes are discrete.
* Perhaps it will look better at 1000 draws.
simulate mean_x =r(mean), reps(1000): randdraw 1000 "rbinomial(1,.05)-.05"
hist mean_x
* This is what is meant by asymptotically converging on a normal distribution. As N gets large the distribution of the means begins to look more like a normal distribution (so long as the variance of x is finite).
* Crazy, huh?
* Perhaps it will look better at 10000 draws.
simulate mean_x =r(mean), reps(1000): randdraw 10000 "rbinomial(1,.05)-.05"
hist mean_x, kden
* Still not looking perfect.
simulate mean_x =r(mean), reps(1000): randdraw 100000 "rbinomial(1,.05)-.05"
hist mean_x, kden
* Now things are looking very close to normal. Of course there will be some randomness in the 1000 draws which have enough inherent variation to make even a normal variable look non-normal.
* This might take a little while but let's increase the draws one more time.
simulate mean_x =r(mean), reps(1000): randdraw 1000000 "rbinomial(1,.05)-.05"
hist mean_x, kden
* So using a x variable with are bernoulli(.05) leads to a convergence in means that looks normal not until around 100,000 draws. This might sound bad but it really is not.
* Because, usually we don't need the individually drawn means TO BE normal just to be approximated by a normal distribution as well as their standardized values as being approximated by a chi-squared distribution for null tests.
* In addition, a bernoulli(.05) is an extreme distribution with a large skew and discrete outcomes. Most distributions, even binary and count distributions are going to have properties which are more similar to a normal distribution causing more rapid convergence on normality.
* Even a .5 bernoulli begins to look pretty normal at 1000 draws.
simulate mean_x =r(mean), reps(1000): randdraw 1000 "rbinomial(1,.5)-.5"
hist mean_x, kden
* Let alone distributions such as uniform
simulate mean_x =r(mean), reps(1000): randdraw 1000 "runiform()-.5"
hist mean_x, kden
* Compare this to a normal distribution for which the mean by definition normally distributed (because Normal + Normal ~ Normal)
simulate mean_x =r(mean), reps(1000): randdraw 1000 "rnormal()"
hist mean_x, kden
Great program! Thank you!
ReplyDelete