Saturday, June 9, 2012
Generating 'random' variables drawn from any distribution
* Generating 'random' variables drawn from any distribution
* This post is a response to a question posted by a reader of this blog.
* The questions were:
* 1. How do I draw random variables that are not normally distributed?
* 2. How does set seed work? See previous post
* 3. How do I ensure that variables are trully random?
* To answer 1:
* Stata has a number of built in functions for drawing random variables.
* The staples of these are normal distributions and uniform distributions.
* However the is a number of other distributions available.
* To access them type the following:
*. help rnormal
* To use them do the following:
set obs 1000
* All of these distributions are easily to draw identical and independently distributed variables.
* However, as far as I know the normal distribution is the only distribution that allow one to easily draw correlated random normal variables.
* In the post tomorrow I will show how to use the correlation in the normal distribution to draw correlated random variables from any distribution.
* In the meantime let us briefly talk about uses known properties of the CDFs to draw any imaginable distribution.
* For instance, Stata does not allow for the drawing of Cauchy distribution random variables.
* if u is a variable that is uniformly distributed between 0 and 1.
* Then z = Theta^-1(u) where Theta^-1 is the inverse of the CDF of any variable, and z is distributed from the same distribution Theta describes.
* So for the Cauchy distribution the CDF = (1/pi)*arctan((x-x0)/gamma)+1/2
* Where x0 is the median and gamma is the parameter in the distribution.
* The inverse is then: gamma[tan(pi(CDF-1/2))]+x0 = x
* Let us set gamma to = 1
* And let us set the median at 10
* Drawing on the already established random variable runif
gen rcauchy = tan(_pi*(runif-.5)) + 10
* Generally a Cauchy distribution is not thought of as a well behaved because it does not have a defined mean or variance.
sum rcauchy, detail
* However, of course any draw from the distribution has a known mean and variance.
* It is interesting to see what happens to the variance of the rchauchy as the sample size gets bigger.
* The mean seems to behave well, however the variance seems to continuously get larger.
* To answer question 3. It would help to look at the previous post on the set seed command.
* However, to briefly summarize, we should know that Stata does not use "trully random" variables.
* Rather Stata uses specific draws from a predefined population.
* Thus, variables are not random in the sense that they are drawn randomly from the air.
* But in terms of randomness, they are close enough to random for almost anybody who needs random variables.