## Saturday, June 9, 2012

### Generating 'random' variables drawn from any distribution

* Generating 'random' variables drawn from any distribution

* This post is a response to a question posted by a reader of this blog.

* The questions were:
* 1. How do I draw random variables that are not normally distributed?
* 2. How does set seed work?  See previous post
* 3. How do I ensure that variables are trully random?

* Stata has a number of built in functions for drawing random variables.

* The staples of these are normal distributions and uniform distributions.

* However the is a number of other distributions available.

* To access them type the following:

*. help rnormal

* To use them do the following:
clear
set obs 1000

gen runif=runiform()
gen rbeta=rbeta(1,2)
gen rbinomial=rbinomial(10,.5)
gen rchi2=rchi2(43)
gen rgamma=rgamma(2,5)
gen rhypergeometric=rhypergeometric(10,4,20)
gen rnbinomial=rnbinomial(123,.2)
gen rnormal=rnormal()
gen rpoisson=rpoisson(20)
gen rt=rt(3)

* All of these distributions are easily to draw identical and independently distributed variables.

* However, as far as I know the normal distribution is the only distribution that allow one to easily draw correlated random normal variables.

* In the post tomorrow I will show how to use the correlation in the normal distribution to draw correlated random variables from any distribution.

* In the meantime let us briefly talk about uses known properties of the CDFs to draw any imaginable distribution.

* For instance, Stata does not allow for the drawing of Cauchy distribution random variables.

* if u is a variable that is uniformly distributed between 0 and 1.

* Then z = Theta^-1(u) where Theta^-1 is the inverse of the CDF of any variable, and z is distributed from the same distribution Theta describes.

* So for the Cauchy distribution the CDF = (1/pi)*arctan((x-x0)/gamma)+1/2
* Where x0 is the median and gamma is the parameter in the distribution.

* The inverse is then:  gamma[tan(pi(CDF-1/2))]+x0 = x

* Let us set gamma to = 1

* And let us set the median at 10

* Drawing on the already established random variable runif
gen rcauchy = tan(_pi*(runif-.5)) + 10

* Generally a Cauchy distribution is not thought of as a well behaved because it does not have a defined mean or variance.
sum rcauchy, detail

* However, of course any draw from the distribution has a known mean and variance.

* It is interesting to see what happens to the variance of the rchauchy as the sample size gets bigger.

* The mean seems to behave well, however the variance seems to continuously get larger.

* To answer question 3. It would help to look at the previous post on the set seed command.

* However, to briefly summarize, we should know that Stata does not use "trully random" variables.

* Rather Stata uses specific draws from a predefined population.

* Thus, variables are not random in the sense that they are drawn randomly from the air.

* But in terms of randomness, they are close enough to random for almost anybody who needs random variables.