* This simulation looks at the problem that happens when the variable of interest is correlated with other explanatory variables. By not including the other variables you may bias the results but by including them you may absorb so much of the variation in the explanatory variable that you may be unable to identify the true coefficient of interest.
* Clear the old data from memory
* Set the number of observations to generate to 1000
set obs 1000
set seed 10
* Generate a positive explanatory variable.
gen x = abs(rnormal())*3
* Imagine we are interested in the coefficient on x.
* Now create correlated explanatory variables
gen z1 = x^2 + rnormal()*10
gen z2 = x^1.75 + rnormal()*10
gen z3 = x^.5 + rnormal()*10 + z2/4
gen y = 4*x + .5*z1 + .8*z2 + z3 + rnormal()*100
reg y x
* The problem with near-mutlitcolinearity is that when you do not include other correlated explanatory variables it can heavily bias the one that is included.
reg y x z1 z2 z3
* But then you do include them they can absorb so much of the variation that you have no help of identifying the true effect of the variables of interest (x).
corr x z1 z2 z3
* Let's use the Farrar-Glauber Multicollinearity Tests user written Stata command by Emad Abd Elmessih Shehata.
* The ado file can be found at (http://ideas.repec.org/c/boc/bocode/s457417.html).
* However, it can also be installed via the command ssc install fgtest
fgtest y x z1 z2 z3
* All of the variables appear to be multicollinear (unsuprisingly).
* Thus we can see the Farrar-Glauber test is working well.