Tuesday, May 1, 2012

The Many Forms of Instrumental Variables

* The many forms of the Instrumental Variables estimator
* Stata Simulation and Estimation

* Imagine that you have the endogenous variable, years of 
* education, and you want to estimate the returns of education 
* in terms of earnings.  The problem is that diligence is 
* going to be correlated with education and it is also going to 
* be correlated with earnings. But diligence is unobservable 
* in your data set and perhaps from a feasibility perspective, 
* in life.

* Ideally you would like to have an experiment where you "give" 
some people more education than others.  We have the next best 
* thing.  Let's imagine that we have education scholarship 
* lottery data which gives students one year of free education 
* upon completing that year and it is awarded completely RANDOMLY 
* among all potential students.

* The randomness is year in this application.  Formally:
* Y=XB+U
* The problem corr(X,U)!=0
* So, rather than solving the standard way:
* X'Y=X'XB+X'U
* B=(X'X)^-1 X'Y-(X'X)^-1 X'U  
* E(B)=B=E((X'X)^-1 X'Y) + 0  ---- because E(X'U)=0, OLS assumption

* We, use:
* Z'Y=Z'XB+Z'U
* B=(Z'X)^-1 Z'Y-(Z'X)^-1 Z'U  
* E(B)=B=E((Z'X)^-1 Z'Y) + 0  ---- because E(Z'U)=0, IV assumption
* It is not obvious from this formation, but IV is not an unbiased estimator,
* just a consistent one.

* For more details see Wooldridge, Book 1 Chapter 15

* Standard deviation of u
gl sdu=5

* Average effect of z on w
gl gamma11=4
gl gamma12=2
gl gamma21=4.4
gl gamma22=0

* Average effect of w on z
gl beta1=1
gl beta2=1

* Standard deviation of z
gl zsd1=1
gl zsd2=1

* Specify the correlation between the explanatory variables x1 and x2 and the error.
gl rho12=.5
gl rho13=.75

gl sdv1 = 1gl sdv2 = 1

drop _all

set obs 10000

gen rv1=rnormal()
gen rv2=rnormal()
gen rv3=rnormal()

gen u =(rv1+rv2+rv3)*$sdu
gen v1 =$sdv1*(rv1*$rho12 + rv2*(1-$rho12)^.5)
gen v2 =$sdv2*(rv1*$rho13 + rv3*(1-$rho13)^.5)

gen z1 = rnormal()*$zsd1
gen z2 = rnormal()*$zsd2
gen x1=z1*$gamma11 + z2*$gamma12 + v1
gen x2=z1*$gamma21 + z2*$gamma22 + v2
gen y = x1*$beta1 + x2*$beta2 + u

* The most straight forward IV estimator is IV reg
ivreg y (x*=z*)

* An equivalent estimator is 2SLS
reg x1 z*
predict x1hat

reg x2 z*
predict x2hat

reg y x1hat x2hat
* The second stage errors need be adjusted for the first stage 
* being estimated.

* Also, an equivalent estimator is another 2 stage estimator 
* called the control function.
reg x1 z*
predict v1hat, residual

reg x2 z*
predict v2hat, residual

reg y x1 x2 v1hat v2hat