## Tuesday, October 8, 2013

### Finite Sample Properties of IV - Weak Instrument Bias

```* There is no proof that an instrumental variables (IV) estimator is unbiased.

* In fact we know that in small enough samples the bias can be large.

* Let's see a simple setup with the endogeneity a result of omitted variable bias.

* Our instrument is valid, though biased because we are using a "small" sample and the instrument is weak.

clear
set obs 1000

gen z = rnormal()

gen w = rnormal()

gen x = z*.3 + rnormal() + w

gen u = rnormal()

gen y = x + w + u*5

reg y x

ivreg y (x=z)
* IVreg includes the true estimate in the confidence interval though the interval is quite wide.

* This is largely the result of z being a weak instrument for x
reg x z

* There is a conjecture that the IV estimator is biased in finite samples.

* In order to examine this bias we will run a monte carlo
*  simulation to see how biased our estimates are at each level.

cap program drop weakreg
program weakreg, rclass
clear
set obs `1'
* The first argument of the weakreg command is the number of
*  observations to draw.
gen z = rnormal()
gen w = rnormal()
gen x = z*.2 + rnormal() + w
gen u = rnormal()
gen y = x + w + u*5
reg y x
return scalar reg_x = _b[x]
return scalar reg_se_x = _se[x]
ivreg y (x=z)
return scalar ivreg_x = _b[x]
return scalar iv_se_x = _se[x]
end

* With only 100 observations
simulate reg_x=r(reg_x)     reg_se_x=r(reg_se_x)  ///
ivreg_x=r(ivreg_x) iv_se_x=r(iv_se_x)    ///
, rep(10000): weakreg 100
sum
/*
Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
reg_x |     10000    1.484123    .3619725   .1481085   2.739473
reg_se_x |     10000     .357267    .0359624   .2261475   .5157685
ivreg_x |     10000    14.45544    846.8431  -1318.675   78604.67
iv_se_x |     10000    208878.2    1.93e+07   .6884335   1.92e+09
*/

* We can see the mean standard error estimate is much
* larger than the standard deviation of the estimates.

* In addition, the apparent bias of the IV is huge!
* Thus OLS is the better estimator in this case.

simulate reg_x=r(reg_x)     reg_se_x=r(reg_se_x)  ///
ivreg_x=r(ivreg_x) iv_se_x=r(iv_se_x)    ///
, rep(10000): weakreg 300
sum

/*

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
reg_x |     10000     1.48824    .2065498   .6367978   2.236391
reg_se_x |     10000     .204807    .0117558   .1625977   .2537656
ivreg_x |     10000     .883504    16.43395  -489.2355   1346.839
iv_se_x |     10000     103.928    7065.172   .5418385   696229.8

*/
* Increasing the sample size to 300 vastly improves the IV estimator.
* Though it is now downward biased.

simulate reg_x=r(reg_x)     reg_se_x=r(reg_se_x)  ///
ivreg_x=r(ivreg_x) iv_se_x=r(iv_se_x)    ///
, rep(10000): weakreg 500
sum

/*
Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
reg_x |     10000    1.490552     .159073   .9184232   2.088732
reg_se_x |     10000    .1584161    .0071414   .1326629   .1896385
ivreg_x |     10000     .841985    4.533252  -337.8417   41.04016
iv_se_x |     10000    10.02729    672.1545   .5350561   66082.69
*/
* Increasing the sample size to 500 does not seem to improve the bias
* of the IV estimator. Though the standard errors on average seem to be
* getting closer to the standard deviations of the estimators.

simulate reg_x=r(reg_x)     reg_se_x=r(reg_se_x)  ///
ivreg_x=r(ivreg_x) iv_se_x=r(iv_se_x)    ///
, rep(10000): weakreg 750
sum

/*
Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
reg_x |     10000    1.491437    .1291091   1.012212   1.998231
reg_se_x |     10000    .1292925    .0047437   .1129745   .1498492
ivreg_x |     10000    .9714284    1.087322  -12.33198   7.718197
iv_se_x |     10000    1.080065    .8625042   .4623444   39.95131
*/

* Increasing the sample size to 750 dramatically improves the IV estimator.
* It is still slightly biased but that is not a huge problem.
* Now the standard errors are working very well as well.
* The only problem would be the IV estimator still has such large variation
* that both the OLS estimator and the 0 coefficient would be included in
* most confidence intervals.

simulate reg_x=r(reg_x)     reg_se_x=r(reg_se_x)  ///
ivreg_x=r(ivreg_x) iv_se_x=r(iv_se_x)    ///
, rep(10000): weakreg 1000
sum

/*
Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
reg_x |     10000    1.488341    .1107187   1.064637   1.938462
reg_se_x |     10000     .111871    .0035312   .1001198   .1255203
ivreg_x |     10000    .9691499    .8924782  -5.659174    5.94314
iv_se_x |     10000    .8812981    .2897863     .44236   6.674775
*/

* We can see that our primary gains from more observations is a smaller
* standard error.

Formatted By Econometrics by Simulation
```

1. 2. 