* There is no proof that an instrumental variables (IV) estimator is unbiased.
* In fact we know that in small enough samples the bias can be large.
* Let's see a simple setup with the endogeneity a result of omitted variable bias.
* Our instrument is valid, though biased because we are using a "small" sample and the instrument is weak.
clear
set obs 1000
gen z = rnormal()
gen w = rnormal()
gen x = z*.3 + rnormal() + w
gen u = rnormal()
gen y = x + w + u*5
reg y x
ivreg y (x=z)
* IVreg includes the true estimate in the confidence interval though the interval is quite wide.
* This is largely the result of z being a weak instrument for x
reg x z
* There is a conjecture that the IV estimator is biased in finite samples.
* In order to examine this bias we will run a monte carlo
* simulation to see how biased our estimates are at each level.
cap program drop weakreg
program weakreg, rclass
clear
set obs `1'
* The first argument of the weakreg command is the number of
* observations to draw.
gen z = rnormal()
gen w = rnormal()
gen x = z*.2 + rnormal() + w
gen u = rnormal()
gen y = x + w + u*5
reg y x
return scalar reg_x = _b[x]
return scalar reg_se_x = _se[x]
ivreg y (x=z)
return scalar ivreg_x = _b[x]
return scalar iv_se_x = _se[x]
end
* With only 100 observations
simulate reg_x=r(reg_x) reg_se_x=r(reg_se_x) ///
ivreg_x=r(ivreg_x) iv_se_x=r(iv_se_x) ///
, rep(10000): weakreg 100
sum
/*
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
reg_x | 10000 1.484123 .3619725 .1481085 2.739473
reg_se_x | 10000 .357267 .0359624 .2261475 .5157685
ivreg_x | 10000 14.45544 846.8431 -1318.675 78604.67
iv_se_x | 10000 208878.2 1.93e+07 .6884335 1.92e+09
*/
* We can see the mean standard error estimate is much
* larger than the standard deviation of the estimates.
* In addition, the apparent bias of the IV is huge!
* Thus OLS is the better estimator in this case.
simulate reg_x=r(reg_x) reg_se_x=r(reg_se_x) ///
ivreg_x=r(ivreg_x) iv_se_x=r(iv_se_x) ///
, rep(10000): weakreg 300
sum
/*
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
reg_x | 10000 1.48824 .2065498 .6367978 2.236391
reg_se_x | 10000 .204807 .0117558 .1625977 .2537656
ivreg_x | 10000 .883504 16.43395 -489.2355 1346.839
iv_se_x | 10000 103.928 7065.172 .5418385 696229.8
*/
* Increasing the sample size to 300 vastly improves the IV estimator.
* Though it is now downward biased.
simulate reg_x=r(reg_x) reg_se_x=r(reg_se_x) ///
ivreg_x=r(ivreg_x) iv_se_x=r(iv_se_x) ///
, rep(10000): weakreg 500
sum
/*
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
reg_x | 10000 1.490552 .159073 .9184232 2.088732
reg_se_x | 10000 .1584161 .0071414 .1326629 .1896385
ivreg_x | 10000 .841985 4.533252 -337.8417 41.04016
iv_se_x | 10000 10.02729 672.1545 .5350561 66082.69
*/
* Increasing the sample size to 500 does not seem to improve the bias
* of the IV estimator. Though the standard errors on average seem to be
* getting closer to the standard deviations of the estimators.
simulate reg_x=r(reg_x) reg_se_x=r(reg_se_x) ///
ivreg_x=r(ivreg_x) iv_se_x=r(iv_se_x) ///
, rep(10000): weakreg 750
sum
/*
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
reg_x | 10000 1.491437 .1291091 1.012212 1.998231
reg_se_x | 10000 .1292925 .0047437 .1129745 .1498492
ivreg_x | 10000 .9714284 1.087322 -12.33198 7.718197
iv_se_x | 10000 1.080065 .8625042 .4623444 39.95131
*/
* Increasing the sample size to 750 dramatically improves the IV estimator.
* It is still slightly biased but that is not a huge problem.
* Now the standard errors are working very well as well.
* The only problem would be the IV estimator still has such large variation
* that both the OLS estimator and the 0 coefficient would be included in
* most confidence intervals.
simulate reg_x=r(reg_x) reg_se_x=r(reg_se_x) ///
ivreg_x=r(ivreg_x) iv_se_x=r(iv_se_x) ///
, rep(10000): weakreg 1000
sum
/*
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
reg_x | 10000 1.488341 .1107187 1.064637 1.938462
reg_se_x | 10000 .111871 .0035312 .1001198 .1255203
ivreg_x | 10000 .9691499 .8924782 -5.659174 5.94314
iv_se_x | 10000 .8812981 .2897863 .44236 6.674775
*/
* We can see that our primary gains from more observations is a smaller
* standard error.
Formatted By Econometrics by Simulation
Tuesday, October 8, 2013
Finite Sample Properties of IV - Weak Instrument Bias
Subscribe to:
Post Comments (Atom)
Hi
ReplyDeleteas somebody who regularly consumes cross-country empirical research based on IV regressions with samples of 50-100, I found this quite alarming. But then most of the papers I read will be panel, with T of let's say 50.
this question may reveal shocking ignorance, but if the number of observations in a panel (N*T) is say 100 * 50, does that translate into a (very) safe sample size?
Why do you use -ivreg- instead of -ivregress-?
ReplyDelete