Econometrics By Simulation: Measurement Error

Friday, November 16, 2012

Measurement Error

Original Code

* It is well known that measurement error causes attenuation bias in regression analysis estimators.

* This fact appeared in the literature as early at Spearman (1904).

* Attenuation bias also known as regression dilution is the phenomenon where coefficient estimates are biased towards zero.

* We can understand this phenomona very intuitively by thinking about what measurement error means.

* It means we do not have a very good measure of what something is.

* Imagine we measure people's weight and height just by watching that person walk by.

* We will assume we know the average weight of people and we make sure our average guess is that weight.

* However, unless we are trained our guess will probably miss the mark at a large frequency.

* Thus, if we want to use our guesses of weight and height as predictors of that person's athletic ability then our estimates will suffer from potentially two problems as a result of our measurement method.

* There is the one previously mentioned, attenuation bias caused from our measures not being exact.

* How, are we going to identify the effect of 5 more pounds or 3 extra inches on athletic performance if we are incapable of accurately gaging the difference between 5 pounds or 3 inches?

* The second potential source of problems is that our errors in measurement might be correlated with our unconsious assessment of the subject's athletic ability.

* That is, perhaps subjects that appear more athletic, we will guess as being taller or weighing less.

* This second issue is much more problematic than attentuation bias.

* It will cause a correlation between our errors and our explanatory variables which causes bias of an unknown form.

* In order to understand why attenuation bias exists remember Beta=cov(x,Y)/var(x) and that the OLS coefficient of BetaHat = cov(X,Y)/var(X).

* Where the observable X = x + v.

* If we assume the error term v is uncorrelated with the outcome variable Y then cov(X,Y)=cov(x,Y)

* However, the var(X) = var(x)+var(v)

* Thus: BetaHat = cov(X,Y)/var(X) = cov(X,Y)/(var(x)+var(v)) = cov(x,Y)/(var(x)+var(v))

* Therefore: |BetaHat| < |Beta| when var(v)>0

* Let's see this in action!

set seed 101
clear
set obs 100000

gen true_weight=165+rnormal()*30
gen measurement_error = 20*rnormal()

gen weight_observed = true_weight+measurement_error

gen u = rnormal()* 5

corr true_weight weight_observed
* We can see even with measurement error, our estimate of weight is 82% correlated with the true weight.

gen athletic_performance = 10 - .05*true_weight + u

* We expect our estimate of weight to be biased by a factor of alpha where alpha is defined as:

* alpha*|beta| = |BetaHat|

* alpha*|-.05| =|cov(x,Y)/(var(x)+var(v))|=

qui corr true_weight athletic_performance, cov
di r(cov_12)/(30^2 + 20^2)

* = -.03469636

* Thus alpha = 70%

reg athletic_performance weight_observed

di .05 * .7
* Thus we can see the nature of our bias is very predictable under the assumption that the measurement error is uncorrelated with our outcome variable.

Econometrics By Simulation

Friday, November 16, 2012

Measurement Error

No comments:

Post a Comment

Blog Archive