## Friday, November 16, 2012

### Measurement Error

Original Code

* It is well known that measurement error causes attenuation bias in regression analysis estimators.

* This fact appeared in the literature as early at Spearman (1904).

* Attenuation bias also known as regression dilution is the phenomenon where coefficient estimates are biased towards zero.

* We can understand this phenomona very intuitively by thinking about what measurement error means.

* It means we do not have a very good measure of what something is.

* Imagine we measure people's weight and height just by watching that person walk by.

* We will assume we know the average weight of people and we make sure our average guess is that weight.

* However, unless we are trained our guess will probably miss the mark at a large frequency.

* Thus, if we want to use our guesses of weight and height as predictors of that person's athletic ability then our estimates will suffer from potentially two problems as a result of our measurement method.

* There is the one previously mentioned, attenuation bias caused from our measures not being exact.

* How, are we going to identify the effect of 5 more pounds or 3 extra inches on athletic performance if we are incapable of accurately gaging the difference between 5 pounds or 3 inches?

* The second potential source of problems is that our errors in measurement might be correlated with our unconsious assessment of the subject's athletic ability.

* That is, perhaps subjects that appear more athletic, we will guess as being taller or weighing less.

* This second issue is much more problematic than attentuation bias.

* It will cause a correlation between our errors and our explanatory variables which causes bias of an unknown form.

* In order to understand why attenuation bias exists remember  Beta=cov(x,Y)/var(x) and that the OLS coefficient of BetaHat = cov(X,Y)/var(X).

* Where the observable X = x + v.

* If we assume the error term v is uncorrelated with the outcome variable Y then cov(X,Y)=cov(x,Y)

* However, the var(X) = var(x)+var(v)

* Thus: BetaHat = cov(X,Y)/var(X) = cov(X,Y)/(var(x)+var(v)) = cov(x,Y)/(var(x)+var(v))

* Therefore: |BetaHat| < |Beta| when var(v)>0

* Let's see this in action!

set seed 101
clear
set obs 100000

gen true_weight=165+rnormal()*30
gen measurement_error = 20*rnormal()

gen weight_observed = true_weight+measurement_error

gen u = rnormal()* 5

corr true_weight weight_observed
* We can see even with measurement error, our estimate of weight is 82% correlated with the true weight.

gen athletic_performance = 10 - .05*true_weight + u

* We expect our estimate of weight to be biased by a factor of alpha where alpha is defined as:

* alpha*|beta| = |BetaHat|

* alpha*|-.05| =|cov(x,Y)/(var(x)+var(v))|=

qui corr true_weight athletic_performance, cov
di r(cov_12)/(30^2 + 20^2)

* = -.03469636

* Thus alpha = 70%

reg athletic_performance weight_observed

di .05 * .7
* Thus we can see the nature of our bias is very predictable under the assumption that the measurement error is uncorrelated with our outcome variable.