Tuesday, September 4, 2012

Robust Hausman Test Fail?


* Robust Hausman Test Fail?

* The Huasman test is a commonly used to indicate an ideal choice between fixed effect and random effect estiamtors (in a panel data context).

* In this post I will attempt to violate the underlying assumptions in the Hausman test to see how well the test performs under non-experimental situations.

* To execute this post I will use the robust form of the test purposed by Arellano (1993) {http://ideas.repec.org/a/eee/econom/v59y1993i1-2p87-97.html}.

clear
set obs 10000

gen id=_n

expand 5
* We have 5 years of data per id


bysort id: gen year = _n
 
* Exlpanatory variables are serially correlated accross years
gen x1 = abs(rnormal())+year
gen x2 = abs(rnormal())+year

gen u = rnormal()*5

* Let's create a set of variables that are the means of x1 and x2.
bysort id: egen x1_mean = mean(x1)
bysort id: egen x2_mean = mean(x2)

xtset id

gen y1 = x1 + x2 + u
xtreg y1 x1 x2 x1_mean x2_mean, cluster(id) re
test x1_mean x2_mean
* It is not a requirement that the explanatory variable be independent and failure of independence of draws does not cause problems for the hausman test.

* Let's see what happens when y is no longer a linear function of our explanatory variables

gen y2 = x1^.97 + x2^.98 + u
xtreg y2 x1 x2 x1_mean x2_mean, cluster(id) re
test x1_mean x2_mean

* Non-linearities do not seem to have an obvious and problematic effect on the Hausman test (though both FE and RE are now inconsistent generally).

* Perhaps if there is noise in the measurement of x1 and x2, the Hausman test will suffer.

gen x1_true = x1+rnormal()
gen x2_true = x2+rnormal()

bysort id: egen x1t_mean = mean(x1_true)
bysort id: egen x2t_mean = mean(x2_true)

gen y3 = x1_true + x2_true + u
xtreg y3 x1_true x2_true x1t_mean x2t_mean, cluster(id) re
test x1t_mean x2t_mean
* Interestingly the test fails very badly.  As far as I know, under measurement error in the explanatory variables, there is no reason to use a FE estimator above a RE estimator.

xtreg y3 x1_true x2_true , cluster(id) re
xtreg y3 x1_true x2_true , cluster(id) fe

* Finally we would like to know what would happen to the test if the error (u) is correlated inviduals?

sort id
gen pctile = _n/(_N+1)
gen u2 = normal(pctile)*5
sum u2

gen y4 = x1 + x2 + u2
xtreg y4 x1 x2 x1_mean x2_mean, cluster(id) re
test x1_mean x2_mean
* We can see that the Hausman test once again seems to be working.

* So, the take way? Hausman works well even when the model is slightly misspecified or when errors are serially correlated or when there exists measurement error in the explanatory variable.

No comments:

Post a Comment