## Tuesday, September 4, 2012

### Robust Hausman Test Fail?

* Robust Hausman Test Fail?

* The Huasman test is a commonly used to indicate an ideal choice between fixed effect and random effect estiamtors (in a panel data context).

* In this post I will attempt to violate the underlying assumptions in the Hausman test to see how well the test performs under non-experimental situations.

* To execute this post I will use the robust form of the test purposed by Arellano (1993) {http://ideas.repec.org/a/eee/econom/v59y1993i1-2p87-97.html}.

clear
set obs 10000

gen id=_n

expand 5
* We have 5 years of data per id

bysort id: gen year = _n

* Exlpanatory variables are serially correlated accross years
gen x1 = abs(rnormal())+year
gen x2 = abs(rnormal())+year

gen u = rnormal()*5

* Let's create a set of variables that are the means of x1 and x2.
bysort id: egen x1_mean = mean(x1)
bysort id: egen x2_mean = mean(x2)

xtset id

gen y1 = x1 + x2 + u
xtreg y1 x1 x2 x1_mean x2_mean, cluster(id) re
test x1_mean x2_mean
* It is not a requirement that the explanatory variable be independent and failure of independence of draws does not cause problems for the hausman test.

* Let's see what happens when y is no longer a linear function of our explanatory variables

gen y2 = x1^.97 + x2^.98 + u
xtreg y2 x1 x2 x1_mean x2_mean, cluster(id) re
test x1_mean x2_mean

* Non-linearities do not seem to have an obvious and problematic effect on the Hausman test (though both FE and RE are now inconsistent generally).

* Perhaps if there is noise in the measurement of x1 and x2, the Hausman test will suffer.

gen x1_true = x1+rnormal()
gen x2_true = x2+rnormal()

bysort id: egen x1t_mean = mean(x1_true)
bysort id: egen x2t_mean = mean(x2_true)

gen y3 = x1_true + x2_true + u
xtreg y3 x1_true x2_true x1t_mean x2t_mean, cluster(id) re
test x1t_mean x2t_mean
* Interestingly the test fails very badly.  As far as I know, under measurement error in the explanatory variables, there is no reason to use a FE estimator above a RE estimator.

xtreg y3 x1_true x2_true , cluster(id) re
xtreg y3 x1_true x2_true , cluster(id) fe

* Finally we would like to know what would happen to the test if the error (u) is correlated inviduals?

sort id
gen pctile = _n/(_N+1)
gen u2 = normal(pctile)*5
sum u2

gen y4 = x1 + x2 + u2
xtreg y4 x1 x2 x1_mean x2_mean, cluster(id) re
test x1_mean x2_mean
* We can see that the Hausman test once again seems to be working.

* So, the take way? Hausman works well even when the model is slightly misspecified or when errors are serially correlated or when there exists measurement error in the explanatory variable.