## Thursday, May 17, 2012

### Unobserved fixed effects model

* Often times we are concerned that there are some unobserved
* factors which are correlated with our explanatory variables x
* as well as with our error term u.

* For example, we might be concerned that intelligence is
* correlated with years of schooling as well as future
* expected earnings.  However, fortunately, intelligence
* is thought of as a time constant factor.

* Therefore, if we remove time constant factors we might
* be able to approximate the returns to education.
* (This is assuming the returns to years of education is
* constant.  If it is a function of intelligence then
* we are going to need to think about being more clever

* Stata code
clear

* Imagine we have 200 individuals that we track
set obs 200
set seed 101

gen c = rnormal()
label var c "Time Constant Heterogeniety (individual specific)"

gen id = _n
label var id "Individual specific ID"

* create 5 observations for each initial observation
expand 2

bysort id: gen year=_n
label var year "Year of observation"

tab year

gen x = rnormal()+c
label var x "explanatory variable X (with time constant and time varying components)"

gen u = 3*rnormal()+3*c
label var u "Error term (correlated with unobservables c)"

gen y = x + u
label var y "Outcome variable"

reg y x
* We can see that OLS is biased

xtset id year
* Tells stata to use id as a panel data individual identifier

xtreg y x, fe
* However, the fixed effect estimator is unbiased because it
* successfully eliminates the correlation between the time
* constant correlation between the x and the error u.

* Note: an identical command is:
reg y x i.id

* Or:
areg y x, absorb(id)

* An alternative approach is the Chamberlain Munlack device.
* If we fear that the constant part of x might be correlated
* with u then we can easily control for that by including
* it in the regression:
bysort id: egen x_mean = mean(x)

reg y x x_mean

* When there is only two time periods difference in difference
* the same but in time periods more than two it tends to be
* different.  Though it is also effective at removing time
* constant effects.
gen y_diff = y-l.y
gen x_diff = x-l.x

reg y_diff x_diff