Wednesday, October 31, 2012
Clustering Standard Errors - State Panel Data Example
* Imagine that you are trying to evaluate corporate state labor taxes as a predictor of state employment.
* First let's generate our states
clear
set seed 1033
set obs 50
gen state=_n
* Let's generate some starting values for unemployment.
gen base_employment=runiform()*.3
* Let's imagine that there is an annual trend in unemployment for each state.
gen trend=rnormal()*.025
* The policy to cut unemployment is enacted in different states around year 10.
gen policy_start = rpoisson(10)
expand 20
bysort state: gen t=_n
gen policy=(t>policy_start)
gen employment = .01*policy + base_employment + trend*t + rnormal()*.06
* The nieve regression would be to directly estimate the effect of the policy.
reg employment policy
* However, we might be concerned that the sampling is clustered.
* In order to help controlled for correlated errors by cluster we can cluster the standard errors.
* We may be interested in the interclass correlation.
loneway employment state
* This happens to be large.
reg employment policy, cluster(state)
* This substantially increases our standard errors size and results in a failure to reject the null.
* But, in this case we know that there is an effect of the policy, should we still cluster our standard errors?
* The answer is yes, we need to cluster our standard errors.
* To show this I will simulate the data 100 times with the alternative scenario (that the null is true and there is no effect).
cap program drop cluster_test
cap program define cluster_test, rclass
clear
set obs 50
gen state=_n
gen base_employment=runiform()*.3
gen trend=rnormal()*.025
gen policy_start = rpoisson(10)
expand 20
bysort state: gen t=_n
gen policy=(t>policy_start)
gen employment = .00*policy + base_employment + trend*t + rnormal()*.06
* NOTE: Now the policy has no effect.
reg employment policy
local p1 = ttail(e(df_r), abs(_b[policy]/_se[policy]))
return scalar sig1 = (`p1'<.05)
reg employment policy, cluster(state)
local p2 = ttail(e(df_r), abs(_b[policy]/_se[policy]))
return scalar sig2 = (`p2'<.05)
end
simulate sig1=r(sig1) sig2=r(sig2), reps(100): cluster_test
sum
* sig1 is from the regression without clustered standard errors.
* sig2 is from the regression with clustered standard errors.
* We can see that both rejections too frequently reject the null (target is 5%).
* However, the difference between unclustered and clustered is the difference between falsely rejecting the null 56% of the time and 12% of the time.
* You can repeate the simulation above using 500 or 5000 states above.
* The more states you use the closer the type 1 error gets to 5%.
* However, increasing the number of years does not impove the estimates.
* There is one more thing I would like to do with this data so let's generate it once more.
cluster_test
* We may be concerned that our policy was not exogenously given to each state but rather as a product of an endogenous connection between employment and the policy.
* One method to test the exogeniety of the policy to so test if the year before the policy was enacted, if there was any predictive power on unemployment.
gen year_before=policy_start-1
gen policy_lead=(t==year_before)
reg employment policy_lead policy
* We may be tempted to not cluster the errors but clustering is just as important here as previously.
reg employment policy_lead policy, cluster(state)
* Unsurprisingly there is on evidence of endogeniety, since treatement was not endogenous by construction in this case.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment