* This post picks up on the Value Added Methods (model) post previously:
* In order to execute this code first run that previous simulation to create a data set that you can use.
* Now, you that you have that data set you want to recover estimates for the various contractors and the various production stages.
* Let us first set up the panel data:
order prod_id prod_stage
* Tell stata that product id is the panel id and product stage is the time dimension.
xtset prod_id prod_stage
* Now to being with we want to estimate the productivity of each contractor, each producing company, each contracting umbrella company, and each production stage.
* If you can estimate this then you know where to invest your resources or which contractors to hire.
* So you estimate the following equation:
reg value i.comp_id i.cont_id i.cont_company_id i.prod_stage
* Immediately you notice a problem.
* 1. All of the contractors always belong to the same company therefore the contracting company id is perfectly multicolinear with the contractors.
* 2. There is only 5 production stages. They are mutually exclusive. Therefore the estimated value added of the production stages are not absolute values but rather values relative to one omitted value.
* 3. These estimates are not easily compared with the original.
* There are some tricks you can do in order to compare estimates.
* First we need to make a lot of dummy variables:
tab comp_id, gen(comp_id_)
tab cont_id, gen(cont_id_)
tab cont_company_id, gen(cont_company_id_)
tab prod_stage, gen(prod_stage_)
* Now do the above regression but with the dummy variables:
reg value comp_id_* cont_id_* cont_company_id_* prod_stage_*
* Yes this is not very pretty. However, we more easily manipulate these coefficients.
gen comp_fe_hat = .
gen cont_fe_hat = .
gen cont_company_fe_hat = .
gen stage_fe_hat = .
forv i=1/101 {
cap replace comp_fe_hat = _b[comp_id_`i'] if comp_id==`i'
cap replace cont_fe_hat = _b[cont_id_`i'] if cont_id==`i'
cap replace cont_company_fe_hat = _b[cont_company_id_`i'] if cont_company_id==`i'
cap replace stage_fe_hat = _b[prod_stage_`i'] if prod_stage==`i'
}
sum *hat
* Now we have stored a lot of estimates of the effects of various levels of inputs let's see how well or estimates perform relative to the true.
cor comp_fe*
cor cont_fe*
cor cont_company_fe*
cor stage_fe*
* despite a simple regression that is consistent with our knowledge of how the data is generated our estimates are generally pretty bad.
* Let us try a series of less complex regresssions.
reg value comp_id_*
forv i=1/101 {
cap replace comp_fe_hat = _b[comp_id_`i'] if comp_id==`i'
}
cor comp_fe*
* That is looking pretty well.
reg value cont_id_*
forv i=1/101 {
cap replace cont_fe_hat = _b[cont_id_`i'] if cont_id==`i'
}
cor cont_fe*
* We can see that this estimator is still performing quite poorly.
* One way to try to get a better estimate might be throwing the lag of the value into the regression.
gen value_l1 = l1.value
reg value value_l1 cont_id_*
forv i=1/101 {
cap replace cont_fe_hat = _b[cont_id_`i'] if cont_id==`i'
}
cor cont_fe*
* This does not help much.
scatter cont_fe*
* We can see that the estimates of the effectiveness of contractors do not line up with reality much at all.
* Why is that?
* I would like to say that I know why, but the truth is I don't.
* We know that there is a fixed product effect. Perhaps that is throwing off our estimates?
xtreg value value_l1 cont_id_* prod_stage_*, fe
forv i=1/101 {
cap replace cont_fe_hat = _b[cont_id_`i'] if cont_id==`i'
}
cor cont_fe*
* Nope. Well I am stumped at this point but fortunately wiser minds than mine have been working at this problem before me.
* The following paper addresses in detail many of the questions raised by this simulation and others not addressed in this simulation (though not all of them).
* http://vam.educ.msu.edu/wp-content/uploads/2010/11/20120517_Can-Value-Added-Measures-of-Teacher-Performance-be-Trusted-WP2.doc
No comments:
Post a Comment