Tuesday, June 5, 2012

Estimating VAMs Value Added Methods (models)

* This post picks up on the Value Added Methods (model) post previously:

* In order to execute this code first run that previous simulation to create a data set that you can use.

* Now, you that you have that data set you want to recover estimates for the various contractors and the various production stages.

* Let us first set up the panel data:

order prod_id prod_stage

* Tell stata that product id is the panel id and product stage is the time dimension.
xtset prod_id prod_stage

* Now to being with we want to estimate the productivity of each contractor, each producing company, each contracting umbrella company, and each production stage.

* If you can estimate this then you know where to invest your resources or which contractors to hire.

* So you estimate the following equation:

reg value i.comp_id i.cont_id i.cont_company_id i.prod_stage

* Immediately you notice a problem.

* 1. All of the contractors always belong to the same company therefore the contracting company id is perfectly multicolinear with the contractors.

* 2. There is only 5 production stages. They are mutually exclusive.  Therefore the estimated value added of the production stages are not absolute values but rather values relative to one omitted value.

* 3. These estimates are not easily compared with the original.

* There are some tricks you can do in order to compare estimates.

* First we need to make a lot of dummy variables:

tab comp_id, gen(comp_id_)
tab cont_id, gen(cont_id_)
tab cont_company_id, gen(cont_company_id_)
tab prod_stage, gen(prod_stage_)

* Now do the above regression but with the dummy variables:
reg value comp_id_* cont_id_* cont_company_id_* prod_stage_*

* Yes this is not very pretty.  However, we more easily manipulate these coefficients.
gen comp_fe_hat = .
gen cont_fe_hat = .
gen cont_company_fe_hat = .
gen stage_fe_hat = .

forv i=1/101 {
cap replace comp_fe_hat = _b[comp_id_`i'] if comp_id==`i'
cap replace cont_fe_hat = _b[cont_id_`i'] if cont_id==`i'
cap replace cont_company_fe_hat = _b[cont_company_id_`i'] if cont_company_id==`i'
cap replace stage_fe_hat = _b[prod_stage_`i'] if prod_stage==`i'
}

sum *hat

* Now we have stored a lot of estimates of the effects of various levels of inputs let's see how well or estimates perform relative to the true.

cor comp_fe*
cor cont_fe*
cor cont_company_fe*
cor stage_fe*

* despite a simple regression that is consistent with our knowledge of how the data is generated our estimates are generally pretty bad.

* Let us try a series of less complex regresssions.
reg value comp_id_*
forv i=1/101 {
cap replace comp_fe_hat = _b[comp_id_`i'] if comp_id==`i'
}

cor comp_fe*
* That is looking pretty well.

reg value cont_id_*
forv i=1/101 {
cap replace cont_fe_hat = _b[cont_id_`i'] if cont_id==`i'
}

cor cont_fe*
* We can see that this estimator is still performing quite poorly.

* One way to try to get a better estimate might be throwing the lag of the value into the regression.

gen value_l1 = l1.value

reg value value_l1 cont_id_*
forv i=1/101 {
cap replace cont_fe_hat = _b[cont_id_`i'] if cont_id==`i'
}

cor cont_fe*
* This does not help much.

scatter cont_fe*
* We can see that the estimates of the effectiveness of contractors do not line up with reality much at all.

* Why is that?

* I would like to say that I know why, but the truth is I don't.

* We know that there is a fixed product effect.  Perhaps that is throwing off our estimates?

xtreg value value_l1 cont_id_* prod_stage_*, fe
forv i=1/101 {
cap replace cont_fe_hat = _b[cont_id_`i'] if cont_id==`i'
}

cor cont_fe*

* Nope.  Well I am stumped at this point but fortunately wiser minds than mine have been working at this problem before me.

* The following paper addresses in detail many of the questions raised by this simulation and others not addressed in this simulation (though not all of them).