## Wednesday, June 6, 2012

### 2SQreg IVqreg Cfqreg - zombies

* The following simulation is testing how well a 2 stage quantile regression can work.  The example is a little unlikely but the methods should still be good.

* In the first stage we will assume the coefficient on the instrument is constant.

* In the second stage we will assume that the coefficient on the endogenous variable w is changing in y.

* That is we want to estimate quantile(Y|w)=wB + u   med(u|z)=0
* w = gamma0 + z*gamma1 + v
* E(w|z) = gamma0 + z*gamma1

* w_hat = gamma0_hat + z*gamma1_hat

* quantile(Y|w_hat)=w_hat B + u   med(u|w_hat)=0   or something like that
* but that med(u|w)!=0

* The properties of quantile regression are difficult because of the non-linearities involved in median maximization.

* However, we can test the properties of 2 stage quantile regression through simulation!

* Imagine we would like to estimate how good are weapons are at killing zombies.

* However you are afraid that people who own weapons might also be more militaristic people and in general might be more effective at killing zombies.

* Therefore, you would like to instrument for the likelihood of owning a weapons.

* You would like to know three things.

* 1. For the bottom 25% zombie killer how much does owning weapons improve their ability to kill zombies?

* 2. For the median person (typical person), how much does owning weapons improve zombie killing ability?

* 3. For the top 75% zombie killer how much does owning weapons improve their ability to kill zombies?

* Let us first generate the data

set seed 101
clear
local num_obs = 10000
set obs num_obs'

gen fitness = runiform()

gen militarism = runiform()

* Your instrument is that some people live in areas that are more weapon friendly prior to the zombie outbreak.

gen weapon_ease = runiform()
label var weapon_ease "The ease by which people can purchase weapons in the area"
* Assume the likelihood of people being militaristic is unrelated to the area the live (unlikely).

gen weapons = weapon_ease + militarism

gen error = 5*rnormal()

* Let's get a general estimate of the effectiveness of weapons
gen weapon_coef = 1
gen zombie_kills = fitness + militarism + weapon_coef*weapons + rnormal()

forv i=1(1)10 {
sort zombie_kills
replace weapon_coef=4*(_n/num_obs')
replace zombie_kills = 5*fitness + 7* militarism + weapon_coef*weapons + error + 15
* Note partial kills are possible because assists do not count as full kills.
}

* We know the true coefficient on weapons at 25% is 1 at 50% is 2 and at 75% is 3
scatter  weapon_coef zombie_kills, sort

sum weapon_coef if _n==num_obs'*1/4
sum weapon_coef if _n==num_obs'*2/4
sum weapon_coef if _n==num_obs'*3/4

* Let us see how well we can recover the coefficients:

* First: militarism is unobservable

drop militarism

* Let us first try the nieve regression
qreg zombie_kills fitness weapons, quantile(.25)
qreg zombie_kills fitness weapons, quantile(.50)
qreg zombie_kills fitness weapons, quantile(.75)

* We can see that at all levels weapons appear far more effective than they actually are.

* Let us try 2SQReg

reg weapons weapon_ease
* Looks like a pretty good estimate

predict weapons_hat
predict uhat, resid

qreg zombie_kills fitness weapons_hat, quantile(.25)
qreg zombie_kills fitness weapons_hat, quantile(.50)
qreg zombie_kills fitness weapons_hat, quantile(.75)

* It appears that 2SQreg while not perfect is much better than qreg.

* Let us try a control function formulation

qreg zombie_kills fitness weapons uhat, quantile(.25)
qreg zombie_kills fitness weapons uhat, quantile(.50)
qreg zombie_kills fitness weapons uhat, quantile(.75)

* We can see that while the estimates of 2SQreg is identical to that of the control function they both appear to be effective methods.

* Let us do a monte Carlo Simulation of the who thing again:

cap program drop s2qreg
program define s2qreg, rclass

clear
local num_obs = 10000
set obs num_obs'

gen fitness = runiform()

gen militarism = runiform()

* Your instrument is that some people live in areas that are more weapon friendly prior to the zombie outbreak.

gen weapon_ease = runiform()
label var weapon_ease "The ease by which people can purchase weapons in the area"
* Assume the likelihood of people being militaristic is unrelated to the area the live (unlikely).

gen weapons = weapon_ease + militarism

gen error = 5*rnormal()

* Let's get a general estimate of the effectiveness of weapons
gen weapon_coef = 1
gen zombie_kills = fitness + militarism + weapon_coef*weapons + rnormal()

forv i=1(1)10 {
sort zombie_kills
replace weapon_coef=4*(_n/`num_obs')
replace zombie_kills = 5*fitness + 7* militarism + weapon_coef*weapons + error + 15
}

qreg zombie_kills fitness weapons, quantile(.25)
return scalar qreg25=_b[weapons]
qreg zombie_kills fitness weapons, quantile(.50)
return scalar qreg5=_b[weapons]
qreg zombie_kills fitness weapons, quantile(.75)
return scalar qreg75=_b[weapons]

reg weapons weapon_ease

predict weapons_hat
predict uhat, resid

qreg zombie_kills fitness weapons_hat, quantile(.25)
return scalar s2qreg25=_b[weapons]
qreg zombie_kills fitness weapons_hat, quantile(.50)
return scalar s2qreg5=_b[weapons]
qreg zombie_kills fitness weapons_hat, quantile(.75)
return scalar s2qreg75=_b[weapons]

qreg zombie_kills fitness weapons uhat, quantile(.25)
return scalar cfqreg25=_b[weapons]
qreg zombie_kills fitness weapons uhat, quantile(.50)
return scalar cfqreg5=_b[weapons]
qreg zombie_kills fitness weapons uhat, quantile(.75)
return scalar cfqreg75=_b[weapons]

end

s2qreg

return list

simulate qreg25=r(qreg25) s2qreg25=r(s2qreg25) cfqreg25=r(cfqreg25)  ///
qreg5=r(qreg5)   s2qreg5=r(s2qreg5)   cfqreg5=r(cfqreg5)    ///
qreg75=r(qreg75) s2qreg75=r(s2qreg75) cfqreg75=r(cfqreg75), ///
reps(500): s2qreg
sum