## Tuesday, May 8, 2012

### Quantile Regression Fail

* Stata simulation to estimate performance of quantile regression at
* different quantiles

clear
set obs 10000
set seed 101

gen x1 = runiform()
* Single exogenous x

gen u = rnormal()
* Single uncorrelated error u

gen beta=1
label var beta "True Beta"
* At first beta is = 1

gen y0 = u

gen y = beta*x1 + y0
* Generate a starting point for y

forv i=1/20 {
cap drop ytile
* Removes old ytile from the data
xtile ytile=y, nq(100)
* Generates a variable that orders all of the
* y variables into 100 quantiles
qui replace beta = 4-((ytile+30)/20)^.5
* beta goes from  4-31/20)^.5 to 4-(133/20)^.5 which is
* a range of 1.25 to 2.6 ish.
replace y = beta*x1 + u + 190

di "`i'"
}
* This repetitive routine makes it so that beta is smoothly related to y

replace y = y-190
* Make y closer a small range so that the beta hat seems more effective

line beta y, sort
* We can see the beta function is changing effect accross the different
* quantiles of y.  We can see that if x1 is a policy variable then because
* x1 is greater for lower ys and smaller for higher ys.  Then the effect of
* sign of the betas is to keep the variance in y constant or even decrease
* it.  It is possible for the variance in y to still be larger if the
* variances of the xs are larger since in effect we are adding two random
* variables together.

sum y y0

gen beta_hat=.
label var beta_hat "Estimated Beta"
forv i =1/100 {
cap qreg y x1, quantile(`=`i'/100')
* This tells stata to do a quantile regression at every point
* the 'cap' tells it not to stop if there is an error.

if _rc==0 qui replace beta_hat=_b[x1] if ytile==`i'
* _rc==0 then it means there was no error in the previous quantile regression

di _continue " `i'"
}

two (line beta y, sort) (line beta_hat y, sort), title(Results of Qreg)

two (line beta ytile, sort) (line beta_hat ytile, sort), title(Results of Qreg)

* The quantile regression seems to be working mediocre
* at picking up some of the shape of the beta distribution.
* However, near the edges which people are often interested in
* it is failing.

* One might say that this is due to the tails having too few observations.
* However, increasing the sample size to 100,000 gives almost identical results!

* Ah, but hope is not lost.  Stay tuned for future releases of semi-parametric
* estimators!