Saturday, June 23, 2012

Quantile Regression (qreg) is invariant to non-decreasing transformations

* That is med(f(x))=f(med(x)) so long as f' > = 0

* LAD is  is invariant to non-decreasing transformations.
set seed 110

set obs 10000

gen x = rnormal()*8+6
* Because x is symetric around 1 we know the median is 1

sum x, detail

gen fx = sign(x)*x^2+500
* fx is a non-decreasing function we can see this by ploting fx against x

line fx x, sort

* Likewise the median of fx is now easy to find med(f(x))=f(med(x))

* med(f(x))=f(med(x))=f(x=6) = sign(1)*6^2+50 = 536
* We can confirm this:

sum fx, detail

* also: med(f(x))=f(med(x)) so long as f' <= 0 by the symetry of the rank function around 50%
* med(g(x))=g(med(x))=g(x=6) = (-1)*(sign(1)*6^2+5) = -536
gen gx = (-1)*(sign(x)*x^2+500)
sum gx, detail

* Notice that while the medians are mirrors of each other (and equal) despite g' < 0.  However the quantile have now reversed order thus quantile(.25)=-quantile(.75).

* This is because of the mirror nature of the generated data.
two (hist  fx, color(blue)) (hist  gx, color(red)), legend(label(1 "fx") label(2 "gx")) title(Mirror quantiles)

* But what is more interesting to us how well LAD does at estimating the conditional median.

* First let us specify:

gen u = rnormal()*20

gen y = x*10 + u*10

* The conditional median is clearly 10

qreg y x
* And qreg is pretty good at identifying the conditional coefficient as 10.

* Also, because E(u|x)=med(u|x)=0, OLS also identifies the median.
* Thus the following also provides a good estimate
reg y x

* Now let us transform y so that it has larger tails using f(y)=fy:

gen fy = sign(y)*y^2+5

* Let's see how well LAD (least absolute deviations) works

qreg fy x
* But what does this mean?
* How well is the quantile regression working?

* Remember fy = sign(y)*y^2+500
* If fy>0: fy(x) = y(x)^2+500 = (x*10 + u)^2 + 500
* And the conditional effect of x on y is

* fy'(x) = 20*(x*10 + u)
* med(fy'(y)|x) = fy'(med(y|x)) =
* 20*(med(x|x)*10 + med(u|x)) =
* 20*(med(x)*10) =
* 20*(6*10) = 1200

* Alternatively:
reg fy x

graph twoway (lfitci fy x) ///
             (scatter fy x)

* This regression does not work very well even though it has a higher r2.

No comments:

Post a Comment