## Thursday, January 10, 2013

### Efficiency of LAD vs OLS

R Script

# The following post follows class notes from Panel Data Analysis II by Jeffrey Wooldridge (though all errors are mine)

# When deciding on the best method to estimate a coefficient, LAD and OLS are two commonly considered.

# Often times these estimators are theoretically estimating the same coefficient.

# That is, in the linear relationship y=xB + u with u~symmetrically distributed and independent of x with E(u|x)=0 and med(u|x)=0, both OLS and LAD are consistent estimators of B.

# However, in terms of efficiency, the two estimators individual efficiency is variable.

# Assuming u is independent of x, the asymptotic variance of the LAD estimator (n^.5 * (B[LAD]-B))= is 1/(4*f(0)^2)*E(x'x)^-1

# with f(0) being the pdf of the error distribution at the quantile being estimated (median 0).

# While the asymptotic variance of OLS is (n^.5 * (B[OLS]-B))=sigma2*E(x'x)^-1

# with sigma2 being the variance of the error distribution.

# Now we can evaluate the differences in asymptotic variances given difference distributions of errors.

# For instance, assume the underlying distribution is standard normal

# sigma2=1

# f(0)

1/(4*dnorm(0)^2)

# Yields 1.57.  This indicates that LAD is 57% less efficient than OLS when the errors are distributed normally.

# On the other hand.  If we had a distribution of errors v = exp(|u|)*(2-|u|)*sign(u) with u~normally.

N = 1000000

u = rnorm(N)

v = exp(abs(u))*(2-abs(u))*sign(u)

# We can estimate the variance of the errors by sampling.

hist(v)
summary(v)

var(v)
# Is estimated around 12.7

# To calculate the density at the median 0 we can sum across the pdf values of u in which v = 0.

# Which is u=2 and u=-2.  Because the normal is symmetric:

dnorm(-2)*2

# So avar(lad) = (X'X)^-1 *
1/(4*dnorm(-2)*2)