Wednesday, June 20, 2012

Non-Linear Least Squares (M-estimation)

* Non-Linear Least Squares (M-estimation)

* Sometimes we would like to estimate values that cannot be written in closed form

* For instance imagine Y=exp(XB) and we want to estimate the Bs directly.

* Any data from memory

* Set the initial number of observations
set obs 10000

gen x=rnormal()

gen u = rnormal()*10

gen y=exp(10+4*x+u)

* We can directly estimate B0 and B1 with non-linear least squares
nl (y=exp({b0}+x*{b1}))

* Or we can transform y
gen lny=ln(y)

reg lny x

* Notice that the non-linear least squares estimate is pretty bad.

* This is because the E(u)!=0, that is because E(normal())>0 and that by random error being enclosed in the exponential it means that the error is multiplicative.

* Thus defining the true error r = y - y=exp(10+4*x)

gen r = y-exp(3+1*x)

* We can see the source of the biase in b0 in nl estimation is
sum r
* Which causes positive bias in b0

gen expx=exp(x)
corr r expx
* Which causes positive bias in b1

* To generate a simulation that rightly favors the nl estimator we might generate a different y

gen y2 = exp(3+1*x) + u*10

nl (y2=exp({b0}+x*{b1}))

gen lny2=ln(y2)

reg lny2 x

* The problem with a simulation like this is that we can estimate B without needing to use non-linear least squares.

* Let us try something a little trickier: y = 5*sin(B0+B1*x)+B2*x2 + u

* An equation that cannot be decomposed into closed form.

set seed 101

set obs 10000

gen x = runiform()*_pi

gen y =5*x*sin(1+2.95*x)+2*x+rnormal()*1

scatter y x

nl (y=5*x*sin({b0}+{b1}*x)+{b2}*x)

* These coefficients are not right (b0 or b1), but then again maybe they are because the sign function is symmetric and repetitive.

* Thus two different input vectors could predict the same output.

* Let us check:

predict y_hat

two (scatter y x) (line y_hat x, sort color(pink)), title (Non-linear least squares fits pretty well)

* Looking pretty darn good.

cor y_hat y

* Of course this example is also very contrived.

* In order to estimate B you must know pretty exactly the functional form of m (where y=m(x)+u)

* Let's test how well the estimate performs when there is more noise in the model.

* Try instead:

gen y2 =5*x*sin(1+2.95*x)+2*x+rnormal()*10

nl (y2=5*x*sin({b0}+{b1}*x)+{b2}*x)

predict y2_hat

two (scatter y2 x) (line y2_hat x, sort)

* Now let's see what happens if we do not know the functional form as well:

* Let's say we say we saw:
scatter y x

* and we want to fit a polynomial to it.

gen x2 = x^2
gen x3 = x^3
gen x4 = x^4
gen x5 = x^5
gen x6 = x^6

reg y x x2 x3 x4 x5 x6

predict y_hat2

two (scatter y x) (line y_hat2 x, sort color(orange)), title(OLS with lots of polynomials fits well too)

corr y_hat2 y

* Not too bad a fit at all!

* But does a good fit imply a good forecast?

* Let's imagine that the true distribution of x is truncated normal runiform()*_pi*1.1 and we only observed runiform()*_pi portion of it.

* If we are using x to predict y in both these scenarios we will come to two ver different conclusions in our predictions.

* Imagine that we have 500 more new observations:

set obs 10500

gen new_obs = 1 if x==.
  * This variable is newly observed.

replace x = runiform()*_pi*1.15 if new_obs == 1

replace y = 5*x*sin(1+2.95*x)+2*x+rnormal()*1 if new_obs == 1

two (scatter y x if new_obs==. ,  msymbol(smcircle)) (scatter y x if new_obs==1 ,  msymbol(smcircle))

* Now let's see how our two models do.
* First we will just resubmit the nl regression using the old data
nl (y=5*x*sin({b0}+{b1}*x)+{b2}*x) if new_obs==.

* Then predict the yhat values
predict y_hat1_new

* Next we will redo the polynomial regression using the old values.
reg y x x2 x3 x4 x5 x6 if new_obs==.

* We need to make sure all of the x2, x3, ... xn polynomials are defined for the new observations.
replace x2 = x^2
replace x3 = x^3
replace x4 = x^4
replace x5 = x^5
replace x6 = x^6

cap predict y_poly_new
* This did not generate y_hat for all of the values.

two (scatter y x if new_obs==. , msymbol(smcircle)) ///
    (scatter y x if new_obs==1 , msymbol(smcircle)) ///
(line y_hat1_new x, sort color(pink) lwidth(.7)) ///
(line y_poly_new x, sort color(black) lwidth(.4)) ///
, legend(label(1 "Original Observations") ///
label(2 "New Observations") ///
label(3 "Non-linear model (correct)") ///
label(4 "Polynomial model")) ///
title(Specification is Everything in Forcasting)

* Despite both models having very similar r2, the correctly specified one is much better at forcasting.

* Of course the difficulty of discovering the underlying structural model is another matter :)

* Thus every so often you hear about a group of financial wizards who are able to "predict" the stock market for a few time periods but then something changes and the models that looked so good to begin with end up generating lots of trash.

1 comment:

  1. Dear Francis,
    two remarks:
    "Closed form" - You mean linear in parameters?
    "This is because the E(u)!=0, that is because E(normal())>0 and that by random error being enclosed in the exponential it means that the error is multiplicative." - Well E[u] is definitely==0. You must argue via E[y].