* Non-Linear Least Squares (M-estimation)
* Sometimes we would like to estimate values that cannot be written in closed form
* For instance imagine Y=exp(XB) and we want to estimate the Bs directly.
* Any data from memory
clear
* Set the initial number of observations
set obs 10000
gen x=rnormal()
gen u = rnormal()*10
gen y=exp(10+4*x+u)
* We can directly estimate B0 and B1 with non-linear least squares
nl (y=exp({b0}+x*{b1}))
* Or we can transform y
gen lny=ln(y)
reg lny x
* Notice that the non-linear least squares estimate is pretty bad.
* This is because the E(u)!=0, that is because E(normal())>0 and that by random error being enclosed in the exponential it means that the error is multiplicative.
* Thus defining the true error r = y - y=exp(10+4*x)
gen r = y-exp(3+1*x)
* We can see the source of the biase in b0 in nl estimation is
sum r
* Which causes positive bias in b0
gen expx=exp(x)
corr r expx
* Which causes positive bias in b1
* To generate a simulation that rightly favors the nl estimator we might generate a different y
gen y2 = exp(3+1*x) + u*10
nl (y2=exp({b0}+x*{b1}))
gen lny2=ln(y2)
reg lny2 x
* The problem with a simulation like this is that we can estimate B without needing to use non-linear least squares.
**********************************************************************
* Let us try something a little trickier: y = 5*sin(B0+B1*x)+B2*x2 + u
* An equation that cannot be decomposed into closed form.
set seed 101
clear
set obs 10000
gen x = runiform()*_pi
gen y =5*x*sin(1+2.95*x)+2*x+rnormal()*1
scatter y x
nl (y=5*x*sin({b0}+{b1}*x)+{b2}*x)
* These coefficients are not right (b0 or b1), but then again maybe they are because the sign function is symmetric and repetitive.
* Thus two different input vectors could predict the same output.
* Let us check:
predict y_hat
two (scatter y x) (line y_hat x, sort color(pink)), title (Non-linear least squares fits pretty well)
* Looking pretty darn good.
cor y_hat y
* Of course this example is also very contrived.
* In order to estimate B you must know pretty exactly the functional form of m (where y=m(x)+u)
* Let's test how well the estimate performs when there is more noise in the model.
* Try instead:
gen y2 =5*x*sin(1+2.95*x)+2*x+rnormal()*10
nl (y2=5*x*sin({b0}+{b1}*x)+{b2}*x)
predict y2_hat
two (scatter y2 x) (line y2_hat x, sort)
* Now let's see what happens if we do not know the functional form as well:
* Let's say we say we saw:
scatter y x
* and we want to fit a polynomial to it.
gen x2 = x^2
gen x3 = x^3
gen x4 = x^4
gen x5 = x^5
gen x6 = x^6
reg y x x2 x3 x4 x5 x6
predict y_hat2
two (scatter y x) (line y_hat2 x, sort color(orange)), title(OLS with lots of polynomials fits well too)
corr y_hat2 y
* Not too bad a fit at all!
* But does a good fit imply a good forecast?
* Let's imagine that the true distribution of x is truncated normal runiform()*_pi*1.1 and we only observed runiform()*_pi portion of it.
* If we are using x to predict y in both these scenarios we will come to two ver different conclusions in our predictions.
* Imagine that we have 500 more new observations:
set obs 10500
gen new_obs = 1 if x==.
* This variable is newly observed.
replace x = runiform()*_pi*1.15 if new_obs == 1
replace y = 5*x*sin(1+2.95*x)+2*x+rnormal()*1 if new_obs == 1
two (scatter y x if new_obs==. , msymbol(smcircle)) (scatter y x if new_obs==1 , msymbol(smcircle))
* Now let's see how our two models do.
* First we will just resubmit the nl regression using the old data
nl (y=5*x*sin({b0}+{b1}*x)+{b2}*x) if new_obs==.
* Then predict the yhat values
predict y_hat1_new
* Next we will redo the polynomial regression using the old values.
reg y x x2 x3 x4 x5 x6 if new_obs==.
* We need to make sure all of the x2, x3, ... xn polynomials are defined for the new observations.
replace x2 = x^2
replace x3 = x^3
replace x4 = x^4
replace x5 = x^5
replace x6 = x^6
cap predict y_poly_new
* This did not generate y_hat for all of the values.
two (scatter y x if new_obs==. , msymbol(smcircle)) ///
(scatter y x if new_obs==1 , msymbol(smcircle)) ///
(line y_hat1_new x, sort color(pink) lwidth(.7)) ///
(line y_poly_new x, sort color(black) lwidth(.4)) ///
, legend(label(1 "Original Observations") ///
label(2 "New Observations") ///
label(3 "Non-linear model (correct)") ///
label(4 "Polynomial model")) ///
title(Specification is Everything in Forcasting)
* Despite both models having very similar r2, the correctly specified one is much better at forcasting.
* Of course the difficulty of discovering the underlying structural model is another matter :)
* Thus every so often you hear about a group of financial wizards who are able to "predict" the stock market for a few time periods but then something changes and the models that looked so good to begin with end up generating lots of trash.
Dear Francis,
ReplyDeletetwo remarks:
"Closed form" - You mean linear in parameters?
"This is because the E(u)!=0, that is because E(normal())>0 and that by random error being enclosed in the exponential it means that the error is multiplicative." - Well E[u] is definitely==0. You must argue via E[y].