## Sunday, July 8, 2012

### Predict

* Stata's predict command is an extremely useful command for many purposes.

* In this post we will go through how it works.  And manually program in long hand some of the things it does.

* Imagine the underlying population model Y = g(x1, x2)

* Now imagine an estimator Y = f(x1, x2)

* What most estimations do is they take the Y and the xs and estimate some variant of f.

* In the linear case Y = b0 + b1x1 + b2x2 + u

* Most estimation commands attempt to estimate b0, b1, and b2.  Which is great!

* But after estiamting b0, b1, and b2 what we may ask,

* "How does u look? Does it look normal, thus justifying the use of OLS?"

* We may also ask, "How does the estimated y look?  This is often not particularly interesting since it is purely linear but often 'yhat' the predicted y is used in post estimation techniques."

* Let's see how this works.

* First, let's simulate some data:

set seed 10
clear
set obs 1000

gen x1 = rnormal()
gen x2 = rnormal()
gen u  = rnormal()
gen y  = 6*x1 + -4*x2 + 10*u

* Now let's estimate the OLS equation

reg y x1 x2

* If we want to get the fitted values we need only write the following
predict yhat1

* This is equivalent in the OLS case to:
predict yhat2, xb

* We can also manually generate these values by using the estimated coefficients:
gen yhat3 = .1430927 + 5.767773*x1 + -3.798869*x2

sum yhat?

* Likewise we may be interested in the error uhat

predict uhat1, residual

* We can do it manually:
gen uhat2=y-yhat1

sum uhat?
* There is the slightest difference between uhat1 and uhat2 but this is only the result of rounding error.

* Now that we have uhat, we can map it out to see if it looks like it is behaving well:

hist uhat1, kden
* Unsprisingly (given that we drew u from normal) uhat1 (which is an estimate of u) looks normal as well.