Thursday, October 31, 2013

Tobit fitted values not "fitting" data

* I was recently asked by a reader why it might be that the predicted values from 
* a tobit regression might have a constant significantly below zero and many fitted
* values unrealistically below zero.

* Is that a problem?

* Let's do a simple simulation of what the Tobit might be doing:

clear
set obs 10000

gen x = rnormal()

gen y_true = -2 + x*2 + rnormal()*4

gen y_observed = y_true
replace y_observed = 0 if y_true < 0

hist y_ob



tobit y_ob x, ll(0)

predict y_hat

graph twoway  (scatter y_hat y_true) (lfitci y_hat y_true)

* Not the best fit but, okay.

* So the take away. It is not inconsistent with the Tobit model at all that
* the constant and many fitted values may be significantly below zero.

* In a way, that is typically the result when it is most important to use a Tobit
* because you observe few values which are positive indicating that the underlying
* function is typically having many values fit below zero which have been censored.

* When you have the case in which the constant and many fitted values above zero,
* you have the least gains in terms of reducing bias from using a Tobit estimation
* method.

Formatted By Econometrics by Simulation

3 comments:

  1. The Tobit model estimates the parameters that generated the *latent* variable y, which was censored if less than zero. So the predicted values correspond to the underlying latent y, which should have values less than zero. To get back the observed values, censor the predicted values.

    ReplyDelete
  2. Hi, I estimate a tobit model and generate fitted value by using "predict yhat, ystar(0,.)" where 0 is the lower limit. It turned out two observations have missing fitted values that yhat=.
    Why predict returns missing value? I expect the within sample prediction contain zeros but not missing values. I appreciate if you could help.

    ReplyDelete
    Replies
    1. I am not sure why this would be due to your data, however I suspect that you might have missing explanatory values for these two observations. Without having a full complement of explanatory values you cannot adequately predict the dependent variable.

      Delete