Tuesday, May 22, 2012

Tobit Normality Assumption Fail - Tobit Still Works

* This simulation looks at what happens when the underlying data generating process is not normal (key assumption with Tobit).



* This post is a follow up to the previous post on Bottom Coding and Tobit on May 21st.

set seed 11

  * Let's first set up the simulation
  clear

  * Set the number of observations
  set obs 1000

  * Set the random seed
 set seed 101

  * Generate some explanatory variables
  gen man_num_sibs = rpoisson(3)
    label var man_num_sibs "The number of sibblings that the man has"

  gen woman_num_sibs = rpoisson(3)
    label var woman_num_sibs "The number of sibblings that the spouse has"

  gen income = abs(rnormal())*2
    label var income "Family income, 10k/year"
   
  * Generate the number of children each man has with the error being
  * drawn from a poisson distribution which has either positive or negative
  * signs randomly
  gen e1 = rpoisson(3)*(-1)^rbinomial(1,.5)
  sum e1
  replace e1=e1/r(sd)*2

gen Y1 = .8*man_num_sibs + .6*woman_num_sibs - 2*income + e1
    label var Y1 "The true underlying amount of children some men would have"
 
  * Retrict the number of children to the positive range.
gen Nchildren1 = max(Y1,0)

tobit Nchildren1 man_num_sibs woman_num_sibs income, ll(0)
* Despite a very non-normal error the tobit estimator still works quite well

  * Generate the number of children each man has with the error being
  * drawn from a log normal distribution with random positive or negative signs
  gen e2 = exp(rnormal())*(-1)^rbinomial(1,.5)
  sum e2
  replace e2=e2/r(sd)*2
gen Y2 = .8*man_num_sibs + .6*woman_num_sibs - 2*income + e2
    label var Y2 "The true underlying amount of children some men would have"
 
  * Retrict the number of children to the positive range.
gen Nchildren2 = max(Y2,0)

  tobit Nchildren2 man_num_sibs woman_num_sibs income, ll(0)
* Despite a very non-normal error the tobit estimator still works quite well

  * Generate the number of children each man has with the error being
  * drawn from a double log normal distribution with random positive or negative signs
  gen e3 = exp(exp(rnormal()))*(-1)^rbinomial(1,.5)
    sum e3
  replace e3=e3/r(sd)*2
gen Y3 = .8*man_num_sibs + .6*woman_num_sibs - 2*income + e3
    label var Y3 "The true underlying amount of children some men would have"
 
  * Retrict the number of children to the positive range.
gen Nchildren3 = max(Y3,0)

  tobit Nchildren3 man_num_sibs woman_num_sibs income, ll(0)
* Despite a very non-normal error the tobit estimator still works pretty good.



* Compare with OLS.  Hard to tell which is preferred from this.  It would be useful to use
* a monte carlo simulation to discover if the tobit seems unbiased.  See the previous post
* on using simulations to understand bias.
reg Nchildren3 man_num_sibs woman_num_sibs income

  * Generate the number of children each man has with the error being
  * drawn from a chi-squared distribution with random positive or negative signs
  gen e4 = (-1)^rbinomial(1,.5)*(rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
                                 rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2)
  label var e4 "Bimodal error - tobit still works"
 
    sum e4
  replace e4=e4/r(sd)*2
gen Y4 = .8*man_num_sibs + .6*woman_num_sibs - 2*income + e4
    label var Y3 "The true underlying amount of children some men would have"
 
  * Retrict the number of children to the positive range.
gen Nchildren4 = max(Y4,0)

  tobit Nchildren4 man_num_sibs woman_num_sibs income, ll(0)
* Despite a very non-normal error the tobit estimator still works quite well

sum e?

hist e4, kden

No comments:

Post a Comment