## Tuesday, May 22, 2012

### Tobit Normality Assumption Fail - Tobit Still Works * This simulation looks at what happens when the underlying data generating process is not normal (key assumption with Tobit).

* This post is a follow up to the previous post on Bottom Coding and Tobit on May 21st.

set seed 11

* Let's first set up the simulation
clear

* Set the number of observations
set obs 1000

* Set the random seed
set seed 101

* Generate some explanatory variables
gen man_num_sibs = rpoisson(3)
label var man_num_sibs "The number of sibblings that the man has"

gen woman_num_sibs = rpoisson(3)
label var woman_num_sibs "The number of sibblings that the spouse has"

gen income = abs(rnormal())*2
label var income "Family income, 10k/year"

* Generate the number of children each man has with the error being
* drawn from a poisson distribution which has either positive or negative
* signs randomly
gen e1 = rpoisson(3)*(-1)^rbinomial(1,.5)
sum e1
replace e1=e1/r(sd)*2

gen Y1 = .8*man_num_sibs + .6*woman_num_sibs - 2*income + e1
label var Y1 "The true underlying amount of children some men would have"

* Retrict the number of children to the positive range.
gen Nchildren1 = max(Y1,0)

tobit Nchildren1 man_num_sibs woman_num_sibs income, ll(0)
* Despite a very non-normal error the tobit estimator still works quite well

* Generate the number of children each man has with the error being
* drawn from a log normal distribution with random positive or negative signs
gen e2 = exp(rnormal())*(-1)^rbinomial(1,.5)
sum e2
replace e2=e2/r(sd)*2
gen Y2 = .8*man_num_sibs + .6*woman_num_sibs - 2*income + e2
label var Y2 "The true underlying amount of children some men would have"

* Retrict the number of children to the positive range.
gen Nchildren2 = max(Y2,0)

tobit Nchildren2 man_num_sibs woman_num_sibs income, ll(0)
* Despite a very non-normal error the tobit estimator still works quite well

* Generate the number of children each man has with the error being
* drawn from a double log normal distribution with random positive or negative signs
gen e3 = exp(exp(rnormal()))*(-1)^rbinomial(1,.5)
sum e3
replace e3=e3/r(sd)*2
gen Y3 = .8*man_num_sibs + .6*woman_num_sibs - 2*income + e3
label var Y3 "The true underlying amount of children some men would have"

* Retrict the number of children to the positive range.
gen Nchildren3 = max(Y3,0)

tobit Nchildren3 man_num_sibs woman_num_sibs income, ll(0)
* Despite a very non-normal error the tobit estimator still works pretty good.

* Compare with OLS.  Hard to tell which is preferred from this.  It would be useful to use
* a monte carlo simulation to discover if the tobit seems unbiased.  See the previous post
* on using simulations to understand bias.
reg Nchildren3 man_num_sibs woman_num_sibs income

* Generate the number of children each man has with the error being
* drawn from a chi-squared distribution with random positive or negative signs
gen e4 = (-1)^rbinomial(1,.5)*(rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2+ ///
rnormal()^2+rnormal()^2+rnormal()^2+rnormal()^2)
label var e4 "Bimodal error - tobit still works"

sum e4
replace e4=e4/r(sd)*2
gen Y4 = .8*man_num_sibs + .6*woman_num_sibs - 2*income + e4
label var Y3 "The true underlying amount of children some men would have"

* Retrict the number of children to the positive range.
gen Nchildren4 = max(Y4,0)

tobit Nchildren4 man_num_sibs woman_num_sibs income, ll(0)
* Despite a very non-normal error the tobit estimator still works quite well

sum e?

hist e4, kden