* Dependent variable is censored:

* In a previous post I looked at what happens when we use the tobit maximum likelihood estimator even when the error term is not normally distributed.

* In general we see that despite the failure of the normality assumption the tobit is shown to be a good estimator in a wide variety situations with different error structures.

* However, all of those distributions of errors were symmetric distributions.

* There is no reason to believe that in general the unobserved heterogeneity should be symmetric around the expected value.

* Let's see what happens as we relax this assumption

cap program drop tobit_monte_carlo

program define tobit_monte_carlo, rclass

* Let's first set up the simulation

clear

* Set the number of observations

set obs 3000

* Let's imagine that we are trying to infer the damages caused by various things to homes in coastal cities.

* Generate some explanatory variables

gen weather = rpoisson(1)

label var weather "The home was hit by extreme weather."

gen crime = rbeta(2,6)

label var crime "Property crime rate in home's area"

gen occupants = rpoisson(4)

label var occupants "The number of people occupying the home"

gen age = (runiform()*40)+18

label var age "The age of the owner"

gen age2=age^2

gen credit = (runiform()*600)+200

label var credit "The credit worthiness of the owner"

* Now lets imagine that there is a lot of low level unexplained damages

* This will loop from 1 to 4 to 7 to 10.

foreach i in 2 1 6 9 {

* This generates a error distribution

gen e`i' = rbeta(2,`i')

sum e`i'

replace e`i'=(e`i'-r(mean))/r(sd)

if `i'==10 replace e`i'=e`i'*(-1)

* The name option saves the graph to memory with the name e`i'

if "`0'"=="graph" qui hist e`i', title(e~rbeta(2,`i')) name(e`i', replace)

* This creates a local list of all of the graphs in memory by adding on to the list every time this loops.

local graphnames `graphnames' e`i'

}

* Graphs the combined 4 graphs

if "`0'"=="graph" graph combine `graphnames'

foreach i in 2 1 6 9 {

* First let's generate the true thing we would like to understand. True amount of home damage.

gen home_damage`i' = -10000 + 100000*weather + 10000*crime + 5000*occupants - 500*age + 20*age2 + 100*credit + e`i'*100000

}

sum home_damage*

* We can reasonably think of repairers made to the home as a reasonable interpretation for negative values of home_damage.

* However, we only have information on insurance payments. Meaning each home had a different deductable:

gen deducatable = 5000

* Each home also has a maximum that the insurance policy will cover:

gen maximum = `2'

* Let us first impose our maximums and minimums

foreach i in 2 1 6 9 {

gen insurance_claims`i' = min(home_damage`i', maximum)

* This puts a cap on payouts but it is a little trickier figuring out minimums

* We know that if the claim is less than the deductible then it is not recorded.

replace insurance_claims`i' = 0 if insurance_claims`i'

}

sum insurance_claims*

* We can see the different distributions of errors slightly affect payouts but not by much.

*****************************************************************

*** simulation end

* So we want to know, how much did the different factors affect home damages?

* We can observe the insurance claims, the deductibles, and the maximum payout but not any damages that are less or more than that.

* remember home_damage`i' = -10000 + 100000*weather + 10000*crime + 5000*occupants - 500*age + 20*age2 + 100*credit + e`i'*100000

* create a return list for the simulation command

gl return_list

* Let's see how well the OLS estimator does at recovering the coefficients

foreach i in 2 1 6 9 {

reg home_damage`i' weather crime occupants age age2 credit

foreach v in weather crime occupants age age2 credit {

return scalar OLS_`v'`i' = _b[`v']

gl return_list $return_list OLS_`v'`i'=r(OLS_`v'`i')

}

}

* Let's see how well the tobit estimator does at recovering the coefficients

foreach i in 2 1 6 9 {

tobit home_damage`i' weather crime occupants age age2 credit, ll(5000) ul(`2')

foreach v in weather crime occupants age age2 credit {

return scalar Tob_`v'`i' = _b[`v']

gl return_list $return_list Tob_`v'`i'=r(Tob_`v'`i')

}

}

* End program

end

tobit_monte_carlo graph 100000

di "simulate $return_list , reps(50): tobit_monte_carlo nograph 100000"

simulate $return_list , reps(50): tobit_monte_carlo nograph 100000

order *weather* *crime* *occup* *age? *age2? *credit*

sum

* It seems that the OLS estimator generally outperforms the tobit estimator.

* There is no reason that this should be the case except that the data suffers from both top and bottom coding.

* I suspect that if there was only bottom coding then the Tobit estimator would outperform the OLS estimator.

* In order to test this we can try:

simulate $return_list , reps(50): tobit_monte_carlo nograph 10000000

order *weather* *crime* *occup* *age? *age2? *credit*

sum

* It seems that the OLS estimator still generally outperforms the tobit estimator.

* Though it is hard to say. Perhaps this is due to the small sample size.

* A larger sample size should help the QMLE be more consistent.

## No comments:

## Post a Comment