* Dependent variable is censored:
* In a previous post I looked at what happens when we use the tobit maximum likelihood estimator even when the error term is not normally distributed.
* In general we see that despite the failure of the normality assumption the tobit is shown to be a good estimator in a wide variety situations with different error structures.
* However, all of those distributions of errors were symmetric distributions.
* There is no reason to believe that in general the unobserved heterogeneity should be symmetric around the expected value.
* Let's see what happens as we relax this assumption
cap program drop tobit_monte_carlo
program define tobit_monte_carlo, rclass
* Let's first set up the simulation
clear
* Set the number of observations
set obs 3000
* Let's imagine that we are trying to infer the damages caused by various things to homes in coastal cities.
* Generate some explanatory variables
gen weather = rpoisson(1)
label var weather "The home was hit by extreme weather."
gen crime = rbeta(2,6)
label var crime "Property crime rate in home's area"
gen occupants = rpoisson(4)
label var occupants "The number of people occupying the home"
gen age = (runiform()*40)+18
label var age "The age of the owner"
gen age2=age^2
gen credit = (runiform()*600)+200
label var credit "The credit worthiness of the owner"
* Now lets imagine that there is a lot of low level unexplained damages
* This will loop from 1 to 4 to 7 to 10.
foreach i in 2 1 6 9 {
* This generates a error distribution
gen e`i' = rbeta(2,`i')
sum e`i'
replace e`i'=(e`i'-r(mean))/r(sd)
if `i'==10 replace e`i'=e`i'*(-1)
* The name option saves the graph to memory with the name e`i'
if "`0'"=="graph" qui hist e`i', title(e~rbeta(2,`i')) name(e`i', replace)
* This creates a local list of all of the graphs in memory by adding on to the list every time this loops.
local graphnames `graphnames' e`i'
}
* Graphs the combined 4 graphs
if "`0'"=="graph" graph combine `graphnames'
foreach i in 2 1 6 9 {
* First let's generate the true thing we would like to understand. True amount of home damage.
gen home_damage`i' = -10000 + 100000*weather + 10000*crime + 5000*occupants - 500*age + 20*age2 + 100*credit + e`i'*100000
}
sum home_damage*
* We can reasonably think of repairers made to the home as a reasonable interpretation for negative values of home_damage.
* However, we only have information on insurance payments. Meaning each home had a different deductable:
gen deducatable = 5000
* Each home also has a maximum that the insurance policy will cover:
gen maximum = `2'
* Let us first impose our maximums and minimums
foreach i in 2 1 6 9 {
gen insurance_claims`i' = min(home_damage`i', maximum)
* This puts a cap on payouts but it is a little trickier figuring out minimums
* We know that if the claim is less than the deductible then it is not recorded.
replace insurance_claims`i' = 0 if insurance_claims`i'
}
sum insurance_claims*
* We can see the different distributions of errors slightly affect payouts but not by much.
*****************************************************************
*** simulation end
* So we want to know, how much did the different factors affect home damages?
* We can observe the insurance claims, the deductibles, and the maximum payout but not any damages that are less or more than that.
* remember home_damage`i' = -10000 + 100000*weather + 10000*crime + 5000*occupants - 500*age + 20*age2 + 100*credit + e`i'*100000
* create a return list for the simulation command
gl return_list
* Let's see how well the OLS estimator does at recovering the coefficients
foreach i in 2 1 6 9 {
reg home_damage`i' weather crime occupants age age2 credit
foreach v in weather crime occupants age age2 credit {
return scalar OLS_`v'`i' = _b[`v']
gl return_list $return_list OLS_`v'`i'=r(OLS_`v'`i')
}
}
* Let's see how well the tobit estimator does at recovering the coefficients
foreach i in 2 1 6 9 {
tobit home_damage`i' weather crime occupants age age2 credit, ll(5000) ul(`2')
foreach v in weather crime occupants age age2 credit {
return scalar Tob_`v'`i' = _b[`v']
gl return_list $return_list Tob_`v'`i'=r(Tob_`v'`i')
}
}
* End program
end
tobit_monte_carlo graph 100000
di "simulate $return_list , reps(50): tobit_monte_carlo nograph 100000"
simulate $return_list , reps(50): tobit_monte_carlo nograph 100000
order *weather* *crime* *occup* *age? *age2? *credit*
sum
* It seems that the OLS estimator generally outperforms the tobit estimator.
* There is no reason that this should be the case except that the data suffers from both top and bottom coding.
* I suspect that if there was only bottom coding then the Tobit estimator would outperform the OLS estimator.
* In order to test this we can try:
simulate $return_list , reps(50): tobit_monte_carlo nograph 10000000
order *weather* *crime* *occup* *age? *age2? *credit*
sum
* It seems that the OLS estimator still generally outperforms the tobit estimator.
* Though it is hard to say. Perhaps this is due to the small sample size.
* A larger sample size should help the QMLE be more consistent.
No comments:
Post a Comment