Econometrics By Simulation: Linear models with heteroskedastic errors

Thursday, August 16, 2012

Linear models with heteroskedastic errors

* Imagine we have a sample of hotels and prices of rooms in those hotels
clear
set obs 200
set seed 101
* This sets the random seed at 101. Thus every time this simulation is run it will product identical results.

gen star = ceil(runiform()*7)/2+.5
label var star "Number of stars of hotel"

gen id = _n

gen het = abs(rnormal())
label var het "Unobserved heterogeneity."

* Generate a the number of reservations observed per hotel.
gen num_reservations = rpoisson(150)
expand num_reservations

gen seasonality = abs(rnormal())
label var seasonality "Seasonal demand"

gen v = abs(rnormal())
label var v "Unobserved variance in the variance term"

gen u = rnormal()*(5+het*15+7.5*star+22.5*seasonality+v*10)
label var u "Error term"

gen p = 175 + 20*star + 15*seasonality + u + het

sum p
* There are some p values which are less than 0 but we can think of those as special deals, coupons, refunds, or other situations that might result in the effective price being less than 0 dollars.
* If we were to eliminate the less than 0 prices then we would in be enforcing left censoring which is a different problem. See "tobit". This blog has several posts touching on the use of the tobit.

* Estimate the price through direct OLS
reg p star seasonality

* Set the panel level observation
xtset id

* Though we have panel data we cannot effectively use fixed effect or random effects approaches to identify the price effect having one more star has on prices.
xtreg p star seasonality, fe
xtreg p star seasonality, re

* Now let's attempt a two step method to more efficient identify the error and reestimate the OLS.

reg p star seasonality
* The OLS regression looks pretty good. Let's see if we can improve on it. Note, the 95% confidence interval did not capture the true coefficient of 20 but that is not necessarily a problem.

reg p star seasonality, robust
reg p star seasonality, cluster(id)
* Using robust and cluster robust estimates of the standard error does not change the variance so much that the 95% confidence interval encloses the 20.

predict uhat, resid

gen uhat_abs = abs(uhat)
label var uhat_abs "Abs of OLS residual"

two (scatter uhat_abs seasonality) (scatter uhat_abs star), ///
legend(label(1 "Seasonality") label(2 "Stars"))

* Since E(u)==0 var(u) is equal to u squared Var(u)=(u-E(u))^2
* Likewise, u^2 is approximated by u^2
* Similarly sd(u) should be approximated by (u^2)^.5=abs(u)
reg uhat_abs star seasonality

predict uhat_abs_hat, xb
* I might be doing this wrong. I am trying aweights which say something about weighting by the inverse of the variance of the observation.)

gen uhat2 = 1/uhat_abs_hat^2

reg p star seasonality
* Let's see how the unweighted estimate performs

reg p star seasonality [aweight = uhat2]
* Using variance weights does not seem to improve the estimate

* Alternatively we can use the MLE estimator allowing the conditional standard deviation of the error as well as the conditional mean to vary linearly.

cap program drop mle_ols
program mle_ols
args log_like xb sigma
qui replace `log_like' = ln(normalden($ML_y1-`xb',0,`sigma'))
end

ml model lf mle_ols (price: p = star seasonality) (sigma: star seasonality)
ml maximize

* Using the MLE estimator we seem to have gained precision in the 3rd decimal place of the coefficients. The 95% CI still does not enclose the 20 but it is closer. Still, this is not indicative of a problem. Perhaps if we simulated this 1000 times and substantially more than 50 times the CI did not enclose the 20 then we might be worried.

Econometrics By Simulation

Thursday, August 16, 2012

Linear models with heteroskedastic errors

No comments:

Post a Comment

Blog Archive