* Imagine we have a sample of hotels and prices of rooms in those hotels
clear
set obs 200
set seed 101
* This sets the random seed at 101. Thus every time this simulation is run it will product identical results.
gen star = ceil(runiform()*7)/2+.5
label var star "Number of stars of hotel"
gen id = _n
gen het = abs(rnormal())
label var het "Unobserved heterogeneity."
* Generate a the number of reservations observed per hotel.
gen num_reservations = rpoisson(150)
expand num_reservations
gen seasonality = abs(rnormal())
label var seasonality "Seasonal demand"
gen v = abs(rnormal())
label var v "Unobserved variance in the variance term"
gen u = rnormal()*(5+het*15+7.5*star+22.5*seasonality+v*10)
label var u "Error term"
gen p = 175 + 20*star + 15*seasonality + u + het
sum p
* There are some p values which are less than 0 but we can think of those as special deals, coupons, refunds, or other situations that might result in the effective price being less than 0 dollars.
* If we were to eliminate the less than 0 prices then we would in be enforcing left censoring which is a different problem. See "tobit". This blog has several posts touching on the use of the tobit.
* Estimate the price through direct OLS
reg p star seasonality
* Set the panel level observation
xtset id
* Though we have panel data we cannot effectively use fixed effect or random effects approaches to identify the price effect having one more star has on prices.
xtreg p star seasonality, fe
xtreg p star seasonality, re
* Now let's attempt a two step method to more efficient identify the error and reestimate the OLS.
reg p star seasonality
* The OLS regression looks pretty good. Let's see if we can improve on it. Note, the 95% confidence interval did not capture the true coefficient of 20 but that is not necessarily a problem.
reg p star seasonality, robust
reg p star seasonality, cluster(id)
* Using robust and cluster robust estimates of the standard error does not change the variance so much that the 95% confidence interval encloses the 20.
predict uhat, resid
gen uhat_abs = abs(uhat)
label var uhat_abs "Abs of OLS residual"
two (scatter uhat_abs seasonality) (scatter uhat_abs star), ///
legend(label(1 "Seasonality") label(2 "Stars"))
* Since E(u)==0 var(u) is equal to u squared Var(u)=(u-E(u))^2
* Likewise, u^2 is approximated by u^2
* Similarly sd(u) should be approximated by (u^2)^.5=abs(u)
reg uhat_abs star seasonality
predict uhat_abs_hat, xb
* I might be doing this wrong. I am trying aweights which say something about weighting by the inverse of the variance of the observation.)
gen uhat2 = 1/uhat_abs_hat^2
reg p star seasonality
* Let's see how the unweighted estimate performs
reg p star seasonality [aweight = uhat2]
* Using variance weights does not seem to improve the estimate
* Alternatively we can use the MLE estimator allowing the conditional standard deviation of the error as well as the conditional mean to vary linearly.
cap program drop mle_ols
program mle_ols
args log_like xb sigma
qui replace `log_like' = ln(normalden($ML_y1-`xb',0,`sigma'))
end
ml model lf mle_ols (price: p = star seasonality) (sigma: star seasonality)
ml maximize
* Using the MLE estimator we seem to have gained precision in the 3rd decimal place of the coefficients. The 95% CI still does not enclose the 20 but it is closer. Still, this is not indicative of a problem. Perhaps if we simulated this 1000 times and substantially more than 50 times the CI did not enclose the 20 then we might be worried.
No comments:
Post a Comment