* Imagine we have a sample of hotels and prices of rooms in those hotels

clear

set obs 200

set seed 101

* This sets the random seed at 101. Thus every time this simulation is run it will product identical results.

gen star = ceil(runiform()*7)/2+.5

label var star "Number of stars of hotel"

gen id = _n

gen het = abs(rnormal())

label var het "Unobserved heterogeneity."

* Generate a the number of reservations observed per hotel.

gen num_reservations = rpoisson(150)

expand num_reservations

gen seasonality = abs(rnormal())

label var seasonality "Seasonal demand"

gen v = abs(rnormal())

label var v "Unobserved variance in the variance term"

gen u = rnormal()*(5+het*15+7.5*star+22.5*seasonality+v*10)

label var u "Error term"

gen p = 175 + 20*star + 15*seasonality + u + het

sum p

* There are some p values which are less than 0 but we can think of those as special deals, coupons, refunds, or other situations that might result in the effective price being less than 0 dollars.

* If we were to eliminate the less than 0 prices then we would in be enforcing left censoring which is a different problem. See "tobit". This blog has several posts touching on the use of the tobit.

* Estimate the price through direct OLS

reg p star seasonality

* Set the panel level observation

xtset id

* Though we have panel data we cannot effectively use fixed effect or random effects approaches to identify the price effect having one more star has on prices.

xtreg p star seasonality, fe

xtreg p star seasonality, re

* Now let's attempt a two step method to more efficient identify the error and reestimate the OLS.

reg p star seasonality

* The OLS regression looks pretty good. Let's see if we can improve on it. Note, the 95% confidence interval did not capture the true coefficient of 20 but that is not necessarily a problem.

reg p star seasonality, robust

reg p star seasonality, cluster(id)

* Using robust and cluster robust estimates of the standard error does not change the variance so much that the 95% confidence interval encloses the 20.

predict uhat, resid

gen uhat_abs = abs(uhat)

label var uhat_abs "Abs of OLS residual"

two (scatter uhat_abs seasonality) (scatter uhat_abs star), ///

legend(label(1 "Seasonality") label(2 "Stars"))

* Since E(u)==0 var(u) is equal to u squared Var(u)=(u-E(u))^2

* Likewise, u^2 is approximated by u^2

* Similarly sd(u) should be approximated by (u^2)^.5=abs(u)

reg uhat_abs star seasonality

predict uhat_abs_hat, xb

* I might be doing this wrong. I am trying aweights which say something about weighting by the inverse of the variance of the observation.)

gen uhat2 = 1/uhat_abs_hat^2

reg p star seasonality

* Let's see how the unweighted estimate performs

reg p star seasonality [aweight = uhat2]

* Using variance weights does not seem to improve the estimate

* Alternatively we can use the MLE estimator allowing the conditional standard deviation of the error as well as the conditional mean to vary linearly.

cap program drop mle_ols

program mle_ols

args log_like xb sigma

qui replace `log_like' = ln(normalden($ML_y1-`xb',0,`sigma'))

end

ml model lf mle_ols (price: p = star seasonality) (sigma: star seasonality)

ml maximize

* Using the MLE estimator we seem to have gained precision in the 3rd decimal place of the coefficients. The 95% CI still does not enclose the 20 but it is closer. Still, this is not indicative of a problem. Perhaps if we simulated this 1000 times and substantially more than 50 times the CI did not enclose the 20 then we might be worried.

## No comments:

## Post a Comment