Thursday, May 10, 2012
Reverse Engineering a Probit
* Stata simulation of a Probit
* Image that you are interested in estimating a binary response variable.
* For instance the likelihood of a farmer choosing to go to market with
* some portion of his or her yield rather than consume at the farmers
* personal plot.
* This is a binary response variable (i.e. one taking on a value of 0 or 1).
* There are a number of functional forms that help transform a linear model
* with no true minimum or maximum into a binary model for which the probability
* of an outcome approaches either 0 or 1.
* The most common choices are the probit and logit models though other options
* present themselves as well.
* From a maximum likelihood perspective think of ordinary least squares as
* the model that maximizes the probity of observing an outcome when the error
* is distributed with the probability density function of the normal distribution.
* max_choosing{beta} ln(pdf(u)) where u=Y-XB
* For a probit, what you are instead doing is maximizing the likelihood of
* observing a cumulative density function which takes 'yhat'. That is:
* max_choosing{beta} ln(CFD(XB)) where Yhat=XB
* So this guides us on how to simulate a probit model accurately.
* Set the random seed
set seed 101
* Clear the memory
clear
* Tell stata to prepare 1000 rows for data entry
set obs 10000
* Generate yield as a chi-squared distribution
gen yield = rnormal()^2
* Generate number of family members. Poisson has the same mean and variance.
* Add +1 to ensure that the minimum household size is 1.
gen household_size = rpoisson(5)+1
* Generate the error
gen u=rnormal()*5
* First we generate the probability of selling at the market.
* In order to do that we need to do find the CDF of the XB+u
* values. Stata first needs to know the variance and mean of
* XB+u.
**** Note the coefficient on yeild and household size
gen XB_u= 10*yeild - household_size + u
sum XB_u
* Now we normalize.
gen XB_u_norm = (XB_u-r(mean))/r(sd)
* Then we calculate the probabilities using normal CDF.
gen double Py=normal(XB_u_norm)
label var Py "True probability of market sale"
* Finally we need to change this from a probability vector to a response
* vector.
gen y=rbinomial(1,Py)
************************************************************************
* Simulation END
* Now lets see how well the probit does at recovering our initial estimates:
probit y yeild household_size
predict yhat
label var yhat "Estimated probability of market sale"
* Now, wait a second. These values look way too small!
* But not so. Take a look at the normalization we did above.
sum XB_u
di "This normalization shrunk the coefficients by a factor of " r(sd)
* So to recover the estimates multiply by r(sd)
di "The estimate for yield is " r(sd)*_b[yeild]
di "The estimate for household size is " r(sd)*_b[household_size]
* Not exactly on but pretty close.
two (line yhat Py, sort) if _n/10==int(_n/10), title(Probit Probability Fit compared with True)
* The if statement thins out the distribution, telling Stata to show only
* 1 out of 10 observations.
* Later we can compare how well the probit performs relative to a logit and a
* linear heteroskedastically robust OLS model.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment