## Tuesday, July 31, 2012

### 3 ways to write a probit command

* Probit is a commonly used command to model binary outcome variables.

* The maximum likelihood sytax in Stata is specific
* I will first define three different equivalent ml programs.  Then we will solve them to see how they perform.

cap program drop myprobit1
program define myprobit1
* Tell Stata that this code is written in version 11
version 11
* Define the arguments input in the maximum likelihood function
* Stata automatically generates the first argument as a temporary variable name
* that is used to store the natural log of the likelihood value of the likelihood function.
args ln_likehood xb
* The probit tries to directly maximize the probability of seeing the outcome (either y==1 or y==0)
* If y==1 then we maximize the probability of seeing a positive outcome
qui replace ln_likehood' = ln(  normal(xb')) if $ML_y==1 * If y==0 then we want to maximize the probability of seeing a negative outcome qui replace ln_likehood' = ln(1-normal(xb')) if$ML_y==0
end

cap program drop myprobit2
program define myprobit2
version 11
args ln_likehood xb
* Because of the symetry of the normal CDF the following set of expressions is equivalent.
qui replace ln_likehood' = ln(normal( xb' )) if $ML_y==1 qui replace ln_likehood' = ln(normal(-xb' )) if$ML_y==0
end

cap program drop myprobit3
program define myprobit3
version 11
args ln_likehood xb
* This is a somewhat opeque way of writing the same thing as above.
* I do not prefer it to the first two forms except that it is likely to run slightly faster
* since it is a single command.
qui replace ln_likehood' = ln(normal((-1)^(1-$ML_y)*xb')) end clear set obs 1000 gen x1 = rnormal()*.5 gen x2 = rnormal()*.5 gen u = rnormal()*(.5^.5) gen p=normal(x1+x2+u) gen y=rbinomial(1,p) probit y x1 x2 * This is the benchmark ml model lf myprobit1 (y=x1 x2) ml maximize * Basically works the same except there is no pseudo r2 reported. timer on 1 ml model lf myprobit2 (y=x1 x2) ml maximize timer off 1 timer on 2 ml model lf myprobit3 (y=x1 x2) ml maximize timer off 2 timer list di "myprobit2 is " r(t1)/r(t2) " times slower than myprobit3" * The following is a brief exploration attempting to run two probits at once. * It shows how ml can optimize two equations jointly cap program drop mytwoprobit1 program define mytwoprobit1 version 11 args ln_likehood xb1 xb2 qui replace ln_likehood' = ln(normal((-1)^(1-$ML_y1)*xb1')) ///
+ ln(normal((-1)^(1-$ML_y2)*xb2')) end * This is equivalent to the previous except that it uses the binormal distribution rather than two seperate normals. I tried specifying the correlation as a third equation but that did not work. I will need to play around a bit more to figure out how to program a biprobit. cap program drop mytwoprobit2 program define mytwoprobit2 version 11 args ln_likehood xb1 xb2 qui replace ln_likehood' = ln(binormal((-1)^(1-$ML_y1)*xb1', ///
(-1)^(1-\$ML_y2)*xb2', ///
0))

end

clear
set obs 1000

gen x1 = rnormal()*.5
gen x2 = rnormal()*.5

gen p1=normal(x1-x2)
gen p2=normal(-x1+x2)

gen y1=rbinomial(1,p1)
gen y2=rbinomial(1,p2)

ml model lf mytwoprobit1 (y1:y1=x1 x2) (y2:y2=x1 x2)
ml maximize

ml model lf mytwoprobit2 (y1:y1=x1 x2) (y2:y2=x1 x2)
ml maximize