## Tuesday, July 31, 2012

### 3 ways to write a probit command

* Probit is a commonly used command to model binary outcome variables.

* The maximum likelihood sytax in Stata is specific
* I will first define three different equivalent ml programs.  Then we will solve them to see how they perform.

cap program drop myprobit1
program define myprobit1
* Tell Stata that this code is written in version 11
version 11
* Define the arguments input in the maximum likelihood function
* Stata automatically generates the first argument as a temporary variable name
* that is used to store the natural log of the likelihood value of the likelihood function.
args ln_likehood xb
* The probit tries to directly maximize the probability of seeing the outcome (either y==1 or y==0)
* If y==1 then we maximize the probability of seeing a positive outcome
qui replace `ln_likehood' = ln(  normal(`xb')) if \$ML_y==1
* If y==0 then we want to maximize the probability of seeing a negative outcome
qui replace `ln_likehood' = ln(1-normal(`xb')) if \$ML_y==0
end

cap program drop myprobit2
program define myprobit2
version 11
args ln_likehood xb
* Because of the symetry of the normal CDF the following set of expressions is equivalent.
qui replace `ln_likehood' = ln(normal( `xb' )) if \$ML_y==1
qui replace `ln_likehood' = ln(normal(-`xb' )) if \$ML_y==0
end

cap program drop myprobit3
program define myprobit3
version 11
args ln_likehood xb
* This is a somewhat opeque way of writing the same thing as above.
* I do not prefer it to the first two forms except that it is likely to run slightly faster
* since it is a single command.
qui replace `ln_likehood' = ln(normal((-1)^(1-\$ML_y)*`xb'))
end

clear
set obs 1000

gen x1 = rnormal()*.5
gen x2 = rnormal()*.5
gen u = rnormal()*(.5^.5)

gen p=normal(x1+x2+u)

gen y=rbinomial(1,p)

probit y x1 x2
* This is the benchmark

ml model lf myprobit1 (y=x1 x2)
ml maximize
* Basically works the same except there is no pseudo r2 reported.

timer on 1
ml model lf myprobit2 (y=x1 x2)
ml maximize
timer off 1

timer on 2
ml model lf myprobit3 (y=x1 x2)
ml maximize
timer off 2

timer list

di "myprobit2 is " r(t1)/r(t2) " times slower than myprobit3"

* The following is a brief exploration attempting to run two probits at once.
* It shows how ml can optimize two equations jointly
cap program drop mytwoprobit1
program define mytwoprobit1
version 11
args ln_likehood xb1 xb2
qui replace `ln_likehood' = ln(normal((-1)^(1-\$ML_y1)*`xb1')) ///
+ ln(normal((-1)^(1-\$ML_y2)*`xb2'))
end

* This is equivalent to the previous except that it uses the binormal distribution rather than two seperate normals.  I tried specifying the correlation as a third equation but that did not work.  I will need to play around a bit more to figure out how to program a biprobit.
cap program drop mytwoprobit2
program define mytwoprobit2
version 11
args ln_likehood xb1 xb2
qui replace `ln_likehood' = ln(binormal((-1)^(1-\$ML_y1)*`xb1', ///
(-1)^(1-\$ML_y2)*`xb2', ///
0))

end

clear
set obs 1000

gen x1 = rnormal()*.5
gen x2 = rnormal()*.5

gen p1=normal(x1-x2)
gen p2=normal(-x1+x2)

gen y1=rbinomial(1,p1)
gen y2=rbinomial(1,p2)

ml model lf mytwoprobit1 (y1:y1=x1 x2) (y2:y2=x1 x2)
ml maximize

ml model lf mytwoprobit2 (y1:y1=x1 x2) (y2:y2=x1 x2)
ml maximize