Tuesday, July 31, 2012

3 ways to write a probit command


* Probit is a commonly used command to model binary outcome variables.

* The maximum likelihood sytax in Stata is specific
* I will first define three different equivalent ml programs.  Then we will solve them to see how they perform.

cap program drop myprobit1
program define myprobit1
  * Tell Stata that this code is written in version 11
  version 11
  * Define the arguments input in the maximum likelihood function
  * Stata automatically generates the first argument as a temporary variable name
  * that is used to store the natural log of the likelihood value of the likelihood function.
  args ln_likehood xb
  * The probit tries to directly maximize the probability of seeing the outcome (either y==1 or y==0)
  * If y==1 then we maximize the probability of seeing a positive outcome
  qui replace `ln_likehood' = ln(  normal(`xb')) if $ML_y==1
  * If y==0 then we want to maximize the probability of seeing a negative outcome
  qui replace `ln_likehood' = ln(1-normal(`xb')) if $ML_y==0
end

cap program drop myprobit2
program define myprobit2
  version 11
  args ln_likehood xb
  * Because of the symetry of the normal CDF the following set of expressions is equivalent.
  qui replace `ln_likehood' = ln(normal( `xb' )) if $ML_y==1
  qui replace `ln_likehood' = ln(normal(-`xb' )) if $ML_y==0
end

cap program drop myprobit3
program define myprobit3
  version 11
  args ln_likehood xb
  * This is a somewhat opeque way of writing the same thing as above.
  * I do not prefer it to the first two forms except that it is likely to run slightly faster
  * since it is a single command.
  qui replace `ln_likehood' = ln(normal((-1)^(1-$ML_y)*`xb'))
end


clear
set obs 1000

gen x1 = rnormal()*.5
gen x2 = rnormal()*.5
gen u = rnormal()*(.5^.5)

gen p=normal(x1+x2+u)

gen y=rbinomial(1,p)

probit y x1 x2
* This is the benchmark

ml model lf myprobit1 (y=x1 x2)
ml maximize
* Basically works the same except there is no pseudo r2 reported.

timer on 1
ml model lf myprobit2 (y=x1 x2)
ml maximize
timer off 1

timer on 2
ml model lf myprobit3 (y=x1 x2)
ml maximize
timer off 2

timer list

di "myprobit2 is " r(t1)/r(t2) " times slower than myprobit3"

* The following is a brief exploration attempting to run two probits at once.
* It shows how ml can optimize two equations jointly
cap program drop mytwoprobit1
program define mytwoprobit1
  version 11
  args ln_likehood xb1 xb2
  qui replace `ln_likehood' = ln(normal((-1)^(1-$ML_y1)*`xb1')) ///
                            + ln(normal((-1)^(1-$ML_y2)*`xb2'))
end

* This is equivalent to the previous except that it uses the binormal distribution rather than two seperate normals.  I tried specifying the correlation as a third equation but that did not work.  I will need to play around a bit more to figure out how to program a biprobit.
cap program drop mytwoprobit2
program define mytwoprobit2
  version 11
  args ln_likehood xb1 xb2
  qui replace `ln_likehood' = ln(binormal((-1)^(1-$ML_y1)*`xb1', ///
                                          (-1)^(1-$ML_y2)*`xb2', ///
 0))

end

clear
set obs 1000

gen x1 = rnormal()*.5
gen x2 = rnormal()*.5

gen p1=normal(x1-x2)
gen p2=normal(-x1+x2)

gen y1=rbinomial(1,p1)
gen y2=rbinomial(1,p2)

ml model lf mytwoprobit1 (y1:y1=x1 x2) (y2:y2=x1 x2)
ml maximize

ml model lf mytwoprobit2 (y1:y1=x1 x2) (y2:y2=x1 x2)
ml maximize

No comments:

Post a Comment