## Tuesday, September 25, 2012

### Maximum Likelihood Using R

# The short tutorial by Professor Marco Steenbergen is very helpful in learning to program maximum likelihood in R.

# http://www.unc.edu/~monogan/computing/r/MLE_in_R.pdf

# I will attempt my first R MLE with a probit.

# Let's first generate our data

# Declare the number of observations to create.
nobs=1000

x1 = rnorm(nobs)*.15^.5
x2 = rnorm(nobs)*.35^.5
u = rnorm(nobs)*(.5)^.5

# I have generated the variables so that the inside of the normal CDF has a standard deviation of 1.  This should help the coefficients be more easily recovered.

inside = 2*x1+2*x2+u

# Using basic
# Var(inside) = (.15*2)^2*var(N(0,1)) + (.35*2)^2*var(N(0,1)) + (.5^.5)^2*var(N(0,1)) = .15+.35+.5 =1

# The probability of have a success is equal the cumulative density function at the inside value.
p = pnorm(inside)

y = rbinom(nobs,1,p)

mydata = data.frame(y=y,x1=x1,x2=x2)

# The estimates we would like to duplicate are:

# First we must specify the function to maximize
# All we need to do is to specify a function that takes choice values and maximizes over

myprobit <- function(theta) {
beta0 <- theta
beta1 <- theta
beta2 <- theta

# Calculate the log likelihood values for a set of betas
log.lik <- sum(log(pnorm((-1)^(1-mydata\$y)*(beta0 + beta1*mydata\$x1 + beta2*mydata\$x2))))
# This might look quite confusing.  However, it basically says that if y=1 then count beta0+beta1*x1+beta2*x2 as positively contributing to the probability.  However, if y=0 then we need to count not the contribution but in effect the negative contribution of the inside term.  That is when y is positive count the inside term as predicting a positive outcome.  When it is negative count the inside term as falsely predicting a positive outcome.  I am not sure how much sense this all makes.  I am still trying to wrap my head around it all.

# Return that log likelihood value.
return(log.lik)
}

# Let's test out our likelihood function using a generic set of parameters
myprobit(c(1,1,1))

# Let's see what the probit log likelihoods look like as we vary B1 from -1 to 2
b.range <- seq(-1,2.5,.1)
b0.loglik <- b1.loglik <- b2.loglik <- 0*b.range
for (i in 1:length(b.range)) {
b0.loglik[i] <- myprobit(c(b.range[i],1,1))
b1.loglik[i] <- myprobit(c(1,b.range[i],1))
b2.loglik[i] <- myprobit(c(1,1,b.range[i]))
}

# I will define a quick min max function to help setting up the range of plots
minmax <- function(x) c(min(x),max(x))

plot(minmax(b.range), minmax(c(b0.loglik, b1.loglik, b2.loglik)),
type="n", main="Log-Likelihood Around the point (1,1,1)",
ylab = "log likelihoods", xlab= ~ beta )

lines(c(1,1),minmax(c(b0.loglik, b1.loglik, b2.loglik)), col="green", lwd=2)

lines(b.range, b0.loglik, col="black", lwd=2)
lines(b.range, b1.loglik, col="blue", lwd=2)
lines(b.range, b2.loglik, col="purple", lwd=2)

# The graph displays how the log likelyhood changes with respect to different coefficients around the point (1,1,1).  If we were using the Newton-Ralphsony algorithm then B0 (green) would step left, while B1 and B2 would step right.  This is because the alorithm use the determinate at the point to approximate the next place to step.

# Looking good.  By specifying the control fnscale as =-1 we turn the optimization problem from the default minimize to maximize.
optim(c(1,1,1), myprobit, control=list(fnscale = -1))
# Great! Though, it is worth noting that the coefficients are not identical.  This is not neccessarily a problem since we would expect different optimization algorithms to produce different results.

# The vector c(1,1,1) is the starting values of for the coefficients to be estimated.

# Notice that this has worked fine for this specific model prob(p)=pnorm(B0+B1*x1+B2*x2).
# Clearly, a more sophisticated level of programming would be neccessary to duplicate the flexibility of the probit command.  That is, a model that allows for more variables to be specified.