## Sunday, September 8, 2013

### Maximum Likelihood Estimation and the Origin of Life

# Maximum likelihood Estimation (MLE) is a powerful tool in econometrics and statistics which allows for the consistent and asymptotically efficient estimation of parameters given a correct identification (in terms of distribution) of the random variable.

# It is also a proceedure which seems superficially quite complex and intractible, but in reality is very intuitive and easy to use.

# To introduct MLE, lets think about observing a repeated coin flip.

# Let's say we know it is not a fair coin and we would like to figure out what the likehood of heads parameter popping up is.

# Let's define out likelihood function this way:

lk <- function(responses, theta) {
# This likelihood function returns the likelihood of (a priori) observing a
# particular response pattern given the likelihood of seeing a head on the
# coin is theta.

returner <- 0*theta
# Define a vector to hold likelihoods in case theta is a vector

for (i in 1:length(theta)) returner[i] <-
prod(theta[i]^responses) * # The likelihood of seeing the head's pattern
prod((1-theta[i])^(1-responses)) # The likelihood of seeing the tails's pattern

returner
}

# Let's see how our likelihood function is working.
# Let's see we flip the coin once and we observe a heads
# and our initial guess is that the coin is fair.
lk(c(1), .5)
# [1] 0.5
# Our likelihood of this being true is .5 (we know this is right)

# What is we get a tail?
lk(c(0), .5)
# [1] 0.5
# .5 as well.  This seems right.

# Now let's try something more interesting.
# Let's say we get two heads:
lk(c(1,1), .5)
# [1] 0.25
# .5^2=.25

# This is standard probability. But, let's now ask:
# What if the coin was not fair (as we originally guessed)?

# Is there a better parameter set that would lead to our observed outcome?
# Let's say, there is a 75% chance of getting heads.
# What is the likelihood of seeing our 2 heads.
lk(c(1,1), .75)
# [1] 0.5625
# So if the coin was unfair at 75% then our likelihood of observing the
# outcome we did would increase from 25% to 56%.

# How about is the coin were 95% in favor of heads.
lk(c(1,1), .95)
# [1] 0.9025
# Now we are up to 90%.

# You probably see where this is going.
# If we assume we know nothing about the underlying distribution of the coin
# parameter, then the parameter which fits best is 1 (100%).
# Thus if we see only positive outcomes then the most likely guess at the
# underlying distribution of outcomes is that every time you flip the coin it

# We can see this mapped out.
theta.vect <- seq(0,1,.05)

plot(theta.vect, lk(c(1,1), theta.vect), main="HH",
ylab="Probability", xlab="Theta")

# It is worth noting that though there are different probabilities # for each unfairness parameter. We can only definitively rule out # one option after flipping the coin twice and getting heads. # That is that it is impossible for the likelihood of getting a head=0.   # However, all other options are available. How do we choose from these options?   # Well... naturally, we choose the option which maximizes the probability # of seeing the observed outcome (as the name implies).   # Now let's make things a little more interesting. Let's say the third # time we flip the coin it ends up tails.   # If we assume it is a fair coin then: lk(c(1,1,0), .5) # [1] 0.125   # How about our guess of .75 heads? lk(c(1,1,0), .75) # [1] 0.140625 # More likely than that of a fair coin but not as dramatic an improvement from # when we saw only two heads.   # Let's see how the graph looks now. plot(theta.vect, lk(c(1,1,0), theta.vect), main="HHT", ylab="Probability", xlab="Theta")
# We can see that our graph is finally beginning to look a bit more interesting. # We can see that the most likely outcome is around 65%. For those of us # a little ahead of the game the most likely probability is the success rate # or 2/3 (66.6%).   # But the importance of the exercise to think about why 66.6% is parameter # we select as the most likely. lk(c(1,1,0), 2/3) # [1] 0.1481481 # Not because it is overwhelmingly the best choice.   # It is only 2.3% (0.148-0.125) more likely to occur than if it were a fair coin. # So we really are not very confident with our parameter choice at this point.   # However, imagine instead for one moment, if we observed the same ratio but with # 300 coins. cpattern <- c(rep(1,200), rep(0,100))   lk(cpattern, 2/3) # [1] 1.173877e-83   lk(cpattern, 1/2) # [1] 4.909093e-91   # Now, in terms of percentages the differences are extremely small. # So small that it is hard to compare. The plot can be useful:   plot(theta.vect, lk(cpattern, theta.vect), main="(HHT)^100", ylab="Probability", xlab="Theta")
# Let's see what happens if we increase our number of coins to # 3000 plot(theta.vect, lk(rep(cpattern,10), theta.vect), main="(HHT)^1000", ylab="Probability", xlab="Theta")
# In this graph we can see the first major computational problem # when dealing with likelihoods. They get so small, they are hard # to manage in raw probabilities. In this case the digits get # rounded into 0 so that all R sees is 0.   # Fortunately, the maximum of a function is the same maximum # (in terms of parameter choices) as a monotonic transformation # of a function. Thus we can rescale our probabilities # before multiplication using logs creating the log likelihood # function which produces parameters which vary in scale much less # dramatically.   # I won't say anything more about this right now except that this is why # MLE functions always maximizes and reports the "log likelihood" # value rather than the "likelihood".   # However, in this discussion it is worth noting that there is # a somewhat useful statistic that we can produce to compare # the likelihoods of the fair coin hypothesis with that # of the 2/3 biased hypothesis.   # That is the odds ratio of the two outcomes. How much more # likely (multiplicatively) is our outcome to be observed # if the coin is unfair towards 2/3 heads rather than fair?   lk(cpattern, 2/3)/lk(cpattern, 1/2) # [1] 23912304 # That is to say, the outcome in which 2/3 rds of the time # we get heads for a coin flip of 300 coins is 23 million # times more likely to occur if our coin # is unfair (66.6%) over that of being fair (50%).   # This is a pretty big number and thus very unlikely to occur # relative to that of a fair coin. Comparing accross all possible # outcomes, we would find that while this ratio is not always as # large, for example if we are comparing 2/3s to .6 lk(cpattern, 2/3)/lk(cpattern, .75) # [1] 183.3873 # But it can still be quite large. In this case, we are 183 times # more likely to see the outcome we saw if we chose 2/3s as our parameter # choice compared with 3/4ths.   # Looking at the raw probability we see lk(cpattern, 2/3) # [1] 1.173877e-83 # or 1 out of 8*10^82 outcomes.   # Thus the likelihood of a particular event ever occurring is very small, even # given the most likely hypothesis (theta=2/3). # However, compared nearly all other hypothesis such as (theta=1/2 or 3/4) # the event is much more likely to have occurred.   # And THAT is why creationists are right to say it very unlikely # in absolute terms that evolution brought about the origin of # life on earth yet are also completely wrong because compared # with all other available hypotheses that is the only one # which is remotely likely (at least from the series of # outcomes that I have observed) to have occurred makes its odds
# ratio very high in my mind.
Created by Pretty R at inside-R.org