## Monday, November 4, 2013

### The Motivation for the Poisson Distribution

# The Poisson distribution has the interesting property that it
# models outcomes from events that are independent and equally
# likely to occur.  The distribution takes only one parameter mu
# which is equal to both the mean (expected number of events)
# as well as the variance.

# This distribution as with all distributions is somewhat
# fascinating because it represents an approximation of a
# real world phenomenon.

# Imagine you are trying to model the mail delivery on wednesdays.

# On average you recieve 9 pieces of mail. If the mail delivery
# system is well modeled by a poisson distribution then
# the standard deviation of mail delivery should be 3.
# Meaning most days you should recieve between 3 and 15 pieces
# of mail.

# What underlying physical phenomenon must exist for this to be
# possible?

# In order to aid this discussion we will think of the poisson
# distribution as a limitting distribution of the sum of
# outcomes from a number of independent binary draws:

DrawsApprox <- function(mu, N) sum(rbinom(N,1,mu/N))

# This idea is if we specify a number of expected outcomes mu
# and give a number of draws (N>mu) then we can approximate the
# single draw of a poisson by summing across outcomes.

DrawsApprox(9,9)
# In this case of course the sum is 9 and variance = 0
# Under this case there are 9 letters which are always
# sent out every Wednesday.

# More interestingly:
DrawsApprox(9,18)
# In this case there are 18 letters that may be sent out.
# Any one of them is possible at a 50% rate.

# We want to know what the mean and variance is.
# Let us design a simple function to achieve this.
evar <- function(fun, draw=100, outc=NULL, ...) {
for(i in 1:draw) outc <- c(outc, get(fun)(...))
list(outc=outc, mean=mean(outc), var=var(outc))
}

evar("DrawsApprox", draw=10000, N=18, mu=9)
# I get the mean very close to 9 as we should hope
# but interestingly the variance less than five.
# This is less than that of the poisson which is 9.

# Let's see what happens if we double the number of
# potential letters going out which will halve the
# probability of any particular letter.
evar("DrawsApprox", draw=10000, N=36, mu=9)
# Now the variance is about 6.7

evar("DrawsApprox", draw=10000, N=72, mu=9)
# Now 7.7

evar("DrawsApprox", draw=10000, N=144, mu=9)
# 8.6

evar("DrawsApprox", draw=10000, N=288, mu=9)
# 8.65

# We can see that as the number of letters gets very large
# the mean and variance of the number letters approaches
# the same number 9.  I will never be able to choose a
# large enough number of letters so that the variance exactly
# equals the mean.

# However the didactic point of how the distribution is
# structured and when it may be appropriate to use should be
# clear.  Poisson is a good fit when the likelihood of each
# individual outcome is equal, yet the number of possible
# outcomes is large (in principal I could recieve 100 pieces
# of mail in a single day though it would be very unlikely).

bigdraw <- evar("DrawsApprox", draw=10000, N=1000, mu=9)
summary(bigdraw\$outc)

Created by Pretty R at inside-R.org