Monday, October 15, 2012

Maximum Likelihoods and Item Response Theory


# Maximum likelihoods is a frequent mechanism for solving item response theory problems.  Outside of IRT, maximum likelihood is widely used to solve a range of other problems.

# The following post will follow a lecture I recently attended by Mark Reckase which illustrated graphically how maximum likelihood works to solve problems.  I found this lecture extremely useful because the arguments especially with regards to IRT are almost exactly analogous to binary response models in econometrics.

# Let us start with a single parameter IRT model also know as the Rasch model.

# Imagine that we have a two test items in which we have a single response by a student.  We know that the difficulty of those questions is b=-1 and b=2.  We would like to find out the skill level of the student.

# The student gets the first question right and the second one wrong.  Based on only this information, what is our best guess at the student's ability.

# Well, we know the probabilies of the student getting the different questions right or wrong is a function of the student's ability.  The response to the questions takes the logistic form.

# Let's see what that means.  First let's restrict the range of the student ability to be between -4 and 4

ability <- seq(-4,4,.1)

# Let's define the Rasch model
rasch <- function(theta,b) exp(theta-b)/(1+exp(theta-b))

# First let's plot the probability of getting the first problem right (b=-1) given different ability levels.
plot(ability, rasch(ability,b=-1), type="l", xlab = ~theta, ylab="Probability", main="Joint Probabilities", col="red", lwd=2)

# Now let's plot the probability of getting the second question wrong.  This is the inverse of the probabity of getting it right.  Thus 1-rasch(theta,b=2).

lines(ability, 1-rasch(ability,b=2), col="blue", lwd=2)

# We can see that the probability of getting either question individually right or wrong makes it impossible to identify the parameter of interest.  However, since we know one of them was right and one of them was wrong we can use that information in order to calculate the best approximation of the ability of the student.

# This is done via maximum likelihood methods.  By assuming that the probability of getting a correct response given the student's ability level is independent for each item, we can calculate the joint probability of getting both outcomes jointly.

# This is done by the standard formula for conditional independence.  P(I1=1,I2=0|theta)=P(I1=1|theta)P(I2=0|theta)

# Thus we can graph the joint probability by multiplying them by each other.

lines(ability, rasch(ability, b=-1)*(1-rasch(ability,b=2)), col="purple", lwd=2)

# It turns out in the rasch model each item has equal discriminatory power, thus each question is equally weighted in the probability calculation.  Thus the maximimum likelihood point happens to be at the midpoint between -1 and 2.
abline(v=.5, col="orange", lwd=2)



# Imagine instead of a correct response to the first and an incorrect response to the second we instead got an incorrect response to the first and a correct response to the second.

lines(ability, 1-rasch(ability,b=-1), type="l", xlab = ~theta, ylab="Probability", main="Joint Probabilities b=-1 wrong, b=2 right", col="red", lwd=2)
lines(ability, rasch(ability,b=2), col="blue", lwd=2)
lines(ability, rasch(ability, b=2)*(1-rasch(ability,b=-1)), col="purple", lwd=2)


# Interestingly we can see that though the responses to the questions are opposite the maximum likelihood estimator still yeilds a highest probability predicted value at 1/2.  The predicted error of the latter two combinations is much greater.

# Notice that if there were no variance in the responses to the questions then the joint maximum would be no more identifiable than the maximum of individual curves.  On the other hand interestingly the difficulties of the questions need not vary in order to identify a maximum.

plot(ability, rasch(ability,b=1), type="l", xlab = ~theta, ylab="Probability", main="Joint Probabilities with four items", col="red", lwd=2,ylim=c(0,1))
lines(ability, 1-rasch(ability,b=1), col="blue", lwd=2)
lines(ability, rasch(ability, b=1)*(1-rasch(ability,b=1)), col="purple", lwd=2)



# Okay, let's imagine that we introduce information about the previous item responses into the current information set.  Imagine the student got b=-1 correct

lines(ability, rasch(ability,b=-1), col="pink", lwd=2)
lines(ability, rasch(ability, b=1)*(1-rasch(ability,b=1))*(rasch(ability,b=-1)), col="yellow", lwd=2)

# and b=2 right this time.
lines(ability, rasch(ability,b=2), col="green", lwd=2)
lines(ability, (rasch(ability,b=2))*rasch(ability, b=1)*(1-rasch(ability,b=1))*(rasch(ability,b=-1)), col="yellow", lwd=2)



# With each additional bit of information the MLE becomes more precise.  That is the tails get smaller and the estimate becomes a steeper hill with a more clearly defined peak.  I do not believe this has to be the case.  If the student were to keep getting easy problems wrong and hard problems right I am not sure if the estimates would improve.

# In order to see how the joint probabilities are changing we can look at just the joint probabilities rescaled so that their maximum is 1.

# I would also like to redefine the ability spectrum to be between -4 and 8 since the expected student ability is creeping to the right.

ability <- seq(-4,8,.1)

# First let's plot the two item joint probability function:
two.joint <-  rasch(ability, b=2)*(1-rasch(ability,b=-1))

plot(ability,two.joint/max(two.joint), col="blue", lwd=2, xlab=~theta, type="l", main="Joint probability distributions")

three.joint <- two.joint*rasch(ability,b=-1)
lines(ability,three.joint/max(three.joint), col="red", lwd=2)

four.joint <- three.joint*rasch(ability,b=2)
lines(ability,four.joint/max(four.joint), col="purple", lwd=2)



# Now let's add a fifth item with a difficulty of 3 that the student got wrong.
five.joint <- four.joint*(1-rasch(ability,b=3))
lines(ability,five.joint/max(five.joint), col="green", lwd=2)

# We can see that the result is a clear tightenning of the distribution.

# It looks like the most likely ability level of the student is around 2.

# Now let's imagine that the student happened to get a really easy problem wrong by filling in the wrong number on a multiple choice.

six.joint <- five.joint*(1-rasch(ability,b=-3))
lines(ability,six.joint/max(six.joint), col="pink", lwd=2)
# We can see that the performance on a single easy question has substantially decreased our estimate of the student's ability.

# In order to see if the distribution has also widenned I will move the max of the five and six joint distributions to be centered at 0.

plot(ability-ability[five.joint==max(five.joint)],five.joint/max(five.joint), col="green", lwd=3, type="l", xlim=c(-3,3))
lines(ability-ability[six.joint==max(six.joint)],six.joint/max(six.joint), col="purple", lwd=1, type="l")
# Adding the sixth item does appear to have increased the width of the joint distribution ever so slightly indicating that getting the easy problem wrong not only substantially decreased the estimate of the student ability but also made that estimate less precise.



No comments:

Post a Comment