Econometrics By Simulation: Estimating Person Characteristics from IRT Data

Wednesday, November 21, 2012

Estimating Person Characteristics from IRT Data - 3PL Model

Original Code

# One of the basic tasks in item response theory is estimating the ability of a test taker from the responses to a series of items (questions).

# Let's draw the same pool of items that we have used on several previous posts:

# First let's imagine we have a pool of 100 3PL items.
set.seed(101)

npool = 500

pool = as.data.frame(cbind(item=1:npool, a=abs(rnorm(npool)*.25+1.1), b=rnorm(npool), c=abs(rnorm(npool)/7.5+.1)))
summary(pool)

# Drawing on code from a previous post we can calculate several useful functions:

# (http://www.econometricsbysimulation.com/2012/11/matching-item-information.html)

# Each item has a item characteristic curve (ICC) of:
PL3 = function(theta,a, b, c) c+(1-c)*exp(a*(theta-b))/(1+exp(a*(theta-b)))

# Let's imagine that we have a single test taker and that test taker has taken the first 15 items from the pool.

items.count = 15
items.taken = pool[1:items.count,]

# And that the person has a latent theta ability of 1
theta = 1.3

# Let's calculate the cut points for each of the items.
items.cut = PL3(theta, items.taken$a, items.taken$b, items.taken$c)

# We can see how the cut point works by graphing
plot(0,0, type="n", xlim=c(-3,3),ylim=c(0,1), xlab=~theta,
ylab="Probability of Correct Response", yaxs = "i", xaxs = "i" , main="Item Characteristics Curves and Ability Level")

for(i in 1:items.count) {
lines(seq(-3,3,.1),PL3(seq(-3,3,.1), items.taken$a[i], items.taken$b[i], items.taken$c[i]), lwd=2)
abline(h=items.cut[i], col="blue")
}
abline(v=theta,col="red", lwd=3)

# Now let's draw a uniform draw that we will use to calculate whether each item as passed.

rdraw = runif(items.count)

# Finally, we will calculate item responses
item.responses = 0
item.responses = items.cut > rdraw

###############################################
# Done with Simulation - Time for Estimation

# We want to use the information we know about the items (the item parameters and the responses) in order to estimate a best guess at the true ability of the test taker.

# First we must check if the person got all of the items either correct or incorrect.

sum(item.responses)
# If this is either a 0 or a number equal to the number of items then we cannot esimate an interior maximum without additional assumptions.

# We will attempt to recover our theta value using the r command optim

# First we need to specify the function to optimize over.

MLE = function(theta) sum(log((item.responses==T)*PL3(theta, items.taken$a, items.taken$b, items.taken$c) +
(item.responses==F)*(1-PL3(theta, items.taken$a, items.taken$b, items.taken$c))))
# The optimization function takes as its argument the choice variables to be optimized (theta).
# The way the above optimization works is that you specify the probility of each response piecewise.
# If the response is correct, then you count the CDF of theta up to that point as contributing to the probability of observing a correct outcome. If the response is negative, then you count it as contributing the the probability of a incorrect outcome. You then choose the theta that produces the greatest total probability.

MLEval = 0
theta.range = seq(-3,3,.1)
for(i in 1:length(theta.range)) MLEval[i] =MLE(theta.range[i])

plot(theta.range, MLEval, type="l", main="Maximim Likelihood function", xlab= ~theta, ylab="Sum of Log Likelihood")
abline(v=theta, col="blue")
# We can visually see that the maximum of the slope will not be at the true value though it will be close.

optim(0,MLE, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))
abline(v=optim(0,MLE, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par, col="red")

# We can see that we can estimate theta reasonably well with just 15 items from a paper test (red line estimate, blue line true). However, looking at the graph of the ICCs, we can see that for most of the items, the steepest point (where they have the most discriminating power) is at an ability set lower than the test taker's ability. Thus, this test provides the most information about a person who has a lower ability than the person with a theta=1.2.

# We can use R's optim function to find the ideal theta that would maximize the information from this test.
# Item information is:
PL3.info = function(theta, a, b, c) a^2 *(PL3(theta,a,b,c)-c)^2/(1-c)^2 * (1-PL3(theta,a,b,c))/PL3(theta,a,b,c)

# Notice, this is not the best way of defining the test information function since the items are not arguments.
test.info = function(theta) sum(PL3.info(theta, items.taken$a, items.taken$b, items.taken$c))

# Construct a vector to hold the test information
info = 0
for(i in 1:length(theta.range)) info[i]=test.info(theta.range[i])

plot(theta.range, info, type="l", main="Information Peaks Slightly Above 0", xlab= ~theta, ylab="Information")
abline(v=theta, col="blue")
# But we want to know about the test taker at theta

optim(0,test.info, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))
# The person this test would be best suited to evaluate would have an ability rating of .19
abline(v=optim(0,test.info, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par, col="red")

Econometrics By Simulation

Wednesday, November 21, 2012

Estimating Person Characteristics from IRT Data - 3PL Model

No comments:

Post a Comment

Blog Archive