Sunday, November 18, 2012

Selecting your First Item on a Computer Adaptive Test


Original Code

# Computer adaptive tests "adapt" to test taker ability by making assessments of the test taker's ability and providing questions that are meant to maximize the amount of information that can be inferred from the test.

# We often start by assuming that the ability score of an individual is at the average for the population to begin with.

# Once that assumption is made then the adapative test selects an item that maximizes the information at that point.


# Let's imagine that the true ability of the student is -1 and we would like to select items that get us from 0 to -1.


# Let's see how this works.

#######################################################
#  Method 1 - Item Information, Fisher Information Criteria

# First let's imagine we have a pool of 100 3PL items.
set.seed(101)

npool = 100

pool = as.data.frame(cbind(item=1:npool, a=abs(rnorm(npool)*.25+1.1), b=rnorm(npool), c=abs(rnorm(npool)/7.5+.1)))
summary(pool)

# Drawing on code from a previous post we can calculate several useful functions:

# (http://www.econometricsbysimulation.com/2012/11/matching-item-information.html)

# Each item has a item characteristic curve (ICC) of:
PL3 = function(theta,a, b, c) c+(1-c)*exp(a*(theta-b))/(1+exp(a*(theta-b)))

# and information function defined as:
PL3.info = function(theta, a, b, c) a^2 *(PL3(theta,a,b,c)-c)^2/(1-c)^2 * (1-PL3(theta,a,b,c))/PL3(theta,a,b,c)

# We can use the previous post to find the information for any number theta values.
# For now we are only interested in the information for our itial "guess" at student ability:
theta.estimate=0

# Each item has a item characteristic curve (ICC) of:
PL3 = function(theta.estimate,a, b, c) c+(1-c)*exp(a*(theta.estimate-b))/(1+exp(a*(theta.estimate-b)))

# and information function defined as:
PL3.info = function(theta.estimate, a, b, c) a^2 *(PL3(theta.estimate,a,b,c)-c)^2/(1-c)^2 * (1-PL3(theta.estimate,a,b,c))/PL3(theta.estimate,a,b,c)

# First I want to calculate the information at each theta for each item.

# In this first case theta only equals 0.
nthetas = length(theta.estimate)

# This matrix will contain the item pool information values
pool.info = matrix(NA, nrow=npool, ncol=nthetas)

colnames(pool.info) = paste("IT=",theta.estimate,sep="")

# These must be calculated for each item but can be for all thetas simultaneously.
for(i in 1:npool) pool.info[i,] = PL3.info(theta.estimate, pool[i,2]*1.7, pool[i,3], pool[i,4])

head(pool.info)

# Everything appears to be working well.  Let's find the max (note this code only works well for one theta).
cbind(pool,pool.info)[pool.info==max(pool.info),]

# Item 33 with am information of 1.2695 with a=1.35, b=-.14, and c=.007 is the best choice for the first item using the fisher information criteria.

#######################################################
#  Method 2 - Kullback–Leibler information divergence method

# Referencing Eggen of Cito (1999) the KL information divergence for a single item can be expressed as
# KL = p(true.theta)*log(p(true.theta)/p(estimated.theta)) + q(true.theta)*log(q(true.theta)/q(estimated.theta))
# where p is the probability of getting that item correct and q is the probability of getting that item wrong.

# KL can be thought of as an item selection criteria that is likely to give you the best item to distinguish between your current estimate and the true (if the true was known).

# In order to use information divergence we find the expected value of each item.
# Computationally this is approximated by inputing possible true.theta and their estimate and finding the average.


# Let's first let's define the KL function (drawing on PL3 defined previously)

KL = function(theta.true,theta.estimate, a, b, c) {
  # For the true value
  p.true = PL3(theta.true,a,b,c)
  q.true = 1-p.true
 
  # For the estimate
  p.estimate = PL3(theta.estimate,a,b,c)
  q.estimate = 1-p.estimate

  # The following line is the value to be returned to the KL function:
  p.true*log(p.true/p.estimate) + q.true*log(q.true/q.estimate)
  }
# The function is written to only take a single theta.estimate while multiple true theta's should not be a problem.

# Now let's specify a simplified discrete range that the true theta can take.

theta.true = c(-1,0,1)

nthetas = length(theta.true)

# This matrix will contain the item pool KL values
pool.KL = matrix(NA, nrow=npool, ncol=nthetas)

colnames(pool.KL) = paste("Theta=",theta.true,sep="")

# These must be calculated for each item but can be for all thetas simultaneously.
for(i in 1:npool) pool.KL[i,] = KL(theta.true, theta.estimate, pool[i,2]*1.7, pool[i,3], pool[i,4])
head(pool.KL)
# Note that for theta.true = 0 the entire column is equal to zero.
# This is because if the true is equal to the estimate than there is no item that maximize the ability to tell the difference between the estimate and the true because there is no difference.

# In order to select the best item for a true theta range of -1,0,1 we average across the three values (or sum them).

pool.avKL = apply(pool.KL, 1, mean)
head(pool.avKL)

cbind(pool,pool.info,pool.avKL)[pool.avKL==max(pool.avKL),]
# Interestingly, item 33 is once again the item which is selected.

# Now let's imagine that rather than picking just three true theta's to search for items over that we instead want to search across the standard normal standarized distribution of abilities theta.  This is very easy to do with the code we already have.  All we need do is take draw from the normal distribution.

theta.true = qnorm(seq(.01,.99,.01))

hist(theta.true)



nthetas = length(theta.true)

# This matrix will contain the item pool KL values
pool.KL = matrix(NA, nrow=npool, ncol=nthetas)

colnames(pool.KL) = paste("Theta=",theta.true,sep="")

# These must be calculated for each item but can be for all thetas simultaneously.
for(i in 1:npool) pool.KL[i,] = KL(theta.true, theta.estimate, pool[i,2]*1.7, pool[i,3], pool[i,4])
pool.avKL = apply(pool.KL, 1, mean)
head(pool.avKL)

cbind(pool,pool.info,pool.avKL)[pool.avKL==max(pool.avKL),]
# Surprisingly, once again item 33 is selected.  I am not sure why there is no difference in methods yet.  I suspect it is due to the symetric nature of the normal distribution.

# We can certainly play with the models at least.  I reran the code starting with theta.estimate=2 and the fisher information criteria did select a different item than the KL though the KL selected the same item with both choices of theta.  This is probably because the item pool is so small.  Given a larger item pool, I suspect that distributional choices become more imporant.

# Changing the item pool to 10,000 and theta.estimate=1 I ended up not finding any difference in item choice between only the three values -1,0,1 and the full normal distribution of values.

# Let's do one more thing.  Let us imagine that we think we know the true theta and we would like to use the KL to select an item that brings us closest to the true.

theta.true = -1
nthetas = length(theta.true)

pool.KL = matrix(NA, nrow=npool, ncol=nthetas)

colnames(pool.KL) = paste("Theta=",theta.true,sep="")

# These must be calculated for each item but can be for all thetas simultaneously.
for(i in 1:npool) pool.KL[i,] = KL(theta.true, theta.estimate, pool[i,2]*1.7, pool[i,3], pool[i,4])
pool.avKL = apply(pool.KL, 1, mean)
head(pool.avKL)

cbind(pool,pool.info,pool.avKL)[pool.avKL==max(pool.avKL),]

# In that case item 72 is selected with parameters a=1.5, b=-.43, c=.097.

# Ths KL difference criteria seems to be working well since it selected an item that is between the true -1 and the estimate 0.

No comments:

Post a Comment