Econometrics By Simulation: More Explorations with catR

Sunday, December 1, 2013

More Explorations with catR

# For the purposes of simulating computerized adaptive tests
# the R package catR is unparallelled.
 
# catR is an excellent tool for students who are curious about
# how a computerized adaptive test might work. It is also useful
# for testing companies that are interested in seeing how
# their choices of number of items, or model, stopping rule,
# or quite a few of the other options which are available
# when designing a specific computerized adaptive test.
 
# In this post I will explore some of the features of the 
# function randomCAT, an extremely powerful function 
# that simulates an entire response pattern for an individual.
 
# In a previous post I explore  some of the other function

# in catR in order to step by step demonstrate how to use

# the package to simulate a test.
 
library("catR")
 
# First let's generate an item bank. 
# Items specifies how many items to generate
 
# Model specifies which model to use in generating the items
# a,b,c Priors are specifying distributions to draw
# the parameters from for each item.
 
# The final set of arguments is for specifying
# what range of theta values the bank will initially
# draw item parameters for.  Theta values are the typical
# latent traits for which item response theory is concerned
# with estimating.
Bank <- createItemBank(items = 500, model = "3PL", 
                       aPrior=c("norm",1,0.2), 
                       bPrior=c("norm",0,1), 
                       cPrior=c("unif",0,0.25),
                       thMin = -4, thMax = 4,
                       step = 0.05)
 
# We may want to examine the object we have created called "Bank"
attributes(Bank)
 
# Within the Bank object of class "itBank" there is three named
# attributes.
 
# itemPar lists the item parameters for those items which have been
# generated.  We could see a histogram of difficulty parameters (b) by
# targeting within the Bank object:
 
hist(Bank$itemPar[,2], breaks=30, 
     main="Distribution of Item Difficulties",
     xlab="b parameter")


# We can also see how much information a particular item would add
# accross a range of ability levels.  This information is already
# available within the Bank object under the names infoTab and
# theta.
 
# Plot the first item's information
plot(rep(Bank$theta,1),Bank$infoTab[,1], 
     type="l", main="Item 1's information", 
     xlab="Ability (theta)", ylab="Information")
 
# Plot the first 3 items
# By specifying type = "n" this plot is left empty
nitems = 3
plot(rep(Bank$theta,nitems),Bank$infoTab[,1:nitems], type="n", 
     main=paste0("First ",nitems," items' information"), 
     xlab="Ability (theta)", ylab="Information")
# Now we plot the 
for (i in 1:nitems) lines(Bank$theta,Bank$infoTab[,i], 
                          col=grey(.8*i/nitems))


# We can see how different items can have information that 
# spans different ability estimates as well as some items
# which just have more information than other items.
 
 
# Plotting all 500 items (same code as previously but now
# we specify the number of items as 500)
nitems = 500
plot(rep(Bank$theta,nitems),Bank$infoTab[,1:nitems], type="n", 
    main=paste0("First ",nitems," items' information"), 
    xlab="Ability (theta)", ylab="Information")
for (i in 1:nitems) lines(Bank$theta,Bank$infoTab[,i], 
                          col=grey(.8*i/nitems)) 
# This plot may look nonsensical at first.  Be it actually
# provides some useful information.  From it you can see the
# maximum amount of information available for any one
# item at different levels of ability.  In the places where
# there is only one very tall item standing out we may be 
# concerned about item exposure since subjects which seem to
# be in the area of that item are disproportionately more likely
# to get the same high info item than other other subjects
# in which the next highest item is very close in information
# to the max item.
 
# To see the max information for each ability we can add a line.
lines(Bank$theta,apply(Bank$infoTab, 1, max), col="blue", lwd=2)
 
# We might also be interested in seeing how much information
# on average a random item chosen from the bank would provide
# or in other words what is the expected information from a
# random item drawn from the bank at different ability levels.
lines(Bank$theta,apply(Bank$infoTab, 1, mean), col="red", lwd=2)
 
# Or perhaps we might want to see what the maximum average information
# for a 20 item test might be. So we calculate the average information
# for the top 20 items at different ability levels.
maxmean <- function(x, length=20) mean(sort(x, decreasing=T)[1:length])
maxmean(1:100) # Returns 90.5, seems to be working properly
 
lines(Bank$theta,apply(Bank$infoTab, 1, maxmean), col="orange", lwd=3)
 


# Now this last line is very interesting because it reflects
# per item the maximum amount of information this bank can provide
# given a fixed length of 20. Multiply this curve by 20 and it will give
# us the maximum information this bank can provide given a 20 item test
# and a subject's ability.
 
# This can really be thought of as a theoretical maximum for which
# any particular CAT test might attempt to meet but on average will
# always fall short.
 
# We can add a lengend
legend(-4.2, .55, c("max item info", "mean(info)", 
                    "mean(top items)"),
       lty = 1, col = c("blue","red","orange"),  adj = c(0, 0.6))
 
 
 
library("reshape")
library("ggplot2")
 
# Let's seperate info tab
infoTab <- Bank$infoTab
 
# Let's add three columns to info tab for max, mean, and mean(top 20)
infoTab <- cbind(infoTab, 
                 apply(Bank$infoTab, 1, max),
                 apply(Bank$infoTab, 1, mean),
                 apply(Bank$infoTab, 1, maxmean))
 
 
# Melt will turn the item information array into a long object
items.long <- melt(infoTab)
 
# Let's assign values to the first column which are thetas
items.long[,1] <- Bank$theta
 
# Now we are ready to name the different columns created by melt
names(items.long) <- c("theta", "item", "info")
 
itemtype <- factor("Item", c("Item","Max", "Mean", "Mean(Max)"))
items.long <- cbind(items.long, type=itemtype)
items.long[items.long$item==501,4] <- "Max"
items.long[items.long$item==502,4] <- "Mean"
items.long[items.long$item==503,4] <- "Mean(Max)"
 
# Now we are ready to start plotting
# Assign the data to a ggplot object
a <- ggplot(items.long, aes(x=theta, y=info, group=item))
 
# Plot a particular instance of the object
a + geom_line(colour = gray(.2)) +
    geom_line(aes(colour = type), size=2 , 
            subset = .(type %in% c("Max", "Mean", "Mean(Max)")))
 


# Now let's look at how the randomCAT function works.
# There are a number of arguments that the randomCAt function
# can take.  They can be defined as lists which are fed
# into the function.
 
# I will specify only that the stoping rule is 20 items.
# By specifying true Theta that is telling random CAT what the
# true ability level we are estimating.
res <- randomCAT(trueTheta = 3, itemBank = Bank, 
                 test=list(method = "ML"),
                 stop = list(rule = "length", thr = 20))
# I specify test (theta estimator) as using ML because the
# default which is Bayesian model is strongly centrally
# biased in this case.
 
# Let's examine what elements are contained with the object "res"
attributes(res)
 
# We can see our example response pattern.
thetaEst <- c(0, res$thetaProv)
 
plot(1:21, thetaEst, type="n",
     xlab="Item Number",
     ylab="Ability Estimate", 
     main="Sample Random Response Pattern")
# Add true ability line
  abline(h=3, col="red", lwd=2, lty=2)
# Add a line connecting responses
  lines(1:21, thetaEst, type="l", col=grey(.8))
# Add the response pattern to 
  text(1:21, thetaEst, c(res$pattern, "X"))
# Add the legend
  legend(15,1,"True Ability", col="red", lty=2, lwd=2)
 


# Plot the sample item information from the set of items selected.
plot(rep(Bank$theta,20),Bank$infoTab[,res$testItems], type="n", 
     main="High information items are often selected", 
     xlab="Ability (theta)", ylab="Information")
for (i in 1:500) lines(Bank$theta,Bank$infoTab[,i], col=grey(.75)) 
# Now we plot the 
for (i in res$testItems) lines(Bank$theta,Bank$infoTab[,i], 
                               lwd=2, col=grey(.2))
 


# Now let's see how randomCat performs with a random draw
# of 150 people with different ability estimates.
 
npers <- 150   # Specify number of people to simulate
 
theta <- rnorm(npers)
# Draw a theta ability level vector
 
thetaest <- numeric(npers)
# Creates an empty vector of zeros to hold future estimates
# of theta
 
# Create an empty item object
items.used <- NULL
 
# Create an empty object to hold b values for items used
b.values <- NULL
 
 
for (i in 1:npers) {
  # Input the particular theta[i] ability for a particular run.
  res <- randomCAT(trueTheta = theta[i], 
                   itemBank = Bank, 
                   test=list(method = "ML"),
                   stop = list(rule = "length", thr = 20))
  # Save theta final estimates
  thetaest[i] <- res$thFinal
  # Save a list of items selected in each row of items.used
  items.used <- rbind(items.used, res$testItems)
  # Save a list of b values of items selected in each row
  b.values <- rbind(b.values, res$itemPar[,2])
 
}
 
# Let's see how our estimated theta's compare with our true
plot(theta, thetaest, 
     main="Ability plotted against ability estimates",
     ylab="theta estimate")

# To get a sense of how much exposure our items get 
itemTab <- table(items.used)
 
length(itemTab)
# We can see we only have 92 items used for all 150 subjects
# taking the cat exam.
 
mean(itemTab)
# On average each item used is exposed 32 times which means
mean(itemTab)/150
# over a 20% exposure rate on average in addition to some items
# have much higher exposure rates.

Created by Pretty R at inside-R.org

Econometrics By Simulation

Sunday, December 1, 2013

More Explorations with catR

No comments:

Post a Comment

Blog Archive