Thursday, August 30, 2012

Cronbach's alpha


* Cronbach's alpha coefficient is a widely used measure of internal consistency or reliability of a psychometric test score for a sample of examinees.

* First let us imagine that we have a test of 100 items to be administered to 1000 people.

* Let's imagine the test in only attempting to measure a single ability (math competency).

clear
set obs 1000
gen stu_id = _n
label var stu_id "Student ID"

* Let's generate our data using IRT specifications:

* Each of the 1000 test takers has a different ability
gen theta = rnormal() + .5      
  label var theta "Individual ability"
 

* Each item (testing question) has three parameters:
* a - discrimination (the ability this item has of deciphering between people of different ability levels)
* b - the difficulty of this item (1 is easy, 0 is hard)
* c - the guessing probability (the probability that someone who knows nothing will guess the correct answer

* Let us generate now the 100 items, for all 1000 test takers

* Each item will have a different a,b, and c.

* Let's also generate a total score for each person
gen total_score = 0

forv i = 1/100 {

  local a = .2 + runiform()/4
  local b = .3 + runiform()/2
  local c = .2 + runiform()/3
  * I am not very sure what the best way of parameterizing these items are since I know this matters.
 
  local a`i' = `a'
  local b`i' = `b'
  local c`i' = `c'
 
  * We will generate item responses as pi(theta) = ci + (1-ci)/(1 + exp(-ai*(theta-bi)))
  * This is saying for a person, the probability of getting the item i right is a function of the item parameters ai bi ci as well personal ability theta.
  gen item`i' = rbinomial(1, `c'+(1-`c')/(1+exp(-`a'*(theta-`b'))))

  * Add the result of this item to the total score
  replace total_score = total_score + item`i'

}
sum
forv i = 1/100 {
  di "Item `i': a = `a`i'' , b = `b`i'' , c = `c`i''"
}
* We can see that items with higher "difficulty" have more people getting the item right.

* We now have a collection of items.

* Now to calculate the alpha coefficient we need to do the following
* alpha = K/(K-1) * (1-sum(var(items))/var(total_score))
* K is the number of components (so K is 100)

local K = 100
* variance of total score is easy
sum total_score
local var_total_score = r(sd)^2

* To calculate the sum of the variance of items is a bit more complicated
local sum_var_items = 0
forv i = 1/100 {
  qui sum item`i'
  local sum_var_items = `sum_var_items' + r(sd)^2
}

di "Sum of Item Variances is `sum_var_items'"

* Now we are ready to calculate the alpha:
di "alpha = " `K'/(`K'-1) * (1-`sum_var_items'/`var_total_score')

* I am learning this as I am doing it.  Please correct me if I have made any mistakes.

* Right now the alpha for the current items is only .55 Which is really pretty poor.

* Try increasing the different parameters in the model a, b, c (their constants and their variations)

No comments:

Post a Comment