Thursday, August 30, 2012

Cronbach's alpha

* Cronbach's alpha coefficient is a widely used measure of internal consistency or reliability of a psychometric test score for a sample of examinees.

* First let us imagine that we have a test of 100 items to be administered to 1000 people.

* Let's imagine the test in only attempting to measure a single ability (math competency).

clear
set obs 1000
gen stu_id = _n
label var stu_id "Student ID"

* Let's generate our data using IRT specifications:

* Each of the 1000 test takers has a different ability
gen theta = rnormal() + .5
label var theta "Individual ability"

* Each item (testing question) has three parameters:
* a - discrimination (the ability this item has of deciphering between people of different ability levels)
* b - the difficulty of this item (1 is easy, 0 is hard)
* c - the guessing probability (the probability that someone who knows nothing will guess the correct answer

* Let us generate now the 100 items, for all 1000 test takers

* Each item will have a different a,b, and c.

* Let's also generate a total score for each person
gen total_score = 0

forv i = 1/100 {

local a = .2 + runiform()/4
local b = .3 + runiform()/2
local c = .2 + runiform()/3
* I am not very sure what the best way of parameterizing these items are since I know this matters.

local ai' = a'
local bi' = b'
local ci' = c'

* We will generate item responses as pi(theta) = ci + (1-ci)/(1 + exp(-ai*(theta-bi)))
* This is saying for a person, the probability of getting the item i right is a function of the item parameters ai bi ci as well personal ability theta.
gen itemi' = rbinomial(1, c'+(1-c')/(1+exp(-a'*(theta-b'))))

* Add the result of this item to the total score
replace total_score = total_score + itemi'

}
sum
forv i = 1/100 {
di "Item i': a = ai'' , b = bi'' , c = ci''"
}
* We can see that items with higher "difficulty" have more people getting the item right.

* We now have a collection of items.

* Now to calculate the alpha coefficient we need to do the following
* alpha = K/(K-1) * (1-sum(var(items))/var(total_score))
* K is the number of components (so K is 100)

local K = 100
* variance of total score is easy
sum total_score
local var_total_score = r(sd)^2

* To calculate the sum of the variance of items is a bit more complicated
local sum_var_items = 0
forv i = 1/100 {
qui sum itemi'
local sum_var_items = sum_var_items' + r(sd)^2
}

di "Sum of Item Variances is sum_var_items'"

* Now we are ready to calculate the alpha:
di "alpha = " K'/(K'-1) * (1-sum_var_items'/var_total_score')

* I am learning this as I am doing it.  Please correct me if I have made any mistakes.

* Right now the alpha for the current items is only .55 Which is really pretty poor.

* Try increasing the different parameters in the model a, b, c (their constants and their variations)