Sunday, September 16, 2012
The True Test Score World
* We have results ten different tests administered to 10000 respondents, each taking 10 different tests, which each experience some symmetric homogenous measurement error E(e)=0.
set obs 1000
* The true score of each respondent is tt
gen true_score = rnormal()*2
label var true_score "True Score - Intelligence test"
gen age_group = mod(_n-1,5)
label var age_group "Age grouping"
* This is creating
* Measurement error
gen e = rnormal()
* The observed score is a result of the true score as well as a difference in ability resulting from age as well as some error in measurement.
gen obs_score = true_score+age_group+e*(1+age_group/2.5)
* Performance of students harder to measure as they get older (thus the 1+e/15) term.
corr obs_score true_score
* We can see that our observed score is not a very good measure of true intelligence.
* This is of course primarily because our students get uniformily better at the designated task the older they are.
* If we looked within age group we should be able to see that our estimate is working better.
bysort age_group: corr obs_score true_score
* Now we can see our correlations are looking pretty good.
* However, we want to figure out which of our students are the brightest among all age groups and put those students in special programs to encourage their developement.
* So we want to measure of intelligence that is independent of age.
* One method would be to subract the mean of the observed score for each age.
bysort age_group: egen meaned_score = mean(obs_score)
gen score_demean = obs_score-meaned_score
corr score_demean true_score
* We know that intelligence is distributed equally across all grades. Might we use this knowledge to rescale our observed scores to get a better estimate?
bysort age_group: egen sd_score = sd(obs_score)
gen standardized_score = score_demean/sd_score
corr standardized_score true_score
* We can see interestingly, that standardizing the score by age group gives us the best estimate of intelligence accross all age groups. This is exploiting the knowledge (assumption normally) that the underlying distribution of intelligence is the same for all age groups.