Wednesday, August 29, 2012

Calculating Mutually Exclusive Fixed Effects


* Let's imagine we would like to estimate how effective 90 different teachers in 3 different grades each teaching 20 students are individually.  Every student receives a teacher and all teachers are assigned to students randomly (an important assumption often violated).

clear
set obs 90

gen teffect = rnormal()+1.5
  label var teffect "True Teacher Effect"

gen tid = _n

* This should create 3 different grades for the teachers to be assigned to
gen grade = ceil(_n/30)

* We will expand to 20 students per teacher
expand 20

* This is the base student effect level.
gen student_effect = rnormal()

* This is the starting levels of the teachers
gen start_level = rnormal()

* This is a random normal variation achievement gain over the year
gen u = rnormal()

gen current_level = teffect + student_effect + .75*start_level + 2*grade + u*5

* Multiplying start_level by .75 implies that students retain 75% of the ability that they had going into the school year.

* Now we want to see how well we can infer teacher ability

tab tid, gen(tid_)

* We might want to start with a straightforward regression of current achievement on teacher id
reg current_level tid_2-tid_90 start_level

* Since tid=1 is ommitted due to multicolinearity, we will set its effect equal to 0 as a base of reference.
gen reg_res = 0 if tid==1

* Note, you may want to identify the true magnitude of the teacher effect.  This however, is not possible because all students have recieved teachers.  Therefore, we can only at best hope to estimate how good teachers are relative to each other.

forv i = 2/90 {
  cap replace reg_res = _b[tid_`i'] if tid == `i'
}

* The problem with this is now we have to figure out how to compare teachers.

* One way would be to correlate the estimated teacher effect with the true teacher effect (which we know).
corr reg_res teffect

* This correlation looks really bad primarily because teachers are only teaching in one grade each and grades have different learning effects.

spearman reg_res teffect
* The spearman rank correlation fairs even worse than the pearson correlation

two (scatter reg_res teffect if grade==1)  ///
    (scatter reg_res teffect if grade==2)  ///
    (scatter reg_res teffect if grade==3), ///
legend( label(1 "Grade 1") label(2 "Grade 2") label(3 "Grade 3"))

* We can see generally their is a correlation between higher teacher effect and higher estimates of teacher effects across all grades.  However, within grades the correlation is even more clear.


* One may attempt to correct this problem by including grade dummies
reg current_level tid_2-tid_90 i.grade start_level

* However, the system experiences multicolinearity issues and sometimes drops the grade dummies.

* To control this we will drop the first teacher in each grade.

tab grade, gen(grade_)

reg current_level tid_2-tid_30 tid_32-tid_60 tid_62-tid_90 grade_1-grade_3, nocon

* This regression still is a little fishy however.

* Within each grade the estimated teacher effects is relative to the omitted teacher.

* Thus if the omitted teacher is high in grade 1 and low in grade 2 then the correlations will be thrown off.

gen reg_GD = 0 if tid==1 | tid==31 | tid == 61
forv i = 2/90 {
  cap replace reg_GD = _b[tid_`i'] if tid == `i'
}

corr reg_GD teffect
spearman reg_GD teffect
* Including the grade dummies greatly improves the teacher estimates.

* An alternative method would be to demean current achievement

bysort grade: egen mean_current_level = mean(current_level)

gen dm_current_level = current_level-mean_current_level

reg dm_current_level tid_2-tid_90 start_level

gen dm_results = 0 if tid==1
forv i = 1/90 {
  cap replace dm_results = _b[tid_`i'] if tid == `i'
}

* The problem with this is now we have to figure out how to compare teachers.
corr dm_results teffect
spearman dm_results teffect

* Finally an alternative approach may be to do the original regression but demean the teacher estimates by grade post estimation.

bysort grade: egen mean_reg_res = mean(reg_res)
gen dm_reg_res = reg_res - mean_reg_res

* The problem with this is now we have to figure out how to compare teachers.
corr dm_reg_res teffect
spearman dm_reg_res teffect

two (scatter dm_reg_res teffect if grade==1)  ///
    (scatter dm_reg_res teffect if grade==2)  ///
    (scatter dm_reg_res teffect if grade==3), ///
legend( label(1 "Grade 1") label(2 "Grade 2") ///
label(3 "Grade 3")) title(Demeaned dependent variable)

No comments:

Post a Comment