Wednesday, October 24, 2012

Hierarchical linear modelling


* Explorations of in heirachical linear modeling

* Hierarchical modeling is a a type of model that assigns to different levels different portions of the unexplained error term.

* What does this mean?

* Imagine that you are interested in predicting student success.

* However, you believe that there is a invidual effect size for each student, each teacher, each school, and each district.

* You would like to know what portion of the unexplained variance can be attributed to each level.

* Let's see how this works

clear

* Imagine that there are 5 districts you have data on
set obs 5

gen dist_id=_n

* Each district has an effect size sd=2
gen dist_fe = rnormal()*2

* Each district has 5 schools
expand 5

gen school_id=_n

* Each school has an effect size sd=3
gen school_fe = rnormal()*3

* Within each school you have 8 teachers
expand 8

gen teach_id = _n

* Each teacher has an individual effect size sd = 4
gen teacher_fe = rnormal()*4

* Each teacher has twenty students.

expand 20

gen student_id = _n

* Each of these students has an effect size equal to sd = 5

gen student_fe = rnormal()*5

* Let's imagine for a second that each student stays with the same teacher for two semesters and we have information about student progress at the end of each semester.

expand 2

sort student_id

gen semester = mod(_n-1,2) + 1

* There is some transient error (shocks) that affect student performance during each semester sd = 6

gen u = rnormal()*6

* Finally let's imagine that there is some treatment such as extra tutoring that is randomly assigned to 20% of our students on a semester basis.

gen tutoring = rbinomial(1,.2)

* Let's now generate our test results

gen perform = 2*tutoring + dist_fe + school_fe + teacher_fe + student_fe + u

* For our first cut at our analysis of this data let's compare the mixed effect HLM model with a standard fixed effect model and random effects model.

* The standard fixed effect model estimates an effect for every individual.  In this case let us focus on students.

xtset student_id

xtreg perform tutoring, fe
* We can see that our estimate of the effect tutoring is close and that our variance in estimated student effects (sigma_e) is very close to 6.

* We might also try to estimate tutoring with a random effects model at the student level.
xtreg perform tutoring, re

* We expect that the re estimator outperforms the fe estimator in estimating the returns to tutoring because in this case we know that the assignment to tutoring is trully orthoganol to the student fixed effects.

* Now let's compare our previous estimates to the single level mixed effects
xtmixed perform tutoring || student_id:

* We can see that the xtmixed command produces an estimate on tutoring very nearly identical to that of the random effects model.

* As with the previous two estimates, we are generally satisfied with the estimates of the standard deviation of the student effect.

* Now let's start adding layers.

xtmixed perform tutoring || student_id: || teach_id: || school_id: || dist_id:

* Our estimate of tutoring has not improved.

* Likewise our ability to estimate the variance at each level does not seem to have much ability to distinguish effect size between the district level with a (sd of 2), the school (sd of 3), the teacher (sd of 4), and the student (sd of 5).

* Perhaps this is a result of the sample size being too small.

* At this point I will return to the beginning of the simulation and increase the number of districts to 50 with the resulting number of observations at 80,000.  However, the xtmixed command is very slow and computationally intensive so I will paste in the results directly.

/* The First etimation
Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log restricted-likelihood =  -283974.7
Iteration 1:   log restricted-likelihood =  -283974.7

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =     80000
Group variable: student_id                      Number of groups   =     40000

                                                Obs per group: min =         2
                                                               avg =       2.0
                                                               max =         2


                                                Wald chi2(1)       =    873.44
Log restricted-likelihood =  -283974.7          Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     perform |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    tutoring |   1.975427   .0668413    29.55   0.000     1.844421    2.106434
       _cons |  -.4480866   .0439606   -10.19   0.000    -.5342478   -.3619255
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
student_id: Identity         |
                   sd(_cons) |   7.231256   .0354216      7.162163    7.301015
-----------------------------+------------------------------------------------
                sd(Residual) |   5.984532   .0211588      5.943205    6.026147
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 17369.05 Prob >= chibar2 = 0.0000

* The second estimation (this took a very long time)

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log restricted-likelihood = -284364.73
Iteration 1:   log restricted-likelihood = -283987.22  (not concave)
Iteration 2:   log restricted-likelihood = -283981.69  (backed up)
Iteration 3:   log restricted-likelihood =  -283974.7  (not concave)
Iteration 4:   log restricted-likelihood =  -283974.7  (backed up)
Iteration 5:   log restricted-likelihood =  -283974.7  (not concave)
Iteration 6:   log restricted-likelihood =  -283974.7  (backed up)

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =     80000

-----------------------------------------------------------
                |   No. of       Observations per Group
 Group Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
     student_id |    40000          2        2.0          2
       teach_id |    40000          2        2.0          2
      school_id |    40000          2        2.0          2
        dist_id |    40000          2        2.0          2
-----------------------------------------------------------

                                                Wald chi2(1)       =    873.44
Log restricted-likelihood =  -283974.7          Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     perform |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    tutoring |   1.975427   .0668413    29.55   0.000     1.844421    2.106434
       _cons |  -.4480866   .0439606   -10.19   0.000    -.5342478   -.3619253
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
student_id: Identity         |
                   sd(_cons) |   3.621455   11.10185      .0089014     1473.35
-----------------------------+------------------------------------------------
teach_id: Identity           |
                   sd(_cons) |   3.608619   9.417657      .0216722    600.8684
-----------------------------+------------------------------------------------
school_id: Identity          |
                   sd(_cons) |   3.681086   8.431311       .041338    327.7954
-----------------------------+------------------------------------------------
dist_id: Identity            |
                   sd(_cons) |   3.550187   8.323714       .035854    351.5322
-----------------------------+------------------------------------------------
                sd(Residual) |   5.984525   .0211586      5.943198    6.026139
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(4) = 17369.05   Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.              */

* So?  As a result we can see that our estimators do not improve substantially as a result of more data.  They may even be worse.

4 comments:

  1. You have your nesting backwards. It should be:

    xtmixed perform tutoring || dist_id: || school_id: || teach_id: || student_id:

    Scott

    ReplyDelete
  2. You're comparing the mean structure of models estimated using ReML. Use ML instead.

    ReplyDelete
    Replies
    1. I am afraid I don't know what you are talking about with ReML

      Delete