Tuesday, March 26, 2013

A simple simulation of Hierarchical Linear Modelling (HLM) using two different R packages - intercept only RE

# HLM is a common tool used to analysize hierarchically structured data.

# Let's see it in action using a couple of the tools programmed up by HLM researchers.

# First let's generate a data set.

# Imagine we have a two level model, students nested within classrooms.

# Let's say we have 20 classrooms

nclass = 20

# And thirty students per classroom

nstud = 30

# Let's imagine that we have a classroom effect that varies randomly and is uncorrelated with the student level effect.

class.effect = rnorm(nclass)*2

# Imagine also we have a unique outcome per student.

# We have nclass*nstud number of students in total.

student.effect = rnorm(nstud*nclass)*3

# Now we have all the data we need in order to generate our analysis.
data.set = data.frame(class.id = rep(1:nclass, each=nstud),
class.effect = rep(class.effect, each=nstud),
student.id = 1:(nstud*nclass),
student.effect = student.effect)

data.set$outcomes = data.set$class.effect + data.set\$student.effect

head(data.set)
# Looking good.  Now let's load our HLM package.

require(nlme)

# For quick reference I used:
# https://txcdk.unt.edu/iralab/sites/default/files/HLM_R_Handout.pdf

lme(outcomes ~ 1, random = ~ 1 | class.id, data=data.set)

# We can see that lme is pretty good at estimating a standard deviation of the level 2 effect at around 2 and a level 3 effect around 3.

# Alternatively we can attempt to use the software lme4.
require(lme4)

# For quick reference I went to:
http://lme4.r-forge.r-project.org/book/Ch1.pdf

lmer(outcomes ~ 1 + (1|class.id), data=data.set)
# Reassuringly we can see that the estimates are indentical though lme4 seems to provide more better organized feedback.

2 comments:

1. As you show, it's relatively straightforward to simulate independent data. But what about a situation where student and class are not independent of each other? Taking it to an even more complex level, what about simulating student-level and classroom-level covariates that are perhaps intercorrelated with other predictors and with the outcome in a regression model? It would be interesting to learn how to simulate data with dependencies such as these. Does R provide an easy way to create such simulations?

1. This is a good point. Independence is a large assumption that should not be lightly made. I will post more on this is the future.