Saturday, April 13, 2013

Meta-Analysis Simulation - Monte Carlo

# Meta-Analysis is a method that allows researchers to merge the scientific results from a number of studies into a single analysis.  This is an especially useful method because it allows for a collective estimate of an effect using a pool of data much large than could be achieved with any single study.
n.studies = 50

# This stores the simulation results from all runs varying the true effect.
total.sim.results=matrix(NA, nrow=0, ncol=5)

# This is true we want to detect.
for (true.eff in c(0,.05,.10,.15,.20)) {

ind.sd = 1

s.scalar = 100
# Study scale is drawn from a chi-squared distribution times the scalar + 5

# unit scale. Normal distribution.
mean.scale = 1

# sd scale. Uniform distribution.
sd.scale = 20

# In this analysis I will structure this as a Monte Carlo simulation to test how this method works.  Please note, I am just working from class notes on this so I might be formulating the meta-analysis proceedure incorrectly.
n.simulations = 100
# We will save the results of the Monte Carlo in a matrix.
sim.results = matrix(NA, nrow=n.simulations, ncol=5)

print(paste("True Effect:",true.eff))

for (s in 1:n.simulations) {


# First let's generate our study parameters:

s.size = ceiling(rnorm(n.studies)^2 * s.scalar)+15
s.sd = runif(n.studies)*sd.scale
s.mean = rnorm(n.studies)*mean.scale

# For didactic purposes I will simulate the individual studies though this is not neccessary.

ind.treat = ind.data = matrix(NA,nrow=max(s.size), ncol=n.studies)

est.corr = est.y.sd = est.x.sd = est.effect = rep(NA, n.studies)

for (i in 1:n.studies) {
  ind.treat[1:s.size[i],i] = rnorm(s.size[i])>0
  ind.data[1:s.size[i],i] = (true.eff*ind.treat[1:s.size[i],i] + rnorm(s.size[i])*ind.sd)*s.sd[i] + s.mean[i]

  # Estimate effect for each
  est.effect[i] = lm(ind.data[1:s.size[i],i]~ind.treat[1:s.size[i],i])$coef[2]

  # Estimate the standard deviation for each study
  est.y.sd[i] = sd(ind.data[1:s.size[i],i], na.rm=T)
  est.x.sd[i] = sd(ind.treat[1:s.size[i],i], na.rm=T)

  # Estimate correlation between treatment and outcome
  est.corr[i] = cor(ind.data[1:s.size[i],i], ind.treat[1:s.size[i],i])
}
# Because everybody is using a different scale, the estimated affect is unique to that study's scale.
est.effect

# We therefore divide the estimated effect by the
z.est.effect = est.effect/est.y.sd

z.est.effect

# We can convert our standardized estimated effects to correlations:
z.est.effect*est.x.sd

est.corr

# In order to aid analysis, tranform the data to fisher z
z.corr = .5*log((1+est.corr)/(1-est.corr))

mean(z.corr)

# The variance of the inidividual Fisher zs can be approximated with:
z.var = 1/(s.size-3)

# Calculate the between study variance
between.var = var(z.corr)

# In order to construct the test we need to specify weights which are equal to the inverse of the variance the study plus the variance between studies.
weight = 1/(between.var+z.var)

# Now we can estimate to see if our effect is statistically signficant.
  (sim.results[s,]=c(true.eff,summary(lm(z.corr~1, weights=weight))$coef))
# Name the columns of the results matrix.
  if (s==1) colnames(sim.results)=c("true.eff" ,colnames(summary(lm(z.corr~1, weights=weight))$coef))
# Provide feedback to the user.
  print(paste("Number of repetitions completed:",s))
}
total.sim.results = rbind(total.sim.results,sim.results)
}

# We can use the Monte Carlos results to test several features:

# 1. Test if our estimator is overpowered.
# To do this we set the true effect equal to zero and see how frequently we reject at the 10% and 5% level (true.eff=0)
mean(total.sim.results[total.sim.results[,1]==0,5]<.1)
# If this number is substantially higher than 10% or 5% then we should worry about our estimator giving us results when there actually are no results.

# 2. We can test the power of our analysis to reject the null when there is an effect of a particular size (for example: true.eff=.1)
mean(total.sim.results[total.sim.results[,1]==.1,5]<.1)
# true.eff=.2
mean(total.sim.results[total.sim.results[,1]==.2,5]<.1)

boxplot(total.sim.results[,5]~total.sim.results[,1], main="P values", ylab="P value", xlab="True Effect")
abline(h=.1, col=grey(.75))


2 comments:

  1. Your blog is hard to read due to the difficulty in separating text from r code. Why not separate the two instead of comments throughout the code?

    ReplyDelete
    Replies
    1. Please see next post addressing this issue.

      Delete