Saturday, October 27, 2012

Power Analysis

# Power analysis is a frequently used tool to calculate the minimum sample size neccessary in order to show that a hypothesized result exists given some specified rate of type I and type II error.  A typical level of error for type I is alpha=5%.  Type II error for the purpose of power analysis is often specified at a beta=20% level.

# Let's first look at how selecting a sample corresponds to rejection rates and type I errors. We can do this based on the normality assumption of the underlying distribution of errors.  If we assume that we know that the underlying errors are normally distributed then we can say with exact precision how frequently a difference in means will reject the null for a particular sample size.

# A critical feature of power analysis is the specification of the expected effect size.  The larger the specified effect size the smaller the statistical sample we need in order to reject the null.  Symmetrically, the larger the effect size you assume the more likely you will fail to reject the null in practice as your analysis fails to have sufficient power.  Thus, choosing expected effect size is a non-trivial task.  Sometimes, discipline specific requirements will inform your decision as to the a-priori effect size.  For instance in educational interventions, a new technology that is expensive may not be considered economically feasible unless it increases student performance by .1 on overall GPA.
d = .1

# It is also important to either have information about the population's sample variance or to make some assumption about the populations variance.

# I assume that the sample's standard deviation of GPAs is 1.
s = 1

# I would like to first look at the number of students that may be part of the study.  In this case anywhere between 2 and 500.
n = 2:500

# The standard errror is:
se = s/(n^.5)

# The resulting t score is:
t = d/se

# Probability of rejecting the null is drawn from the Student t-distribution because we are assuming a two sided rejection rate.
p <- pt(t,n)

plot(n, p, ylim=c(.5,1), ylab="Probability", col="blue",
          xlab="Sample Size", type="l", yaxs = "i", xaxs = "i" ,
          lwd=2, main="Probability of Rejecting the Null")
# I choose .975 because 5% significance level on a two sided test leaves only 2.5% in each tail.
lines(y=c(0.975,0.975), x=c(2,500), col="red", lwd=2)

# We can see that for an effect size that is only 10% of the standard deviation of the error we would need nearly 400 people before we can be confident of rejecting the null 95% of the time.  I believe there is a symetric argument for type I errors as well.  I will attempt to address this more later.

# At this point one may wonder if we really need to hassle with the t-statistic.  For the most part you would be right.  The difference between the t and the z is extremely small.

z = d/se
pz <- pnorm(z)
plot(n, p, ylim=c(.5,1), ylab="Probability", col="blue",
          xlab="Sample Size", type="l", yaxs = "i", xaxs = "i" ,
          lwd=8, main=c("The difference between t and z distributions is", "not driving results: t=blue, z=green"))
lines(n, pz, col="green", lwd=3)
lines(y=c(0.975,0.975), x=c(2,500), col="red", lwd=2)

# Thus, for simplicity, we will use the z distribution.

# Let's look briefly at what happens when we vary effect size.

# Start with a blank graph with a range of n from 2 to 100
n<- 2:100

# we have to redefine the standard error vector
se = s/(n^.5)

plot(n, n*0, ylim=c(.5,1), ylab="Probability", col="blue",
          xlab="Sample Size", type="n", yaxs = "i", xaxs = "i" ,
          lwd=2, main="Probability of Rejecting the Null")

# Now let's loop through the difference in power as a function of effect size:
for (d in seq(.1,1,.1)) {
  z = d/se
  pz <- pnorm(z)
  lines(n, pz, col=gray(d), lwd=2)
lines(y=c(0.975,0.975), x=c(2,200), col="orange", lwd=2)

# I will now strengthen the axes since they seem to have been somewhat overwritten by the other lines on the graph.
lines(y=c(1,1), x=c(2,200), lwd=2)      
lines(y=c(0,1), x=c(2,2), lwd=2)
# Thus we can see that if the effect is close to one standard deviation, then a very small sample size is all that is needed to identify the effect.

# Overall the construction of power analysis is found by rearranging the common form of the t-test.

# t = (M0 - M1)/(SE)
# SE = s/n^.5
# t=n^.5 * (M0-M1)/s
# (t*s/(M0-M1))^2 = n
# d=M0-M1
# n = t^2 (var/d^2)

# In order to find t you find what t value is required in order to reject at the desired level.

# T~1.96 at 5% rejection rate.

# Let's imagine our previous example

# Let's see how our required sample size increases as our desired rejection rate also increases.
alpha = seq(.5,.975,.025)

target.z <- qnorm(alpha)

nreq = (target.z)^2*s^2/d^2

plot(nreq, alpha, type="l", main="As the required rejection rate increases, patient count increases")

# Thus on a target difference in means of .1 standard deviation, there need be at least:

# Around 384 observations.  (This is of course because the effect size is very small.)

# Statisticians often do not stop there.  They often also assume that the probabiliy of failing to reject the null when there actually is an effect (Type II error) is independent of Type I errors and thus need to be taken into account.  This is done by saying that we are willing to accept a 20% possibility that there is an effect but we are not detecting it.  We can adjust the requirement simply by adding an additional z score:

# This nearly doubles the required sample.

# At this point I would like to make the caveat that most of my posts I simply pull out of my head.  Thus, if there is something wrong I welcome corrections and apoligize.  I say this now because I am generally unfamiliar with power analysis and am just piecing together this simulation as a learning device.

1 comment:

  1. Great post. Shouldn't the line:
    p <- pt(t,n)
    Actually be:
    p <- pt(t,n-1)

    Since there are n-1 degrees of freedom for one sample, on statistic test?