Sample Size Calculation Using R

Preample

“Sample Size Calculation Using R” – the title of this blog post – sounds very comprehensive. However, this post only deals with the sample size collection for one particular test (two sample t-test). A collection of functions for basic sample size calculation can be found in the pwr package.

Introduction

A couple of months ago, I attended some statistics course about sample size calculation and sample size adjustment. Amongst others, we spoke about a randomized placebo-controlled trial with patients suffering of depression. The trial's primary end point was the difference between placebo and treatment group in the HAM-D total score (some depression score) between baseline and the end of therapy. During the course, the professor asked us to write a “two-liner” returning the sample size per group for a two-sided, two sample t-test with the following parameters:

• alpha = 0.05,
• sigma = 8 ,
• delta = 4 and
• power ≤ 0.90

Here is my solution:

In a first step, I wrote a loop returning the power rising from 0.1 to 0.9 and the corresponding sample sizes respectively. The loop may either be a repeat or a while loop.

While Loop

A while loop runs and repeats while a specified condition returns TRUE. The loop terminates only when the condition evaluates to FALSE. In our case, the loop stops evaluating as soon as p > 0.9.

p <- 0.05
while(p <= 0.9){
p <- p + 0.05
t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided")
print(paste(t\$power, ceiling(t\$n), sep = ' - '))
}

Repeat Loop

In a repeat loop it is required to define a so-called break declaration to stop repeating the code. Just like the while loop, our repeat loop stops evaluating as soon as p > 0.9.

Both, the while and the repeat loop return the same result.

p <- 0.05
repeat{
p <- p + 0.05
t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided")
print(paste(t\$power, ceiling(t\$n), sep = ' - '))
if(p > 0.90){
break
}
}
##  "0.1 - 5"
##  "0.15 - 8"
##  "0.2 - 12"
##  "0.25 - 15"
##  "0.3 - 18"
##  "0.35 - 21"
##  "0.4 - 25"
##  "0.45 - 28"
##  "0.5 - 32"
##  "0.55 - 36"
##  "0.6 - 41"
##  "0.65 - 45"
##  "0.7 - 51"
##  "0.75 - 57"
##  "0.8 - 64"
##  "0.85 - 73"
##  "0.9 - 86"

For a very good introduction to loops, see Davies (2016): The Book of R. No Starch Press: San Francisco.

Storing the Results

In case we want to use the results for whatever purpose, we need to store them in a vector. The following example heavily borrows from StackOverflow. The loop returns a data frame containing two variables: pwr (power) and num (sample size).

p <- 0.05
pwr <- c()
num <- c()
while(p <= 0.9){
p <- p + 0.05
t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided")
pwr <- c(pwr, t\$power)
num <- c(num, ceiling(t\$n))
df <- data.frame(pwr = pwr, n = num)
}
rm(list = c(setdiff(ls(), c("df"))))
dplyr::glimpse(df)
## Observations: 17
## Variables: 2
## \$ pwr <dbl> 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55...
## \$ n   <dbl> 5, 8, 12, 15, 18, 21, 25, 28, 32, 36, 41, 45, 51, 57, 64, ...

Plotting Power and Sample Size

Finally, we plot the relation between power and sample size using the ggplot2 package. The theme I use, stems from the ggpubr package.

ggplot(df, aes(pwr, n)) +
geom_point() +
scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.1)) +
scale_y_continuous(breaks = seq(0, max(df\$n)+10, 10)) +
geom_text(aes(label = n), nudge_y = 4) +
ggpubr::theme_pubr() +
labs(x = 'Power', y = 'Sample Size',
title = 'Power and Sample Size',
subtitle = expression('Two-sided, two sample t-test with' ~ sigma ~ '= 8,' ~ delta ~ '= 4,' ~ alpha ~ '= 0.05')) As we see, a power of 0.90 requires a sample size of 86 patients per treatment group for a two-sided, two sample t-test with sd = 8, delta = 4, and alpha = 0.05. 