Preample
“Sample Size Calculation Using R” – the title of this blog post – sounds very comprehensive. However, this post only deals with the sample size collection for one particular test (two sample t-test). A collection of functions for basic sample size calculation can be found in the pwr
package.
Introduction
A couple of months ago, I attended some statistics course about sample size calculation and sample size adjustment. Amongst others, we spoke about a randomized placebo-controlled trial with patients suffering of depression. The trial's primary end point was the difference between placebo and treatment group in the HAM-D total score (some depression score) between baseline and the end of therapy. During the course, the professor asked us to write a “two-liner” returning the sample size per group for a two-sided, two sample t-test with the following parameters:
- alpha = 0.05,
- sigma = 8 ,
- delta = 4 and
- power ≤ 0.90
Here is my solution:
In a first step, I wrote a loop returning the power rising from 0.1 to 0.9 and the corresponding sample sizes respectively. The loop may either be a repeat or a while loop.
While Loop
A while
loop runs and repeats while a specified condition returns TRUE
. The loop terminates only when the condition evaluates to FALSE
. In our case, the loop stops evaluating as soon as p > 0.9.
p <- 0.05 while(p <= 0.9){ p <- p + 0.05 t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided") print(paste(t$power, ceiling(t$n), sep = ' - ')) }
Repeat Loop
In a repeat
loop it is required to define a so-called break
declaration to stop repeating the code. Just like the while
loop, our repeat
loop stops evaluating as soon as p > 0.9.
Both, the while
and the repeat
loop return the same result.
p <- 0.05 repeat{ p <- p + 0.05 t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided") print(paste(t$power, ceiling(t$n), sep = ' - ')) if(p > 0.90){ break } }
## [1] "0.1 - 5" ## [1] "0.15 - 8" ## [1] "0.2 - 12" ## [1] "0.25 - 15" ## [1] "0.3 - 18" ## [1] "0.35 - 21" ## [1] "0.4 - 25" ## [1] "0.45 - 28" ## [1] "0.5 - 32" ## [1] "0.55 - 36" ## [1] "0.6 - 41" ## [1] "0.65 - 45" ## [1] "0.7 - 51" ## [1] "0.75 - 57" ## [1] "0.8 - 64" ## [1] "0.85 - 73" ## [1] "0.9 - 86"
For a very good introduction to loops, see Davies (2016): The Book of R. No Starch Press: San Francisco.
Storing the Results
In case we want to use the results for whatever purpose, we need to store them in a vector. The following example heavily borrows from StackOverflow. The loop returns a data frame containing two variables: pwr
(power) and num
(sample size).
p <- 0.05 pwr <- c() num <- c() while(p <= 0.9){ p <- p + 0.05 t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided") pwr <- c(pwr, t$power) num <- c(num, ceiling(t$n)) df <- data.frame(pwr = pwr, n = num) } rm(list = c(setdiff(ls(), c("df")))) dplyr::glimpse(df)
## Observations: 17 ## Variables: 2 ## $ pwr <dbl> 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55... ## $ n <dbl> 5, 8, 12, 15, 18, 21, 25, 28, 32, 36, 41, 45, 51, 57, 64, ...
Plotting Power and Sample Size
Finally, we plot the relation between power and sample size using the ggplot2
package. The theme I use, stems from the ggpubr
package.
ggplot(df, aes(pwr, n)) + geom_point() + scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.1)) + scale_y_continuous(breaks = seq(0, max(df$n)+10, 10)) + geom_text(aes(label = n), nudge_y = 4) + ggpubr::theme_pubr() + labs(x = 'Power', y = 'Sample Size', title = 'Power and Sample Size', subtitle = expression('Two-sided, two sample t-test with' ~ sigma ~ '= 8,' ~ delta ~ '= 4,' ~ alpha ~ '= 0.05'))
As we see, a power of 0.90 requires a sample size of 86 patients per treatment group for a two-sided, two sample t-test with sd = 8, delta = 4, and alpha = 0.05.