### Preample

“Sample Size Calculation Using R” – the title of this blog post – sounds very comprehensive. However, this post only deals with the sample size collection for one particular test (two sample t-test). A collection of functions for basic sample size calculation can be found in the `pwr`

package.

### Introduction

A couple of months ago, I attended some statistics course about sample size calculation and sample size adjustment. Amongst others, we spoke about a randomized placebo-controlled trial with patients suffering of depression. The trial's primary end point was the difference between placebo and treatment group in the HAM-D total score (some depression score) between baseline and the end of therapy. During the course, the professor asked us to write a “two-liner” returning the sample size per group for a two-sided, two sample t-test with the following parameters:

- alpha = 0.05,
- sigma = 8 ,
- delta = 4 and
- power ≤ 0.90

Here is my solution:

In a first step, I wrote a loop returning the power rising from 0.1 to 0.9 and the corresponding sample sizes respectively. The loop may either be a repeat or a while loop.

### While Loop

A `while`

loop runs and repeats while a specified condition returns `TRUE`

. The loop terminates only when the condition evaluates to `FALSE`

. In our case, the loop stops evaluating as soon as p > 0.9.

p <- 0.05 while(p <= 0.9){ p <- p + 0.05 t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided") print(paste(t$power, ceiling(t$n), sep = ' - ')) }

### Repeat Loop

In a `repeat`

loop it is required to define a so-called `break`

declaration to stop repeating the code. Just like the `while`

loop, our `repeat`

loop stops evaluating as soon as p > 0.9.

Both, the `while`

and the `repeat`

loop return the same result.

p <- 0.05 repeat{ p <- p + 0.05 t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided") print(paste(t$power, ceiling(t$n), sep = ' - ')) if(p > 0.90){ break } }

## [1] "0.1 - 5" ## [1] "0.15 - 8" ## [1] "0.2 - 12" ## [1] "0.25 - 15" ## [1] "0.3 - 18" ## [1] "0.35 - 21" ## [1] "0.4 - 25" ## [1] "0.45 - 28" ## [1] "0.5 - 32" ## [1] "0.55 - 36" ## [1] "0.6 - 41" ## [1] "0.65 - 45" ## [1] "0.7 - 51" ## [1] "0.75 - 57" ## [1] "0.8 - 64" ## [1] "0.85 - 73" ## [1] "0.9 - 86"

For a very good introduction to loops, see Davies (2016): The Book of R. No Starch Press: San Francisco.

### Storing the Results

In case we want to use the results for whatever purpose, we need to store them in a vector. The following example heavily borrows from StackOverflow. The loop returns a data frame containing two variables: `pwr`

(power) and `num`

(sample size).

p <- 0.05 pwr <- c() num <- c() while(p <= 0.9){ p <- p + 0.05 t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided") pwr <- c(pwr, t$power) num <- c(num, ceiling(t$n)) df <- data.frame(pwr = pwr, n = num) } rm(list = c(setdiff(ls(), c("df")))) dplyr::glimpse(df)

## Observations: 17 ## Variables: 2 ## $ pwr <dbl> 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55... ## $ n <dbl> 5, 8, 12, 15, 18, 21, 25, 28, 32, 36, 41, 45, 51, 57, 64, ...

### Plotting Power and Sample Size

Finally, we plot the relation between power and sample size using the `ggplot2`

package. The theme I use, stems from the `ggpubr`

package.

ggplot(df, aes(pwr, n)) + geom_point() + scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.1)) + scale_y_continuous(breaks = seq(0, max(df$n)+10, 10)) + geom_text(aes(label = n), nudge_y = 4) + ggpubr::theme_pubr() + labs(x = 'Power', y = 'Sample Size', title = 'Power and Sample Size', subtitle = expression('Two-sided, two sample t-test with' ~ sigma ~ '= 8,' ~ delta ~ '= 4,' ~ alpha ~ '= 0.05'))

As we see, a power of 0.90 requires a sample size of 86 patients per treatment group for a two-sided, two sample t-test with sd = 8, delta = 4, and alpha = 0.05.