## Intro

In June 2017 I've started working at the Clinical Trial Centre Leipzig at Leipzig University. Since my knowledge in statistics is rather poor, my employer offered me to attend some seminars in Medical Biometry at the University of Heidelberg. The first seminar I attended was called “Basics of Epidemiology”. At the first day, we learned how to calculate so called odds ratios in case-control studies using a simple pocket calculator.

In this blog post, I will show, how to calculate a simple odds ratio with 95% CI using R.

## Data simulation

The data I'm using in this blog post were simulated using the `wakefield`

package. The following code returns a data frame with 2 binary variables (`Exposition`

and `Disease`

) and 1.000 cases.

library(wakefield) mydata <- data.frame(Exposition = group(n = 1000, x = c('yes', 'no'), prob = c(0.75, 0.25)), Disease = group(n = 1000, x = c('yes', 'no'), prob = c(0.75, 0.25))) dim(mydata)

## [1] 1000 2

head(mydata)

## Exposition Disease ## 1 yes yes ## 2 no no ## 3 no yes ## 4 yes yes ## 5 yes yes ## 6 yes yes

Based on this data frame, we calculate a table showing how many patients with exposition vs. no exposition developed a disease vs. no disease.

tab <-table(mydata$Exposition, mydata$Disease) tab

## ## yes no ## yes 569 210 ## no 163 58

## Odds Ratio Calculation

In order to get to know whether the risk for developing a disease is significantly higher in patients having a certain exposition, we need to calculate the odds ratio and its 95% CI.

The following function will return a data frame containing these values.

# return odds ratio with 95%ci f <- function(x) { or <- round((x[1] * x[4]) / (x[2] * x[3]), 2) cil <- round(exp(log(or) - 1.96 * sqrt(1/x[1] + 1/x[2] + 1/x[3] + 1/x[4])), 2) ciu <- round(exp(log(or) + 1.96 * sqrt(1/x[1] + 1/x[2] + 1/x[3] + 1/x[4])), 2) df <- data.frame(matrix(ncol = 3, nrow = 1, dimnames = list(NULL, c('CI_95_lower', 'OR', 'CI_95_upper')))) df[1,] <- rbind(c(cil, or, ciu)) df <- as.data.frame(df) }

Now, we can deploy the function on our table `tab`

.

df.or <- f(tab) knitr::kable(df.or, align = 'c')

CI_95_lower | OR | CI_95_upper |
---|---|---|

0.68 | 0.96 | 1.35 |

As the results indicate, patients with a disposition have no higher risk to develop a disease than patients having no disposition.

My youngest grandson is the second son of a second son of a second son of a second son of a second son. How can this scenario be interpreted in terms of odds?