How to calculate Odds Ratios in Case-Control Studies using R


In June 2017 I've started working at the Clinical Trial Centre Leipzig at Leipzig University. Since my knowledge in statistics is rather poor, my employer offered me to attend some seminars in Medical Biometry at the University of Heidelberg. The first seminar I attended was called “Basics of Epidemiology”. At the first day, we learned how to calculate so called odds ratios in case-control studies using a simple pocket calculator.

In this blog post, I will show, how to calculate a simple odds ratio with 95% CI using R.

Data simulation

The data I'm using in this blog post were simulated using the wakefield package. The following code returns a data frame with 2 binary variables (Exposition and Disease) and 1.000 cases.


mydata <- data.frame(Exposition = group(n = 1000, x = c('yes', 'no'), 
                             prob = c(0.75, 0.25)),
                     Disease = group(n = 1000, x = c('yes', 'no'), 
                             prob = c(0.75, 0.25)))
## [1] 1000    2
##   Exposition Disease
## 1        yes     yes
## 2         no      no
## 3         no     yes
## 4        yes     yes
## 5        yes     yes
## 6        yes     yes

Based on this data frame, we calculate a table showing how many patients with exposition vs. no exposition developed a disease vs. no disease.

tab <-table(mydata$Exposition, mydata$Disease)
##       yes  no
##   yes 569 210
##   no  163  58

Odds Ratio Calculation

In order to get to know whether the risk for developing a disease is significantly higher in patients having a certain exposition, we need to calculate the odds ratio and its 95% CI.

The following function will return a data frame containing these values.

# return odds ratio with 95%ci
f <- function(x) {
  or <- round((x[1] * x[4]) / (x[2] * x[3]), 2)
  cil <- round(exp(log(or) - 1.96 * sqrt(1/x[1] + 1/x[2] + 1/x[3] + 1/x[4])), 2)
  ciu <- round(exp(log(or) + 1.96 * sqrt(1/x[1] + 1/x[2] + 1/x[3] + 1/x[4])), 2)
  df <- data.frame(matrix(ncol = 3, nrow = 1, 
                              dimnames = list(NULL, c('CI_95_lower', 'OR', 'CI_95_upper'))))
  df[1,] <- rbind(c(cil, or, ciu))
  df <-

Now, we can deploy the function on our table tab.

df.or <- f(tab)
knitr::kable(df.or, align = 'c')
CI_95_lower OR CI_95_upper
0.68 0.96 1.35

As the results indicate, patients with a disposition have no higher risk to develop a disease than patients having no disposition.


About norbert

Biometrician at Clinical Trial Centre, Leipzig University (GER), with degrees in sociology (MA) and public health (MPH).
This entry was posted in Indroduction and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.