Intro
In June 2017 I've started working at the Clinical Trial Centre Leipzig at Leipzig University. Since my knowledge in statistics is rather poor, my employer offered me to attend some seminars in Medical Biometry at the University of Heidelberg. The first seminar I attended was called “Basics of Epidemiology”. At the first day, we learned how to calculate so called odds ratios in case-control studies using a simple pocket calculator.
In this blog post, I will show, how to calculate a simple odds ratio with 95% CI using R.
Data simulation
The data I'm using in this blog post were simulated using the wakefield
package. The following code returns a data frame with 2 binary variables (Exposition
and Disease
) and 1.000 cases.
library(wakefield) mydata <- data.frame(Exposition = group(n = 1000, x = c('yes', 'no'), prob = c(0.75, 0.25)), Disease = group(n = 1000, x = c('yes', 'no'), prob = c(0.75, 0.25))) dim(mydata)
## [1] 1000 2
head(mydata)
## Exposition Disease ## 1 yes yes ## 2 no no ## 3 no yes ## 4 yes yes ## 5 yes yes ## 6 yes yes
Based on this data frame, we calculate a table showing how many patients with exposition vs. no exposition developed a disease vs. no disease.
tab <-table(mydata$Exposition, mydata$Disease) tab
## ## yes no ## yes 569 210 ## no 163 58
Odds Ratio Calculation
In order to get to know whether the risk for developing a disease is significantly higher in patients having a certain exposition, we need to calculate the odds ratio and its 95% CI.
The following function will return a data frame containing these values.
# return odds ratio with 95%ci f <- function(x) { or <- round((x[1] * x[4]) / (x[2] * x[3]), 2) cil <- round(exp(log(or) - 1.96 * sqrt(1/x[1] + 1/x[2] + 1/x[3] + 1/x[4])), 2) ciu <- round(exp(log(or) + 1.96 * sqrt(1/x[1] + 1/x[2] + 1/x[3] + 1/x[4])), 2) df <- data.frame(matrix(ncol = 3, nrow = 1, dimnames = list(NULL, c('CI_95_lower', 'OR', 'CI_95_upper')))) df[1,] <- rbind(c(cil, or, ciu)) df <- as.data.frame(df) }
Now, we can deploy the function on our table tab
.
df.or <- f(tab) knitr::kable(df.or, align = 'c')
CI_95_lower | OR | CI_95_upper |
---|---|---|
0.68 | 0.96 | 1.35 |
As the results indicate, patients with a disposition have no higher risk to develop a disease than patients having no disposition.
My youngest grandson is the second son of a second son of a second son of a second son of a second son. How can this scenario be interpreted in terms of odds?