Scoring the PHQ-9 Questionnaire Using R

Intro

The PHQ-9 is the nine-item depression module of the Patient Health Questionnaire. Each of the items is scored on a 4-point Likert scale ranging from 0 (not at all) to 3 (nearly every day). The items are summed to obtain a total score ranging from 0 to 27 with higher scores indicating greater severity of depression. Based on the total score, different levels of severity can be evaluated with 0–4, 5–9, 10–14, 15–19 and 20–27 points indicating “minimal”, “mild”, “moderate”, “moderately severe” and “severe” depression.

The PHQ-9 questionnaire may be found under the following link.

In this blog post, I show how to calculate the PHQ-9 score and the PHQ-9 severety levels.

Packages and data

The dataset we are going to use was published in Plos One. The file has got a Digital Object Identifier (doi) and may be imported into R using the read_delim() function of the readr package.

library(readr)
library(dplyr)
library(ggplot2)

df.phq9 <- readr::read_delim("https://doi.org/10.1371/journal.pone.0156167.s001", 
                             delim = ";", 
                             escape_double = FALSE, 
                             trim_ws = TRUE) %>%
            select(starts_with('phq9'))

glimpse(df.phq9)
## Observations: 1,337
## Variables: 9
## $ phq9_1 <int> 1, 3, 2, 0, 0, 0, 1, 0, 0, 2, 1, 1, 0, 3, 0, 0, 0, 2, 0...
## $ phq9_2 <int> 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
## $ phq9_3 <int> 3, 2, 2, 2, 1, 0, 1, 3, 1, 0, 1, 1, 0, 3, 1, 0, 0, 0, 0...
## $ phq9_4 <int> 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 1, 3, 0, 1, 0, 0, 0, 1, 0...
## $ phq9_5 <int> 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0...
## $ phq9_6 <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0...
## $ phq9_7 <int> 0, 1, 1, 1, 0, 1, 0, 0, 0, 3, 1, 1, 0, 1, 0, 0, 0, 0, 0...
## $ phq9_8 <int> 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0...
## $ phq9_9 <int> 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...

The Scoring Function

The scoring_phq9 function requires a data frame containing the PHQ-9 items (data) and a vector containing the items' names (items.phq9) as input parameters.

scoring_phq9 <- function(data, items.phq9) {
  data %>%
    mutate(nvalid.phq9 = rowSums(!is.na(select(., items.phq9))),
           nvalid.phq9 = as.integer(nvalid.phq9),
           mean.temp = rowSums(select(., items.phq9), na.rm = TRUE)/nvalid.phq9,
           phq.01.temp = as.integer(unlist(data[items.phq9[1]])),
           phq.02.temp = as.integer(unlist(data[items.phq9[2]])),
           phq.03.temp = as.integer(unlist(data[items.phq9[3]])),
           phq.04.temp = as.integer(unlist(data[items.phq9[4]])),
           phq.05.temp = as.integer(unlist(data[items.phq9[5]])),
           phq.06.temp = as.integer(unlist(data[items.phq9[6]])),
           phq.07.temp = as.integer(unlist(data[items.phq9[7]])),
           phq.08.temp = as.integer(unlist(data[items.phq9[8]])),
           phq.09.temp = as.integer(unlist(data[items.phq9[9]]))) %>%
    mutate_at(vars(phq.01.temp:phq.09.temp),
              funs(ifelse(is.na(.), round(mean.temp), .))) %>%
    mutate(score.temp = rowSums(select(., phq.01.temp:phq.09.temp), na.rm = TRUE),
           score.phq9 = ifelse(nvalid.phq9 >= 7, as.integer(round(score.temp)), NA),
           cutoff.phq9 = case_when(
             score.phq9 >= 20 ~ 'severe',
             score.phq9 >= 15 ~ 'moderately severe',
             score.phq9 >= 10 ~ 'moderate',
             score.phq9 >= 5 ~ 'mild',
             score.phq9 < 5 ~ 'minimal'),
             cutoff.phq9 = factor(cutoff.phq9, levels = c('minimal', 'mild',
                                                          'moderate', 'moderately severe',
                                                          'severe'))) %>%
    select(-ends_with("temp"))

}

Example

The function adds three variables to the original data frame:

  • nvalid.phq9: Number of variables with valid values,
  • score.phq9: PHQ-9 score (0 – 27),
  • cutoff.phq9: PHQ-9 severety levels (minimal, mild, moderate, moderately severe, severe)
items.phq9 <- paste0('phq9_', seq(1, 9, 1))
df.phq9 <- df.phq9 %>%
  scoring_phq9(., items.phq9)
glimpse(df.phq9)
## Observations: 1,337
## Variables: 12
## $ phq9_1      <int> 1, 3, 2, 0, 0, 0, 1, 0, 0, 2, 1, 1, 0, 3, 0, 0, 0,...
## $ phq9_2      <int> 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ phq9_3      <int> 3, 2, 2, 2, 1, 0, 1, 3, 1, 0, 1, 1, 0, 3, 1, 0, 0,...
## $ phq9_4      <int> 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 1, 3, 0, 1, 0, 0, 0,...
## $ phq9_5      <int> 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 2, 0, 1, 0, 0, 0,...
## $ phq9_6      <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ phq9_7      <int> 0, 1, 1, 1, 0, 1, 0, 0, 0, 3, 1, 1, 0, 1, 0, 0, 0,...
## $ phq9_8      <int> 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,...
## $ phq9_9      <int> 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ nvalid.phq9 <int> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,...
## $ score.phq9  <int> 7, 10, 7, 9, 3, 2, 3, 4, 5, 7, 7, 8, 0, 11, 1, 0, ...
## $ cutoff.phq9 <fct> mild, moderate, mild, mild, minimal, minimal, mini...

Visualization

PHQ-9 Score

ggplot(df.phq9, aes(score.phq9)) +
  geom_density(fill = 'blue', alpha = 0.2) +
  scale_x_continuous(limits = c(0, 27), breaks = c(0,5,10,15,20,27)) +
  labs(x = 'PHQ-9 Score', y = 'Density') +
  theme_bw()

plot of chunk unnamed-chunk-4

PHQ-9 Severety Levels

ggplot(df.phq9, aes(x = cutoff.phq9, fill = cutoff.phq9)) +
  geom_bar(colour = 'black') +
  scale_fill_brewer(type = 'seq') +
  labs(x = NULL, y = NULL, fill = NULL) +
  theme_bw()

plot of chunk unnamed-chunk-5