Scoring Questionnaires: QSCORER version 0.0.10 has been released

A new release of the qscorer package is now available on GitHub.

Scoring procedures

The package (version 0.0.10) provides procedures for scoring the following health-related questionnaires:

  • Atrial Fibrillation Effect on QualiTy-of-Life Questionnaire (AFEQT)
  • Beck Depression Inventory (BDI, BDI-II)
  • Behavior Rating Inventory of Executuve Function, adult version (BRIEF-A)
  • Dutch Eating Behavior Questionnaire, German version (DEBQ)
  • Eating Disorder Examination Questionnaire, short form (EDE-Q8)
  • EORTC QLQ-C30 Quality of Life Questionnaire
  • Epworth Sleepiness Scale (ESS)
  • European Quality of Life Five Dimension Three Level Scale Questionnaire (EQ-5D-3L)
  • General Self-Efficacy Scale (GSES)
  • Hospital Anxiety and Depression Scale (HADS)
  • Impact of Weight on Quality of Life-Lite Questionnaire (IWQOL-Lite)
  • Internaltional Index of Erectile Function, short form (IIEF-5)
  • International Physical Activity Questionnaire, short form (IPAQ)
  • Patient Health Questionnaire-9 (PHQ-9)
  • Patient Health Questionnaire-15 (PHQ-15)
  • Rosenberg Self-Esteem Scale (RSES)
  • Severe Respiratory Questionnaire (SRI)
  • Skala zur Selbstregulation (REG) (German)
  • Social Support Questionnaire, short form (F-SozU-K-7) (German)
  • Weight Bias Internalization Scale (WBIS)
  • Weight Self-Stigma Questionnaire (WSSQ)
  • Yale Food Addiction Scale, Version 2.0 (YFAS V2.0)

The scoring functions usually have the following arguments:

  • data: A data frame containing the items of the questionnaire in a pre-specified order. The data.frame may contain further variables.
  • items: A character vector with the item names in a pre-specified order, or a numeric vector indicating the column numbers of the items in data.
  • keep: Logical, whether to keep the single items and whether to return variables containing the number of non-missing items on each scale for each respondent. The default is TRUE.
  • nvalid: A numeric value indicating the number of non-missing items required for score calculations. A default is pre-specified for each questionnaire.
  • digits: Integer of length one: value to round to. No rounding by default.

If there are items that need to be scored reversely:

  • reverse: Items to be scored reversely. These items can be specified either by name or by index. Default is pre-specified.

Data

Furthermore, the package contains real-life data of the following questionnaires:

  • Beck Depression Inventory II (df.test.bdi)
  • Dutch Eating Behavior Questionnaire, German version (df.test.debq)
  • EORTC QLQ-C30 Quality of Life Questionnaire (df.test.eortcc30)
  • European Quality of Life Five Dimension Three Level Scale Questionnaire (df.test.eq5d3l)
  • Hospital Anxiety and Depression Scale (df.test.hads)
  • Patient Health Questionnaire-9 (df.test.phq9)

These real-life data are publicly available; sources can be found in the package documentation. The following output shows the df.test.phq9 data set (data of the PHQ-9 (a.k.a. PHQ-D) questionnaire).

dplyr::glimpse(qscorer::df.test.phq9)
## Observations: 1,337
## Variables: 11
## $ age    <dbl> 79, 62, 71, 65, 63, 68, 52, 88, 71, 77, 69, 55, 71, 65, 69, 65, 71, 53, 86, 58, 69, 70, 69, 69, 74, 74, 60…
## $ gender <fct> f, f, m, f, f, f, m, m, f, f, f, m, f, f, f, f, f, m, f, f, f, m, f, m, m, m, f, f, m, f, m, m, f, f, f, f…
## $ phq9_1 <dbl> 1, 3, 2, 0, 0, 0, 1, 0, 0, 2, 1, 1, 0, 3, 0, 0, 0, 2, 0, 0, 2, 3, 0, 0, 0, 3, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2…
## $ phq9_2 <dbl> 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 2, 0, 1, 0, 1…
## $ phq9_3 <dbl> 3, 2, 2, 2, 1, 0, 1, 3, 1, 0, 1, 1, 0, 3, 1, 0, 0, 0, 0, 0, 1, 3, 1, 0, 0, 3, 0, 0, 1, 1, 1, 1, 3, 3, 1, 1…
## $ phq9_4 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 1, 3, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 3, 0, 0, 1, 1, 1, 2, 2, 2, 1, 2…
## $ phq9_5 <dbl> 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 0, 1, 3, 0, 1, 0, 2, 0, 0, 1, 0, 0, 1, 3, 1, 1, 0…
## $ phq9_6 <dbl> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0…
## $ phq9_7 <dbl> 0, 1, 1, 1, 0, 1, 0, 0, 0, 3, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 3, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1…
## $ phq9_8 <dbl> 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1…
## $ phq9_9 <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

Questionnaires without real-life data may be simulated using the simulate_items() function. The following code simulates data for the PHQ-15 questionnaire:

df.phq15 <- qscorer::simulate_items(num_cols = 15, item_name = 'PHQ15', item_range = 0:2)

Two more arguments have default values and don't need to be explicitly specified:

  • num_rows (Number of rows; default = 100)
  • prop_mis (Proportion of missing values to be put into each item; default = 0.05)

The function returns a data frame including the items of the questionnaire and an id variable.

dplyr::glimpse(df.phq15)
## Observations: 100
## Variables: 16
## $ id       <chr> "001", "002", "003", "004", "005", "006", "007", "008", "009", "010", "011", "012", "013", "014", "015",…
## $ PHQ15_1  <int> 2, 2, 2, 1, 2, 1, 1, 1, 2, 0, 1, 1, 0, 1, 2, 0, 2, 2, 0, 0, 0, 0, 2, NA, 2, 1, NA, 1, 2, 1, 0, 2, 2, 0, …
## $ PHQ15_2  <int> 2, 1, 1, 2, 0, 0, 2, NA, 1, 1, 1, NA, 1, 2, 1, 0, 1, 1, 1, 2, 1, 2, 1, 2, 0, 2, 2, 2, 0, 2, 2, 1, 1, 1, …
## $ PHQ15_3  <int> 2, 1, 2, 1, NA, 0, 0, 0, 0, 2, 1, 0, 1, 0, 2, 2, 0, 1, 2, 0, 2, 1, 0, 0, 0, 0, 0, 2, 1, 1, 1, 2, 0, 1, 1…
## $ PHQ15_4  <int> 1, 0, 1, 1, 1, 1, 2, 2, NA, 2, 0, 2, 2, 2, 1, 2, 2, 2, 0, NA, 2, 1, 2, 1, 0, NA, 1, 1, NA, 0, 0, 2, 2, 0…
## $ PHQ15_5  <int> 0, 0, 2, 0, 1, 1, 2, 1, 0, 2, 0, 0, 2, 1, 0, 0, 1, 1, 1, 1, 0, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 2, 1, 2,…
## $ PHQ15_6  <int> 1, 1, 2, 1, 2, 2, 1, 2, 2, 0, NA, 0, 1, 0, 2, 2, 1, 1, 0, 1, 2, 2, 2, 1, 0, 1, 2, 1, 2, 2, 0, 1, 1, 1, 1…
## $ PHQ15_7  <int> 1, 2, 0, 0, 2, 0, 1, 1, 2, 0, 2, 0, 2, NA, 0, 2, 2, 2, 2, 2, 1, 2, 1, 2, 1, 0, 0, 2, 0, 1, 0, 1, 1, 0, 0…
## $ PHQ15_8  <int> 2, 1, 1, 1, 2, 2, 1, 0, 0, NA, 1, 0, 1, 1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 0, 2, 0, 2, 1, 0, 0, 0, 2, 0, 1…
## $ PHQ15_9  <int> 0, 2, 2, 1, 1, 1, 0, 2, 0, 1, 2, 0, 0, 1, 2, 0, 1, 0, 0, 2, 2, NA, 2, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 0…
## $ PHQ15_10 <int> 0, 2, 2, 0, 2, 0, 0, 1, 0, 0, NA, NA, 0, 0, 2, 2, 0, NA, 2, 1, 1, 2, 2, 1, 2, 2, 0, 1, 1, 1, 1, 0, 0, 1,…
## $ PHQ15_11 <int> 1, 2, NA, 0, 2, 0, 2, 2, 1, 2, 2, 1, 2, 0, 1, 1, 0, 0, 2, 2, 2, 1, 2, 2, 1, 2, 0, 2, 2, 2, 1, 1, 0, 0, 2…
## $ PHQ15_12 <int> 2, 1, 2, 2, 0, 0, 0, 0, 1, 1, 2, 2, 1, 0, 0, 2, 1, 0, 0, 1, 2, 0, 1, 0, 1, 2, 1, 0, 2, NA, 0, 1, 0, 0, 0…
## $ PHQ15_13 <int> 2, 0, 0, 1, 1, 0, 0, 2, 2, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 2, 1, 0, 1, 2, 0, 1, 2, NA, 1, 2, 0, 1, 0, 2…
## $ PHQ15_14 <int> 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 2, 1, 0, 0, 0, 1, 0, 0, 2, 1, 2, 2, 2, 1, 0, 2, 1, 0, 2, 1, 2, 2, 2,…
## $ PHQ15_15 <int> NA, 1, 0, 1, 2, 0, 1, 1, 2, 0, 2, 0, 0, 1, 2, 2, 0, 1, 1, 2, 1, 1, 0, 0, 0, 0, 0, 0, NA, 0, 2, NA, 0, NA…

Now, the corresponding scoring function (scoring_phq15()) can be apllied to this data:

library(dplyr)
df.phq15 <- df.phq15 %>% 
  qscorer::scoring_phq15(.,items = 2:16, keep = FALSE)

The function returns a data frame including the PHQ-15 score and the corresponding cut-off values (severe, moderate, mild, …):

dplyr::glimpse(df.phq15)
## Observations: 100
## Variables: 3
## $ id           <chr> "001", "002", "003", "004", "005", "006", "007", "008", "009", "010", "011", "012", "013", "014", "0…
## $ score.phq15  <dbl> 17.142857, 17.000000, 19.285714, 13.000000, 20.357143, 8.000000, 14.000000, 16.071429, 15.000000, 11…
## $ cutoff.phq15 <fct> Severe, Severe, Severe, Moderate, Severe, Mild, Moderate, Severe, Severe, Moderate, Severe, Mild, Se…

Documentation

The package documentation is hosted on GitHub Pages.

Scoring questionnaires using the ‘qscorer’ package

Motivation

Health-related questionnaires are used in many psychological studies and clinical trials. I know from my own experience that information about scoring procedures are often scattered among numerous sources and, thus, hard to find. Since I frequently work with questionnaires, I have decided to write qscorer, an R package with scoring procedures for health-related questionnaires.

plot of chunk unnamed-chunk-1

Scoring procedures

The qscorer package (version 0.3.0) provides procedures for scoring the following health-related questionnaires:

  • Atrial Fibrillation Effect on QualiTy-of-Life Questionnaire (AFEQT)
  • Beck Depression Inventory (BDI, BDI-II)
  • Behavior Rating Inventory of Executuve Function, adult version (BRIEF-A)
  • Dutch Eating Behavior Questionnaire, German version (DEBQ)
  • Eating Disorder Examination Questionnaire, short form (EDE-Q8)
  • Epworth Sleepiness Scale (ESS)
  • European Quality of Life Five Dimension Three Level Scale Questionnaire (EQ-5D-3L)
  • General Self-Efficacy Scale (GSES)
  • Hospital Anxiety and Depression Scale (HADS)
  • Impact of Weight on Quality of Life-Lite Questionnaire (IWQOL-Lite)
  • International Physical Activity Questionnaire, short form (IPAQ)
  • Patient Health Questionnaire-9 (PHQ-9)
  • Patient Health Questionnaire-15 (PHQ-15)
  • Rosenberg Self-Esteem Scale (RSES)
  • Severe Respiratory Questionnaire (SRI)
  • Skala zur Selbstregulation (REG) (German)
  • Social Support Questionnaire, short form (F-SozU-K-7) (German)
  • Weight Bias Internalization Scale (WBIS)
  • Weight Self-Stigma Questionnaire (WSSQ)
  • Yale Food Addiction Scale, Version 2.0 (YFAS V2.0)

Documentation

The package documentation is hosted by GitHub Pages.

Installation

You can install the development version of qscorer from GitHub with:

devtools::install_github('nrkoehler/qscorer')

To Do

In the near future, I will add control structures to the scoring functions in order to prevent items with out-of-range values to be scored. qscorer is a growing package. Thus, further scoring procedures will be added.

Scoring the PHQ-9 Questionnaire Using R

Intro

The PHQ-9 is the nine-item depression module of the Patient Health Questionnaire. Each of the items is scored on a 4-point Likert scale ranging from 0 (not at all) to 3 (nearly every day). The items are summed to obtain a total score ranging from 0 to 27 with higher scores indicating greater severity of depression. Based on the total score, different levels of severity can be evaluated with 0–4, 5–9, 10–14, 15–19 and 20–27 points indicating “minimal”, “mild”, “moderate”, “moderately severe” and “severe” depression.

The PHQ-9 questionnaire may be found under the following link.

In this blog post, I show how to calculate the PHQ-9 score and the PHQ-9 severety levels.

Packages and data

The dataset we are going to use was published in Plos One. The file has got a Digital Object Identifier (doi) and may be imported into R using the read_delim() function of the readr package.

library(readr)
library(dplyr)
library(ggplot2)

df.phq9 <- readr::read_delim("https://doi.org/10.1371/journal.pone.0156167.s001", 
                             delim = ";", 
                             escape_double = FALSE, 
                             trim_ws = TRUE) %>%
            select(starts_with('phq9'))

glimpse(df.phq9)
## Observations: 1,337
## Variables: 9
## $ phq9_1 <int> 1, 3, 2, 0, 0, 0, 1, 0, 0, 2, 1, 1, 0, 3, 0, 0, 0, 2, 0...
## $ phq9_2 <int> 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
## $ phq9_3 <int> 3, 2, 2, 2, 1, 0, 1, 3, 1, 0, 1, 1, 0, 3, 1, 0, 0, 0, 0...
## $ phq9_4 <int> 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 1, 3, 0, 1, 0, 0, 0, 1, 0...
## $ phq9_5 <int> 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0...
## $ phq9_6 <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0...
## $ phq9_7 <int> 0, 1, 1, 1, 0, 1, 0, 0, 0, 3, 1, 1, 0, 1, 0, 0, 0, 0, 0...
## $ phq9_8 <int> 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0...
## $ phq9_9 <int> 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...

The Scoring Function

The scoring_phq9 function requires a data frame containing the PHQ-9 items (data) and a vector containing the items' names (items.phq9) as input parameters.

scoring_phq9 <- function(data, items.phq9) {
  data %>%
    mutate(nvalid.phq9 = rowSums(!is.na(select(., items.phq9))),
           nvalid.phq9 = as.integer(nvalid.phq9),
           mean.temp = rowSums(select(., items.phq9), na.rm = TRUE)/nvalid.phq9,
           phq.01.temp = as.integer(unlist(data[items.phq9[1]])),
           phq.02.temp = as.integer(unlist(data[items.phq9[2]])),
           phq.03.temp = as.integer(unlist(data[items.phq9[3]])),
           phq.04.temp = as.integer(unlist(data[items.phq9[4]])),
           phq.05.temp = as.integer(unlist(data[items.phq9[5]])),
           phq.06.temp = as.integer(unlist(data[items.phq9[6]])),
           phq.07.temp = as.integer(unlist(data[items.phq9[7]])),
           phq.08.temp = as.integer(unlist(data[items.phq9[8]])),
           phq.09.temp = as.integer(unlist(data[items.phq9[9]]))) %>%
    mutate_at(vars(phq.01.temp:phq.09.temp),
              funs(ifelse(is.na(.), round(mean.temp), .))) %>%
    mutate(score.temp = rowSums(select(., phq.01.temp:phq.09.temp), na.rm = TRUE),
           score.phq9 = ifelse(nvalid.phq9 >= 7, as.integer(round(score.temp)), NA),
           cutoff.phq9 = case_when(
             score.phq9 >= 20 ~ 'severe',
             score.phq9 >= 15 ~ 'moderately severe',
             score.phq9 >= 10 ~ 'moderate',
             score.phq9 >= 5 ~ 'mild',
             score.phq9 < 5 ~ 'minimal'),
             cutoff.phq9 = factor(cutoff.phq9, levels = c('minimal', 'mild',
                                                          'moderate', 'moderately severe',
                                                          'severe'))) %>%
    select(-ends_with("temp"))

}

Example

The function adds three variables to the original data frame:

  • nvalid.phq9: Number of variables with valid values,
  • score.phq9: PHQ-9 score (0 – 27),
  • cutoff.phq9: PHQ-9 severety levels (minimal, mild, moderate, moderately severe, severe)
items.phq9 <- paste0('phq9_', seq(1, 9, 1))
df.phq9 <- df.phq9 %>%
  scoring_phq9(., items.phq9)
glimpse(df.phq9)
## Observations: 1,337
## Variables: 12
## $ phq9_1      <int> 1, 3, 2, 0, 0, 0, 1, 0, 0, 2, 1, 1, 0, 3, 0, 0, 0,...
## $ phq9_2      <int> 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ phq9_3      <int> 3, 2, 2, 2, 1, 0, 1, 3, 1, 0, 1, 1, 0, 3, 1, 0, 0,...
## $ phq9_4      <int> 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 1, 3, 0, 1, 0, 0, 0,...
## $ phq9_5      <int> 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 2, 0, 1, 0, 0, 0,...
## $ phq9_6      <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ phq9_7      <int> 0, 1, 1, 1, 0, 1, 0, 0, 0, 3, 1, 1, 0, 1, 0, 0, 0,...
## $ phq9_8      <int> 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,...
## $ phq9_9      <int> 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ nvalid.phq9 <int> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,...
## $ score.phq9  <int> 7, 10, 7, 9, 3, 2, 3, 4, 5, 7, 7, 8, 0, 11, 1, 0, ...
## $ cutoff.phq9 <fct> mild, moderate, mild, mild, minimal, minimal, mini...

Visualization

PHQ-9 Score

ggplot(df.phq9, aes(score.phq9)) +
  geom_density(fill = 'blue', alpha = 0.2) +
  scale_x_continuous(limits = c(0, 27), breaks = c(0,5,10,15,20,27)) +
  labs(x = 'PHQ-9 Score', y = 'Density') +
  theme_bw()

plot of chunk unnamed-chunk-4

PHQ-9 Severety Levels

ggplot(df.phq9, aes(x = cutoff.phq9, fill = cutoff.phq9)) +
  geom_bar(colour = 'black') +
  scale_fill_brewer(type = 'seq') +
  labs(x = NULL, y = NULL, fill = NULL) +
  theme_bw()

plot of chunk unnamed-chunk-5

Scoring the St George’s Respiratory Questionnaire (SGRQ) using R

Background

plot of chunk sgrq

The St George's Respiratory Questionnaire (SGRQ) is an instrument for the measuring of Health-Related Quality-of-Life in patients with diseases of airways obstruction. The SGRQ contains 50 items covering three domains:

  • Symptoms (8 items),
  • Activity (16 items), and
  • Impacts (26 items).

In addition, a total summary scale may be computed (1, 2).

All scales have a score range between 0 and 100 with higher scores indicating a worse quality of life [2]. The items are either scored on 3-point-, 4-point-, and 5-point Likert scales, or they are binary-choice items that must be answered with either “yes” or “no”. Each item has an empirically derived regression weight.

Scoring the SGRQ

Based on the SGRQ Scoring Manual, I have written the R-package sgrqr for calculating the SGRQ scores.

Installation

The package is hosted on GitHub and may be installed using the following code:

devtools::install_github("nrkoehler/sgrqr")
library(sgrqr)

Functions and data

The core of sgrqr is the function scoring_sgrq(). It must be applied to a data frame with the 50 SGRQ items and one id variable. Moreover, the package contains two data frames with simulated values. Unlike sgrq.full, sgrq.na has some missing values.

names(sgrq.full)
##  [1] "id"       "sgrq.1"   "sgrq.2"   "sgrq.3"   "sgrq.4"   "sgrq.5"  
##  [7] "sgrq.6"   "sgrq.7"   "sgrq.8"   "sgrq.11a" "sgrq.11b" "sgrq.11c"
## [13] "sgrq.11d" "sgrq.11e" "sgrq.11f" "sgrq.11g" "sgrq.15a" "sgrq.15b"
## [19] "sgrq.15c" "sgrq.15d" "sgrq.15e" "sgrq.15f" "sgrq.15g" "sgrq.15h"
## [25] "sgrq.15i" "sgrq.9"   "sgrq.10"  "sgrq.12a" "sgrq.12b" "sgrq.12c"
## [31] "sgrq.12d" "sgrq.12e" "sgrq.12f" "sgrq.13a" "sgrq.13b" "sgrq.13c"
## [37] "sgrq.13d" "sgrq.13e" "sgrq.13f" "sgrq.13g" "sgrq.13h" "sgrq.14a"
## [43] "sgrq.14b" "sgrq.14c" "sgrq.14d" "sgrq.16a" "sgrq.16b" "sgrq.16c"
## [49] "sgrq.16d" "sgrq.16e" "sgrq.17"
head(sgrq.na[1:6])
##   id sgrq.1 sgrq.2 sgrq.3 sgrq.4 sgrq.5
## 1  1      1      1      1      1      1
## 2  2      2      2      2      5      4
## 3  3      2     NA      4      3      5
## 4  4      1      1      1      5      4
## 5  5      3      2      2      4      1
## 6  6      4      2      2      5      4

Usage

When applied to a data frame, the function returns a data frame containing the SGRQ score values and an id variable.

df <- scoring_sgrq(sgrq.full, id = 'id')
head(df)
##   id sgrq.ss sgrq.as sgrq.is sgrq.ts
## 1  1   100.0    63.4    38.9    56.5
## 2  2    36.1    62.9    37.2    44.8
## 3  3    56.0    44.8    45.7    47.1
## 4  4    52.3    35.8    51.1    46.7
## 5  5    65.8    50.0    63.0    59.5
## 6  6    53.4    51.3    37.1    44.1

If no id variable is specified, a data frame containing the score values only is returned.

df <- scoring_sgrq(sgrq.na)
head(df)
##   sgrq.ss sgrq.as sgrq.is sgrq.ts
## 1   100.0    63.4    38.9    56.5
## 2    36.1    68.8    45.1    50.8
## 3    58.5    44.8    49.6    49.6
## 4    52.3    35.8    51.1    46.7
## 5    65.8    56.9    65.3    62.8
## 6    53.4    58.1    40.8    48.2

Difficulties in handling missing values

In the SGRQ scoring manual it says:

The Symptoms component will tolerate a maximum of 2 missed items. The weight for the missed item is subtracted from the total possible weight for the Symptoms component (662.5) and from the Total weight (3989.4).

Since item weights depend on the actual answers given, it remains unclear (at least for me) how to determine the weight of a missing item. The weight of the item “If you have a wheeze, is it worse in the morning?”, for example, is “0.0” vs. “62.0” depending on the answer “no” vs. “yes”. The algorithm implemented in scoring_sgrq() ascribes the missing item the highest weight possible (so 62.0 rather than 0.0). In order to be able to substract the weight of the missing item “from the total possible weight for the Symptoms component and from the Total weight”, it needs to be checked whether no more than 2 items are missing, and if so, which items are missing. Since this is very extensive to implement, I decided to program the algorithm the quick and dirty way.

First, I check whether no more than 2 items are missing:

  # return position of first item 
  a <- which(names(X)=="sgrq.1")   
  # return position of last item
  z <- which(names(X)=="sgrq.8")
  # calculate number of missing items
  Y$NMISS.ss <- rowSums(is.na(X[, c(a:z)]))

Second, I replace all missing values with the corresponding highest item weight:

  # replace missing values with highest weight
  for (i in a:z) {
    for (j in 1:nrow(X)){
      X[j, i] <- ifelse(is.na(X[j, i] == TRUE), repl.val[i-1], X[j, i])
    }}

Third, I calculate the score:

 # calculate score
  Y$sgrq.ss <- rowSums(X[, vars]) / 662.5 * 100

And finally, I replace the score value by NA if more than 2 items of the Symptom score are missing:

 Y$sgrq.ss <- ifelse(Y$NMISS.ss > 2, NA, Y$sgrq.ss)

Rather than substracting the weight of the missing item “form the total possible weight”, I “add” the highest possible item weight to the missing item, but only if no more than 2 items are missing.

I'm looking forward to getting some feedback to this post. I'm sure there is a better solution.

References

  1. Jones, P. W., F. H. Quirk, and C. M. Baveystock. 1991. The St George Respiratory Questionnaire. Respiratory Medicine 85 (September): 25-31. doi:10.1016/S0954-6111(06)80166-6.

  2. Jones, Paul W, Frances H Quirk, Chlo M Baveystock, and Peter Littlejohns. 1992. A Self-Complete Measure of Health Status for Chronic Airflow Limitation. Am Rev Respir Dis 145 (6): 1321-7.

Scoring the Severe Respiratory Insufficiency Questionnaire (SRI) using R

Background

The SRI is a multidimensional general health questionnaire “to assess HRQL in patients with chronic respiratory failure due to various underlying diseases” [1]. Based on 49 items, seven sub scales addressing the following domains are calculated:

plot of chunk sri
  • Respiratory Complaints (8 items);
  • Physical Function (6 items);
  • Attendant Symptoms and Sleep (7 items);
  • Social Relationships (6 items);
  • Anxiety (5 items);
  • Psychological Well-Being (9 items);
  • Social Functioning (8 items).

Based on the sub scales, a total summary scale is calculated. All scales have a score range between 0 and 100 with higher scores indicating a better quality of life [2]. All items are scored on a 5-point Likert scale ranging from 1 (completely untrue) to 5 (always true). The majority of items need to be recoded (recoded value = 6 – raw value).

Scoring the SRI

Based on the SRI Scoring Manual, I have written the R-package srir for calculating the SRI scores.

Installation

The package is hosted on GitHub and may be installed using the following code:

devtools::install_github("nrkoehler/srir")
library(srir)

Functions and data

The core of srir is the function scoring_sri(). It must be applied to a data frame with the 49 SRI items and one id variable. Moreover, the package contains two data frames with simulated values. Unlike df.full, df.na has some missing values.

names(df.full)
##  [1] "id"     "sri.1"  "sri.2"  "sri.3"  "sri.4"  "sri.5"  "sri.6" 
##  [8] "sri.7"  "sri.8"  "sri.9"  "sri.10" "sri.11" "sri.12" "sri.13"
## [15] "sri.14" "sri.15" "sri.16" "sri.17" "sri.18" "sri.19" "sri.20"
## [22] "sri.21" "sri.22" "sri.23" "sri.24" "sri.25" "sri.26" "sri.27"
## [29] "sri.28" "sri.29" "sri.30" "sri.31" "sri.32" "sri.33" "sri.34"
## [36] "sri.35" "sri.36" "sri.37" "sri.38" "sri.39" "sri.40" "sri.41"
## [43] "sri.42" "sri.43" "sri.44" "sri.45" "sri.46" "sri.47" "sri.48"
## [50] "sri.49"
head(df.na[1:6])
##   id sri.1 sri.2 sri.3 sri.4 sri.5
## 1  1     1     2     1     5     4
## 2  2     5     3     5     3     2
## 3  3     2     2     4    NA     5
## 4  4    NA     3     2     4     4
## 5  5     3     4     3     5     2
## 6  6     2     2     2     3     4

Usage

When applied to a data frame, the function returns a data frame containing the SRI score values and an id variable.

df <- scoring_sri(df.full, id = 'id')
head(df)
##   id sri.rc sri.pf sri.as sri.sr sri.ax sri.wb sri.sf sri.ss
## 1  1   37.5   50.0   46.4   95.8     35   52.8   28.1   49.4
## 2  2   56.2   45.8   60.7   41.7     45   66.7   59.4   53.6
## 3  3   50.0   50.0   35.7   45.8     35   58.3   53.1   46.9
## 4  4   46.9   54.2   39.3   75.0     55   38.9   43.8   50.4
## 5  5   71.9   50.0   28.6   58.3     50   33.3   40.6   47.5
## 6  6   53.1   66.7   28.6   45.8     30   47.2   37.5   44.1

If no id variable is specified, a data frame containing the score values only is returned.

df <- scoring_sri(df.na)
head(df)
##   sri.rc sri.pf sri.as sri.sr sri.ax sri.wb sri.sf sri.ss
## 1   37.5   50.0   46.4   95.8   35.0   52.8   28.1   49.4
## 2   56.2   45.8   70.8   25.0   31.2   66.7   67.9   52.0
## 3   50.0   50.0   33.3   50.0   31.2   53.1   46.4   44.9
## 4   46.4     NA   37.5   75.0   68.8   38.9   42.9     NA
## 5   65.0   50.0   12.5   55.0   50.0   33.3   40.6   43.8
## 6   60.7   70.0   25.0   37.5   33.3   40.6   37.5   43.5

References

  1. Struik, Fransien M., Huib A.M. Kerstjens, Gerrie Bladder, Roy Sprooten, Marianne Zijnen, Jerryll Asin, Thys van der Molen, and Peter J. Wijkstra. 2013. “The Severe Respiratory Insufficiency Questionnaire Scored Best in the Assessment of Health-Related Quality of Life in Chronic Obstructive Pulmonary Disease.” Journal of Clinical Epidemiology 66 (10): 1166–74.

  2. Windisch, Wolfram, Klaus Freidel, Bernd Schucher, Hansjrg Baumann, Matthias Wiebel, Heinrich Matthys, and Franz Petermann. 2003. “The Severe Respiratory Insufficiency (SRI) Questionnaire a Specific Measure of Health-Related Quality of Life in Patients Receiving Home Mechanical Ventilation.” Journal of Clinical Epidemiology 56 (8): 752–59.