## Intro

I'm currently involved in a research project called EFFECT. EFFECT is a multicentre, cluster-randomised, placebo-controlled cross-over trial evaluating antiseptic body wash of patients on intensive care units (ICU). The trial is to test whether daily antiseptic body wash reduces the risk of intensive care unit (ICU)-acquired primary bacteraemia and ICU-acquired multidrug-resistant organisms. EFFECT requires two types of data: (1) The patients' individual ward-movement history and
(2) microbiological test results (see Meissner 2017).

According to the study protocol, positive blood tests do count as infection unless there is a negative blood test within 48 hours after the positive blood test.

In this blog post, I show how to solve this problem on a computational level.

## The Problem

The following code chunk provides an hypothetical example of the microbiological data I have to deal with. The data frame `df.mibi` contains 4 variables:

• ID: Patient id (only 1 patient in this example);
• ORGANISM: name of skin commensal organism found in some blood sample,
• RESULT: laboratory test result (POS vs. NEG);
• DATE: date of laboratory test
```library(tidyverse)
library(lubridate)

df.mibi <- tibble(
ID = paste0("ID_", rep(1, 11)),
ORGANISM = c(rep('Propionibacterium acnes', 2),
rep('Staphylococcus epidermidis', 2),
rep('Staphylococcus capitis', 2),
rep('', 5)),
RESULT = c(rep('POS', 6), rep('NEG', 5)),
DATE = ymd(c(
"2018-02-07", "2018-02-12", "2018-02-13", "2018-02-20",
"2018-02-21", "2018-03-18", "2018-02-01", "2018-02-06",
"2018-02-10", "2018-02-21", "2018-04-05")
)
)
```

## My Idea

In a first step, I separated `df.mibi` into two data frames:

• `df.POS`: containing positive blood tests only
• `df.NEG`: containing negative blood tests only
```df.POS <- df.mibi %>%
filter(RESULT == 'POS')
df.NEG <- df.mibi %>%
filter(RESULT == 'NEG')
```

In a second step, I removed two variables from `df.NEG` (RESULT, ORGANISM), grouped the data frame by `ID`, and put all dates belonging to one `ID` into the list column `data` using the `nest()` function of the `tidyr` package

```df.NEG <- df.NEG %>%
select(ID, DATE) %>%
group_by(ID) %>%
nest()
```

This is how both data frames look like:

```df.POS
```
```## # A tibble: 6 x 4
##   ID    ORGANISM                   RESULT DATE
##   <chr> <chr>                      <chr>  <date>
## 1 ID_1  Propionibacterium acnes    POS    2018-02-07
## 2 ID_1  Propionibacterium acnes    POS    2018-02-12
## 3 ID_1  Staphylococcus epidermidis POS    2018-02-13
## 4 ID_1  Staphylococcus epidermidis POS    2018-02-20
## 5 ID_1  Staphylococcus capitis     POS    2018-02-21
## 6 ID_1  Staphylococcus capitis     POS    2018-03-18
```
```df.NEG
```
```## # A tibble: 1 x 2
##   ID    data
##   <chr> <list>
## 1 ID_1  <tibble [5 x 1]>
```

In a third step, I tried to check whether one of the negative test (stored in the list variable `data`) lies within the time interval positive test + 48 hours (`TIME`).
I did the mapping using the `map2()` function of the `purrr` package:

```# merging and mapping
df.TOTAL <- df.POS %>%
left_join(df.NEG, by = 'ID') %>%
mutate(TIME = interval(DATE, DATE + days(2)),
RESULT = map2(data, "DATE", TIME, ~ .x %within% .y))
```

Unfortunaltely, my code did not work. The RESULT variable should be logical and return `TRUE` in case of a negative test result up to 2 days after the positive test. Instead it is a list and returns `NULL`.

```df.TOTAL
```
```## # A tibble: 6 x 6
##   ID    ORGANISM   RESULT DATE       data   TIME
##   <chr> <chr>      <list> <date>     <list> <S4: Interval>
## 1 ID_1  Propionib~ <NULL> 2018-02-07 <tibb~ 2018-02-07 UTC--2018-02-09 UTC
## 2 ID_1  Propionib~ <NULL> 2018-02-12 <tibb~ 2018-02-12 UTC--2018-02-14 UTC
## 3 ID_1  Staphyloc~ <NULL> 2018-02-13 <tibb~ 2018-02-13 UTC--2018-02-15 UTC
## 4 ID_1  Staphyloc~ <NULL> 2018-02-20 <tibb~ 2018-02-20 UTC--2018-02-22 UTC
## 5 ID_1  Staphyloc~ <NULL> 2018-02-21 <tibb~ 2018-02-21 UTC--2018-02-23 UTC
## 6 ID_1  Staphyloc~ <NULL> 2018-03-18 <tibb~ 2018-03-18 UTC--2018-03-20 UTC
```

## The Solution

Not even one hour after I posted my question to StackOverflow, a user who calles himself “utubun” found the following solution:

```df.TOTAL <- df.POS %>%
left_join(df.NEG, by = 'ID') %>%
mutate(TIME = interval(DATE, DATE + days(2)),
RESULT = map2_lgl(data, TIME, ~ any(.x\$DATE %within% .y)))
df.TOTAL
```
```## # A tibble: 6 x 6
##   ID    ORGANISM   RESULT DATE       data   TIME
##   <chr> <chr>      <lgl>  <date>     <list> <S4: Interval>
## 1 ID_1  Propionib~ FALSE  2018-02-07 <tibb~ 2018-02-07 UTC--2018-02-09 UTC
## 2 ID_1  Propionib~ FALSE  2018-02-12 <tibb~ 2018-02-12 UTC--2018-02-14 UTC
## 3 ID_1  Staphyloc~ FALSE  2018-02-13 <tibb~ 2018-02-13 UTC--2018-02-15 UTC
## 4 ID_1  Staphyloc~ TRUE   2018-02-20 <tibb~ 2018-02-20 UTC--2018-02-22 UTC
## 5 ID_1  Staphyloc~ TRUE   2018-02-21 <tibb~ 2018-02-21 UTC--2018-02-23 UTC
## 6 ID_1  Staphyloc~ FALSE  2018-03-18 <tibb~ 2018-03-18 UTC--2018-03-20 UTC
```

It works!!! Thank you very much! 🙂