How to draw a CONSORT flow diagram using R and GraphViz

Intro

The acronym CONSORT stands for “Consolidated Standards of Reporting Trials”. The CONSORT statement is published by the CONSORT group and gives recommendations aiming to improve the quality of reports of randomized clinical trials. Besides a 25 item checklist, the current CONSORT statement (2010) includes the CONSORT flow diagram which is to visualize the progress through the phases of a parallel randomised clinical trial of two groups (see Figure 1).

In this blog post, I will do both show how to “draw” the CONSORT flow diagram using R and discuss the pros and cons of this approach.

Figure 1: CONSORT 2010 Flow Diagram

R Packages

The following R packages are required to create, plot and save flow diagrams: DiagrammeR supports the Graphviz software which consists of a graph description language called DOT. DiagrammeRsvg and rsvg are required to save diagrams to a hard drive.

library(DiagrammeR)
library(DiagrammeRsvg)
library(rsvg)

In case we want to include the flow diagram into a .Rmd document and render it as PDF, we need to install the phantomjs JavaScript library. This can be done using the install_phantomjs() function of the webshot package.

webshot::install_phantomjs()

GraphViz is very flexible regarding the definition of line colors, arrow shapes, node shapes, and many other layout features. However, since the actual purpose of GraphViz is the visualization of graphs (networks), it is difficult to control the position of the nodes.

Unlike in a Cartesian plot, the position of the nodes is not defined by x and y coordinates. Instead, the nodes' position can only be determined by edges (relationships between nodes) and their attributes.

Creating a grid

When I looked at the CONSORT flow diagram (see Figure 1), I had the impression that the nodes may be placed using a grid-based system.

Figure 2: Placement of nodes using a grid
plot of chunk structure

The CONSORT flow diagram conceptualized as grid consists of 21 nodes. That is:

  • 4 nodes with plain text (blue, T1-T4) describing the phase of trial (Enrollment, Allocation, Follow-Up, Analysis) and
  • 9 boxed nodes (black, B1-B9) containing text (“Randomized”) and numerical data (“n=200”).

In order to populate each cell of the grid, we need to add:

  • 4 nodes rendered as tiny and, thus, invisible points (green, P1-P4). We use the points to mimic edges brancing-off from other edges.
  • 4 completely invisible nodes (grey, I1_I4).

The numbers in brackets denote the nodes' ID numbers.

Eventually, the grid structure is created using both vertical and horizontal edges.

Creating the CONSORT Flow Diagram

When creating the actual CONSORT diagram we gonna make use of the grid structure. All we have to do now is to change some attributes of the nodes and edges.

Preparation

To make the flow diagram dynamic, we need to save constant elements (text labels) and dynamic elements (numeric values) into separate vectors (text, values). The paste1()-function is a simple wrapper of the base R paste0()-function making it more convenient to concatinate text labels and numeric values.

# Values ------------------------------------------------------------------
values <- c(210, 10, 200, 100, 100, 10, 10, 90, 90)

# Defining Text Labels ----------------------------------------------------
text <- c('Assessment for\neligibility',
          'Excluded',
          'Randomized',
          'Allocated to\nintervention',
          'Allocated to\nintervention',
          'Lost to follow-up',
          'Lost to follow-up',
          'Analysed',
          'Analysed')

# Defining Function -------------------------------------------------------
paste1 <- function(x, y){
  paste0(x, ' (n=', y, ')')
}

# Concatenating Values and Text Labels ------------------------------------
LABS <- paste1(text, values)
LABS
## [1] "Assessment for\neligibility (n=210)"
## [2] "Excluded (n=10)"                    
## [3] "Randomized (n=200)"                 
## [4] "Allocated to\nintervention (n=100)" 
## [5] "Allocated to\nintervention (n=100)" 
## [6] "Lost to follow-up (n=10)"           
## [7] "Lost to follow-up (n=10)"           
## [8] "Analysed (n=90)"                    
## [9] "Analysed (n=90)"

With the next step, we create a node data frame (ndf) using the create_node_df()-function of the DiagrammR package. The attributes of this function are pretty much self-explanatory.

Node Data Frame

ndf <-
  create_node_df(
    n = 21,
    label = c('Enrollment', 'Allocation', 'Follow-Up', 'Analysis',
              LABS, rep("", 8)),
    style = c(rep("solid", 13), rep('invis', 8)),
    shape = c(rep("plaintext", 4), 
              rep("box", 9),
              rep("point", 8)),
    width = c(rep(2, 4), rep(2.5, 9), rep(0.001, 8)),
    hight = c(rep(0.5, 13), rep(0.001, 8)),
    fontsize = c(rep(14, 4), rep(10, 17)),
    fontname = c(rep('Arial Rounded MT Bold', 4), rep('Courier New', 17)),
    penwidth = 2.0,
    fixedsize = "true")

Edge Data Frame

Furthermore, we create an edge data frame (edf) using the create_edge_df()-function. Edges we don't want to be visible are colored white (with alpha = 0) and don't get an arrowhead. To make edges horizontal, the constraint attribute needs to be set to false. from and to are vectors containing node IDs from which edges are outbound and incoming. They determine the relationships between the nodes. To arrange the code more clearly, I've ordered outbound and incoming edges by columns and rows according to Figure 2.

edf <-
  create_edge_df(
    arrowhead = c(rep('none', 3), rep("vee", 3), rep('none', 2), "vee", rep('none', 6),
                  rep("vee", 3), rep("none", 3), "vee", rep("none", 10)),
    color = c(rep('#00000000', 3), rep('black', 6), rep('#00000000', 6),
              rep('black', 3), rep('#00000000', 3), rep('black', 1),
              rep('#00000000', 2), rep('black', 2), 
              rep('#00000000', 6)),
    constraint = c(rep("true", 18), rep('false', 14)),
    from = c(1, 19, 20, 16, 8, 10, # column 1
             5, 14, 7, 15, 2, 3, # column 2
             18, 6, 21, 17, 9, 11, # column 3
             1, 5, # row 1
             19, 14, # row 2
             20, 7, # row 3
             16, 15, # row 4
             8, 2, # row 5
             10, 3, # row 6
             12, 4), # row 7
    to = c(19, 20, 16, 8, 10, 12, # column 1
           14, 7, 15, 2, 3, 4, # column 2
           6, 21, 17, 9, 11, 13, # column 3
           5, 18, # row 1
           14, 6, # row 2
           7, 21, # row 3
           15, 17, # row 4
           2, 9, # row 5
           3, 11, # row 6
           4, 13)) # row 7

Plotting

Before we are going to plot the flow diagram, we need to create a graph using the create_graph()-function. The graph can be plotted using the render_graph()-function.

# Create Graph ------------------------------------------------------------
g <- create_graph(ndf, 
                  edf,
                  attr_theme = NULL)

# Plotting ----------------------------------------------------------------
render_graph(g)

Figure 3: Simplified CONSORT Flow Diagram
plot of chunk unnamed-chunk-1

Unfortunately, it didn't find a way to left align text within the text boxes. Thus, I didn't manage to draw a fully featured CONSORT flow diagram using R.

Export

Saving the flow diagram to a hard drive is quite straight forward:

export_graph(g, file_name = "CONSORT.png")

File types to be exported are PNG, PDF, SVG, PS and GEXF (graph file format).

Discussion

As demonstrated in this blog post, “drawing” a CONSORT flow diagram using R and GraphViz is rather cumbersome. Because of some limitations regarding the alignment of text within boxes, only a simplyfied form of the CONSORT flow diagram can be drawn. However, the use of R seems to be the only way to make flow diagrams dynamic.

Advertisements

About norbert

Biometrician at Clinical Trial Centre, Leipzig University (GER), with degrees in sociology (MA) and public health (MPH).
This entry was posted in Visualizing Data and tagged . Bookmark the permalink.

3 Responses to How to draw a CONSORT flow diagram using R and GraphViz

  1. Kim Mathiasen says:

    Dear Norbert,

    Thanks so much for sharing this. I do have one problem with the code, though. It produces the plot on a 45 degree angle, and I cannot figure out where the problem is.

    Hope you can help!

    Best regards,
    Kim Mathiasen,
    University of Southern Denmark

    • norbert says:

      Hi Kim, when you create the graph, you need to set attr_theme = NULL. Does that solve your problem? Best Regards, Norbert

      g <- create_graph(ndf, edf, attr_theme = NULL)

      • Kim Mathiasen says:

        Hi Norbert,
        Thanks so much for responing. I have allready tried both with and without setting attr_theme = NULL, and it doesn’t solve the problem. I have been running it on Rstudio on both my mac and my linux machine. Same result… It is really strange.

        Mac (Mac OS 10.13.6 ): RStudio ver. 1.1.383, r ver. 3.5.0
        Linux (Kubuntu 18.04): RStudio ver. 1.1.453, r ver. 3.4.4

        Best regards,
        Kim

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.