R makes it easy to combine different kinds of plots into one overall graph. This may be useful to visualize both basic measures of central tendency (median, quartiles etc.) *and* the distribution of a certain variable. Moreover, so called *cut-off* values can be added to the graph.

In this blog post, I show how to combine box and jitter plots using the `ggplot2`

package.

First of all, we need to install and load the R packages required for the following steps. Since we want to do the installation and loading using the `pacman`

package, we need to check whether this package has been installed already. If not, it will be installed and loaded. If yes, it will just be loaded (line 1). Furthermore we need the R packages `ggplot2`

and `Hmisc`

. This time, the `p_load`

function checks whether these packages have been installed already and either installs and loads or just loads them (line 2).

if (!require("pacman")) install.packages("pacman") pacman::p_load(ggplot2, Hmisc)

In a second step, we create three random variables (*var.scale*, *var.group*, *var.cutoff*) with n=300.

*var.scale*is a numeric variable with a mean value of about 50 and a standard deviation of about 17.*var.group*is a factor variable comprising the groups*male*dnd*female*.*var.cutoff*was calculated based on*var.scale*using predefined cut-off values (0 – 40 ==*low*, 41 –60 =*medium*, >60 ==*high*).

var.scale <- round(rnorm(300, 50, 17)) var.group <- rbinom(300, 1, .5) var.group <- factor(var.group, levels = c(0:1), labels = c("male", "female")) var.cutoff <- ifelse(var.scale <= 40, 1, ifelse(var.scale > 40 & var.scale <= 60, 2, 3)) var.cutoff <- factor(var.cutoff, levels = c(3:1), labels = c("high", "medium", "low"))

The *describe()* function of the ** Hmisc** package returns some basic measures of central tendency.

Hmisc::describe(var.scale)

## var.scale ## n missing unique Info Mean .05 .10 .25 .50 ## 300 0 71 1 51.25 24.00 30.90 41.00 50.00 ## .75 .90 .95 ## 63.25 70.00 76.00 ## ## lowest : 8 10 14 16 17, highest: 85 97 100 102 104

Hmisc::describe(var.group)

## var.group ## n missing unique ## 300 0 2 ## ## male (141, 47%), female (159, 53%)

Hmisc::describe(var.cutoff)

## var.cutoff ## n missing unique ## 300 0 3 ## ## high (87, 29%), medium (141, 47%), low (72, 24%)

Since the `ggplot2`

package requires the variables to be in a data frame, we have to create a new data frame *df* comprising our predefined variables using the `data.frame()`

function.

df <- data.frame(var.scale, var.cutoff, var.group)

Using the functions `xlab()`

, `ylab()`

and `ggtitle()`

, axis labels and plot title will be defined.

Box plots will be created using the `geom_boxplot()`

function, with `width`

specifying the boxes' width :-).

Jitter plots will be created using the `geom_jitter()`

function. In addition, specifications have been made for `colour`

and `position`

and `size`

of the dots.

ggplot(df) + xlab("Group") + ylab("Scale") + ggtitle("Combination of Box and Jitter Plot") + geom_boxplot(aes(var.group, var.scale), width=0.5) + geom_jitter(aes(var.group, var.scale, colour = var.cutoff), position = position_jitter(width = .15, height=-0.7), size=2) + scale_y_continuous(limits=c(0, 101), breaks = seq(0, 110, 10)) + scale_color_manual(name="Legend", values=c("red", "blue3", "green3"))

Finally, we are going to format both Y-axis and legend using the functions `scale_y_continuous()`

and `scale_color_manual()`

.

Hi Norbert,

great post! I like this type of diagram because it is really informative (good “ink to information”-relation). Used it as well for a client, and came across the following issue:

I noticed some data points were plotted twice. It took me quite some time to figure out why. In one subgroup there were just two outliers, but four outliers appeared in the plot. geom_jitter plots all the points, regardless whether they are outliers or not. geom_boxplot plots outliers, regardless of what geom_jitter does.

The solution was to turn off outliers in the geom_boxplot call:

geom_boxplot(aes(outlier.color = NA))

That’s a good hint. Thanks very much!