Intro
Venn diagrams – named after the English logician and philosopher John Venn – “illustrate the logical relationships between two or more sets of items” with overlapping circles.
In this tutorial, I'll show how to plot a three set venn diagram using R
and the ggplot2
package.
Packages and Data
For the R code to run, we need to install and load three R packages. Unlike tidyverse
and ggforce
, the limma
package must be installed from Bioconductor rather than from CRAN.
Moreover, we create a random data frame using the rbinom()
function.
source("http://www.bioconductor.org/biocLite.R") biocLite("limma") library(limma) library(tidyverse) library(ggforce) set.seed((123)) mydata <- data.frame(A = rbinom(100, 1, 0.8), B = rbinom(100, 1, 0.7), C = rbinom(100, 1, 0.6)) %>% mutate_all(., as.logical)
Drawing the Circles
Next, we create a data frame defining the x and y coordinates for the three circles we want to draw and a variable defining the labels. For plotting the circles – the basic structure of our venn diagram – we need the geom_circle()
function of the ggforce
package. We employ the geom_circle()
-function of the ggforce
package to actually draw the circles. With the parameter r
(= 1.5), we define the radius of the circles.
df.venn <- data.frame(x = c(0, 0.866, -0.866), y = c(1, -0.5, -0.5), labels = c('A', 'B', 'C')) ggplot(df.venn, aes(x0 = x, y0 = y, r = 1.5, fill = labels)) + geom_circle(alpha = .3, size = 1, colour = 'grey') + coord_fixed() + theme_void()
Furthermore, we need a data frame with the values we want the plot and the coordinates for plotting these values. The values can be obtained using the vennCounts()
function of the limma
package. Since ggplot2
requires data frames we need to first transform the vdc
object (class VennCounts) into a matrix and then into a data frame. In addition, we need to add the x and y coordinates for plotting the values.
vdc <- vennCounts(mydata) class(vdc) <- 'matrix' df.vdc <- as.data.frame(vdc)[-1,] %>% mutate(x = c(0, 1.2, 0.8, -1.2, -0.8, 0, 0), y = c(1.2, -0.6, 0.5, -0.6, 0.5, -1, 0))
The final Plot
Finally, we'll customize the look of our venn diagram and plot the values.
ggplot(df.venn) + geom_circle(aes(x0 = x, y0 = y, r = 1.5, fill = labels), alpha = .3, size = 1, colour = 'grey') + coord_fixed() + theme_void() + theme(legend.position = 'bottom') + scale_fill_manual(values = c('cornflowerblue', 'firebrick', 'gold')) + scale_colour_manual(values = c('cornflowerblue', 'firebrick', 'gold'), guide = FALSE) + labs(fill = NULL) + annotate("text", x = df.vdc$x, y = df.vdc$y, label = df.vdc$Counts, size = 5)
Thank you so much for the code.
Great Example.
In order to enhance the applicability I would remove the limma from the example. It really takes a long time to install the library. At the end there is nothing more than a simple binary data frame.
For three sets of arbitrary length I would suggest a function like this:
get_venn_data <- function(u1, u2, u3){
res <- data.frame(
A=integer(),
B=integer(),
C=integer(),
Counts=integer()
)
nix <- c(0,0,0,0)
a <- c(1,0,0,nrow(as.data.frame(u1)))
b <- c(0,1,0,nrow(as.data.frame(u2)))
c <- c(0,0,1,nrow(as.data.frame(u3)))
ab <- c(1,1,0,nrow(as.data.frame(intersect(u1,u2))))
ac <- c(1,0,1,nrow(as.data.frame(intersect(u1,u3))))
bc <- c(0,1,1,nrow(as.data.frame(intersect(u2,u3))))
abc <- c(1,1,1,nrow(as.data.frame(intersect(intersect(u2,u3),u3))))
res <- rbind(res, nix)
res <- rbind(res, a)
res <- rbind(res, b)
res <- rbind(res, c)
res <- rbind(res, ab)
res <- rbind(res, ac)
res <- rbind(res, bc)
res <- rbind(res, abc)
colnames(res) <- c(“A”,”B”,”C”,”Counts”)
return(res)
}
vdc <- get_venn_data(c(1,2,3),c(3,4,5),c(1,2,3,4,5))
Thanks a lot! That seems to be a brilliant idea!