Telegram is a cloud-based and cross-platform instant messaging service. Unlike WhatsApp, Telegram clients exist not only for mobile devices but also for desktop operating systems. In addition, there is also a web based client.
In this blog post I show, how to import Telegram chats into R and how to plot a chat using the
The following code will install load and / or install the R packages required for this blog post.
if (!require("pacman")) install.packages("pacman") pacman::p_load(readr, qdap, lubridate)
Scraping the data
- Visit https://web.telegram.org.
- Select a peer you want to get the chat history from.
- Load the messages.
- Select all text (Ctrl+A) and copy it (Ctrl+C).
- Paste the text into a text editor (Ctrl+V) and save to a local file.
I have saved the chat to a local csv-file. Since the first two lines contain non-tabular information, they need to be manually removed. Furthermore, undesired line breaks must be manually removed as well.
Importing the data
The following line of code shows how to import the csv file into R using the
read_csv() function of the
mydata <- readr::read_csv("telegram.csv", col_names=FALSE)
After importing the file, our data frame consists of two string variables:
X1 containing information about day and time of the conversations and
X2 containing the names of the persons involved in the chat as well as the chat text. With the following lines of code we create 4 new variables:
daycontaining the dates of the chats,
timecontaining the times of day of the chats,
personcontaining the names of the persons involved in the chat,
txtcontaining the actual chat text.
mydata$day <- stringr::str_sub(mydata$X1, 1, 10) mydata$day <- lubridate::dmy(mydata$day) mydata$time <- stringr::str_sub(mydata$X1, 12, 19) mydata$time <- lubridate::hms(mydata$time) mydata$person <- stringr::str_extract(mydata$X2, "[^:]*") mydata$person <- factor(mydata$person, levels = unique(mydata$person), labels = c('Me', 'Other')) mydata$txt <- gsub(".*:\\s*","", mydata$X2) mydata <- mydata[, c(3:6)] head(mydata, 2)
## # A tibble: 2 × 4 ## day time person txt ## <date> <S4: Period> <fctr> <chr> ## 1 2017-01-20 21H 10M 14S Me Hello ## 2 2017-01-20 21H 11M 42S Other Huhu
Gradient word cloud
Since the chat involves only two persons, I decided to plot it as gradient word cloud, a visualization technique developed by Tyler Rinker. The function
gradient_cloud() I use in the next code snippet is part of his
qdap package. Gradient word clouds “color words with a gradient based on degree of usage between two individuals” (See).
gradient_cloud(mydata$txt, mydata$person, title = "Gradient word cloud of Telegram chat")
The chat I have ploted is very short and, thus, not very telling. I'm wondering how it looks in a couple of months.