Intro
Telegram is a cloud-based and cross-platform instant messaging service. Unlike WhatsApp, Telegram clients exist not only for mobile devices but also for desktop operating systems. In addition, there is also a web based client.
In this blog post I show, how to import Telegram chats into R and how to plot a chat using the qdap
package.
R packages
The following code will install load and / or install the R packages required for this blog post.
if (!require("pacman")) install.packages("pacman") pacman::p_load(readr, qdap, lubridate)
Scraping the data
A very straightforward way to save Telegram chats is to use the Chrome extension Save Telegram Chat History. On Quora, Stacy Wu explains how to use it:
- Visit https://web.telegram.org.
- Select a peer you want to get the chat history from.
- Load the messages.
- Select all text (Ctrl+A) and copy it (Ctrl+C).
- Paste the text into a text editor (Ctrl+V) and save to a local file.
I have saved the chat to a local csv-file. Since the first two lines contain non-tabular information, they need to be manually removed. Furthermore, undesired line breaks must be manually removed as well.
Importing the data
The following line of code shows how to import the csv file into R using the read_csv()
function of the readr
package.
mydata <- readr::read_csv("telegram.csv", col_names=FALSE)
Data wrangling
After importing the file, our data frame consists of two string variables: X1
containing information about day and time of the conversations and X2
containing the names of the persons involved in the chat as well as the chat text. With the following lines of code we create 4 new variables:
day
containing the dates of the chats,time
containing the times of day of the chats,person
containing the names of the persons involved in the chat,txt
containing the actual chat text.
mydata$day <- stringr::str_sub(mydata$X1, 1, 10) mydata$day <- lubridate::dmy(mydata$day) mydata$time <- stringr::str_sub(mydata$X1, 12, 19) mydata$time <- lubridate::hms(mydata$time) mydata$person <- stringr::str_extract(mydata$X2, "[^:]*") mydata$person <- factor(mydata$person, levels = unique(mydata$person), labels = c('Me', 'Other')) mydata$txt <- gsub(".*:\\s*","", mydata$X2) mydata <- mydata[, c(3:6)] head(mydata, 2)
## # A tibble: 2 Ă— 4 ## day time person txt ## <date> <S4: Period> <fctr> <chr> ## 1 2017-01-20 21H 10M 14S Me Hello ## 2 2017-01-20 21H 11M 42S Other Huhu
Gradient word cloud
Since the chat involves only two persons, I decided to plot it as gradient word cloud, a visualization technique developed by Tyler Rinker. The function gradient_cloud()
I use in the next code snippet is part of his qdap
package. Gradient word clouds “color words with a gradient based on degree of usage between two individuals” (See).
gradient_cloud(mydata$txt, mydata$person, title = "Gradient word cloud of Telegram chat")
The chat I have ploted is very short and, thus, not very telling. I'm wondering how it looks in a couple of months.
Dear Prof. Norbert,
thank you fro this topic.
i can’t use the rules and get the results, because I saved telegram chat history in (c/users/me/my documents) and named file (telegram.csv)
when I copy this: mydata <- readr::read_csv(“telegram.csv”, col_names=FALSE)
and past it to R console
and enter
shows this message:
Error: ‘telegram.csv’ does not exist in current working directory (‘C:/Users/john/Documents’).
what can I do?
Is there a solution?
I am looking forward to reply.
Best wishes,
I think you have to make sure to save “telegram.csv” in your current working directory (C:/Users/john/Documents).
You can check:
setwd(“C:/Users/john/Documents”)
“telegram.csv” %in% list.files(pattern=’*.csv’)
Good luck,
Norbert