LEGIDA rallies: A heat map of day times using ggplot2

LEGIDA is a Leipzig based offshoot of the rigth-wing and xenophobic PEGIDA movement.

Since January 2015, LEGIDA has held at least one rally per month against what they call the “Islamisation of the Western world”.

Usually, the Leipzig Police Department publishes online reports describing what happened at these rallies.

In my following blog posts I will show, what kind of information can be derived from these reports and how these information can be visualized. My blog posts will have a technical rather than a political character.

Today, I'm going to show how to find information about the time of the day the rallays took place and how to visualize these specifications of time using a pie chart.

In the first code chunk, we will simply load the concatenated police report as a character vector named txt. The text of the police reports is rather unstructured. However, specifications of time are constantly made in the format: two digits followed by one colon followed by two digits, e.g. '19:00' for 7 p.m. Thus, all specifications of time can be extracted with a simple Regular Expression. Since we are not particulary interested in exact to the minute specifications, we save the speifications of hours as a numeric vector.

time.str <- unlist(regmatches(txt, gregexpr("\\d{2}\\:\\d{2}", txt)))
time.str

[1] “19:00” “17:00” “21:00” “15:00” “18:00” “17:00” “19:00” “18:45”
[9] “18:30” “20:20” “17:44” “21:45” “18:00” “17:30” “19:00” “20:20”
[17] “21:30” “19:00” “20:00” “21:15” “21:30” “22:30” “19:15” “19:45”
[25] “21:00” “18:15” “16:15” “19:30” “20:15” “21:30” “22:00” “16:45”
[33] “19:00” “17:20” “18:00” “19:00” “20:15” “19:00” “21:15” “22:45”
[41] “19:10” “19:40” “21:00” “22:00” “19:30” “21:30” “23:00” “21:45”
[49] “17:30” “19:00” “21:00” “19:00” “20:00” “21:15” “21:45” “17:15”
[57] “17:00” “18:00” “19:10” “21:00” “17:00” “19:15” “20:40” “20:15”
[65] “21:45” “18:00” “22:00” “18:30” “18:00” “18:30” “18:00” “18:45”
[73] “19:50” “20:00” “20:20” “21:15” “21:00” “18:45” “19:00” “20:40”
[81] “20:00” “20:45” “21:15” “19:50” “20:40” “21:00” “20:00” “20:45”
[89] “21:00” “19:00” “20:00” “20:50” “21:45” “21:00” “21:30” “19:00”
[97] “20:00” “20:45” “21:20” “21:30” “19:00” “19:30” “20:20” “21:00”
[105] “21:35” “18:00” “19:00” “18:00” “20:50” “19:00” “20:00” “20:50”
[113] “20:50” “19:00” “19:50” “20:35” “20:55” “17:35” “18:35” “19:00”
[121] “20:45” “21:00” “18:00” “18:45” “19:00” “19:35” “20:20” “20:30”
[129] “20:40” “22:00” “18:00” “19:20” “02:30” “19:05” “21:10” “17:40”
[137] “18:45” “20:00” “21:20” “18:30” “19:20” “20:00” “21:45” “19:00”
[145] “19:30” “20:45” “21:45” “23:00”

time.str <- as.numeric(stringr::str_sub(time.str, 1, 2))

[1] 19 17 21 15 18 17 19 18 18 20 17 21 18 17 19 20 21 19 20 21 21 22 19
[24] 19 21 18 16 19 20 21 22 16 19 17 18 19 20 19 21 22 19 19 21 22 19 21
[47] 23 21 17 19 21 19 20 21 21 17 17 18 19 21 17 19 20 20 21 18 22 18 18
[70] 18 18 18 19 20 20 21 21 18 19 20 20 20 21 19 20 21 20 20 21 19 20 20
[93] 21 21 21 19 20 20 21 21 19 19 20 21 21 18 19 18 20 19 20 20 20 19 19
[116] 20 20 17 18 19 20 21 18 18 19 19 20 20 20 22 18 19 2 19 21 17 18 20
[139] 21 18 19 20 21 19 19 20 21 23

Since we are only interested in the time span between 12 p.m and 12 a.m., we transform our numeric vector time.str into a vector of class factor containing only day times of the specified span. Afterwards, we save this vector as a table.

time.str <- factor(time.str, levels = c(13:24))
time.tab <- table(time.str)

In the next step, we create a table containing the proportions for each hour and save these specifications into a new vector named time.vecp.

time.tabp <- round(prop.table(table(time.str)), 2)
time.vecp <- as.numeric(as.character((time.tabp)))*100

[1] 0 0 1 1 7 15 24 23 22 4 1 0

Finally, we want to visualize our results. Since the form of a clock can be very good reproduced with a pie chart, we first create a dataframe with twelf segments of the same size (time). To this dataframe, we add two more variables: our proportional time vector (value) and the labels for visualizing the clock (labs).

df <- data.frame(time = rep(1,12),
                 value = time.vecp,
                 labs <- c(1:12))

The pie chart is plotted using ggplot2. The result is kind of a heat map visualizing the day times the LEGIDA rallies usually take place.

library(ggplot2)

  ggplot(df, aes(x = "", y = time, fill = value)) +
    geom_bar(width = 1, stat = "identity", colour = "grey") +
    scale_y_continuous('', limits=c(0, 12), breaks = seq(1,12,1),
                       labels=df$labs) +
    scale_x_discrete('') +
    scale_fill_distiller('Percent', palette = 'Oranges', space = "Lab", direction = 1) +
    coord_polar(theta = "y", start = 0) +
    labs(title = "LEGIDA clock") +
    theme_minimal() +
    theme(axis.text = element_text(size = 18)) 

plot of chunk plot

Obviously, the rallies usually take place between 6 and 9 p.m.


Last update: 2016-08-30, after the 35th^ LEGIDA rally.

Advertisements

About norbert

I am post doc at the Department of Medical Psychology and Sociology, Leipzig University (GER), with degrees in sociology (MA) and public health (MPH).
This entry was posted in Visualizing Data and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s