My first R package: streetsofle


A couple of days ago I published my first R package on GitHub. I’ve named the package streetsofle standing for streets of Leipzig because it includes a data set containing the street directory of the German city of Leipzig. The street directory was published by the Statistical Bureau of Leipzig and may be downloaded as PDF file from the following website.

Leipzig is divided into 10 greater adminsitrative districts (“Stadtbezirke”) and 63 smaller local districts (“Ortsteile”). The names of both smaller and greater districts can be found under the following link.

Figure 1: Map of Leipzig

plot of chunk map

Furthermore, the Leipzig area is covered by 34 postal codes (“Postleitzahlen”). That is:

##  [1] "04103" "04105" "04107" "04109" "04129" "04155" "04157" "04158"
##  [9] "04159" "04177" "04178" "04179" "04205" "04207" "04209" "04229"
## [17] "04249" "04275" "04277" "04279" "04288" "04289" "04299" "04315"
## [25] "04316" "04317" "04318" "04319" "04328" "04329" "04347" "04349"
## [33] "04356" "04357"


if (!require("devtools")) install.packages("devtools")

The Dataset

The data frame streetsofle contains 3391 observations and 8 variables. That is:

  • plz: postal code,
  • street_key: street identification number,
  • street_name: street name,
  • street_num: a list of street numbers,
  • bz_key: identification number for greater districts (‘Stadtbezirke’) ,
  • bz_name: names of greater districts (‘Bezirke’),
  • ot_key: identification number for smaller districts (‘Ortsteile’) ,
  • ot_name: names of smaller districts (‘Ortsteile’).

Since street sections without addresses are usually not covered by postal codes, the variable plz contains some missing values (n=169).

Next steps

Writing functions

Within the next couple of weeks I will write some functions to analyse the street directory data. The next code snipped, for example, shows how to calculate the number of smaller districts (‘Ortsteile’) traversed by the streets.

f <- function(x){length(unique(x))}
df <- data.frame(ot_num = tapply(streetsofle$ot_name, streetsofle$street_name, f))
##                      ot_num
## Aachener Straße           1
## Abrahamstraße             1
## Abtnaundorfer Straße      1
## Achatstraße               1
## ...                     ...
## Zwergmispelstraße         1
## Zwetschgenweg             1
## Zwickauer Straße          3
## Zwiebelweg                1

The following table shows the number of smaller districts traversed by the streets of Leipzig.

no. of districts no. of streets
1 2721
2 207
3 49
4 11
5 4
6 2
7 2
10 1

While 91% of the streets don’t traverse any district border, one street traverses the borders of 10 districts.

Shiny Web-App

Based on these functions I’m planning to write a Shiny Web-App providing several search functions.

How to plot animated maps with gganimate


Leipzig is the largest city in the federal state of Saxony, Germany. Since about 2009, Leipzig is one of the fastest growing German cities. Between 2006 and 2016, the population has grown from about 506.000 to 580.000 inhabitants.The largest part of the population growth can be attributed to inward migration.

In this blog post, I show how to create an animated map of Leipzig visualizing the migration balances for each of Leipzig's 63 districts (“Ortsteile”).

The data

The data I will visualize in this blog post are provided by the local Statistical Bureau. They can be downloaded as .csv file from the following website. The shapefile I use, is not publicly available and can be purchased for about 25 Euro from the Statistical Bureau.

R packages

The following code will install load and / or install the R packages required for this blog post.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(readr, rgdal, ggplot2, reshape2)

While the packages readr and rgdal are required for importing the .csv and the shapefile, ggplot2 and gganimate are needed to plot and animate the data. The reshape2 package is required for transforming data frames from wide to long format.

After importing the csv file containing the migration data, we add a key variable (id) to identify each of Leipzig's 63 districts. Using the melt-function of the reshape2 package, we reformat the data frame from wide into long form.

# import csv file
df.migrat <- read_csv2("./Data_LE/Bevölkerungsbewegung_Wanderungen_Wanderungssaldo.csv",
                      col_types = cols(`2015` = col_number()),
                      n_max = 63,
                      locale = locale(encoding = "latin1"),
                      trim_ws = TRUE)
# add id variable
df.migrat$id <- c(1:63)
# drop variables not needed
df.migrat <- df.migrat[,c(1,18,2:17)]
# change name of first variable
colnames(df.migrat)[1] <- 'district'
# convert from wide to long
df.migrat <- reshape2::melt(df.migrat, id.vars=c("district","id"))
# change name of third variable
colnames(df.migrat)[3] <- 'year'

With the next code snippet, we import the shapefile using the readOGR-function of the rgdal package. The object we receive (sh.file) is of class SpatialPolygonsDataFrame. After adding a key variable, we transform this object into an ordinary data frame using the fortify-function of the ggplot2 package. Furthermore, we join both data frames by the key variable id. Before we plot the data, we create a new variable (co) categorizing the migration balance variable (value).

# import shapefile
sh.file <- readOGR("ot.shp", layer="ot")
# add id variable
sh.file$id <- c(1:63)
# Create a data frame
sh.file <- fortify(sh.file, region = 'id')
# Merge data frames by id variable
mydata <- merge(sh.file, df.migrat, by='id')
# create a categorized variable
mydata$co <- ifelse(mydata$value >= 1000, 4,
                   ifelse(mydata$value >= 0, 3,
                                ifelse(mydata$value >= -1000, 2, 1)))
# define levels and labels
mydata$co  <- factor(mydata$co, levels = c(4:1),
                     labels = c('> 1000', '0-1000', '-1000-0', '< -1000'))

Moreover, we define a theme suitable for plotting maps with ggpot2. The following theme is a customized version of a theme I found on Stack Overflow.

theme_opts <- list(theme(panel.grid.minor = element_blank(),
                         panel.grid.major = element_blank(),
                         panel.background = element_blank(),
                         plot.background = element_blank(),
                         panel.border = element_blank(),
                         axis.line = element_blank(),
                         axis.text.x = element_blank(),
                         axis.text.y = element_blank(),
                         axis.ticks = element_blank(),
                         axis.title.x = element_blank(),
                         axis.title.y = element_blank(),
                         plot.title = element_text(size=16)))

Plotting the data

Finally, we can plot the data. The first plot (static) visualizes the migration data for the whole period of time (2000–2015).

ggplot(mydata, aes(x = long, y = lat, group = group, fill=value)) +
  theme_opts + 
  coord_equal() +
  geom_polygon(color = 'grey') +
  labs(title = "Migration balance between 2000 and 2015",
       subtitle = "Static Map",
       caption = 'Data source: Stadt Leipzig, Amt für Statistik und Wahlen, 2015') +
  scale_fill_distiller(name='Migration\nbalance', direction = 1, palette='RdYlGn')

plot of chunk map

For creating an animated plot, we add the frame argument stemming from the gganimate package. With interval = 5 we define the time (in seconds) between the “slides”.

p <- ggplot(mydata, aes(x = long, y = lat, group = group, fill=value, frame=year)) +
  theme_opts + 
  coord_equal() +
  geom_polygon(color = 'grey') +
  labs(title = "Migration balance in year: ",
       subtitle = "Animated Map",
       caption = 'Data source: Stadt Leipzig, Amt für Statistik und Wahlen, 2015') +
  scale_fill_distiller(name='Migration\nbalance', direction = 1, palette='RdYlGn')

gg_animate(p, interval = 5)