The European Social Survey (ESS) is a great source of data on public opinion and attitudes that spans 9 waves (from 2002 till 2018) and covers a large number of European countries. In this tutorial I will explain how you can create effective and appealing visualizations of ESS data with (base) R
.
There are two ways to read ESS data in R
. First, you can download the datafile from the ESS website yourself and then read it into R
using the haven
package. Alternatively, you can use the essurvey
package that will import the data directly. In both cases you will need to register an email in order to access the data.
Let’s use the essurvey
package to get the ESS Wave 7 (2014) dataset.
library(essurvey) # install from CRAN first
# set_email("myaccount@email.com") # set your registered email
dat <- import_rounds(7, format = 'spss')
Once we have read the data into R
, we can search for variables using the handy look_for()
function from the labelled
package. When we find a variable we are interested in, we can quickly inspect it with the attributes()
function.
The ESS survey provides weights - variables that help us to reconstruct statistics (such as averages) for the country populations from which the survey samples have been drawn. We have to use these weights to compute valid statistics for the populations we are interested in. ESS offers two types of weights: design weights and post-stratification weights, which use different methods to reconstruct the population-level stats from the sample data. We can use either of the two, but we have to use one.
library(labelled) # install from CRAN first
labelled::look_for(dat, 'poor')
## variable label
## 93 eimpcnt Allow many/few immigrants from poorer countries in Europe
## 94 impcntr Allow many/few immigrants from poorer countries outside Europe
## 201 alpfpe Allow professionals from [poor European country providing largest number of migrants]
## 202 alpfpne Allow professionals from [poor non-European country providing largest number of migrants]
## 203 allbpe Allow unskilled labourers from [poor European country providing largest number of migrants]
## 204 allbpne Allow unskilled labourers from [poor non-European country providing largest number of migrants]
attributes(dat$eimpcnt)
## $label
## [1] "Allow many/few immigrants from poorer countries in Europe"
##
## $format.spss
## [1] "F1.0"
##
## $display_width
## [1] 9
##
## $class
## [1] "haven_labelled"
##
## $labels
## Allow many to come and live here Allow some Allow a few Allow none Refusal Don't know No answer
## 1 2 3 4 7 8 9
table(dat$eimpcnt)
##
## 1 2 3 4
## 5164 14974 11550 5280
summary(dat$dweight) # design weights
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.002272 0.943731 1.000000 1.000000 1.000067 4.000000
summary(dat$pspwght) # post-stratification weights
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.002338 0.721642 0.929546 1.000000 1.188729 4.044253
In this tutorial we will focus on the variable eimpcnt
, which collects responses to the survey question ‘Allow many/few immigrants from poorer countries in Europe’ (here is a screenshot from the actual questionnaire with the exact question formulation). The original answer categories can be checked with the attributes()$labels
function. We can rename this variable to something more memorable and specify it as an ordered factor.
attributes(dat$eimpcnt)$labels
## Allow many to come and live here Allow some Allow a few Allow none Refusal Don't know No answer
## 1 2 3 4 7 8 9
dat$allow.f<-to_factor(dat$eimpcnt, drop_unused_labels=TRUE, ordered=TRUE)
table(dat$allow.f, dat$cntry) # distribution of responses per country (raw)
##
## AT BE CH CZ DE DK EE ES FI FR GB HU IE IL LT NL NO PL PT SE SI
## Allow many to come and live here 188 214 201 0 698 184 166 386 235 244 202 50 280 166 204 238 288 244 126 690 160
## Allow some 642 870 764 0 1418 655 732 586 655 899 910 229 881 577 751 877 773 749 590 857 559
## Allow a few 648 459 467 0 713 537 787 590 982 502 730 742 806 719 689 585 323 451 363 177 280
## Allow none 273 216 68 0 176 101 325 221 185 239 390 602 369 973 428 203 41 110 165 28 167
We want to plot the distribution of the responses to this question by country. To do this, we first need to compute the weighted number of responses in each category, for each country. One way to do that is to use the wtd.table()
function from the questionr
package. First, we compute the weighted number of responses per country. Then, using cprop()
we compute the relative proportions of each response category. Finally, we make a data frame from the (transposed) output of these two actions. We add two new variables to this data frame for convenience later, and we order the data frame by the values of one of the columns (the variable with the ‘Allow none’ responses).
## One way to get a weighted count per country
library(questionr)
temp.table<-questionr::wtd.table(dat$allow.f, dat$cntry, weights=dat$pspwght, digits = 0, na.show=FALSE)
## Now let's get the relative percentages
temp.cprop.table<-questionr::cprop(temp.table, digits=0, total=FALSE, n=FALSE, percent=TRUE)
temp.cprop.table
## AT BE CH DE DK EE ES FI FR GB HU IE IL LT NL NO PL PT SE SI
## Allow many to come and live here 13% 12% 14% 23% 12% 9% 22% 11% 12% 11% 3% 13% 7% 10% 12% 20% 16% 10% 39% 14%
## Allow some 38% 49% 51% 47% 43% 38% 33% 32% 47% 41% 14% 39% 24% 37% 46% 54% 49% 49% 49% 49%
## Allow a few 35% 27% 31% 24% 37% 38% 33% 48% 26% 32% 44% 34% 29% 34% 32% 23% 28% 28% 10% 23%
## Allow none 14% 13% 5% 5% 8% 15% 12% 9% 14% 16% 38% 13% 40% 19% 10% 3% 7% 13% 2% 14%
## Transpose so that countries are rows, and make it a data frame (unexpectedly, it's not)
## Note that we need to use the special data.frame.matrix() function and not just data.frame()
pt<-as.data.frame.matrix(t(temp.cprop.table))
pt<-pt[rownames(pt)!='All',] # remove the row with the totals if it's included and not needed
## Create new columns for convenience later
pt$afewplus <- pt[ , 'Allow a few'] + pt[, 'Allow many to come and live here']
pt$someplus <- pt[, 'Allow a few'] + pt[ , 'Allow some'] + pt[, 'Allow many to come and live here']
## Order the dataset by the specified column
pt <- pt[order(pt$`Allow none`, decreasing = TRUE),]
Now that we have the data ready, we are ready to plot! The variable we want to visualize is an ordered categorical one, so some kind of bar graph could be appropriate. However, the response categories imply not only an order (from ‘none’ to ‘some’ to ‘a few’ to ‘many’) but also a threshold between ‘none’ and all else. Moreover, those who answer ‘some’ would presumably also accept ‘a few’, and those who answer ‘many’ would accept ‘a few’ and ‘some’ immigrants. It will be nice if our visualization indicates both the ordered nature of the categories, as well as the threshold and the fact that some categories are subsumed in others.
We can use a variation of the stacked bar graph to visualize our variable. The positioning of the bars and their colors will help communicate the ordered nature of the variable and the threshold. Instead of aligning all bars to the bottom (or to the left), we move the zero line inside the graph, and we let the ‘Allow none’ responses go down and all others go up from the zero line. We give the ‘Allow none’ and all other responses contrasting colors. And we vary the brightness (but keep the same base color) of the three ‘positive’ responses, with ‘Allow many’ being the lightest and ‘Allow some’ being the darkest. We can use red and green colors for the bars to utilize people’s intuitive associations of red with prohibition and green with access. (Note that this color combination might be challenging for people with color blindness.)
This sort of diverging stacked bar chart is common for Likert-scale responses to survey data. There are specialized packages that will produce this type of graph directly (see the likert
function from the HH
package or the likert
package or a hack with ggplot2). However, we are going to use base R
plotting, because it allows us the most flexibility in tweaking all the features of the graph.
Before we get plotting, you should know that people have argued against the use of diverging stacked bar charts, but not everyone is convinced (see here). I personally find this type of graph both visually appealing and informative as it communicates categorical differences between groups of responses as well as subtler differences within groups.
The plot below is our first, quick-and-dirty attempt to make a diverging stacked bar chart. We start with an empty plot that only specifies the range of the x- and y-axes. Then we iterate using a loop
over the rows of the data to plot a rectangle for the share of each category. Finally, we add indicators for the countries by reference to the names of the rows of the data.
plot(NULL, # start with an empty plot
xlim=c(1, dim(pt)[1]), # the x axis should extend from 1 to the nubmer of rows in the data
ylim=c(-max(pt[, 'Allow none']), max(pt$someplus))) # the y axis should extend from the negative of the maximum of the 'Allow none' category to the maximum of the sum of the other three categories
for (i in 1:dim(pt)[1]){ # for each row in the data
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0 - pt[i , 'Allow none'], ytop = 0, col='red') # plot a red rectangle going down from zero for the 'Allow none' category
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt$someplus[i], col='lightgreen') # plot green rectangles going up for the rest of the categories
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt[i , 'Allow a few'] + pt[i , 'Allow some'], col='green')
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt[i , 'Allow some'], col='darkgreen')
text (rownames(pt)[i], x = i, y = 0 + pt$someplus[i] + 4) # add the names of the countries above each set of bars
}
There is a lot that can be improved about this plot, but overall it shows the data in the way we intended.
To improve on this draft plot, we can (1) specify a better color palette, (2) add proper annotation, including an informative title, axes labels, country labels, gridlines and a legend, (3) customize the fonts, (4) and add a statement about the data source and the author.
In the draft plot we used the named red and greens that are available in base R
. But we can specify our own colors to be used, either by relying on one of the many R packages for color palettes or by specifying the colors directly with their rgb
values. To pick a good combination of colors we can use one of the many color picker tools available online, for example Color Code. Let’s pick the eye-pleasing Pantone Red matched with a Pantone Green. Then we can define lighter and darker versions of the Pantone Green.
While the default background of the plots in base R
is pure white and the color for annotations is pure black, designers recommend against using pure whites and blacks. Therefore, we define a custom background color that is almost white with a slight tint of magnolia and a custom dark color that is a touch lighter than solid black.
## color settings
background.color = rgb(248, 244, 255, max=255) # color for the background: magnolia
dark.color = rgb(24, 24, 38, max=255) # dark color: almost black
red.1 = rgb(237, 41, 57, max=255) # default red (Pantone)
green.1 = rgb(0, 173, 67, max=255) # default green (Pantone)
green.dark = rgb(0, 66, 26, max=255) # dark green
green.light = rgb(15, 255, 108, max=255) # light green
blue.twitter = rgb (29, 161, 242, max=255) # twitter blue
Proper annotation that is informative but not obtrusive is key for effective data visualization. So we have to think carefully about the axes, titles, gridlines, legends and other elements that help the user decode the plot.
Let’s start with the axes. The horizontal axis is not really necessary: we can add the country indicators directly below or above the bars. The vertical axis is definitely necessary. To specify its range, we can pick nice round numbers that are just below and just above the extreme values in our data. (Of course we can rely on the plotting defaults, but these would not always work well, so it is good to know how to specify our own values). The round_any()
function from the plyr
package rounds to a multiple of any number by specifying accuracy
and f
(function: floor, ceiling or round). We use it to get the nearest multiple of 10 that is greater than the maximum values observed in the respective variables in our data.
## axes range settings
library(plyr)
y.min <- plyr::round_any(max(pt[, 'Allow none']), accuracy = 10, f = ceiling)
y.max <- plyr::round_any(max(pt$someplus), accuracy = 10, f = ceiling)
To add the custom vertical axis, first we turn off the automatic inclusion of axes and other annotation from the par()
setting (ann=FALSE
) and when we call plot()
with yaxt = "n"
and xaxt = "n"
. While we are at it, we should also turn off the automatic drawing of a box around the plot by setting the bty='n'
argument of par()
. Then we can describe the axis directly with the axis()
function with our own labels, colors and other custom settings.
## specify vertical axis
axis (2, # this indicates which axis to draw: 1 is bottom and it goes clockwise from there, so 2 is left
line = 0, # position in terms of the plotting region
lwd = 1, # width of axis lines
tck = -0.01, # length of axis tick marks
col = dark.color, # color of the actual axis (line)
col.axis = dark.color, # colors of the actual labels
cex.axis = 1, # font size of of the axis lables
font=2, # font type (bold)
at=seq(-y.min, y.max, 10), # where to put the labels
labels= paste0(c(rev(seq(0, y.min, 10)), seq(10, y.max,10)), "%"), # text of labels
las=1 # orientation of the labels
)
To include custom gridlines, we just add horizontal and vertical lines where we want the gridlines to be. Since our background is not pure white anymore, we can actually define the gridlines themselves to be white, which makes them just visible but unobtrusive. To put emphasis on the line at zero, we can make it slightly wider than the rest. There are two tricks to adding the gridlines: first, we have to think about when to add them, so that they are not drawn on top of labels and other annotation but are drawn on top of the rectangles themselves; second; we have to set the xpd
setting in par()
to FALSE
, so that the gridlines do not extend outside the figure region.
abline(h=seq(-50,100,10), col='white', lwd=1)
abline(h=0, col='white', lwd=3)
I prefer to add titles, subtitles, axes names and legends with mtext()
. This gives more control where exactly to position these annotation elements, including in the margins area that is outside the figure region (this is a good explanation of the different plotting and margins areas of figures in base R). Using mtext()
in combination with expression()
also makes it possible to have text with different colors and formatting on the same line.
# title
offset = 0.01 # distance from the corners of the plotting area
mtext.title = 2 # scaling factor for the font size
mtext(expression(bold('Allow many/few immigrants from poorer countries in Europe to come and live here')), # the text
side = 3, # on which side of the plot to include; 3 is top
line = 2, # position of the text in terms of the figure region; 2 is two lines above the top of the figure
adj = 0, # horizontal adjustment of the text; 0 is left
padj = 1, # vertical adjustment of the text; 1 is top
outer = TRUE, # whether the plot in the margins (outside the figure region)
at = offset, # offset position from the corners of the plotting area
font=1, # font type
col=dark.color, # font color
cex = mtext.title # font size
)
We have to make sure to leave enough space in the margins of the figure for titles and subtitles by specifying the oma
and mar
settings of par()
before we call the plot.
To include a custom legend on top of the figure, we can also you mtext()
. First we only plot the text of the legend labels. Then we add the different symbols (rectangles) with their respective colors. Admittedly, this requires quite a bit of tweaking to get the rectangles fall exactly where needed. But it is still better than having no legend at all or putting the legend within the figure region where it often obscures the actual data.
mtext.subtitle = 1.5 # scaling factor for the size of the font
mtext(expression(italic("Share of people who answer: 'None' 'Some' 'A few' 'Many'")),
side = 3, line = 0, adj = 0, padj = 1, outer = TRUE, at = offset,
font=1, col=dark.color, cex = mtext.subtitle)
par(xpd = TRUE) # turn on plotting outside the figure region
points(x = 5.4, y = 118, pch = 15, cex = 10, col=red.1) # add small rectangles with the respective color
points(x = 5.4 + 3.7, y = 118, pch = 15, cex = 10, col=green.dark)
points(x = 5.4 + 2*3.7, y = 118, pch = 15, cex = 10, col=green.1)
points(x = 5.4 + 3*3.7, y = 118, pch = 15, cex = 10, col=green.light)
Let’s see where we got so far:
y.min <- plyr::round_any(max(pt[, 'Allow none']), accuracy = 10, f = ceiling)
y.max <- plyr::round_any(max(pt$someplus), accuracy = 10, f = ceiling)
offset = 0.01
mtext.title = 1.8
mtext.subtitle = 1.5
par(mfrow=c(1,1), # number and distribution of plots
oma=c(1,0,3,0), # size of the outer margins in lines of text (can be specified in inches as well with `omi`)
mar=c(1,4,1,1), # number of lines of margin to be specified on the four sides of the plot (can be specified in inches as well with `mai`)
bty='n', # no box
cex = 1.25, # magnification of text and symbols
xpd = FALSE, # clipping of plotting to the figure region
ann = FALSE, # switch off titles,
#yaxt = 'n', # switch off y axis, do not do this here, it cannot be overriden with axis()
#xaxt = 'n', # switch off x axis
bg=background.color # background color
)
plot(NULL, xlim=c(1, dim(pt)[1]), ylim=c(-y.min, y.max), yaxt = 'n', xaxt = 'n') # the empty plot
axis (2,
line = 0, # position
tck = -0.01,
col = dark.color, # the actual axis (line)
col.axis = dark.color, # colors of the actual labels
cex.axis = 1,
font=2, # font type (bold)
at=seq(-y.min, y.max, 10), # where to put labels
labels= paste0(c(rev(seq(0, y.min, 10)), seq(10, y.max,10)), "%"), # text of labels
las=1 # orientation of the labels
)
for (i in 1:dim(pt)[1]){
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0 - pt[i , 'Allow none'], ytop = 0, col=red.1, border=red.1)
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt$someplus[i], col=green.light, border=green.light)
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt[i , 'Allow a few'] + pt[i , 'Allow some'], col=green.1, border=green.1)
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt[i , 'Allow some'], col=green.dark, border=green.dark)
}
abline(h=seq(-50,100,10), col='white', lwd=1)
abline(h=0, col='white', lwd=3)
# do that now so it is not crossed by the gridlines
for (i in 1:dim(pt)[1]){
text (rownames(pt)[i], x = i, y = 0 + pt$someplus[i] + 4, font=2)
}
#title
mtext(expression(bold('Allow many/few immigrants from poorer countries in Europe to come and live here')),
side = 3, line = 2, adj = 0, padj = 1, outer = TRUE, at = offset,
font=1, col=dark.color, cex = mtext.title)
mtext(expression(italic("Share of people who answer: 'None' 'Some' 'A few' 'Many'")),
side = 3, line = 0, adj = 0, padj = 1, outer = TRUE, at = offset,
font=1, col=dark.color, cex = mtext.subtitle)
par(xpd = TRUE)
points(x = 5.4, y = 118, pch = 15, cex = 10, col=red.1)
points(x = 5.4 + 3.7, y = 118, pch = 15, cex = 10, col=green.dark)
points(x = 5.4 + 2*3.7, y = 118, pch = 15, cex = 10, col=green.1)
points(x = 5.4 + 3*3.7, y = 118, pch = 15, cex = 10, col=green.light)
One way to improve the styling of your plots is to ditch the default fonts in R and use a custom set of fonts. The way I do this relies on the extrafont
, sysfont
and showtext
packages. This allows us to select a set of fonts from Google Fonts, for example, install them on our system with font_add_google()
and then use them by calling showtext_auto()
before the code for the plot and specifying the name of the font we want in par()
and/or directly in mtext()
.
library(extrafont) # to embed extra fonts
library(sysfonts) # to check available fonts and download fonts from google
library(showtext) # to use the extra fonts
font_add_google('Quattrocento') #get the fonts
font_add_google('Quattrocento Sans')
font_families() #check that the fonts are installed and available
showtext_auto() #this is to turn on the custom fonts availability
showtext_opts(dpi = 96) #set the resolution: 96 is default
Taking styling one step further, we can also incorporate emojis and icons in our annotations with the emojifont
package. For example, we can include the twitter logo in the footer of our plot with the following code:
library(emojifont)
mtext.sign.emo = 1.5
mtext(text= fontawesome('fa-twitter'), # the icon or emoji we want
side=1, line=-1, outer=T, col=blue.twitter, cex=mtext.sign.emo, at = 1 - 0.23, adj=1, padj=0.8,
font=1, family='fontawesome-webfont') # the Font Awesome font
We can also add the ‘Creative Commons’ sign to indicate the conditions of use of our visualization. Unfortunately, the showtext
package does not work with RStudio and RMarkdown, so the custom fonts and emojis will not be visible when you plot directly from RStudio or when you compile a .Rmd
document. But the custom fonts and emojis will be visible when you save your plots directly as png
or pdf
files (see below).
While our plot already features the country codes above the bars, not everyone will be very familiar with these. So we can help the reader by including further information about the countries. Adding the full country names takes too much space, but we can include country flags as additional reference symbols. To do so, we can do the following. First, we download the free library of country flags from FlatIcon. The licence provides for free use if we give the following attribution: “Icon made by Freepik from www.flaticon.com”. Once the flags are downloaded, we can access and read them as png
-s, and place in the plots.
library(png)
temp.flag <- png::readPNG('./flags/197373-countrys-flags/png/albania.png')
dim(temp.flag)
## [1] 512 512 4
We can place a png
image on a plot after we draw from it a raster image, e.g. with rasterImage(temp.flag, 1,1,10,71.5)
. However, it is much more convenient to use the custom addImg()
function written by ‘Marc in a box’ that only requires that we point to the center of the x- and y- coordinates of the image rather than to the four corners.
## Function to place images by center points on the two axes rather than corners by Stack Overflow user 'Marc in the box', retrieved from: https://stackoverflow.com/questions/27800307/adding-a-picture-to-plot-in-r
addImg <- function(
obj, # an image file imported as an array (e.g. png::readPNG, jpeg::readJPEG)
x = NULL, # mid x coordinate for image
y = NULL, # mid y coordinate for image
width = NULL, # width of image (in x coordinate units)
interpolate = TRUE # (passed to graphics::rasterImage) A logical vector (or scalar) indicating whether to apply linear interpolation to the image when drawing.
){
if(is.null(x) | is.null(y) | is.null(width)){stop("Must provide args 'x', 'y', and 'width'")}
USR <- par()$usr # A vector of the form c(x1, x2, y1, y2) giving the extremes of the user coordinates of the plotting region
PIN <- par()$pin # The current plot dimensions, (width, height), in inches
DIM <- dim(obj) # number of x-y pixels for the image
ARp <- DIM[1]/DIM[2] # pixel aspect ratio (y/x)
WIDi <- width/(USR[2]-USR[1])*PIN[1] # convert width units to inches
HEIi <- WIDi * ARp # height in inches
HEIu <- HEIi/PIN[2]*(USR[4]-USR[3]) # height in units
rasterImage(image = obj,
xleft = x-(width/2), xright = x+(width/2),
ybottom = y-(HEIu/2), ytop = y+(HEIu/2),
interpolate = interpolate)
}
How do we call each individual png file with the country flag that we need? When we downloaded the country flags, the names of the files correspond to the full names of the countries. In our ESS data, the names of the countries are in two character ISO codes. To switch between full names and ISO codes, the countrycode
package comes to the rescue. So in the code below we reconstruct the paths and names of the files where the country flags are from their ISO codes contained in the ESS datafile:
addImg( # add the image
readPNG( # read the image as png
paste0('./flags/197373-countrys-flags/png/', # point to the address where the image of the flag is: first, address of the folder
gsub(" ","-", # replace intervals with hyphens
tolower( # make everything lower case
countrycode(rownames(pt)[1], # convert the country indicator
origin = 'iso2c', # from 2-character ISO code
destination = 'country.name'))), # to full name
'.png')), # complete the file name
x = 1, y = 0 - pt[1 , 'Allow none'] - 7, # point to where the image should be placed
width = 0.6 # scale the size of the image
)
Let’s check how this worked:
library(countrycode)
par(mfrow=c(1,1), # number and distribution of plots
oma=c(1,0,3,0), # size of the outer margins in lines of text
mar=c(1,4,1,1), # number of lines of margin to be specified on the four sides of the plot
bty='n', # no box
cex = 1.25, # magnification of text and symbols
xpd = FALSE, # clipping of plotting to the figure region
ann = FALSE, # switch off titles,
bg=background.color, # background color
family='Quattrocento' # font family
)
plot(NULL, xlim=c(1, dim(pt)[1]), ylim=c(-y.min, y.max), yaxt = 'n', xaxt = 'n')
axis (2,
line = 0, # position
tck = -0.01,
col = dark.color, # the actual axis (line)
col.axis = dark.color, # colors of the actual labels
cex.axis = 1,
font=2, # font type (bold)
at=seq(-y.min, y.max, 10), # where to put labels
labels= paste0(c(rev(seq(0, y.min, 10)), seq(10, y.max,10)), "%"), # text of labels
las=1 # orientation of the labels
)
for (i in 1:dim(pt)[1]){
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0 - pt[i , 'Allow none'], ytop = 0, col=red.1, border=red.1)
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt$someplus[i], col=green.light, border=green.light)
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt[i , 'Allow a few'] + pt[i , 'Allow some'], col=green.1, border=green.1)
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt[i , 'Allow some'], col=green.dark, border=green.dark)
}
abline(h=seq(-50,100,10), col='white', lwd=1)
abline(h=0, col='white', lwd=3)
# do that now so it is not crossed by the gridlines
for (i in 1:dim(pt)[1]){
text (rownames(pt)[i], x = i, y = 0 + pt$someplus[i] + 4, font=2)
addImg(readPNG(paste0('./flags/197373-countrys-flags/png/',
gsub(" ","-", tolower(countrycode(rownames(pt)[i], origin = 'iso2c', destination = 'country.name'))),
'.png')),
x = i, y = 0 - pt[i , 'Allow none'] - 7, width = 0.6)
}
#title
mtext(expression(bold('Allow many/few immigrants from poorer countries in Europe to come and live here')),
side = 3, line = 2, adj = 0, padj = 1, outer = TRUE, at = offset, font=1, col=dark.color, cex = mtext.title)
#legend
mtext(expression(italic("Share of people who answer: 'None' 'Some' 'A few' 'Many'")),
side = 3, line = 0, adj = 0, padj = 1, outer = TRUE, at = offset, font=1, col=dark.color, cex = mtext.subtitle)
par(xpd = TRUE)
points(x = 5.4, y = 118, pch = 15, cex = 10, col=red.1)
points(x = 5.4 + 3.7, y = 118, pch = 15, cex = 10, col=green.dark)
points(x = 5.4 + 2*3.7, y = 118, pch = 15, cex = 10, col=green.1)
points(x = 5.4 + 3*3.7, y = 118, pch = 15, cex = 10, col=green.light)
#data statement
mtext(text = fontawesome('fa-table'), side=1, line=-1, outer=T, adj=0, padj=0.8,
col=red.1, cex=mtext.sign.emo, at = offset, font=1, family='fontawesome-webfont')
mtext(text=expression("Data: " * phantom("European Social Survey, Wave 7 [2014]")),
side=1, line=-1, outer=T, at = offset + 0.03,
col=dark.color, cex=mtext.sign, font=1, family='Quattrocento Sans', adj=0, padj=1)
mtext(text=expression(phantom("Data: ") * "European Social Survey, Wave 7 [2014]"),
side=1, line=-1, outer=T, at = offset + 0.03, col=red.1, cex=mtext.sign, font=1, family='Quattrocento Sans', adj=0, padj=1)
#signature
mtext(text=expression(phantom("@DToshkov ") * " http://dimiter" * phantom(".eu")),
side=1, line=-1, outer=T, at = 1 - offset - 0.02, col=dark.color, cex=mtext.sign, font=1, family='Quattrocento Sans', adj=1, padj=1)
mtext(text=expression(phantom("@DToshkov http://dimiter") * ".eu"),
side=1, line=-1, outer=T, at = 1 - offset - 0.02, col=red.1, cex=mtext.sign,font=1, family='Quattrocento Sans', adj=1, padj=1)
mtext(text=expression("@DToshkov " * phantom(" http://dimiter.eu")),
side=1, line=-1, outer=T, at = 1 - offset - 0.02, col=blue.twitter, cex=mtext.sign,font=1, family='Quattrocento Sans', adj=1, padj=1)
mtext(text= fontawesome('fa-twitter'),
side=1, line=-1, outer=T, adj=1, padj=0.8, col=blue.twitter, cex=mtext.sign.emo, at = 1 - 0.23, font=1, family='fontawesome-webfont')
mtext(text= fontawesome('fa-creative-commons'),
side=1, line=-1, outer=T, col=dark.color, cex=mtext.sign.emo, at = 1 - 0.33, font=1, family='fontawesome-webfont', adj=1, padj=0.8)
mtext(text= fontawesome('fa-rss'),
side=1, line=-1, outer=T, adj=1, padj=0.8, col=red.1, cex=mtext.sign.emo, at = 1 - offset, font=1, family='fontawesome-webfont')
Pretty, pretty good, as Larry David would put it.
Now that we have an informative, properly annotated, visually appealing data visualization, we would want to save it. We can save in multiple formats, but usually png
or pdf
(or both) are sufficient. In principle, saving a plot as a png
file is as easy as calling png('filename.png')
before we run the code for the plot and dev.off()
after. But we have to make sure that the plot looks sharp.
When we save as a png
, we can specify the width and height of the output file, as well as the desired resolution. If we prepare the file for a desktop computer or a laptop, a width of 1280 pixels is a reasonable default. The height can be set to 906 pixels (to approximate the dimensions of a standard A4 page) and the resolution to the standard for web images 96 ppi. (The plot above was produced with these specifications).
png ('./figures/F1_small.png', width=1280, height=906, res=96)
### code for plot comes here
dev.off()
However, the resulting plot does not look that sharp even on my relatively small laptop. Changing the resolution (say to 300) while keeping the width and height the same would achieve nothing (try it!). We can improve the sharpness of the image by enlarging the dimensions of the png
while keeping the same resolution. But we will have to increase the size of most elements of the plot as well, so they will still look big enough. As you can see from the code below, we increase the width and height of the plot, as well as the font sizes and line thickness, three times. The result looks much sharper (here is a link to the full-size image; look at it at 100% magnification), which comes at the price of a larger file size (which, however, at 285 KB should not be a point of concern). Note that some small tweaks might need to be made to legends and other annotations to ensure that everything still falls in its right place.
s = 3 # scaling factor
mtext.title = 2*s
mtext.subtitle = 1.5*s
mtext.sign = 1.2*s
mtext.sign.emo = 1.5*s
showtext_opts(dpi = 96) #set the resolution: 96 is default
png ('./figures/F1_big.png', width=1280*s, height=905.5*s, res=96)
par(mfrow=c(1,1), # number and distribution of plots
oma=c(1,0,3,0), # size of the outer margins in lines of text
mar=c(1,4,1,1), # number of lines of margin to be specified on the four sides of the plot
bty='n', # no box
cex = 1.25*s, # magnification of text and symbols
xpd = FALSE, # clipping of plotting to the figure region
ann = FALSE, # switch off titles,
bg=background.color, # background color
family='Quattrocento' # font family
)
plot(NULL, xlim=c(1, dim(pt)[1]), ylim=c(-y.min, y.max), yaxt = 'n', xaxt = 'n')
axis (2,
line = 0, # position
tck = -0.01,
lwd = 1*s,
col = dark.color, # the actual axis (line)
col.axis = dark.color, # colors of the actual labels
cex.axis = 1,
font=2, # font type (bold)
at=seq(-y.min, y.max, 10), # where to put labels
labels= paste0(c(rev(seq(0, y.min, 10)), seq(10, y.max,10)), "%"), # text of labels
las=1 # orientation of the labels
)
for (i in 1:dim(pt)[1]){
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0 - pt[i , 'Allow none'], ytop = 0, col=red.1, border=red.1)
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt$someplus[i], col=green.light, border=green.light)
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt[i , 'Allow a few'] + pt[i , 'Allow some'], col=green.1, border=green.1)
rect(xleft = i - 0.25, xright = i + 0.25, ybottom = 0, ytop = 0 + pt[i , 'Allow some'], col=green.dark, border=green.dark)
}
abline(h=seq(-50,100,10), col='white', lwd=1*s)
abline(h=0, col='white', lwd=3*s)
for (i in 1:dim(pt)[1]){
text (rownames(pt)[i], x = i, y = 0 + pt$someplus[i] + 4, font=2)
addImg(readPNG(paste0('./flags/197373-countrys-flags/png/', gsub(" ","-", tolower(countrycode(rownames(pt)[i], origin = 'iso2c', destination = 'country.name'))), '.png')),
x = i, y = 0 - pt[i , 'Allow none'] - 7, width = 0.6)
}
#title
mtext(expression(bold('Allow many/few immigrants from poorer countries in Europe to come and live here')),
side = 3, line = 2, adj = 0, padj = 1, outer = TRUE, at = offset, font=1, col=dark.color, cex = mtext.title)
#legend
mtext(expression(italic("Share of people who answer: 'None' 'Some' 'A few' 'Many'")),
side = 3, line = 0, adj = 0, padj = 1, outer = TRUE, at = offset, font=1, col=dark.color, cex = mtext.subtitle)
par(xpd = TRUE)
points(x = 5.4, y = 118, pch = 15, cex = 10, col=red.1)
points(x = 5.4 + 3.7, y = 118, pch = 15, cex = 10, col=green.dark)
points(x = 5.4 + 2*3.7, y = 118, pch = 15, cex = 10, col=green.1)
points(x = 5.4 + 3*3.7, y = 118, pch = 15, cex = 10, col=green.light)
#data statement
mtext(text = fontawesome('fa-table'),
side=1, line=-1, outer=T, col=red.1, cex=mtext.sign.emo, at = offset, font=1, family='fontawesome-webfont', adj=0, padj=0.8)
mtext(text=expression("Data: " * phantom("European Social Survey, Wave 7 [2014]")),
side=1, line=-1, outer=T, at = offset + 0.03, col=dark.color, cex=mtext.sign, font=1, family='Quattrocento Sans', adj=0, padj=1)
mtext(text=expression(phantom("Data: ") * "European Social Survey, Wave 7 [2014]"),
side=1, line=-1, outer=T, at = offset + 0.03, col=red.1, cex=mtext.sign, font=1, family='Quattrocento Sans', adj=0, padj=1)
#signature
mtext(text=expression(phantom("@DToshkov ") * " http://dimiter" * phantom(".eu")),
side=1, line=-1, outer=T, at = 1 - offset - 0.02, col=dark.color, cex=mtext.sign, font=1, family='Quattrocento Sans', adj=1, padj=1)
mtext(text=expression(phantom("@DToshkov http://dimiter") * ".eu"),
side=1, line=-1, outer=T, at = 1 - offset - 0.02, col=red.1, cex=mtext.sign, font=1, family='Quattrocento Sans', adj=1, padj=1)
mtext(text=expression("@DToshkov " * phantom(" http://dimiter.eu")),
side=1, line=-1, outer=T, at = 1 - offset - 0.02, col=blue.twitter, cex=mtext.sign, font=1, family='Quattrocento Sans', adj=1, padj=1)
mtext(text= fontawesome('fa-creative-commons'),
side=1, line=-1, outer=T, col=dark.color, cex=mtext.sign.emo, at = 1 - 0.26, font=1, family='fontawesome-webfont', adj=1, padj=0.8)
mtext(text= fontawesome('fa-twitter'),
side=1, line=-1, outer=T, col=blue.twitter, cex=mtext.sign.emo, at = 1 - 0.15,font=1, family='fontawesome-webfont', adj=1, padj=0.8)
mtext(text= fontawesome('fa-rss'),
side=1, line=-1, outer=T, col=red.1, cex=mtext.sign.emo, at = 1 - offset, font=1, family='fontawesome-webfont', adj=1, padj=0.8)
dev.off()
If we want to save a pdf
, we have to call the pdf()
function instead of png()
(and specify a .pdf
extension of the output file). When we save a pdf we specify the size of the desired output file in inches; for example, to make an A4-sized pdf, set the width to 11.69, and the height to 8.27. There is no need to specify a resolution (and there is no need to scale up the font sizes and line widths). Here is a link to the pdf version of the graph.
pdf ('./figures/F1_A4.pdf', width=11.69, height=8.27)
### plot comes here
dev.off()
Now we have nice, sharp versions of our data visualization in pdf
and png
formats. We set up the plot so that the bars are oriented vertically, while the plot itself is in landscape (rather than portrait) orientation. The landscape orientation makes better use of the space - for example, there is more space for the text of titles, subtitles and legends, and it is the default orientation for laptops and desktop computers. Also, typically we plot the values of the response variable on the y-axis and the values of the ‘explanatory’ variable on the x-axis, so if we want to highlight differences in survey responses between countries, the way we did the plot so far should work well. However, divergent bar charts are often presented with horizontal rather than vertical bars and portrait orientation might be better for mobile devices (but this dataviz is too busy for a smartphone anyways). If we want to change the orientation of our graph from landscape to portrait and flip the bars to be horizontal rather than vertical, there is not too much that we have to do. Namely, we have to change the width and height dimensions in the png()
or pdf()
functions, switch the definition of the x- and y-axes, change slightly the way the rectangles showing the data are defined, and make some minor tweaks to the annotation so it fits the new dimensions. One advantage of the portrait orientation is that now we have enough space to print out the full country names on the x-axis. The code and the result are shown below. You can decide for yourself which version of the graph works better, but I personally prefer the one with the vertical bars.
s = 3
offset = 0.01
mtext.title = 1.8*s
mtext.subtitle = 1.5*s
mtext.sign = 1.2*s
mtext.sign.emo = 1.5*s
# png ('./figures/F1_big_vertical_2.png', height=1280*s, width=905.5*s, res=96) # to save it
par(mfrow=c(1,1), # number and distribution of plots
oma=c(1,0,4,0), # size of the outer margins in lines of text (can be specified in inches as well with `omi`)
mar=c(4,1,1,8), # number of lines of margin to be specified on the four sides of the plot (can be specified in inches as well with `mai`)
bty='n', # no box
cex = 1.25*s, # magnification of text and symbols
xpd = FALSE, # clipping of plotting to the figure region
ann = FALSE, # switch off titles,
bg=background.color, # background color
family='Quattrocento' # font family
)
plot(NULL, ylim=c(1, dim(pt)[1]), xlim=c(-x.min, x.max), yaxt = 'n', xaxt = 'n')
axis (1,
line = 0, # position
tck = -0.01,
lwd = 1*s,
col = dark.color, # the actual axis (line)
col.axis = dark.color, # colors of the actual labels
cex.axis = 1,
font=2, # font type (bold)
at=seq(-x.min, x.max, 10), # where to put labels
labels= paste0(c(rev(seq(0, x.min, 10)), seq(10, x.max,10)), "%"), # text of labels
las=1 # orientation of the label
)
axis (4,
line = 0, # position
tck = -0.01,
lwd = 1*s,
col = 'white', # the actual axis (line)
col.axis = dark.color, # colors of the actual labels
cex.axis = 1,
font=2, # font type (bold)
at=seq(1, dim(pt)[1]), # where to put labels
adj=0,
labels=countrycode(rownames(pt), origin = 'iso2c', destination = 'country.name'), # text of labels
las=1 # orientation of the labels
)
for (i in 1:dim(pt)[1]){
rect(ybottom = i - 0.25, ytop = i + 0.25, xleft = 0 - pt[i , 'Allow none'], xright = 0, col=red.1, border=red.1)
rect(ybottom = i - 0.25, ytop = i + 0.25, xleft = 0, xright = 0 + pt$someplus[i], col=green.light, border=green.light)
rect(ybottom = i - 0.25, ytop = i + 0.25, xleft = 0, xright = 0 + pt[i , 'Allow a few'] + pt[i , 'Allow some'], col=green.1, border=green.1)
rect(ybottom = i - 0.25, ytop = i + 0.25, xleft = 0, xright = 0 + pt[i , 'Allow some'], col=green.dark, border=green.dark)
}
abline(v=seq(-50,100,10), col='white', lwd=1*s)
abline(v=0, col='white', lwd=3*s)
abline(h=seq(1:dim(pt)[1]), col='white', lwd=1*s)
for (i in 1:dim(pt)[1]){
addImg(readPNG(paste0('./flags/197373-countrys-flags/png/', gsub(" ","-", tolower(countrycode(rownames(pt)[i], origin = 'iso2c', destination = 'country.name'))), '.png')),
y = i, x = 0 - pt[i , 'Allow none'] - 7, width = 6)
}
#title
mtext(expression(bold('Allow many/few immigrants from poorer countries in Europe \nto come and live here')),
side = 3, line = 2, adj = 0, padj = 1, outer = TRUE, at = offset, font=1, col=dark.color, cex = mtext.title)
#legend
mtext(expression(italic("Share of people who answer: 'None' 'Some' 'A few' 'Many'")),
side = 3, line = 0, adj = 0, padj = 1, outer = TRUE, at = offset, font=1, col=dark.color, cex = mtext.subtitle)
par(xpd = TRUE)
points(x = 24, y = 21.3, pch = 15, cex = 4, col=red.1)
points(x = 24 + 32, y = 21.3, pch = 15, cex = 4, col=green.dark)
points(x = 24 + 32*2, y = 21.3, pch = 15, cex = 4, col=green.1)
points(x = 24 + 32*3, y = 21.3, pch = 15, cex = 4, col=green.light)
#data statement
mtext(text = fontawesome('fa-table'),
side=1, line=-1, outer=T,
col=red.1, cex=mtext.sign.emo, at = offset,
font=1, family='fontawesome-webfont',
adj=0, padj=0.8)
mtext(text=expression("Data: " * phantom("European Social Survey, Wave 7 [2014]")),
side=1, line=-1, outer=T, at = offset + 0.03,
col=dark.color, cex=mtext.sign,
font=1, family='Quattrocento Sans',
adj=0, padj=1)
mtext(text=expression(phantom("Data: ") * "European Social Survey, Wave 7 [2014]"),
side=1, line=-1, outer=T, at = offset + 0.03,
col=red.1, cex=mtext.sign,
font=1, family='Quattrocento Sans',
adj=0, padj=1)
#signature
mtext(text=expression(phantom("@DToshkov ") * " http://dimiter" * phantom(".eu")),
side=1, line=-1, outer=T, at = 1 - offset - 0.03,
col=dark.color, cex=mtext.sign,
font=1, family='Quattrocento Sans',
adj=1, padj=1)
mtext(text=expression(phantom("@DToshkov http://dimiter") * ".eu"),
side=1, line=-1, outer=T, at = 1 - offset - 0.03,
col=red.1, cex=mtext.sign,
font=1, family='Quattrocento Sans',
adj=1, padj=1)
mtext(text=expression("@DToshkov " * phantom(" http://dimiter.eu")),
side=1, line=-1, outer=T, at = 1 - offset - 0.03,
col=blue.twitter, cex=mtext.sign,
font=1, family='Quattrocento Sans',
adj=1, padj=1)
mtext(text= fontawesome('fa-creative-commons'),
side=1, line=-1, outer=T,
col=dark.color, cex=mtext.sign.emo, at = 1 - 0.36,
font=1, family='fontawesome-webfont',
adj=1, padj=0.8)
mtext(text= fontawesome('fa-twitter'),
side=1, line=-1, outer=T,
col=blue.twitter, cex=mtext.sign.emo, at = 1 - 0.21,
font=1, family='fontawesome-webfont',
adj=1, padj=0.8)
mtext(text= fontawesome('fa-rss'),
side=1, line=-1, outer=T,
col=red.1, cex=mtext.sign.emo, at = 1 - offset,
font=1, family='fontawesome-webfont',
adj=1, padj=0.8)
Following this tutorial you learned how to read ESS data in R, create tables of the distribution of weighted responses per country, make divergent stacked bars charts, customize fonts, colors, axes, legends, titles and other annotations, include country flags, emojis and special icons in the plots, and save crisp png
and pdf
versions of your work. You can adapt the code to set your own dataviz style and signature. You have the power to tweak essentially all elements of the graphs directly, without resorting to Photoshop or some other graphics editor. And you can do all that in base R
rather than ggplot2
. You are ready to deploy your dataviz skills to explore the hundreds of public opinions and attitudes that the ESS has measured. Have fun and plot responsibly!
Respect to all the people behind the awesome R packages and functions used in this tutorial, all the researchers responsible for the design, collection and distribution of the ESS surveys, as well as the designers of the Font Awesome and Flat Icon emojis, icons and country flag images. This tutorial has been produced in Rmarkdown
from RStudio
with knitr
. The code for the tutorial is available on Github.
To get in touch and follow my work: