class: top, left, inverse, title-slide .title[ # R for Data Analysis: A Short Tutorial ] .subtitle[ ## Session 4: Visualizing data ] .author[ ### Dimiter Toshkov ] .institute[ ### Institute of Public Administration, Leiden University ] .date[ ### last updated: 2025-03-31 ] --- # Last session... <style type="text/css"> .title-slide { background-image: url(https://cran.r-project.org/Rlogo.svg); background-position: 50% 0%; ## just start changing this background-size: 150px; background-color: #fff; padding-left: 100px; /* delete this for 4:3 aspect ratio */ } .remark-slide-content { font-size: 28px; padding: 1em 1em 1em 1em; } .remark-slide-content > h1 { font-size: 32px; margin-top: -85px; } </style> -- <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M505 174.8l-39.6-39.6c-9.4-9.4-24.6-9.4-33.9 0L192 374.7 80.6 263.2c-9.4-9.4-24.6-9.4-33.9 0L7 302.9c-9.4 9.4-9.4 24.6 0 34L175 505c9.4 9.4 24.6 9.4 33.9 0l296-296.2c9.4-9.5 9.4-24.7.1-34zm-324.3 106c6.2 6.3 16.4 6.3 22.6 0l208-208.2c6.2-6.3 6.2-16.4 0-22.6L366.1 4.7c-6.2-6.3-16.4-6.3-22.6 0L192 156.2l-55.4-55.5c-6.2-6.3-16.4-6.3-22.6 0L68.7 146c-6.2 6.3-6.2 16.4 0 22.6l112 112.2z"></path></svg> We learned how to explore univariate data. -- <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M505 174.8l-39.6-39.6c-9.4-9.4-24.6-9.4-33.9 0L192 374.7 80.6 263.2c-9.4-9.4-24.6-9.4-33.9 0L7 302.9c-9.4 9.4-9.4 24.6 0 34L175 505c9.4 9.4 24.6 9.4 33.9 0l296-296.2c9.4-9.5 9.4-24.7.1-34zm-324.3 106c6.2 6.3 16.4 6.3 22.6 0l208-208.2c6.2-6.3 6.2-16.4 0-22.6L366.1 4.7c-6.2-6.3-16.4-6.3-22.6 0L192 156.2l-55.4-55.5c-6.2-6.3-16.4-6.3-22.6 0L68.7 146c-6.2 6.3-6.2 16.4 0 22.6l112 112.2z"></path></svg> We learned how to test and examine bivariate relationships. -- <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M505 174.8l-39.6-39.6c-9.4-9.4-24.6-9.4-33.9 0L192 374.7 80.6 263.2c-9.4-9.4-24.6-9.4-33.9 0L7 302.9c-9.4 9.4-9.4 24.6 0 34L175 505c9.4 9.4 24.6 9.4 33.9 0l296-296.2c9.4-9.5 9.4-24.7.1-34zm-324.3 106c6.2 6.3 16.4 6.3 22.6 0l208-208.2c6.2-6.3 6.2-16.4 0-22.6L366.1 4.7c-6.2-6.3-16.4-6.3-22.6 0L192 156.2l-55.4-55.5c-6.2-6.3-16.4-6.3-22.6 0L68.7 146c-6.2 6.3-6.2 16.4 0 22.6l112 112.2z"></path></svg> We learned how to run a variety of statistical models and export the output. -- <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M500.5 231.4l-192-160C287.9 54.3 256 68.6 256 96v320c0 27.4 31.9 41.8 52.5 24.6l192-160c15.3-12.8 15.3-36.4 0-49.2zm-256 0l-192-160C31.9 54.3 0 68.6 0 96v320c0 27.4 31.9 41.8 52.5 24.6l192-160c15.3-12.8 15.3-36.4 0-49.2z"></path></svg> Today we focus on data visualization. --- class: inverse, top background-image: url("data:image/png;base64,#figs/static_winner.png") background-size: contain # Here is a dataviz done entirely in (base) R --- # Data menu for today (1) <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M248 8C111 8 0 119 0 256s111 248 248 248 248-111 248-248S385 8 248 8zm200 248c0 22.5-3.9 44.2-10.8 64.4h-20.3c-4.3 0-8.4-1.7-11.4-4.8l-32-32.6c-4.5-4.6-4.5-12.1.1-16.7l12.5-12.5v-8.7c0-3-1.2-5.9-3.3-8l-9.4-9.4c-2.1-2.1-5-3.3-8-3.3h-16c-6.2 0-11.3-5.1-11.3-11.3 0-3 1.2-5.9 3.3-8l9.4-9.4c2.1-2.1 5-3.3 8-3.3h32c6.2 0 11.3-5.1 11.3-11.3v-9.4c0-6.2-5.1-11.3-11.3-11.3h-36.7c-8.8 0-16 7.2-16 16v4.5c0 6.9-4.4 13-10.9 15.2l-31.6 10.5c-3.3 1.1-5.5 4.1-5.5 7.6v2.2c0 4.4-3.6 8-8 8h-16c-4.4 0-8-3.6-8-8s-3.6-8-8-8H247c-3 0-5.8 1.7-7.2 4.4l-9.4 18.7c-2.7 5.4-8.2 8.8-14.3 8.8H194c-8.8 0-16-7.2-16-16V199c0-4.2 1.7-8.3 4.7-11.3l20.1-20.1c4.6-4.6 7.2-10.9 7.2-17.5 0-3.4 2.2-6.5 5.5-7.6l40-13.3c1.7-.6 3.2-1.5 4.4-2.7l26.8-26.8c2.1-2.1 3.3-5 3.3-8 0-6.2-5.1-11.3-11.3-11.3H258l-16 16v8c0 4.4-3.6 8-8 8h-16c-4.4 0-8-3.6-8-8v-20c0-2.5 1.2-4.9 3.2-6.4l28.9-21.7c1.9-.1 3.8-.3 5.7-.3C358.3 56 448 145.7 448 256zM130.1 149.1c0-3 1.2-5.9 3.3-8l25.4-25.4c2.1-2.1 5-3.3 8-3.3 6.2 0 11.3 5.1 11.3 11.3v16c0 3-1.2 5.9-3.3 8l-9.4 9.4c-2.1 2.1-5 3.3-8 3.3h-16c-6.2 0-11.3-5.1-11.3-11.3zm128 306.4v-7.1c0-8.8-7.2-16-16-16h-20.2c-10.8 0-26.7-5.3-35.4-11.8l-22.2-16.7c-11.5-8.6-18.2-22.1-18.2-36.4v-23.9c0-16 8.4-30.8 22.1-39l42.9-25.7c7.1-4.2 15.2-6.5 23.4-6.5h31.2c10.9 0 21.4 3.9 29.6 10.9l43.2 37.1h18.3c8.5 0 16.6 3.4 22.6 9.4l17.3 17.3c3.4 3.4 8.1 5.3 12.9 5.3H423c-32.4 58.9-93.8 99.5-164.9 103.1z"></path></svg> Today we gonna work with data from the World Bank. First, let's get a list of the ISO-3 country codes of all European countries: ``` r library(tidyverse) library(countrycode) countries.iso3 <- countrycode::codelist %>% filter (continent == 'Europe') %>% select (country.name.en) %>% mutate (country.code = countrycode(country.name.en, origin='country.name', destination='iso3c')) ``` --- # Data menu for today (2) <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"></path></svg> Now we are ready to extract the data: ``` r library(wbstats) d <- wb(indicator = c("SI.POV.GINI", "SP.POP.TOTL","NY.GDP.PCAP.PP.KD", "SL.UEM.TOTL.ZS", "CC.EST", "GE.EST"), country = countries.iso3$country.code, startdate = 2000, enddate = 2024, return_wide = TRUE) %>% rename (year = date, control.corruption = CC.EST, government.effectiveness = GE.EST, gdp.per.capita = NY.GDP.PCAP.PP.KD, gini = SI.POV.GINI, unemployment = SL.UEM.TOTL.ZS, population = SP.POP.TOTL) %>% mutate (index.dy = paste0(iso3c, ".", year), gdp.per.capita = gdp.per.capita/1000, # rescale in thousands population = population/1e6 # rescale in millions ) d.2023 <- d %>% filter (year == '2023') # subset with the 2023 data ``` --- # Data visualization systems in R We have two main systems for producing graphs in R: - The `plot` commands in base R; - `ggplot2`, which is a package part of the `tidyverse`. Most of my colleagues consider the `ggplot2` system far superior. Personally, I am not so sure. Plotting in base R provides more direct control over every aspect of the graphs. But there are advantages to working in a system integrated in the `tidyverse` and supported by an ever-expanding number of extensions. In any case, you have to have some familiarity with both. --- # Plotting in base R in base R, the main plotting function is ... *surprise, surprise* ... `plot()`, with variations for specific types of plots, such as `boxplot()` or `hist()`. You can use these for quick-and-dirty data exploration. But for production-level graphs (online or in print), my approach is to start with an empty canvas and add one-by-one every element that I need. --- # Start with an empty plot This code does ... not much more than open a sheet for us to use: .pull-left[ ``` r plot (NULL, type='n', # start with an empty plot axes = FALSE, # no axes ann =FALSE, # no annotation xlim = c(0, 150), # but give limits to the x- ylim = c(-1.5, 2.5) # and y-axes ) ``` ] .pull-right[ <img src="data:image/png;base64,#R-Tutorial-2025-Session-4_files/figure-html/unnamed-chunk-4-1.png" width="504" /> ] --- # Now let's add data points .pull-left[ ``` r # add to the previous block library (scales) pop.scaled <- rescale(d.2023$population, to=c(0.5, 2)) points (x = d.2023$gdp.per.capita, y = d.2023$government.effectiveness, col = 'coral1', pch = 16, cex = pop.scaled) ``` ] .pull-right[ <img src="data:image/png;base64,#R-Tutorial-2025-Session-4_files/figure-html/unnamed-chunk-6-1.png" width="504" /> ] --- # Now let's add our custom-made axes .pull-left[ ``` r # add to the previous block axis (side = 1, font = 1, tck = -0.01, line=0, col = 'darkgrey', col.axis = 'darkgrey', at = seq(0, 150, by = 25), labels = c(0, paste0(seq(25, 150, by = 25), ",000$")), cex.axis = 0.75) axis (2, las=1, font = 1, tck = -0.01, line=0, col = 'darkgrey', col.axis = 'darkgrey', at = seq(-2.5, 2.5, by = 0.5), labels = format(round(seq(-2.5, 2.5, by = 0.5), 2), nsmall = 2), cex.axis = 0.75) ``` ] .pull-right[ <img src="data:image/png;base64,#R-Tutorial-2025-Session-4_files/figure-html/unnamed-chunk-8-1.png" width="504" /> ] --- # We can add annotation .pull-left[ ``` r # add to the previous block title ( main = 'Government effectiveness as a function of wealth', xlab = 'GDP per capita (constant 2002 international dollars), 2023', ylab = 'Government effectiveness score, 2023', col.lab = 'darkgrey' ) ``` ] .pull-right[ <img src="data:image/png;base64,#R-Tutorial-2025-Session-4_files/figure-html/unnamed-chunk-10-1.png" width="504" /> ] --- # Time to add lines .pull-left[ ``` r # add to the previous block abline(v = seq( 25, 150, by = 25), col='grey80') # add the vertical grid abline(h = seq(-2, 3, by = 1), col='grey80') # add the horizontal grid abline (lm(government.effectiveness ~ gdp.per.capita, data=d.2023), col = 'coral3') # regression fit ``` ] .pull-right[ <img src="data:image/png;base64,#R-Tutorial-2025-Session-4_files/figure-html/unnamed-chunk-12-1.png" width="504" /> ] --- # We can add text as well .pull-left[ ``` r # add to the previous block text (cex = 0.75, col = 'darkgrey', x = 125, y = -1.25, paste0("Correlation = ", round(cor (d$government.effectiveness, d$gdp.per.capita, use='complete'), 2))) text (cex = 0.75, col = 'coral1', d.2023$iso2c[d.2023$gdp.per.capita>100], x = d.2023$gdp.per.capita[d.2023$gdp.per.capita>100], y = d.2023$government.effectiveness[d.2023$gdp.per.capita>100] - 0.1) text (cex = 0.75, col = 'coral1', d.2023$iso2c[d.2023$government.effectiveness < -0.55], x = d.2023$gdp.per.capita[d.2023$government.effectiveness < -0.55], y = d.2023$government.effectiveness[d.2023$government.effectiveness < -0.55] - 0.1) ``` ] .pull-right[ <img src="data:image/png;base64,#R-Tutorial-2025-Session-4_files/figure-html/unnamed-chunk-14-1.png" width="504" /> ] --- # We are not done yet! There are additional adjustments we can do. Before we even start the (empty plot), we call the `par()` function, which specifies some basic parameters of the plot, such as the the size of the margins on the four sides of the plot, the background color, the font, etc. For all the options, see `?par`. You can set global options with `par()` for size and color, but it is often better not to do that here, but in the settings for the respective elements of the plot (e.g. the axes). This is an example of how to set some basics: ``` r par(mar = c (2, 2, 4, 0), # margins on the four sides in lines (b,l,t,r) bg = rgb (249, 249, 249), # background color (off-white is nice sometimes) bty = 'n', # type of box around the plot (no, thanks) family = 'Montserrat' # custom font (you have to get it first) ``` --- # Beyond scatterplots We can use the same approach - start with an empty plot and add elements - to produce any kind of plots. For example, we can add a set of rectangles with `rect()` to produce a barplot or a set of polygons with `polygon()` to produce an area plot. We already saw how we can combine lines and dots, which comes handy for plotting data over time. I have written [a detailed guide](https://dimiter.eu/Visualizations_files/ESS/Visualizing_ESS_data.htm) on making a barplot from scratch. The result is on the next slide. An area plot taken from [here](https://www.dimiter.eu/Visualizations_files/csnl.html) also follows. The example of a line plot is from [here](https://dimiter.eu/Visualizations_files/CEE.html). --- class: inverse, top background-image: url("data:image/png;base64,#figs/ess_barplot.png") background-size: contain # A barplot done entirely in (base) R --- class: inverse, top background-image: url("data:image/png;base64,#figs/cs_f15.png") background-size: contain # An area desnity plot done entirely in (base) R --- class: inverse, top background-image: url("data:image/png;base64,#figs/gdppc_region.png") background-size: contain # A time series line plot done entirely in (base) R --- # Taking your plots to the next level (1) Some tips to customize and improve further your plots: - Use `mtext()` to produce the titles, subtitles and lab names. Text produced with `mtext()` can be placed anywhere in the margins of the plot, it can be positioned very precisely, and - most importantly - it can combine text with different colors in the same expression. This allows you to embed the legend of your plots directly in the titles. We do that by `phantom`-ing some parts of the text: ``` r # add to previous block mtext(text=expression("Data: " * phantom("World Bank [2023]")), side=1, line=-0.5, outer=T, at = 1, col="darkgrey", cex=0.8, font=1, adj=1, padj=1) mtext(text=expression(phantom("Data: ") * "World Bank [2023]"), side=1, line=-0.5, outer=T, at = 1, col="coral3", cex=0.8, font=1, adj=1, padj=1) ``` --- # Taking your plots to the next level (2) - To use custom fonts, install and load the packages `systemfonts` and `extrafont`. The `sysfonts` package let's you check and download free fonts from Google. The library `showtext` allows you to use the extra fonts. The function to add fonts from Google is `font_add_google()`, and `font_families()` checks that the fonts you want are installed and available. You turn on the custom fonts availability every session with `showtext_auto()` and - optionally - you set the resolution with `showtext_opts(dpi = 96)` (if you want it at 96 dpi, which also happens to be the default). --- # One plot is not enough (1) Often, we want to combine several plots (or panels) into the same data visualization. For simple layouts, we can specify an `mfrow()` argument in `par()`. For example, to create four equally-sized panels in two columns, we can write `par(mfrow(c(2, 2)))`. For more complex layouts, we have to call `layout()` before running `par()` and the individual plots. The function `layout()` accepts a matrix of rows and columns with numbers in the cells that correspond to the plots. --- # One plot is not enough (2) For example, the layout below will fit a total of 5 plots, with the first big one taking four cells on the left side of the plot space, and plots 2-5 taking one cell each on the right side of the plot space. The order in which you produce the plots sends them to their respective slot. So, if you want a plot to take the bottom-right corner, run it last (if you work with the layout below). ``` r layout(matrix(c(1, 1, 2, 3, 1, 1, 4, 5), nrow=2, byrow=TRUE) ) ``` --- # Saving and exporting your plots We managed to produce some pretty nice graphs already, but for now they are only available in the Plots Viewer. Don't use the export menu from Plots to save your work. We want to do that work programatically. You can save the plots in different formats. The routine is the same: - first, we open the device to which to save (`png`, `pdf`, `tiff`, etc.); - when we do that, we specify the size and resolution; - then, we run the plot syntax (which would not print in our Plots viewer!); - then, we close writing to the device with the function `dev.off()`; - finally, we are ready to find the file with our plot, open it to shine in all its glory, and admire it. --- # Getting publication-ready files (1) It is quite a bother to get the right size, resolution and file type so that your plots look crisp, legible and ready for publication online or in print. I have figured out parameters that work for me, but perhaps you can discover better or more efficient ways to produce publication-ready graphs. Many journals require `tiff` files for graphics. You can export your plots in `tiff` format with: ``` r tiff("./figures/filname.tiff", width = 8.5, height = 6, units = "in", res = 300) ### plot comes here dev.off() ``` You might need to adjust the size of text and points and the width of lines to be legible and clearly visible. Note that file size can be substantial. --- # Getting publication-ready files (2) You can adjust sizes by specifying a scaling factor, e.g. `scaling.factor = 3` before plotting and use it to change all features of the graph until you get the desired result. For example, I find these settings for a `png` acceptable for screens: ``` r scaling.factor = 2 png("./figures/F1_big.png", width = 1280 * scaling.factor, height = 905.5 * scaling.factor, res = 96) ### plot comes here dev.off() ``` Note that the resolution is lower, but the size in pixels is bigger. Accordingly, text, points and line width need to be scaled up (multiplied by 2 or so) as well. Filesize remains small. --- class: inverse, top background-image: url("data:image/png;base64,#figs/F1_big.png") background-size: contain # Here is our plot exported as a 'png' --- # Getting publication-ready files (3) Finally, you can save a `pdf`, which looks great at any resolution. Just make sure that text is large enough to be readable, but not too large to overflow from the page. These are some standard dimensions: ``` r pdf("./figures/F1_A4.pdf", width = 11.69, height = 8.27) ### plot comes here dev.off() ``` --- # Summing up plotting with base R To sum up the procedure for making graphs with base R: 1. start a printing device, e.g. `png()`; 2. lay out the layout with `layout()`; 3. specify the main parameters with `par()` (before each plot, if they differ); 4. run the plots one by one in the order to fill their respective slots in the layout matrix; 5. close with `dev.off()` when you are done. --- # Why leave base R for ggplot2? Clearly, we can achieve pretty much anything we want in base R. You can see why I have little enthusiasm for alternative systems, such as `ggplot`. However... - The default settings of `ggplot2` are more sensible than the default settings of base R. So if you don't want to tinker with the details, working with `ggplot2` is faster. - There are a few extensions for `ggplot2` that make specific tasks much easier, e.g. including non-parametric line fits (see below) or dealing with [overlapping axis labels](https://www.andrewheiss.com/blog/2022/06/23/long-labels-ggplot/). - There are higher-order libraries that, for example, illustrate statistical model results, which are built on top of `ggplot2`, so we have to know how to customize those. --- # A line plot in `ggplot2` (1) The code below produces a line plot showing a flexible, non-parametric fit. .pull-left[ ``` r library(ggplot2) dt <- haven::read_sav("./data/ESS11.sav") ggplot(dt, aes(x = lrscale, y = stfdem)) + geom_smooth(method='gam', color = 'darkblue') + labs(x = 'Left-Right Self-Placement', y = "Satisfaction with Democracy", title = 'Democracy and ideology', subtitle = '' , caption = 'Data: ESS Wave 11 (2023)') + theme_minimal(base_size = 16) + scale_y_continuous() + scale_x_continuous(breaks = seq(0, 10, 1)) ``` ] .pull-right[ <img src="data:image/png;base64,#R-Tutorial-2025-Session-4_files/figure-html/unnamed-chunk-22-1.png" width="504" /> ] --- # A line plot in `ggplot2` (2) We can make some improvements. .pull-left[ ``` r # add with + to the previous block theme(plot.margin = unit(c(2,1,1.5,1.5), "lines"), panel.grid.major = element_line(colour="lightgrey", size=0.2), panel.border = element_blank(), panel.background = element_blank(), panel.grid.minor = element_blank(), axis.ticks = element_blank(), axis.text.x = element_text(vjust = 6), axis.text.y = element_text(hjust = 2), plot.caption = element_text(vjust= -2), plot.title = element_text(vjust=1, size = 20, face='bold')) ``` ] .pull-right[ <img src="data:image/png;base64,#R-Tutorial-2025-Session-4_files/figure-html/unnamed-chunk-24-1.png" width="504" /> ] --- # Small multiples One plot type for which `ggplot2` is very convenient is **small multiples** (several small panels of the same plot per some category). Here is an example: ``` r plot1 <- ggplot(dt, aes(lrscale, stfdem)) + geom_smooth(span = 0.4) + labs(title = 'Democracy and Ideology per Country (2023)', x = 'Left-Right Self-Placement', y = 'Satisfaction with Democracy') + coord_cartesian(ylim=c(2,8)) + scale_x_continuous(breaks = seq(0,10,2)) + scale_y_continuous(breaks = c(0,3,5,7,10)) + theme_minimal(base_size = 16*3) + theme(panel.grid.major = element_line(colour="lightgrey", size=0.2), panel.grid.minor = element_line(size = 0.1), plot.title = element_text(vjust=1, size = 20*3, face='bold')) + facet_wrap(vars(cntry), nrow=4) ggsave(plot1, file = './figures/facets.png', width = 12, height = 9, bg='white') ``` --- class: inverse, top background-image: url("data:image/png;base64,#figs/facets.png") background-size: contain # Here is our `ggplot2` graph exported as `png` --- # Making boxplots with ggplot2 This will be one fancy-looking boxplot. ``` r dt$sex = ifelse (dt$gndr==1, 'male', 'female') p1<-ggplot(dt[1:300,], aes(x=sex, y=stfdem, fill=sex)) + geom_boxplot(alpha=0.6, width=0.55) + theme_bw() + theme(axis.text=element_text(size=12,face="bold"), axis.title=element_text(size=14,face="bold"))+ geom_jitter(alpha=0.2, width = 0.15, height=0.15)+ stat_summary(fun.y=mean, geom="point", shape=20, size=5, color="red", fill="black") + theme(legend.position="none") + ylab("Satisfaction with democracy")+xlab("")+ scale_fill_brewer(palette="Set3")+ scale_y_continuous(breaks=seq(0,10,1), labels=seq(0,10,1))+coord_flip() ggsave(p1, file = "./figures/f1a.png", width = 7, height = 4) ``` --- class: inverse, center, middle background-image: url("data:image/png;base64,#figures/f1a.png") background-size: contain # Fancy boxplot with ggplot2 --- # Illustrating statistical models With the tools that we learned, we can produce plots of marginal effects from scratch. These are just dot-and-whiskers plots, with the dots corresponding to the coefficients and the whiskers to the confidence intervals. This is easy in the case of linear models, but it becomes more complex in the presence of interactions and for non-linear models. Thankfully, there are packages that produce marginal effects plots directly. ``` r dt$age.cat <- factor(cut(dt$agea, breaks=c(-Inf, 25, 40, 60, Inf), labels = c('18-25','26-40','41-60','60+'))) library(fixest) m1a <- feols (stfdem ~ age.cat / etfruit, data=dt, cluster=~cntry) # Prepare a dictionary for variable names dict1 = c('age.cat18-25:etfruit' = 'Fruit Consumptiont (18-25 olds)', 'age.cat26-40:etfruit' = 'Fruit Consumption (26-40 olds)', 'age.cat41-60:etfruit' = 'Fruit Consumption (41-60 olds)', 'age.cat60+:etfruit' = 'Fruit Consumption (60+ olds)') ``` --- # Showing (off) effects (1) Marginal effects with `modelsummary`. The plot thickens: eating fruit decreases satisfaction with democracy only for the middle-aged! .pull-left[ ``` r library(modelsummary) p1a <- modelplot(m1a, coef_map=dict1, size=3) + geom_vline(xintercept = 0, col = "red", linewidth=2) + labs(x = "Marginal effects of Fruit Consumption on\n Satisfaction with Democarcy across age groups") + theme_bw(base_size = 16*2) png ('./figures/marginal_effects_1.png', width=1280*2, height=905.5*2, res=96) p1a dev.off() ``` ] .pull-right[ <img src="data:image/png;base64,#R-Tutorial-2025-Session-4_files/figure-html/unnamed-chunk-29-1.png" width="504" /> ] --- # Showing (off) effects (2) Another great package for illustrating results from stat models is `sjPLot`. See [here](https://strengejacke.github.io/sjPlot/articles/plot_interactions.html) for details how to use it. I have also used `coefplot::multiplot` to illustrate marginal effects of different variables across different models. The `marginaleffects` package (which has a dedicated online [book](https://marginaleffects.com/bonus/get_started.html)) makes it easy to obtain marginal effects, but also calculate predictions and contrasts. It can also do graphs. --- # Resources on dataviz (with R and ggplot2) This is a great [guide](https://pkg.garrickadenbuie.com/gentle-ggplot2/) to using `ggplot2` for dataviz that starts from the basics and goes quite far. Good books on dataviz with R: [Kieran Healy, Data Visualization: A Practical Introduction](https://amzn.to/2VaL1Ys) [Hadley Wickham, ggplot2](https://amzn.to/32gSlTy) [R Graphics Cookbook](https://r-graphics.org/) [Carson Sievert, Interactive Graphs](https://amzn.to/2HErpDU) --- # More resources on dataviz (with R and ggplot2) Free books and online resources on dataviz: [Claus Wilke, Fundamentals of Data Visualization](https://clauswilke.com/dataviz/) [Styling Graphs with ggplot2](https://simplystatistics.org/posts/2019-08-28-you-can-replicate-almost-any-plot-with-ggplot2/) [The BBC Dataviz Style Guide](https://bbc.github.io/rcookbook/#how_to_create_bbc_style_graphics) [Guide on Network Visualization](https://kateto.net/network-visualization) --- # Beyond static dataviz: Shiny Sometimes we have too much data to visualize, too many comparisons to show, or too many statistical results to put in single table. Occasionally, we also want to let the reader find their own insights in the data we provide. To overcome the constraints of static data visualizations and academic articles, we can build interactive dashboards. We can use our `R` skills to do that, with the help of `Shiny`! --- # Beyond static dataviz: Shiny `Shiny` is a special package 'that makes it easy to build interactive web apps straight from R'. Not only you can build a dashboard using `R`, but you can even host the app for free on a server provided by `RStudio` (there are restrictions on the amount of usage the app received for the free option, but for small-scale projects, this is not a problem. If it is, there are paid options without these limitations.) For example, [this is a dashboard](https://anonyms.shinyapps.io/EUattitudes/) of EU attitudes and political ideology. This is [another dashboard](https://dimiter.shinyapps.io/polarization/) of political polarization in South Holland. And [this one](https://anonyms.shinyapps.io/asylum/) presents asylum migration statistics (with maps and tables). The code to generate the dashboards is surprisingly simple: check out examples at my repository on [GitHub](https://github.com/demetriodor/covid-19_mobility)! --- # Beyond stats and dataviz: RMarkdown Wouldn't it be nice, if we could mix text, images and `R` code, to produce articles, presentations and even books... Wouldn't it be even better, if the embedded `R` code would produce all graphs, tables and other results on the fly, every time we generate the document... That would bring the idea of open, reproducible science to a new level, no? It would be great, indeed, and the good news is, we can actually do it! Enter `RMarkdown`! --- # Beyond stats and dataviz: RMarkdown To use `RMarkdown`, we install the `rmarkdown` package (we will also need `knitr`). Then from `RStudio` we start a special type of file (`.Rmd`) that allows use to mix text and code. The syntax for the `R` code is the same as before, but code chunks are embedded in the document between special symbols among the text. Once we are ready, we `knit` the document into one of the many supported formats, including `html`, `pdf`, `MS Word`, etc. To get started, follow the steps [here](https://rmarkdown.rstudio.com/lesson-1.html). <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M496 448H16c-8.84 0-16 7.16-16 16v32c0 8.84 7.16 16 16 16h480c8.84 0 16-7.16 16-16v-32c0-8.84-7.16-16-16-16zm-304-64l-64-32 64-32 32-64 32 64 64 32-64 32-16 32h208l-86.41-201.63a63.955 63.955 0 0 1-1.89-45.45L416 0 228.42 107.19a127.989 127.989 0 0 0-53.46 59.15L64 416h144l-16-32zm64-224l16-32 16 32 32 16-32 16-16 32-16-32-32-16 32-16z"></path></svg> **Protip:** For presentations with `RMarkdown`, use the package `xaringan`. <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M496 448H16c-8.84 0-16 7.16-16 16v32c0 8.84 7.16 16 16 16h480c8.84 0 16-7.16 16-16v-32c0-8.84-7.16-16-16-16zm-304-64l-64-32 64-32 32-64 32 64 64 32-64 32-16 32h208l-86.41-201.63a63.955 63.955 0 0 1-1.89-45.45L416 0 228.42 107.19a127.989 127.989 0 0 0-53.46 59.15L64 416h144l-16-32zm64-224l16-32 16 32 32 16-32 16-16 32-16-32-32-16 32-16z"></path></svg> **Protip:** [`Quarto`](https://quarto.org/) generalizes the idea behind `RMarkdown` and let's you integrate even different programming languages! --- # Don't worry if things don't always woRk .center[] --- # How to get in touch? <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path></svg> demetriodor@gmail.com <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M131.5 217.5L55.1 100.1c47.6-59.2 119-91.8 192-92.1 42.3-.3 85.5 10.5 124.8 33.2 43.4 25.2 76.4 61.4 97.4 103L264 133.4c-58.1-3.4-113.4 29.3-132.5 84.1zm32.9 38.5c0 46.2 37.4 83.6 83.6 83.6s83.6-37.4 83.6-83.6-37.4-83.6-83.6-83.6-83.6 37.3-83.6 83.6zm314.9-89.2L339.6 174c37.9 44.3 38.5 108.2 6.6 157.2L234.1 503.6c46.5 2.5 94.4-7.7 137.8-32.9 107.4-62 150.9-192 107.4-303.9zM133.7 303.6L40.4 120.1C14.9 159.1 0 205.9 0 256c0 124 90.8 226.7 209.5 244.9l63.7-124.8c-57.6 10.8-113.2-20.8-139.5-72.5z"></path></svg> [http://dimiter.eu](http://dimiter.eu) <svg viewBox="0 0 484 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <g groupmode="layer" id="layer6" label="icon"> <path id="Shape_1_" class="st1" d="M 324.19873,96 H 215.27696 C 130.71036,96 61.027479,165.68288 61.027479,250.24948 v 6.08879 c 0,30.44398 9.47146,58.18182 25.70825,83.21353 L 5.5518005,416 123.94503,378.7907 c 25.70825,19.61945 58.18182,30.44397 92.685,30.44397 h 107.5687 c 85.91965,0 154.24947,-69.68287 154.24947,-152.8964 v -6.08879 C 478.4482,165.68288 408.76534,96 324.19873,96 Z M 406,276 c 0,46.68076 -35.23395,75.66979 -81.23818,75.66979 H 213.13392 C 166.45316,351.66979 132,322.68076 132,276 v -40 c 0,-46.68077 34.45321,-81.20125 81.13397,-81.20125 h 111.6279 C 371.44264,154.79875 406,189.31924 406,236 Z" style="stroke-width:1" nodetypes="sssscccssssscsssssscc"></path> </g></svg> @dtoshkov.bsky.social <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> @DToshkov <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> [github.com/demetriodor](https://github.com/demetriodor/) <svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M416 32H31.9C14.3 32 0 46.5 0 64.3v383.4C0 465.5 14.3 480 31.9 480H416c17.6 0 32-14.5 32-32.3V64.3c0-17.8-14.4-32.3-32-32.3zM135.4 416H69V202.2h66.5V416zm-33.2-243c-21.3 0-38.5-17.3-38.5-38.5S80.9 96 102.2 96c21.2 0 38.5 17.3 38.5 38.5 0 21.3-17.2 38.5-38.5 38.5zm282.1 243h-66.4V312c0-24.8-.5-56.7-34.5-56.7-34.6 0-39.9 27-39.9 54.9V416h-66.4V202.2h63.7v29.2h.9c8.9-16.8 30.6-34.5 62.9-34.5 67.2 0 79.7 44.3 79.7 101.9V416z"></path></svg> Dimiter Toshkov