Understand the logic used by ggplot2 to structure graphs
Get familiar with its basic functions and types layer
An R package specifically designed to produce graphics
Unlike other packages, ggplot2 has its own grammar
The grammar is based on “Grammar of Graphics” (Wilkinson 2005)
Independent modules that can be combined in many forms
This grammar provides high flexibility
The main idea is to start with a base layer of raw data and then add more layers of annotations and statistical summaries. The package allows us to produce graphics using the same structure of thought that we use when designing an analysis, reducing the distance of how we visualize a graphic in the head and the final product.
Learning the grammar will not only be crucial to produce a graph of interest, but also to think about other more complex graphs. The advantage of this grammar is the possibility to create new graphs composed of new combinations of elements.
All ggplot2 graphs contain the following components:
This components are put together using “+”.
The most common syntax includes the data within the “ggplot” call and a “geom_” layer.
First install/load the package:
Let’s use the “iris” data set to create scatter plots:
This plot is defined by 3 components: 1. “data”- iris 1. “aes” - Sepal.length vs Petal.length 1. “layer” - Points (geom)
We can also add other aesthetic attributes like color, shape and
size. This attributes can be included within aes()
:
# color by species
ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length,
color = Species)) + geom_point()
# color and shape by species
ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length,
color = Species, shape = Species)) + geom_point()
Note that the aesthetic arguments can also be included in the “geom” layer:
ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length)) +
geom_point(aes(color = Species, shape = Species))
We can also include a fixed value:
Some attributes work better with some data types:
Exercise 1
Using the “hylaeformis” data set (it can also be downloaded here):
# read from website
hylaeformis_data <- read.csv("https://raw.githubusercontent.com/maRce10/OTS_Tropical_Biology_2023/master/data/hylaeformis_data.csv",
stringsAsFactors = FALSE)
# if download manually read it from the local file
# hylaeformis_data <- read.csv('hylaeformis_data.csv',
# stringsAsFactors = FALSE)
head(hylaeformis_data, 20)
1.1 Create a scatter plot of “duration” vs “meanfreq” (mean frequency)
1.2 Add a aesthetic attribute to show a different color for each locality
1.3 Add another aesthetic attribute to show “dfrange” (dominant frequency range) as the shape size
The scale can be fixed or free for the x and y axis, and the number of columns and rows can be modified:
# free x
ggplot(iris, aes(Sepal.Length, Petal.Length)) + geom_point() + facet_wrap(~Species,
scales = "free_x")
# free x and 3 rows
ggplot(iris, aes(Sepal.Length, Petal.Length)) + geom_point() + facet_wrap(~Species,
scales = "free_y", nrow = 3)
# both free and 2 rows
ggplot(iris, aes(Sepal.Length, Petal.Length)) + geom_point() + facet_wrap(~Species,
scales = "free", nrow = 2)
Note that we can also saved the basic component as an R object and add other components later in the code:
geom_smooth()
- adds best fit lines (including CI)geom_boxplot()
geom_histogram()
& geom_freqpoly()
-
frequency distributionsgeom_bar()
- frequency distribution of categorical
variablesgeom_path()
& geom_line()
- add lines
to scatter plots
Best fit regression lines can be added with
geom_smooth()
:
# smoother and CI
ggplot(iris, aes(Sepal.Length, Petal.Length)) + geom_point() + geom_smooth(method = "lm") +
facet_wrap(~Species, scales = "free", nrow = 3)
# without CI
ggplot(iris, aes(Sepal.Length, Petal.Length)) + geom_point() + geom_smooth(method = "lm",
se = FALSE) + facet_wrap(~Species, scales = "free", nrow = 3)
Exercise 2
Using the “msleep” example data set:
2.1 Create a scatter plot of “bodywt”(body weight) vs “brainwt” (brain weight)
2.2 Add “order” as a color aesthetic
2.3 Add a “facet” component to split plots by order using free scales
2.4 Remove the orders with less than 4 species in the data set and make a plot similar to 2.3
2.5 Add a smooth line to each plot in the panel
Again, it only takes a new “geom” component to create a boxplot:
An interesting alternative are the violin plots:
Same thing with histrograms and frequency plots:
We can control the width of the bars:
ggplot(iris, aes(Petal.Length)) + geom_histogram(binwidth = 1, fill = adjustcolor("red2",
alpha.f = 0.3))
## Warning: Duplicated aesthetics after name standardisation: fill
And compare the distribution of different groups within the same histogram:
Besides the basic functions (e.g. components) described above, ggplot has many other tools (both arguments and additional functions) to further customize plots. Pretty much every thing can be modified. Here we see some of the most common tools.
ggplot2 comes with some default themes that can be easily applied to modified the look of our plots:
Most themes differ in the use of grids, border lines and axis labeling patterns.
Axis limits can be modified as follows:
ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) + geom_point() +
xlim(c(0, 10)) + ylim(c(0, 9))
Axis can also be transformed:
ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) + geom_point() +
scale_x_continuous(trans = "log") + scale_y_continuous(trans = "log2")
or reversed:
ggplots can be exported as image files using the ggsave
function:
ggplot(data = msleep[msleep$order %in% names(tab)[tab > 5], ], mapping = aes(x = bodywt,
y = brainwt)) + geom_point() + facet_wrap(~order, scales = "free")
## Warning: Removed 21 rows containing missing values (`geom_point()`).
## Warning: Removed 21 rows containing missing values (`geom_point()`).
The image file type will be identify by the extension in the file name
Additional axis customizing:
# Log2 scaling of the y axis (with visually-equal spacing)
require(scales)
p + scale_y_continuous(trans = log2_trans())
# show exponents
p + scale_y_continuous(trans = log2_trans(), breaks = trans_breaks("log2",
function(x) 2^x), labels = trans_format("log2", math_format(2^.x)))
### Agregar 'tick marks' ###
# Cargar librerías
library(MASS)
data(Animals)
# x and y axis are transformed and formatted
p2 <- ggplot(Animals, aes(x = body, y = brain)) + geom_point(size = 4) +
scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))) + scale_y_log10(breaks = trans_breaks("log10",
function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +
theme_bw()
# log-log plot without log tick marks
p2
Many other types of graphs can be generated. Here I show a single example of cool contour and “heatmap” graphs:
eruptions | waiting |
---|---|
3.60 | 79 |
1.80 | 54 |
3.33 | 74 |
2.28 | 62 |
4.53 | 85 |
2.88 | 55 |
Check the CRAN Graphics Task View for a more comprehensive list of graphical tools in R.
Session information
## R version 4.2.2 Patched (2022-11-10 r83330)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_CR.UTF-8 LC_COLLATE=es_ES.UTF-8
## [5] LC_MONETARY=es_CR.UTF-8 LC_MESSAGES=es_ES.UTF-8
## [7] LC_PAPER=es_CR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_CR.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods
## [7] base
##
## other attached packages:
## [1] scales_1.2.1 MASS_7.3-58.2 viridis_0.6.3
## [4] viridisLite_0.4.2 dplyr_1.1.0 tidyr_1.3.0
## [7] readxl_1.4.1 kableExtra_1.3.4 knitr_1.42
## [10] ggplot2_3.4.2 RColorBrewer_1.1-3
##
## loaded via a namespace (and not attached):
## [1] minqa_1.2.5 colorspace_2.1-0 ellipsis_0.3.2
## [4] rsconnect_0.8.29 sjlabelled_1.2.0 estimability_1.4.1
## [7] fs_1.6.2 rstudioapi_0.14 farver_2.1.1
## [10] remotes_2.4.2 fansi_1.0.4 mvtnorm_1.1-3
## [13] xml2_1.3.4 splines_4.2.2 cachem_1.0.8
## [16] sjmisc_2.8.9 pkgload_1.3.2 jsonlite_1.8.4
## [19] nloptr_2.0.3 ggeffects_1.2.2 broom_1.0.4
## [22] shiny_1.7.3 compiler_4.2.2 httr_1.4.6
## [25] sjstats_0.18.2 emmeans_1.8.6 backports_1.4.1
## [28] assertthat_0.2.1 Matrix_1.5-1 fastmap_1.1.1
## [31] cli_3.6.1 later_1.3.1 formatR_1.12
## [34] htmltools_0.5.5 prettyunits_1.1.1 tools_4.2.2
## [37] lmerTest_3.1-3 coda_0.19-4 gtable_0.3.3
## [40] glue_1.6.2 Rcpp_1.0.10 carData_3.0-5
## [43] cellranger_1.1.0 jquerylib_0.1.4 vctrs_0.6.2
## [46] sjPlot_2.8.14 svglite_2.1.0 nlme_3.1-162
## [49] insight_0.19.2 xfun_0.39 stringr_1.5.0
## [52] ps_1.7.5 lme4_1.1-33 rvest_1.0.3
## [55] mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.3
## [58] devtools_2.4.5 klippy_0.0.0.9500 ragg_1.2.4
## [61] promises_1.2.0.1 yaml_2.3.7 memoise_2.0.1
## [64] gridExtra_2.3 sass_0.4.6 stringi_1.7.12
## [67] highr_0.10 bayestestR_0.13.1 boot_1.3-28
## [70] pkgbuild_1.3.1 rlang_1.1.1 pkgconfig_2.0.3
## [73] systemfonts_1.0.4 evaluate_0.21 lattice_0.20-45
## [76] purrr_1.0.1 htmlwidgets_1.5.4 labeling_0.4.2
## [79] tidyselect_1.2.0 processx_3.8.1 magrittr_2.0.3
## [82] R6_2.5.1 generics_0.1.3 profvis_0.3.7
## [85] pillar_1.9.0 withr_2.5.0 mgcv_1.8-41
## [88] abind_1.4-5 tibble_3.2.1 performance_0.10.3
## [91] modelr_0.1.11 crayon_1.5.2 car_3.1-2
## [94] utf8_1.2.3 rmarkdown_2.21 urlchecker_1.0.1
## [97] usethis_2.1.6 grid_4.2.2 isoband_0.2.7
## [100] callr_3.7.3 digest_0.6.31 webshot_0.5.4
## [103] xtable_1.8-4 httpuv_1.6.6 numDeriv_2016.8-1.1
## [106] textshaping_0.3.6 munsell_0.5.0 bslib_0.4.2
## [109] sessioninfo_1.2.2