Homogeneous | Heterogeneous | |
---|---|---|
1d | Atomic vector | List |
2d | Matrix | Data frame |
nd | Array |
R basics
Objectives
Get familiar with the basic blocks used in R programming
Know the main sources of standardized documentation in R
1 What is R?
- A free Domain Specific Language (DSL) for statistics and data analysis
- A collection of over 18695 packages (as of Sep-21-2022)
- A large and active community in both industry and academia
- A way to “speak directly” to your computer
Historically:
- Based on the S programming language
- Around 20 years old (the lineage dates back to 1975 - almost 40 years ago)
2 Rstudio
Integrated Development Environment (IDE) for R. It includes:
- A console
- Syntax-highlighting editor that supports direct code execution
- Tools for plotting, history, debugging, and workspace management
3 Elements of the R language
- Vectors
- Lists
- Matrices
- Data frames
- Functions (including operators)
- Tables
- Attributes
ArraysEnvironments
4 Basic structure of data representation
The basic data structure in R is the vector. There are two basic types of vectors: atomic vectors and lists.
They have three common properties:
- Type,
typeof()
(class/mode ~) - Length,
length()
(number of elements) - Attributes,
attributes()
(metadata)
They differ in the types of their elements: all elements of an atomic vector must be of the same type, whereas elements of a list can have different types
Individual numbers or strings are actually vectors of length one.
4.1 Atomic vectors
Types of atomic vectors:
- Logical (boolean)
- Integer
- Numeric (double)
- Characters
- Factors
Vectors are constructed using the c()
function:
Code
<- 1
x <- c(1)
x1
all.equal(x, x1)
## [1] TRUE
class(x)
## [1] "numeric"
<- "something"
y
class(y)
## [1] "character"
<- TRUE
z
class(z)
## [1] "logical"
<- factor(1)
q
class(q)
## [1] "factor"
Vectors can only contain elements of the same type. Different types of elements will be coerced to the most flexible type:
Code
<- c(10, 11, 12, 13)
v
class(v)
## [1] "numeric"
typeof(v)
## [1] "double"
<- c("a", "b")
y
class(y)
## [1] "character"
<- c(1,2,3, "a")
x
x## [1] "1" "2" "3" "a"
class(x)
## [1] "character"
Missing values are specified with NA, which is a logical vector of length 1. NA will always be coerced to the correct type if used within c()
:
Code
<- c(10, 11, 12, 13, NA)
v
class(v)
## [1] "numeric"
<- c("a", "b", NA)
v
class(v)
## [1] "character"
4.2 Lists
Can contain objects of different classes and sizes. Lists are constructed with list()
:
Code
<- list("a", 1, FALSE)
l
l
[[1]]
[1] "a"
[[2]]
[1] 1
[[3]]
[1] FALSE
Code
class(l)
[1] "list"
Code
str(l)
List of 3
$ : chr "a"
$ : num 1
$ : logi FALSE
They can actually be seen as bins where any other type of object can be put:
Code
<- list(c("a", "b"), c(1, 2, 3, 4), c(FALSE, TRUE, FALSE))
l
str(l)
List of 3
$ : chr [1:2] "a" "b"
$ : num [1:4] 1 2 3 4
$ : logi [1:3] FALSE TRUE FALSE
Code
<- list(l, l)
l2
str(l2)
List of 2
$ :List of 3
..$ : chr [1:2] "a" "b"
..$ : num [1:4] 1 2 3 4
..$ : logi [1:3] FALSE TRUE FALSE
$ :List of 3
..$ : chr [1:2] "a" "b"
..$ : num [1:4] 1 2 3 4
..$ : logi [1:3] FALSE TRUE FALSE
4.3 Naming elements
Vectors can be named in three ways:
- When creating it:
x <- c(a = 1, b = 2, c = 3)
. - When modifying an existing vector in place:
x <- 1:3
;names(x) <- c("a", "b", "c")
Or:x <- 1:3
;names(x)[[1]] <- c("a")
- Creating a modified copy of a vector:
x <- setNames(1:3, c("a", "b", "c"))
Code
<- c(a = 1, 2, 3)
y
names(y)
[1] "a" "" ""
Code
<- c(1, 2, 3)
v
names(v) <- c('a')
names(v)
[1] "a" NA NA
Code
<- setNames(1:3, c("a", "b", "c"))
z
names(z)
[1] "a" "b" "c"
4.4 Factors
Attributes are used to define factors. A factor is a vector that can only contain predefined values and is used to store categorical data.
Factors are constructed on integer vectors using two attributes:
- class “factor”: makes them behave differently from normal character vectors
- levels: define the set of allowed values
Code
<- factor(c("a", "b", "b", "a"))
x x
[1] a b b a
Levels: a b
Code
levels(x)
[1] "a" "b"
Code
str(x)
Factor w/ 2 levels "a","b": 1 2 2 1
Factors look like character vectors, but they are actually integers:
Code
<- factor(c("a", "b", "b", "a"))
x
c(x)
[1] a b b a
Levels: a b
4.5 Matrices
All elements are of the same type:
Code
<- matrix(c(1, 2, 3, 11, 12, 13), nrow = 2)
m
dim(m)
[1] 2 3
Code
m
[,1] [,2] [,3]
[1,] 1 3 12
[2,] 2 11 13
Code
class(m)
[1] "matrix" "array"
Code
<- matrix(c(1, 2, 3, 11, 12, "13"), nrow = 2)
m m
[,1] [,2] [,3]
[1,] "1" "3" "12"
[2,] "2" "11" "13"
4.6 Data frames
Special case of lists. Can contain elements of different types:
Code
<-
m data.frame(
ID = c("a", "b", "c", "d", "e"),
size = c(1, 2, 3, 4, 5),
observed = c(FALSE, TRUE, FALSE, FALSE, FALSE)
)
dim(m)
[1] 5 3
Code
m
ID | size | observed |
---|---|---|
a | 1 | FALSE |
b | 2 | TRUE |
c | 3 | FALSE |
d | 4 | FALSE |
e | 5 | FALSE |
Code
class(m)
[1] "data.frame"
Code
is.data.frame(m)
[1] TRUE
Code
is.list(m)
[1] TRUE
Code
str(m)
'data.frame': 5 obs. of 3 variables:
$ ID : chr "a" "b" "c" "d" ...
$ size : num 1 2 3 4 5
$ observed: logi FALSE TRUE FALSE FALSE FALSE
But vectors must have the same length:
Code
<-
m data.frame(
ID = c("a", "b", "c", "d", "e"),
size = c(1
2, 3, 4, 5, 6),
, observed = c(FALSE, TRUE, FALSE, FALSE, FALSE)
)
Error in data.frame(ID = c("a", "b", "c", "d", "e"), size = c(1, 2, 3, : arguments imply differing number of rows: 5, 6
5 Exercise 1
Create a numeric vector with 8 elements containing positive and negative numbers
Create a character vector with the names of the stations that will be visited during the course
Add an NA to the above point vector
Create a numeric matrix with 3 columns and 3 rows
Create a character matrix with 4 columns and 3 rows
What type of object is ‘iris’ and what are its dimensions?
Create a data frame with a numeric column, a character column, and a column with factors
6 Extracting subsets using indexing
Elements within objects can be called by indexing. To subset a vector simply call the object’s position using square brackets:
Code
<- c(1, 3, 4, 10, 15, 20, 50, 1, 6)
x
1] x[
[1] 1
Code
2] x[
[1] 3
Code
2:3] x[
[1] 3 4
Code
c(1,3)] x[
[1] 1 4
Elements can be removed in the same way:
Code
-1] x[
[1] 3 4 10 15 20 50 1 6
Code
-c(1,3)] x[
[1] 3 10 15 20 50 1 6
Matrices and data frames require 2 indices [row, column]
:
Code
<- matrix(c(1, 2, 3, 11, 12, 13), nrow = 2)
m
1, ] m[
[1] 1 3 12
Code
1] m[,
[1] 1 2
Code
1, 1] m[
[1] 1
Code
-1, ] m[
[1] 2 11 13
Code
-1] m[,
[,1] [,2]
[1,] 3 12
[2,] 11 13
Code
-1, -1] m[
[1] 11 13
Code
<- data.frame(
df family = c("Psittacidae", "Trochilidae", "Psittacidae"),
genus = c("Amazona", "Phaethornis", "Ara"),
species = c("aestiva", "philippii", "ararauna")
)
df
family | genus | species |
---|---|---|
Psittacidae | Amazona | aestiva |
Trochilidae | Phaethornis | philippii |
Psittacidae | Ara | ararauna |
Code
1, ] df[
family | genus | species |
---|---|---|
Psittacidae | Amazona | aestiva |
Code
1] df[,
[1] "Psittacidae" "Trochilidae" "Psittacidae"
Code
1, 1] df[
[1] "Psittacidae"
Code
-1, ] df[
family | genus | species | |
---|---|---|---|
2 | Trochilidae | Phaethornis | philippii |
3 | Psittacidae | Ara | ararauna |
Code
-1] df[,
genus | species |
---|---|
Amazona | aestiva |
Phaethornis | philippii |
Ara | ararauna |
Code
-1, -1] df[
genus | species | |
---|---|---|
2 | Phaethornis | philippii |
3 | Ara | ararauna |
Code
"family"] df[,
[1] "Psittacidae" "Trochilidae" "Psittacidae"
Code
c("family", "genus")] df[,
family | genus |
---|---|
Psittacidae | Amazona |
Trochilidae | Phaethornis |
Psittacidae | Ara |
Lists require 1 index between double brackets [[index]]
:
Code
<- list(c("a", "b"),
l c(1, 2, 3),
c(FALSE, TRUE, FALSE, FALSE))
1]] l[[
[1] "a" "b"
Code
3]] l[[
[1] FALSE TRUE FALSE FALSE
Elements within lists can also be subset in the same line of code:
Code
1]][1:2] l[[
[1] "a" "b"
Code
3]][2] l[[
[1] TRUE
7 Exploring objects
Code
str(df)
'data.frame': 3 obs. of 3 variables:
$ family : chr "Psittacidae" "Trochilidae" "Psittacidae"
$ genus : chr "Amazona" "Phaethornis" "Ara"
$ species: chr "aestiva" "philippii" "ararauna"
Code
names(df)
[1] "family" "genus" "species"
Code
dim(df)
[1] 3 3
Code
nrow(df)
[1] 3
Code
ncol(df)
[1] 3
Code
head(df)
family | genus | species |
---|---|---|
Psittacidae | Amazona | aestiva |
Trochilidae | Phaethornis | philippii |
Psittacidae | Ara | ararauna |
Code
tail(df)
family | genus | species |
---|---|---|
Psittacidae | Amazona | aestiva |
Trochilidae | Phaethornis | philippii |
Psittacidae | Ara | ararauna |
Code
table(df$genus)
Amazona Ara Phaethornis
1 1 1
Code
class(df)
[1] "data.frame"
Code
View(df)
8 Functions
All functions are created with the function()
function and follow the same structure:
* Modified from Grolemund 2014
R comes with many functions that you can use to perform sophisticated tasks:
Code
# built in functions
<- builtins()
bi
length(bi)
[1] 1388
Code
set.seed(22)
sample(bi, 10)
[1] "print.warnings" ".colMeans"
[3] "row" ".encode_numeric_version"
[5] "gzcon" "delayedAssign"
[7] "rep.int" "class"
[9] ".mergeExportMethods" "charmatch"
Operators are functions:
Code
1 + 1
[1] 2
Code
'+'(1, 1)
[1] 2
Code
2 * 3
[1] 6
Code
'*'(2, 3)
[1] 6
8.1 Most used operators
Arithmetic operators:
Operator | Description |
---|---|
+ | addition |
- | subtraction |
* | multiplication |
/ | division |
^ or ** | exponentiation |
Code
1 - 2
[1] -1
Code
1 + 2
[1] 3
Code
2 ^ 2
[1] 4
Code
2 ** 2
[1] 4
Code
2:3 %in% 2:4
[1] TRUE TRUE
Logical operators:
Operator | Description |
---|---|
< | less than |
<= | less than or equal to |
> | greater than |
>= | greater than or equal to |
== | exactly equal to |
!= | not equal to |
!x | not x |
x | y | x OR y |
x & y | x AND y |
x %in% y | match |
Code
1 < 2
[1] TRUE
Code
1 > 2
[1] FALSE
Code
1 <= 2
[1] TRUE
Code
1 == 2
[1] FALSE
Code
1 != 2
[1] TRUE
Code
1 > 2
[1] FALSE
Code
5 %in% 1:6
[1] TRUE
Code
5 %in% 1:4
[1] FALSE
9 Exercise 2
Use the sample data
iris
to create a subset of data with only observations of the speciessetosa
Now create a subset of data containing observations of both “setosa” and “versicolor”
Also with
iris
create a subset of data with observations for whichiris$Sepal.length
is greater than 6How many observations have a sepal length greater than 6?
Most functions are vectorized:
Code
1:6 * 1:6
* Modified from Grolemund & Wickham 2017
[1] 1 4 9 16 25 36
Code
1:6 - 1:6
[1] 0 0 0 0 0 0
R recycles vectors of unequal length:
Code
1:6 * 1:5
* Modified from Grolemund & Wickham 2017
```{r,
echo=F}
1:6 * 1:5
::: {.cell}
```{.r .cell-code}
1:6 + 1:5
Warning in 1:6 + 1:5: longitud de objeto mayor no es múltiplo de la longitud de
uno menor
[1] 2 4 6 8 10 7
:::
10 Style Matters
Based on google’s R Style Guide
10.1 File Names
File names should end in .R and, of course, be self-explanatory:
- Good: plot_probability_posterior.R
- Bad: plot.R
10.2 Object Names
Variables and functions:
- Lowercase
- Use an underscore
- Generally, names for variables and verbs for functions
- Make names concise and meaningful (not always easy)
- Avoid using names of existing functions or variables
Code
- Good: day_one: day_1, average_weight(),
- Bad: dayone, day1, first.day_of_month, mean <- function(x) sum(x), c <- 10
10.3 Syntax
10.3.1 Spaces
- Use spaces around operators and for arguments within a function
- Always put a space after a comma, and never before (as in normal English)
- Place a space before the left parenthesis, except in a function call
Code
- Good:
<- rnorm(n = 10, sd = 10, mean = 1)
a <- table(df[df$dias < 0, "campaign.id"])
tab.prior <- sum(x[, 1])
total <- sum(x[1, ])
total if (debug)
mean(1:10)
- Bad:
<-rnorm(n=10,sd=10,mean=1)
a<- table(df[df$days.from.opt<0, "campaign.id"]) # needs space around '<'
tab.prior <- table(df[df$days.from.opt < 0,"campaign.id"]) # Needs space after comma
tab.prior <- table(df[df$days.from.opt < 0, "campaign.id"]) # Needs space before <-
tab.prior<-table(df[df$days.from.opt < 0, "campaign.id"]) # Needs space around <-
tab.prior<- sum(x[,1]) # Needs space before comma
total if(debug) # Needs space before parenthesis
mean (1:10) # Extra space after function name
10.3.2 Brackets
- Opening brace should never go on its own line
- Closing brace should always go on its own line
- You may omit braces when a block consists of a single statement
Code
- Good:
if (is.null(ylim)) {
<- c(0, 0.06)
ylim
}
if (is.null(ylim))
<- c(0, 0.06)
ylim
- Bad:
if (is.null(ylim)) ylim <- c(0, 0.06)
if (is.null(ylim)) {ylim <- c(0, 0.06)}
if (is.null(ylim)) {
<- c(0, 0.06)
ylim }
10.4 Creating Objects
- Use <-, not =
Code
- GOOD:
<- 5
x
- BAD:
= 5 x
10.5 Commenting
- Comment your code
- Fully commented lines should start with # and a space
- Short comments can be placed after the code preceded by two spaces, #, and then a space
Code
# Create histogram of frequency of campaigns by pct budget spent.
hist(df$pct.spent,
breaks = "scott", # method for choosing number of buckets
main = "Histogram: individuals per unit of time",
xlab = "Number of individuals",
ylab = "Frequency")
11 R Documentation
Most R resources are well-documented. So the first source of help you should turn to when writing R code is R’s own documentation. All packages are documented in the same standard way. Getting familiar with the format can simplify things a lot.
11.1 Package Documentation
Reference Manuals
Reference manuals are collections of documentation for all functions of a package (only 1 per package):
11.2 Function Documentation
All functions (default or from loaded packages) should have documentation following a standard format:
Code
?mean
help("mean")
This documentation can also be displayed in RStudio by pressing F1
when the cursor is on the function name.
If you don’t remember the function name, try apropos()
:
Code
apropos("mean")
[1] ".colMeans" ".rowMeans" "colMeans" "kmeans"
[5] "mean" "mean.Date" "mean.default" "mean.difftime"
[9] "mean.POSIXct" "mean.POSIXlt" "rowMeans" "weighted.mean"
11.3 Vignettes
Vignettes are illustrative documents or case studies detailing the usage of a package (optional, there can be several per package).
Vignettes can be called directly from R:
Code
<- browseVignettes() vgn
Code
vignette()
They should also appear on the package page on CRAN.
11.4 Demonstrations
Packages can also include extended code demonstrations (“demos”). To list the demos of a package, run demo("package name")
:
Code
demo(package="stats")
# call demo directly
demo("nlm")
12 Exercise 3
What does the function
cut()
do?What is the purpose of the
breaks
argument incut()
?Execute the first 4 lines of code from the examples provided in the documentation of
cut()
.How many vignettes does the package warbleR have?
References
- Advanced R, H Wickham
- Google’s R Style Guide
- Hands-On Programming with R (Grolemund, 2014)
Session Information
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Costa_Rica
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 httr_1.4.7 svglite_2.1.3 cli_3.6.2
[5] knitr_1.46 rlang_1.1.3 xfun_0.43 highr_0.10
[9] stringi_1.8.3 jsonlite_1.8.8 glue_1.7.0 colorspace_2.1-0
[13] htmltools_0.5.8.1 scales_1.3.0 rmarkdown_2.26 evaluate_0.23
[17] munsell_0.5.0 kableExtra_1.3.4 fastmap_1.1.1 yaml_2.3.8
[21] lifecycle_1.0.4 stringr_1.5.1 compiler_4.3.2 rvest_1.0.3
[25] htmlwidgets_1.6.4 rstudioapi_0.15.0 systemfonts_1.0.5 digest_0.6.35
[29] viridisLite_0.4.2 R6_2.5.1 magrittr_2.0.3 webshot_0.5.5
[33] tools_4.3.2 xml2_1.3.6