Historically:
Integrated development environment (IDE) for R. Includes:
Data structure
The basic data structure in R is the vector. There are two basic types of vectors: atomic vectors and lists.
They have three common properties:
typeof()
(~ class/mode)length()
(number of elements)attributes()
(metadata)They differ in the types of their elements: all elements of an atomic vector must be the same type, whereas the elements of a list can have different types.
Homogeneous | Heterogeneous | |
---|---|---|
1d | Atomic vector | List |
2d | Matrix | Data frame |
nd | Array |
R has no 0-dimensional elements (scalars). Individual numbers or strings are actually vectors of length one.
Atomic vectors
Types of atomic vectors:
Vectors are built using c()
:
x <- 1
x1 <- c(1)
all.equal(x, x1)
## [1] TRUE
class(x)
## [1] "numeric"
y <- "something"
class(y)
## [1] "character"
w <- 1L
class(w)
## [1] "integer"
z <- TRUE
class(z)
## [1] "logical"
q <- factor(1)
class(q)
## [1] "factor"
Vectors can only contain entries of the same type. Different types will be coerced to the most flexible type:
v <- c(10, 11, 12, 13)
class(v)
## [1] "numeric"
typeof(v)
## [1] "double"
is.integer(v)
## [1] FALSE
y <- c("Amazona", "Ara", "Eupsittula", "Myiopsitta")
class(y)
## [1] "character"
is.integer(y)
## [1] FALSE
x <- c(1,2,3, "Myiopsitta")
x
## [1] "1" "2" "3" "Myiopsitta"
class(x)
## [1] "character"
Missing values are specified with NA, which is a logical vector of
length 1. NA will always be coerced to the correct type if used inside
c()
:
v <- c(10, 11, 12, 13, NA)
class(v)
## [1] "numeric"
v <- c(letters[1:3], NA)
class(v)
## [1] "character"
Lists
Can contain objects of different classes and sizes. Lists are built using list():
l <- list(ID = c("a", "b", "c", "d", "e"), size = c(1, 2, 3, 4, 5), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE))
l
## $ID
## [1] "a" "b" "c" "d" "e"
##
## $size
## [1] 1 2 3 4 5
##
## $observed
## [1] FALSE TRUE FALSE FALSE FALSE
class(l)
## [1] "list"
str(l)
## List of 3
## $ ID : chr [1:5] "a" "b" "c" "d" ...
## $ size : num [1:5] 1 2 3 4 5
## $ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
… and dimensions:
l <- list(ID = c("a", "b", "c", "d", "e"), size = c(1, 2, 3, 4, 5, 6), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE), l)
str(l)
## List of 4
## $ ID : chr [1:5] "a" "b" "c" "d" ...
## $ size : num [1:6] 1 2 3 4 5 6
## $ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
## $ :List of 3
## ..$ ID : chr [1:5] "a" "b" "c" "d" ...
## ..$ size : num [1:5] 1 2 3 4 5
## ..$ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
l2 <- list(l, l)
str(l2)
## List of 2
## $ :List of 4
## ..$ ID : chr [1:5] "a" "b" "c" "d" ...
## ..$ size : num [1:6] 1 2 3 4 5 6
## ..$ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
## ..$ :List of 3
## .. ..$ ID : chr [1:5] "a" "b" "c" "d" ...
## .. ..$ size : num [1:5] 1 2 3 4 5
## .. ..$ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
## $ :List of 4
## ..$ ID : chr [1:5] "a" "b" "c" "d" ...
## ..$ size : num [1:6] 1 2 3 4 5 6
## ..$ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
## ..$ :List of 3
## .. ..$ ID : chr [1:5] "a" "b" "c" "d" ...
## .. ..$ size : num [1:5] 1 2 3 4 5
## .. ..$ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
Names
Vectors can be named in three ways:
x <- c(a = 1, b = 2, c = 3)
x <- 1:3
;
names(x) <- c("a", "b", "c")
Or:
x <- 1:3
; names(x)[[1]] <- c("a")
x <- setNames(1:3, c("a", "b", "c"))
y <- c(a = 1, 2, 3)
names(y)
## [1] "a" "" ""
v <- c(1, 2, 3)
names(v) <- c('a')
names(v)
## [1] "a" NA NA
z <- setNames(1:3, c("a", "b", "c"))
names(z)
## [1] "a" "b" "c"
Factors
Attributes are used to define factors. A factor is a vector that can contain only predefined values, and is used to store categorical data.
Factors are built on top of integer vectors using two attributes:
x <- factor(c("a", "b", "b", "a"))
x
## [1] a b b a
## Levels: a b
levels(x)
## [1] "a" "b"
str(x)
## Factor w/ 2 levels "a","b": 1 2 2 1
Factors look like character vectors but they are actually integers:
x <- factor(c("a", "b", "b", "a"))
c(x)
## [1] a b b a
## Levels: a b
Matrices
All entries are of the same type:
m <- matrix(c(1, 2, 3, 11, 12, 13), nrow = 2)
dim(m)
## [1] 2 3
m
## [,1] [,2] [,3]
## [1,] 1 3 12
## [2,] 2 11 13
class(m)
## [1] "matrix" "array"
m <- matrix(c(1, 2, 3, 11, 12,"13"), nrow = 2)
m
## [,1] [,2] [,3]
## [1,] "1" "3" "12"
## [2,] "2" "11" "13"
Can be created by modifying the dimension attribute:
c <- 1:6
is.matrix(c)
## [1] FALSE
attributes(c)
## NULL
dim(c) <- c(3, 2)
c
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
is.matrix(c)
## [1] TRUE
attributes(c)
## $dim
## [1] 3 2
Data frames
Special case of lists. Can contain entries of different types:
m <- data.frame(ID = c("a", "b", "c", "d", "e"), size = c(1, 2, 3, 4, 5), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE))
dim(m)
## [1] 5 3
m
## ID size observed
## 1 a 1 FALSE
## 2 b 2 TRUE
## 3 c 3 FALSE
## 4 d 4 FALSE
## 5 e 5 FALSE
class(m)
## [1] "data.frame"
is.data.frame(m)
## [1] TRUE
is.list(m)
## [1] TRUE
str(m)
## 'data.frame': 5 obs. of 3 variables:
## $ ID : chr "a" "b" "c" "d" ...
## $ size : num 1 2 3 4 5
## $ observed: logi FALSE TRUE FALSE FALSE FALSE
But vectors should have the same length:
m <- data.frame(ID = c("a", "b", "c", "d", "e"), size = c(1, 2, 3, 4, 5, 6), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE))
## Error in data.frame(ID = c("a", "b", "c", "d", "e"), size = c(1, 2, 3, : arguments imply differing number of rows: 5, 6
Indexing
Elements within objects can be called by indexing. To subset a vector simply call the object position using square brackets:
x <- c(1, 3, 4, 10, 15, 20, 50, 1, 6)
x[1]
## [1] 1
x[2]
## [1] 3
x[2:3]
## [1] 3 4
x[c(1,3)]
## [1] 1 4
Elements can be removed in the same way:
x[-1]
## [1] 3 4 10 15 20 50 1 6
x[-c(1,3)]
## [1] 3 10 15 20 50 1 6
Matrices and data frames required 2 indices
[row, column]
:
m <- matrix(c(1, 2, 3, 11, 12, 13), nrow = 2)
m[1, ]
## [1] 1 3 12
m[, 1]
## [1] 1 2
m[1, 1]
## [1] 1
m[-1, ]
## [1] 2 11 13
m[, -1]
## [,1] [,2]
## [1,] 3 12
## [2,] 11 13
m[-1, -1]
## [1] 11 13
df <- data.frame(family = c("Psittacidae", "Trochilidae",
"Psittacidae"),
genus = c("Amazona", "Phaethornis", "Ara"),
species = c("aestiva", "philippii", "ararauna"))
df
## family genus species
## 1 Psittacidae Amazona aestiva
## 2 Trochilidae Phaethornis philippii
## 3 Psittacidae Ara ararauna
df[1, ]
## family genus species
## 1 Psittacidae Amazona aestiva
df[, 1]
## [1] "Psittacidae" "Trochilidae" "Psittacidae"
df[1, 1]
## [1] "Psittacidae"
df[-1, ]
## family genus species
## 2 Trochilidae Phaethornis philippii
## 3 Psittacidae Ara ararauna
df[, -1]
## genus species
## 1 Amazona aestiva
## 2 Phaethornis philippii
## 3 Ara ararauna
df[-1, -1]
## genus species
## 2 Phaethornis philippii
## 3 Ara ararauna
df[,"family"]
## [1] "Psittacidae" "Trochilidae" "Psittacidae"
df[,c("family", "genus")]
## family genus
## 1 Psittacidae Amazona
## 2 Trochilidae Phaethornis
## 3 Psittacidae Ara
Lists require 1 index within double square brackets
[[index]]
:
l <- list(ID = c("a", "b", "c", "d", "e"), size = c(1, 2, 3, 4, 5), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE))
l[[1]]
## [1] "a" "b" "c" "d" "e"
l[[3]]
## [1] FALSE TRUE FALSE FALSE FALSE
Elements within lists can also be subset in the same string of code:
l[[1]][1:2]
## [1] "a" "b"
l[[3]][2]
## [1] TRUE
Exploring objects
str(df)
## 'data.frame': 3 obs. of 3 variables:
## $ family : chr "Psittacidae" "Trochilidae" "Psittacidae"
## $ genus : chr "Amazona" "Phaethornis" "Ara"
## $ species: chr "aestiva" "philippii" "ararauna"
names(df)
## [1] "family" "genus" "species"
dim(df)
## [1] 3 3
nrow(df)
## [1] 3
ncol(df)
## [1] 3
head(df)
## family genus species
## 1 Psittacidae Amazona aestiva
## 2 Trochilidae Phaethornis philippii
## 3 Psittacidae Ara ararauna
tail(df)
## family genus species
## 1 Psittacidae Amazona aestiva
## 2 Trochilidae Phaethornis philippii
## 3 Psittacidae Ara ararauna
table(df$genus)
##
## Amazona Ara Phaethornis
## 1 1 1
typeof(df)
## [1] "list"
View(df)
Exercise
Using the example data iris
to create a data subset
with only the observations of the species ‘setosa’
Now create a data subset containing the observations of both ‘setosa’ and ‘versicolor’
Also with iris
create a data subset with the
observations for which iris$Sepal.length
is higher than
6
How many observations have a sepal length higher than 6?
All functions are created by the function function()
and
follow the same structure:
* Modified from Grolemund 2014
R comes with many functions that you can use to do sophisticated tasks:
# built in functions
bi <- builtins()
length(bi)
## [1] 1370
sample(bi, 10)
## [1] "unlist" "mat.or.vec" "close"
## [4] "interaction" ".S3_methods_table" "vapply"
## [7] "dimnames" "digamma" "source"
## [10] ".BaseNamespaceEnv"
Operators are functions:
1 + 1
## [1] 2
'+'(1, 1)
## [1] 2
2 * 3
## [1] 6
'*'(2, 3)
## [1] 6
Most commonly used R operators
Arithmetic operators:
Operator | Description |
---|---|
+ | addition |
- | subtraction |
* | multiplication |
/ | division |
^ or ** | exponent |
x %% y | modulus (x mod y) |
x %/% y | integer division |
1 - 2
## [1] -1
1 + 2
## [1] 3
2 ^ 2
## [1] 4
2 ** 2
## [1] 4
5 %% 2
## [1] 1
5 %/% 2
## [1] 2
Logical operators:
Operator | Description |
---|---|
< | less than |
<= | less than or equal to |
> | greater than |
>= | greater than or equal to |
== | exactly equal to |
!= | not equal to |
!x | Not x |
x | y | x OR y |
x & y | x AND y |
x %in% y | match |
1 < 2
## [1] TRUE
1 > 2
## [1] FALSE
1 <= 2
## [1] TRUE
1 == 2
## [1] FALSE
1 != 2
## [1] TRUE
1 > 2
## [1] FALSE
5 %in% 1:6
## [1] TRUE
5 %in% 1:4
## [1] FALSE
Most functions are vectorized:
1:6 * 1:6
* Modified from Grolemund & Wickham 2017
## [1] 1 4 9 16 25 36
1:6 - 1:6
## [1] 0 0 0 0 0 0
R recycles vectors of unequal length:
1:6 * 1:5
* Modified from Grolemund & Wickham 2017
## Warning in 1:6 * 1:5: longitud de objeto mayor no es múltiplo de la longitud de
## uno menor
## [1] 1 4 9 16 25 6
1:6 + 1:5
## Warning in 1:6 + 1:5: longitud de objeto mayor no es múltiplo de la longitud de
## uno menor
## [1] 2 4 6 8 10 7
Based on google’s R Style Guide
File names
File names should end in .R and, of course, be meaningful:
Object names
Variables and functions:
- GOOD: day_one: day_1, mean.day(),
- BAD: dayOne, day1, firstDay_of.month, mean <- function(x) sum(x), c <- 10
Syntax
Spacing:
- GOOD:
a <- rnorm(n = 10, sd = 10, mean = 1)
tab.prior <- table(df[df$days.from.opt < 0, "campaign.id"])
total <- sum(x[, 1])
total <- sum(x[1, ])
if (debug)
mean(1:10)
- BAD:
a<-rnorm(n=10,sd=10,mean=1)
tab.prior <- table(df[df$days.from.opt<0, "campaign.id"]) # Needs spaces around '<'
tab.prior <- table(df[df$days.from.opt < 0,"campaign.id"]) # Needs a space after the comma
tab.prior<- table(df[df$days.from.opt < 0, "campaign.id"]) # Needs a space before <-
tab.prior<-table(df[df$days.from.opt < 0, "campaign.id"]) # Needs spaces around <-
total <- sum(x[,1]) # Needs a space after the comma
total <- sum(x[ ,1]) # Needs a space after the comma, not before
if(debug) # Needs a space before parenthesis
mean (1:10) # ) # Extra space before parenthesis
Curly braces:
- GOOD:
if (is.null(ylim)) {
ylim <- c(0, 0.06)
}
if (is.null(ylim))
ylim <- c(0, 0.06)
- BAD:
if (is.null(ylim)) ylim <- c(0, 0.06)
if (is.null(ylim)) {ylim <- c(0, 0.06)}
if (is.null(ylim)) {
ylim <- c(0, 0.06)
}
Assigments:
- GOOD:
x <- 5
- BAD:
x = 5
Commenting guidelines:
# Create histogram of frequency of campaigns by pct budget spent.
hist(df$pct.spent,
breaks = "scott", # method for choosing number of buckets
main = "Histogram: fraction budget spent by campaignid",
xlab = "Fraction of budget spent",
ylab = "Frequency (count of campaignids)")
General Layout and Ordering (google style):
Most R resources are extremely well documented. So the first source for help you should go to when writting R code is the R documention itself. All packages are documented in the same standard way. Getting familiar with the format can simplify things a lot.
Package documentation
Reference manuals
Reference manuals are collections of the documentation for all functions in a package (only 1 per package):
Function documentation
All functions (default or from loaded packages) must have a documentation that follows a standard format:
?mean
help("mean")
This documentation can also be shown in Rstudio by pressing
F1
when the cursor is on the function name
If you don’t recall the function name try apropos()
:
apropos("mean")
## [1] ".colMeans" ".rowMeans" "colMeans" "kmeans"
## [5] "mean" "mean.Date" "mean.default" "mean.difftime"
## [9] "mean.POSIXct" "mean.POSIXlt" "rowMeans" "weighted.mean"
Vignettes
Vignettes are illustrative documents or study cases detailing the use of a package (optional, can be several per package).
Vignettes can be called directly from R:
vgn <- browseVignettes()
vignette()
They should also be listed in the package CRAN page.
Demonstrations
Packages may also include extended code demonstrations (‘demos’). To
list demos in a package run demo("package name")
:
demo(package="stats")
# call demo directly
demo("nlm")
Exercise
What does the function cut()
do?
What is the breaks
argument in cut()
used for?
Run the first 4 lines of code in the examples supplied in the
cut()
documentation
How many vignettes does the package warbleR has?
Session information
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_CR.UTF-8 LC_COLLATE=es_ES.UTF-8
## [5] LC_MONETARY=es_CR.UTF-8 LC_MESSAGES=es_ES.UTF-8
## [7] LC_PAPER=es_CR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_CR.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] kableExtra_1.3.4 knitr_1.37
##
## loaded via a namespace (and not attached):
## [1] rstudioapi_0.13 xml2_1.3.2 magrittr_2.0.2 rvest_1.0.1
## [5] munsell_0.5.0 viridisLite_0.4.0 colorspace_2.0-3 R6_2.5.1
## [9] rlang_1.0.2 fastmap_1.1.0 stringr_1.4.0 httr_1.4.2
## [13] tools_4.1.1 webshot_0.5.2 xfun_0.30 cli_3.2.0
## [17] jquerylib_0.1.4 systemfonts_1.0.2 htmltools_0.5.2 yaml_2.3.5
## [21] digest_0.6.29 lifecycle_1.0.1 sass_0.4.0 glue_1.6.2
## [25] evaluate_0.15 rmarkdown_2.10 stringi_1.7.6 compiler_4.1.1
## [29] bslib_0.2.5.1 scales_1.1.1 svglite_2.0.0 jsonlite_1.7.2