Historically:
Integrated development environment (IDE) for R. Includes:
Data structure
The basic data structure in R is the vector. There are two basic types of vectors: atomic vectors and lists.
They have three common properties:
typeof()
(~ class/mode)length()
(number of elements)attributes()
(metadata)They differ in the types of their elements: all elements of an atomic vector must be the same type, whereas the elements of a list can have different types.
Homogeneous | Heterogeneous | |
---|---|---|
1d | Atomic vector | List |
2d | Matrix | Data frame |
nd | Array |
R has no 0-dimensional elements (scalars). Individual numbers or strings are actually vectors of length one.
Atomic vectors
Types of atomic vectors:
Vectors are built using c()
:
<- 1
x <- c(1)
x1
all.equal(x, x1)
## [1] TRUE
class(x)
## [1] "numeric"
<- "something"
y
class(y)
## [1] "character"
<- 1L
w
class(w)
## [1] "integer"
<- TRUE
z
class(z)
## [1] "logical"
<- factor(1)
q
class(q)
## [1] "factor"
Vectors can only contain entries of the same type. Different types will be coerced to the most flexible type:
<- c(10, 11, 12, 13)
v
class(v)
## [1] "numeric"
typeof(v)
## [1] "double"
is.integer(v)
## [1] FALSE
<- c("Amazona", "Ara", "Eupsittula", "Myiopsitta")
y
class(y)
## [1] "character"
is.integer(y)
## [1] FALSE
<- c(1,2,3, "Myiopsitta")
x
x## [1] "1" "2" "3" "Myiopsitta"
class(x)
## [1] "character"
Missing values are specified with NA, which is a logical vector of length 1. NA will always be coerced to the correct type if used inside c()
:
<- c(10, 11, 12, 13, NA)
v
class(v)
## [1] "numeric"
<- c(letters[1:3], NA)
v
class(v)
## [1] "character"
Lists
Can contain objects of different classes and sizes. Lists are built using list():
<- list(ID = letters[1:5], size = rnorm(6), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE))
l
l
## $ID
## [1] "a" "b" "c" "d" "e"
##
## $size
## [1] -1.91076 1.48046 2.38150 1.56967 0.60655 -0.33370
##
## $observed
## [1] FALSE TRUE FALSE FALSE FALSE
class(l)
## [1] "list"
str(l)
## List of 3
## $ ID : chr [1:5] "a" "b" "c" "d" ...
## $ size : num [1:6] -1.911 1.48 2.382 1.57 0.607 ...
## $ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
… and dimensions:
<- list(ID = letters[1:5], size = rnorm(6), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE), l)
l
str(l)
## List of 4
## $ ID : chr [1:5] "a" "b" "c" "d" ...
## $ size : num [1:6] 1.28 -0.437 1.273 -1.447 -1.306 ...
## $ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
## $ :List of 3
## ..$ ID : chr [1:5] "a" "b" "c" "d" ...
## ..$ size : num [1:6] -1.911 1.48 2.382 1.57 0.607 ...
## ..$ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
<- list(l, l)
l2
str(l2)
## List of 2
## $ :List of 4
## ..$ ID : chr [1:5] "a" "b" "c" "d" ...
## ..$ size : num [1:6] 1.28 -0.437 1.273 -1.447 -1.306 ...
## ..$ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
## ..$ :List of 3
## .. ..$ ID : chr [1:5] "a" "b" "c" "d" ...
## .. ..$ size : num [1:6] -1.911 1.48 2.382 1.57 0.607 ...
## .. ..$ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
## $ :List of 4
## ..$ ID : chr [1:5] "a" "b" "c" "d" ...
## ..$ size : num [1:6] 1.28 -0.437 1.273 -1.447 -1.306 ...
## ..$ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
## ..$ :List of 3
## .. ..$ ID : chr [1:5] "a" "b" "c" "d" ...
## .. ..$ size : num [1:6] -1.911 1.48 2.382 1.57 0.607 ...
## .. ..$ observed: logi [1:5] FALSE TRUE FALSE FALSE FALSE
Attributes
Objects can have attributes. Attributes allow to store metadata about the object. Attributes are kind of named lists. Attributes can be accessed individually with attr()
or all at once (as a list) with attributes()
:
<- 1:10
y
mean(y)
## [1] 5.5
attr(y, "my_attribute") <- "This is an attribute"
attr(y, "my_attribute")
## [1] "This is an attribute"
str(y)
## int [1:10] 1 2 3 4 5 6 7 8 9 10
## - attr(*, "my_attribute")= chr "This is an attribute"
structure()
returns a new object with modified attributes:
<- structure(1:10, my_attribute = "This is an attribute")
y attributes(y)
## $my_attribute
## [1] "This is an attribute"
Most attributes are lost when modifying a vector:
attributes(y[1])
## NULL
The only attributes not lost are the three most important:
<- structure(c(a =1, b = 2), my_attribute = "This is not an apple")
w
attributes(w)
## $names
## [1] "a" "b"
##
## $my_attribute
## [1] "This is not an apple"
attributes(w[1])
## $names
## [1] "a"
class(w[1])
## [1] "numeric"
Names
Vectors can be named in three ways:
x <- c(a = 1, b = 2, c = 3)
x <- 1:3
; names(x) <- c("a", "b", "c")
Or: x <- 1:3
; names(x)[[1]] <- c("a")
x <- setNames(1:3, c("a", "b", "c"))
<- c(a = 1, 2, 3)
y
names(y)
## [1] "a" "" ""
<- c(1, 2, 3)
v
names(v) <- c('a')
names(v)
## [1] "a" NA NA
<- setNames(1:3, c("a", "b", "c"))
z
names(z)
## [1] "a" "b" "c"
Factors
Attributes are used to define factors. A factor is a vector that can contain only predefined values, and is used to store categorical data.
Factors are built on top of integer vectors using two attributes:
<- factor(c("a", "b", "b", "a"))
x x
## [1] a b b a
## Levels: a b
levels(x)
## [1] "a" "b"
str(x)
## Factor w/ 2 levels "a","b": 1 2 2 1
Factors look like character vectors but they are actually integers:
<- factor(c("a", "b", "b", "a"))
x
c(x)
## [1] a b b a
## Levels: a b
Matrices
All entries are of the same type:
<- matrix(c(1, 2, 3, 11, 12, 13), nrow = 2)
m
dim(m)
## [1] 2 3
m
## [,1] [,2] [,3]
## [1,] 1 3 12
## [2,] 2 11 13
class(m)
## [1] "matrix" "array"
<- matrix(c(1, 2, 3, 11, 12,"13"), nrow = 2)
m m
## [,1] [,2] [,3]
## [1,] "1" "3" "12"
## [2,] "2" "11" "13"
Can be created by modifying the dimension attribute:
<- 1:6
c
is.matrix(c)
## [1] FALSE
attributes(c)
## NULL
dim(c) <- c(3, 2)
c
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
is.matrix(c)
## [1] TRUE
attributes(c)
## $dim
## [1] 3 2
Data frames
Special case of lists. Can contain entries of different types:
<- data.frame(ID = letters[1:5], size = rnorm(5), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE))
m
dim(m)
## [1] 5 3
m
## ID size observed
## 1 a 0.69341 FALSE
## 2 b 1.41333 TRUE
## 3 c 0.76004 FALSE
## 4 d 0.75104 FALSE
## 5 e 1.92036 FALSE
class(m)
## [1] "data.frame"
is.data.frame(m)
## [1] TRUE
is.list(m)
## [1] TRUE
str(m)
## 'data.frame': 5 obs. of 3 variables:
## $ ID : chr "a" "b" "c" "d" ...
## $ size : num 0.693 1.413 0.76 0.751 1.92
## $ observed: logi FALSE TRUE FALSE FALSE FALSE
But vectors should have the same length:
<- data.frame(ID = letters[1:5], size = rnorm(6), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE)) m
## Error in data.frame(ID = letters[1:5], size = rnorm(6), observed = c(FALSE, : arguments imply differing number of rows: 5, 6
Note: data.frame()
turns strings into factors by default. Use stringsAsFactors = FALSE
to suppress this behavior:
<- data.frame(ID = letters[1:5], size = rnorm(6), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE), stringsAsFactors = FALSE) m
## Error in data.frame(ID = letters[1:5], size = rnorm(6), observed = c(FALSE, : arguments imply differing number of rows: 5, 6
str(m)
## 'data.frame': 5 obs. of 3 variables:
## $ ID : chr "a" "b" "c" "d" ...
## $ size : num 0.693 1.413 0.76 0.751 1.92
## $ observed: logi FALSE TRUE FALSE FALSE FALSE
Complex elements can be added to a data frame using I()
to treat the list as one unit:
<- data.frame(ID = letters[1:5], size = I(matrix(1:10, nrow = 5)), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE))
m
m
## ID size.1 size.2 observed
## 1 a 1 6 FALSE
## 2 b 2 7 TRUE
## 3 c 3 8 FALSE
## 4 d 4 9 FALSE
## 5 e 5 10 FALSE
str(m)
## 'data.frame': 5 obs. of 3 variables:
## $ ID : chr "a" "b" "c" "d" ...
## $ size : 'AsIs' int [1:5, 1:2] 1 2 3 4 5 6 7 8 9 10
## $ observed: logi FALSE TRUE FALSE FALSE FALSE
Indexing
Elements within objects can be called by indexing. To subset a vector simply call the object position using square brackets:
<- c(1, 3, 4, 10, 15, 20, 50, 1, 6)
x
1] x[
## [1] 1
2] x[
## [1] 3
2:3] x[
## [1] 3 4
c(1,3)] x[
## [1] 1 4
Elements can be removed in the same way:
-1] x[
## [1] 3 4 10 15 20 50 1 6
-c(1,3)] x[
## [1] 3 10 15 20 50 1 6
Matrices and data frames required 2 indices [row, column]
:
<- matrix(c(1, 2, 3, 11, 12, 13), nrow = 2)
m
1, ] m[
## [1] 1 3 12
1] m[,
## [1] 1 2
1, 1] m[
## [1] 1
-1, ] m[
## [1] 2 11 13
-1] m[,
## [,1] [,2]
## [1,] 3 12
## [2,] 11 13
-1, -1] m[
## [1] 11 13
<- data.frame(family = c("Psittacidae", "Trochilidae",
df "Psittacidae"),
genus = c("Amazona", "Phaethornis", "Ara"),
species = c("aestiva", "philippii", "ararauna"))
df
## family genus species
## 1 Psittacidae Amazona aestiva
## 2 Trochilidae Phaethornis philippii
## 3 Psittacidae Ara ararauna
1, ] df[
## family genus species
## 1 Psittacidae Amazona aestiva
1] df[,
## [1] "Psittacidae" "Trochilidae" "Psittacidae"
1, 1] df[
## [1] "Psittacidae"
-1, ] df[
## family genus species
## 2 Trochilidae Phaethornis philippii
## 3 Psittacidae Ara ararauna
-1] df[,
## genus species
## 1 Amazona aestiva
## 2 Phaethornis philippii
## 3 Ara ararauna
-1, -1] df[
## genus species
## 2 Phaethornis philippii
## 3 Ara ararauna
"family"] df[,
## [1] "Psittacidae" "Trochilidae" "Psittacidae"
c("family", "genus")] df[,
## family genus
## 1 Psittacidae Amazona
## 2 Trochilidae Phaethornis
## 3 Psittacidae Ara
Lists require 1 index within double square brackets [[index]]
:
<- list(ID = letters[1:5], size = rnorm(6), observed = c(FALSE, TRUE, FALSE, FALSE, FALSE))
l
1]] l[[
## [1] "a" "b" "c" "d" "e"
3]] l[[
## [1] FALSE TRUE FALSE FALSE FALSE
Elements within lists can also be subset in the same string of code:
1]][1:2] l[[
## [1] "a" "b"
3]][2] l[[
## [1] TRUE
Exploring objects
str(df)
## 'data.frame': 3 obs. of 3 variables:
## $ family : chr "Psittacidae" "Trochilidae" "Psittacidae"
## $ genus : chr "Amazona" "Phaethornis" "Ara"
## $ species: chr "aestiva" "philippii" "ararauna"
names(df)
## [1] "family" "genus" "species"
dim(df)
## [1] 3 3
nrow(df)
## [1] 3
ncol(df)
## [1] 3
head(df)
## family genus species
## 1 Psittacidae Amazona aestiva
## 2 Trochilidae Phaethornis philippii
## 3 Psittacidae Ara ararauna
tail(df)
## family genus species
## 1 Psittacidae Amazona aestiva
## 2 Trochilidae Phaethornis philippii
## 3 Psittacidae Ara ararauna
table(df$genus)
##
## Amazona Ara Phaethornis
## 1 1 1
typeof(df)
## [1] "list"
View(df)
Exercise
Using the example data iris
to create a data subset with only the observations of the species ‘setosa’
Now create a data subset containing the observations of both ‘setosa’ and ‘versicolor’
Also with iris
create a data subset with the observations for which iris$Sepal.length
is higher than 6
How many observations have a sepal length higher than 6?
All functions are created by the function function()
and follow the same structure:
* Modified from Grolemund 2014
R comes with many functions that you can use to do sophisticated tasks:
# built in functions
<- builtins()
bi
length(bi)
## [1] 1370
sample(bi, 10)
## [1] "print.NativeRoutineList" "restartDescription"
## [3] ".F_dqrcf" "substring<-"
## [5] "mat.or.vec" "sys.on.exit"
## [7] "as.function" "backsolve"
## [9] "=" "close"
Operators are functions:
1 + 1
## [1] 2
'+'(1, 1)
## [1] 2
2 * 3
## [1] 6
'*'(2, 3)
## [1] 6
Most commonly used R operators
Arithmetic operators:
Operator | Description |
---|---|
+ | addition |
- | subtraction |
* | multiplication |
/ | division |
^ or ** | exponent |
x %% y | modulus (x mod y) |
x %/% y | integer division |
1 - 2
## [1] -1
1 + 2
## [1] 3
2 ^ 2
## [1] 4
2 ** 2
## [1] 4
5 %% 2
## [1] 1
5 %/% 2
## [1] 2
Logical operators:
Operator | Description |
---|---|
< | less than |
<= | less than or equal to |
> | greater than |
>= | greater than or equal to |
== | exactly equal to |
!= | not equal to |
!x | Not x |
x | y | x OR y |
x & y | x AND y |
x %in% y | match |
1 < 2
## [1] TRUE
1 > 2
## [1] FALSE
1 <= 2
## [1] TRUE
1 == 2
## [1] FALSE
1 != 2
## [1] TRUE
1 > 2
## [1] FALSE
5 %in% 1:6
## [1] TRUE
5 %in% 1:4
## [1] FALSE
Most functions are vectorized:
1:6 * 1:6
* Modified from Grolemund & Wickham 2017
## [1] 1 4 9 16 25 36
1:6 - 1:6
## [1] 0 0 0 0 0 0
R recycles vectors of unequal length:
1:6 * 1:5
* Modified from Grolemund & Wickham 2017
## Warning in 1:6 * 1:5: longitud de objeto mayor no es múltiplo de la longitud de
## uno menor
## [1] 1 4 9 16 25 6
1:6 + 1:5
## Warning in 1:6 + 1:5: longitud de objeto mayor no es múltiplo de la longitud de
## uno menor
## [1] 2 4 6 8 10 7
Based on google’s R Style Guide
File names
File names should end in .R and, of course, be meaningful:
Object names
Variables and functions:
- GOOD: day_one: day_1, mean.day(),
- BAD: dayOne, day1, firstDay_of.month, mean <- function(x) sum(x), c <- 10
Syntax
Spacing:
- GOOD:
<- rnorm(n = 10, sd = 10, mean = 1)
a <- table(df[df$days.from.opt < 0, "campaign.id"])
tab.prior <- sum(x[, 1])
total <- sum(x[1, ])
total if (debug)
mean(1:10)
- BAD:
<-rnorm(n=10,sd=10,mean=1)
a<- table(df[df$days.from.opt<0, "campaign.id"]) # Needs spaces around '<'
tab.prior <- table(df[df$days.from.opt < 0,"campaign.id"]) # Needs a space after the comma
tab.prior <- table(df[df$days.from.opt < 0, "campaign.id"]) # Needs a space before <-
tab.prior<-table(df[df$days.from.opt < 0, "campaign.id"]) # Needs spaces around <-
tab.prior<- sum(x[,1]) # Needs a space after the comma
total <- sum(x[ ,1]) # Needs a space after the comma, not before
total if(debug) # Needs a space before parenthesis
mean (1:10) # ) # Extra space before parenthesis
Curly braces:
- GOOD:
if (is.null(ylim)) {
<- c(0, 0.06)
ylim
}
if (is.null(ylim))
<- c(0, 0.06)
ylim
- BAD:
if (is.null(ylim)) ylim <- c(0, 0.06)
if (is.null(ylim)) {ylim <- c(0, 0.06)}
if (is.null(ylim)) {
<- c(0, 0.06)
ylim }
Assigments:
- GOOD:
<- 5
x
- BAD:
= 5 x
Commenting guidelines:
# Create histogram of frequency of campaigns by pct budget spent.
hist(df$pct.spent,
breaks = "scott", # method for choosing number of buckets
main = "Histogram: fraction budget spent by campaignid",
xlab = "Fraction of budget spent",
ylab = "Frequency (count of campaignids)")
General Layout and Ordering (google style):
Package documentation
Reference manuals
Reference manuals are collections of the documentation for all functions in a package (only 1 per package):
Function documentation
All functions (default or from loaded packages) must have a documentation that follows a standard format:
?mean
help("mean")
This documentation can also be shown in Rstudio by pressing F1
when the cursor is on the function name
If you don’t recall the function name try apropos()
:
apropos("mean")
## [1] ".colMeans" ".rowMeans" "colMeans" "kmeans"
## [5] "mean" "mean.Date" "mean.default" "mean.difftime"
## [9] "mean.POSIXct" "mean.POSIXlt" "rowMeans" "weighted.mean"
Vignettes
Vignettes are illustrative documents or study cases detailing the use of a package (optional, can be several per package).
Vignettes can be called directly from R:
<- browseVignettes() vgn
vignette()
They should also be listed in the package CRAN page.
Demonstrations
Packages may also include extended code demonstrations (‘demos’). To list demos in a package run demo("package name")
:
demo(package="stats")
# call demo directly
demo("nlm")
Exercise
What does the function cut()
do?
What is the breaks
argument in cut()
used for?
Run the first 4 lines of code in the examples supplied in the cut()
documentation
How many vignettes does the package warbleR has?