Understand how coding can help make research reproducible
Learn programming practices that can improve reproducibliity
0.1 Free Software and Code
Free and open-source programs allow users to inspect, modify, and enhance their design by providing access to their source code.
Open-source code is ideal for reproducible research because scripts can contain all the steps of the analysis (self-documentation).
Code, in general, allows colleagues to see what we have done and rerun or even modify our analyses.
Free tools can be used by anyone unlike commercial tools.
Open-source code enables a detailed understanding of the analyses
0.1.1 Why R?
www.traininginbangalore.com
0.2 Tools for Reproducible Programming
0.2.1 Literate Programming
Involves documenting in detail what the problem consists of, how it is solved, how and why a certain flow of analysis was adopted, how it was optimized (if it was optimized), and how it was implemented in the programming language.
Dynamic reports in R facilitate the use of literate programming to document data handling and statistical analysis (this file you are reading right now is a dynamic report created in R).
The main way R facilitates reproducible research is by allowing users to create a document that is a combination of content and data analysis code.
0.2.2 Reproducible Environments
Reproducibility is also about ensuring that someone else can reuse your code to get the same results.
For this, you need to provide more than just the code and the data.
Documenting and managing your project’s dependencies correctly can be complicated. However, even simple documentation that helps others understand the setup you used can have a significant impact.
Ideally, you should document the exact versions of all packages and software you used and the operating system.
0.2.3 Session Information
The simplest way to document the environment (R + packages and their versions) in which an analysis was done is by using the sessionInfo() function:
However, this documentation does not necessarily make the analyses replicable since package versions often get updated and even some packages may not be available after a while.
0.2.4 Packrat: Reproducible Package Management in R
R packages (and their specific versions) used in an analysis can be difficult to replicate:
Have you ever had to use trial and error to figure out which R packages you need to install to make someone else’s code work?
Have you ever updated a package to make your project’s code work, only to find out that the updated package causes another project’s code to stop working?
With the packrat package, projects have several useful features in terms of reproducibility:
Isolation: Installing a new or updated package for a project will not affect your other projects and vice versa. That’s because packrat gives each project its own private package library.
Portable: Easily move your projects from one computer to another, even on different platforms. packrat makes it easy to install the packages your project depends on.
Reproducible: packrat records the exact versions of the package it depends on and ensures that those exact versions are installed wherever you go.
Packrat is a package management system for R that helps you manage dependencies for your R projects. It ensures that your projects use the same package versions, making your code more reproducible.
0.2.4.1 Using Packrat
Of course, first, we need to install the packrat package in R:
Code
# install packageinstall.packages("packrat")
Now, let’s create a new R project (in a new directory).
After creating a project (or moving to an existing one) we can start monitoring and managing packages with packrat like this:
Code
# start packrat in projectpackrat::init(path ="/project/directory")
If the working directory is set as the project directory, it is not necessary to define the ‘path’:
Code
# start packrat in projectpackrat::init()
After this, the use of packages in this project will be managed by packrat (you will see some differences in what the R console prints when installing packages). So, we are already using packrat. A packrat project contains some additional files and directories. The init() function creates these files and directories if they do not already exist:
packrat/packrat.lock: lists the precise versions of the package that were used to satisfy the dependencies, including dependencies of dependencies (should never be edited manually!).
packrat/packrat.opts: Project-specific packrat options. These can be consulted and configured with get_opts and set_opts; see “packrat-options” for more information.
packrat/lib/: Private package library for this project.
packrat/src/: Source packages of all dependencies that have been reported to packrat.
.Rprofile: Tells R to use the private package library when started from the project directory.
The only difference with other projects is that projects using packrat have their own package library. This is located in /project/directory/packrat/lib. For example, let’s install a couple of new packages, they can be some you are familiar with or these ones we have here as an example:
Code
install.packages("fun")
Every time we install one or more packages, it is necessary to update the tracking status of packrat. We do this as follows:
Code
# check current statuspackrat::status()# update packrat in projectpackrat::snapshot()
With this package, we can play in R:
Code
# example of an irrelevant game Xlibrary(fun)if (.Platform$OS.type =="windows") x11() elsex11(type ="Xlib")mine_sweeper()
Or take an Alzheimer’s test:
Code
# another slightly less irrelevant gamex =alzheimer_test()
If we remove a package that we used in the project, we can reinstall it using restore():
Code
# removeremove.packages("fun")# check current statuspackrat::status()# restorepackrat::restore()
New packages can be installed:
Code
# installinstall.packages("cowsay")# loadlibrary(cowsay)# diagramsay("Hello world!")# random echosay("rms")
If you or someone else wants to reproduce your project, they can use the packrat::restore() function to install the exact versions of the packages listed in the packrat.lock file.
Code
packrat::restore()
This will ensure that the correct package versions are installed, maintaining consistency across different environments.
---title: Coding and Reproducibility---```{r, echo = FALSE}# devtools::install_github("hadley/emo")library("emo")library("knitr")# options to customize chunk outputsknitr::opts_chunk$set( tidy.opts = list(width.cutoff = 65), tidy = TRUE, message = FALSE )# this is a customized printing style data frames # screws up tibble functiontibble <- function(x, ...) { x <- kbl(x, digits=4, align= 'c', row.names = FALSE) x <- kable_styling(x, position ="center", full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed", "responsive")) asis_output(x)}registerS3method("knit_print", "data.frame", tibble)``````{r setting functions and parameters, echo=FALSE, message=FALSE}# remove all objectsrm(list = ls())# unload all non-based packagesout <- sapply(paste('package:', names(sessionInfo()$otherPkgs), sep = ""), function(x) try(detach(x, unload = FALSE, character.only = TRUE), silent = T))options("digits"=5)options("digits.secs"=3)# library(knitr)# library(kableExtra)# # options(knitr.table.format = "html")# # x <- c("RColorBrewer", "ggplot2")# # aa <- lapply(x, function(y) {# if(!y %in% installed.packages()[,"Package"]) {if(y != "warbleR") install.packages(y) else devtools::install_github("maRce10/warbleR")# }# try(require(y, character.only = T), silent = T)# })# # # theme_set(theme_classic(base_size = 50))# # cols <- brewer.pal(10,"Spectral")```::: {.alert .alert-info}# Objectives {.unnumbered .unlisted}- Understand how coding can help make research reproducible- Learn programming practices that can improve reproducibliity:::## Free Software and Code- Free and open-source programs allow users to **inspect, modify, and enhance their design** by providing access to their source code.- Open-source code is ideal for reproducible research because **scripts can contain all the steps of the analysis** (self-documentation).- Code, in general, **allows colleagues to see what we have done** and rerun or even modify our analyses.- **Free tools can be used by anyone** unlike commercial tools.- Open-source code enables a **detailed understanding of the analyses**### Why R?<center><img src="./images/whylearnr.jpeg" alt="Why R" height="500" width="750"/></center>*www.traininginbangalore.com*## Tools for Reproducible Programming### Literate Programming- Involves **documenting in detail** what the problem consists of, how it is solved, how and why a certain flow of analysis was adopted, how it was optimized (if it was optimized), and how it was implemented in the programming language.- **Dynamic reports in R facilitate the use of literate programming** to document data handling and statistical analysis (this file you are reading right now is a dynamic report created in R).- The main way R facilitates reproducible research is by allowing users to **create a document that is a combination of content and data analysis code**.### Reproducible Environments- Reproducibility is also about ensuring that someone else can reuse your code to get the same results.- For this, you need to provide more than just the code and the data.- Documenting and managing your project's dependencies correctly can be complicated. However, even simple documentation that helps others understand the setup you used can have a significant impact.- Ideally, you should document the exact versions of all packages and software you used and the operating system.### Session InformationThe simplest way to document the environment (R + packages and their versions) in which an analysis was done is by using the `sessionInfo()` function:```{r session info example, echo=TRUE}sessionInfo()```However, this documentation does not necessarily make the analyses replicable since package versions often get updated and even some packages may not be available after a while.### Packrat: Reproducible Package Management in RR packages (and their specific versions) used in an analysis can be difficult to replicate:- Have you ever had to use trial and error to figure out which R packages you need to install to make someone else's code work?- Have you ever updated a package to make your project's code work, only to find out that the updated package causes another project's code to stop working?With the `packrat` package, projects have several useful features in terms of reproducibility:- Isolation: Installing a new or updated package for a project will not affect your other projects and vice versa. That's because `packrat` gives each project its own private package library.- Portable: Easily move your projects from one computer to another, even on different platforms. `packrat` makes it easy to install the packages your project depends on.- Reproducible: `packrat` records the exact versions of the package it depends on and ensures that those exact versions are installed wherever you go.Packrat is a package management system for R that helps you manage dependencies for your R projects. It ensures that your projects use the same package versions, making your code more reproducible.```{r, eval = F, echo = FALSE} However, defaults in some functions change and new functions are introduced regularly. If you wrote your code in a recent version of R and gave it to someone who hasn't updated recently, they may not be able to run your code. Code written for one version of a package may produce very different results with a newer version. <div class="alert alert-info">### Exercise 1- XXXXX</div> ```#### Using Packrat1. Of course, first, we need to install the `packrat` package in R:```{r, eval = FALSE}# install packageinstall.packages("packrat")```2. Now, let's create a new R project (in a new directory).3. After creating a project (or moving to an existing one) we can start monitoring and managing packages with `packrat` like this:```{r, eval = FALSE}# start packrat in projectpackrat::init(path = "/project/directory")```If the working directory is set as the project directory, it is not necessary to define the 'path':```{r, eval = FALSE}# start packrat in projectpackrat::init()```After this, the use of packages in this project will be managed by `packrat` (you will see some differences in what the R console prints when installing packages). So, we are already using `packrat`. A `packrat` project contains some additional files and directories. The `init()` function creates these files and directories if they do not already exist:- **packrat/packrat.lock**: lists the precise versions of the package that were used to satisfy the dependencies, including dependencies of dependencies (should never be edited manually!).- **packrat/packrat.opts**: Project-specific `packrat` options. These can be consulted and configured with `get_opts` and `set_opts`; see "packrat-options" for more information.- **packrat/lib/**: Private package library for this project.- **packrat/src/**: Source packages of all dependencies that have been reported to packrat.- **.Rprofile**: Tells R to use the private package library when started from the project directory.The only difference with other projects is that projects using `packrat` have their own package library. This is located in `/project/directory/packrat/lib`. For example, let's install a couple of new packages, they can be some you are familiar with or these ones we have here as an example:```{r, eval = FALSE}install.packages("fun")```Every time we install one or more packages, it is necessary to update the tracking status of `packrat`. We do this as follows:```{r, eval = FALSE}# check current statuspackrat::status()# update packrat in projectpackrat::snapshot()```With this package, we can play in R:```{r, eval = FALSE}# example of an irrelevant game Xlibrary(fun)if (.Platform$OS.type == "windows") x11() else x11(type = "Xlib")mine_sweeper()```Or take an Alzheimer's test:```{r, eval = FALSE}# another slightly less irrelevant gamex = alzheimer_test()```If we remove a package that we used in the project, we can reinstall it using `restore()`:```{r, eval = FALSE}# removeremove.packages("fun")# check current statuspackrat::status()# restorepackrat::restore()```New packages can be installed:```{r, eval = FALSE}# installinstall.packages("cowsay")# loadlibrary(cowsay)# diagramsay("Hello world!")# random echosay("rms")``````{r, echo = FALSE}# install# install.packages("cowsay")# loadlibrary(cowsay)``````{r, echo = FALSE, eval = TRUE}# random echosay("Hello world!")``````{r, echo = TRUE, eval = FALSE}# random echosay("rms")``````{r, echo = FALSE, eval = TRUE}# random echosay("rms")```...and they should be "referenced" in the same way:```{r, eval = FALSE}# check current statuspackrat::status()# update packrat in projectpackrat::snapshot()```In this [GitHub repository](https://github.com/maRce10/ejemplo_packrat_repo), there is an R project with `packrat`. We can clone it just to see how it works without needing to install the packages:```{r eval = FALSE}git clone https://github.com/maRce10/ejemplo_packrat_repo.git```If you or someone else wants to reproduce your project, they can use the `packrat::restore()` function to install the exact versions of the packages listed in the `packrat.lock` file.```{r, eval = FALSE} packrat::restore()```This will ensure that the correct package versions are installed, maintaining consistency across different environments.------------------------------------------------------------------------<font size="5">Session Information</font>```{r session info, echo=F}sessionInfo()```