Creating Docker images from scratch can be time and labor consuming. Fortunately, many pre-built and regularly updated Docker images for the R community are ready for use, especially when creating your own containerized R Markdown documents with liftr.
Such sources of pre-built Docker images include the rocker project and Bioconductor Docker containers. In this article, we will use the tidyverse image provided by rocker. This image includes the essential tidyverse packages and devtools environment loved by many data scientists (Wickham 2014). We will demonstrate how to containerize and render your tidyverse-heavy R Markdown document using Docker in only a few minutes.
If Docker has not been installed on your system, please use install_docker()
and follow the guidelines to install it. After that, check_docker_install()
and check_docker_running()
would help you make sure that Docker has been installed and running properly.
Let’s create a new folder first and copy the example R Markdown document to this folder:
path = paste0("~/liftr-tidyverse/")
dir.create(path)
file.copy(system.file("examples/liftr-tidyverse.Rmd", package = "liftr"), path)
input = paste0(path, "liftr-tidyverse.Rmd")
If we open the R Markdown file, we will see the header section includes a liftr
section, which defines the Docker system environment required to render this document. For our case, it is very straightforward and simple indeed:
---
title: "Explore tidyverse with liftr"
author: "Nan Xiao <<me@nanx.me>>"
date: "2019-06-18"
output:
rmarkdown::pdf_document:
toc: true
number_sections: true
liftr:
from: "rocker/tidyverse:latest"
maintainer: "Nan Xiao"
email: "me@nanx.me"
pandoc: false
texlive: true
cran:
- nycflights13
---
Most of the fields are self-explanatory:
rocker/tidyverse
image as our base image, which would save us a lot of time creating a custom base image with all the tidyverse dependencies.pandoc
installation was not included because the tidyverse image already includes pandoc
.nycflights13
will be installed.Let’s containerize this document by generating a Dockerfile
for it, using liftr::lift
:
lift(input)
A file named Dockerfile
will be generated under the same directory of the input RMD file. It contains the necessary commands for building the Docker container for rendering the document.
We can use render_docker()
to start the Docker container, and render the document inside it:
render_docker(input)
Let’s view the rendered document:
browseURL(paste0(path, "liftr-tidyverse.pdf"))
In the last section of the rendered PDF, we will see that the session information are probably different with your current system’s information. Yes, that is because the document is completed generated by a newly built, isolated Linux system environment, using Docker.
In this way, the R Markdown document gains a higher, system level reproducibility, thus easily replicable by other users who might not have the identical system and R package environment to yours. This is a good thing for team collaboration and large-scale document orchestration. The best part is, all you need to share is still the document itself, only with a few extra metadata fields.
The Docker images stored in your system could take a few gigabytes and get larger gradually as you build more images. Let’s remove the generated Docker image to save some disk space:
prune_image(paste0(path, "liftr-tidyverse.docker.yml"))
If we do this, the Docker container will be rebuilt next time when you use render_docker()
. If not, the image will be cached in the system and reused when compiling the document later and save some time for you.
Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (10): 1–23.