Capture the spirit of your ggplot2
calls.
ggplot2::ggplot()
stores the information needed to build
the graph as a grob
, but that’s what the
computer needs to know about in order to build the
graph. As humans, we’re more interested in what commands were issued in
order to build the graph. For good reproducibility, the calls need to be
applied to the relevant data. While this is somewhat available by
deconstructing the grob
, it’s not the simplest
approach.
Here is one option that solves that problem.
ggghost
stores the data used in a ggplot()
call, and collects ggplot2
commands (usually separated by
+
) as they are applied, in effect lazily collecting the
calls. Once the object is requested, the print
method
combines the individual calls back into the total plotting command and
executes it. This is where the call would usually be discarded. Instead,
a “ghost” of the commands lingers in the object for further
investigation, subsetting, adding to, or subtracting from.
You can install ggghost
from CRAN with:
install.packages("ggghost")
or the development version from github with:
# install.packages("devtools")
::install_github("jonocarroll/ggghost") devtools
use %g<%
to initiate storage of the
ggplot2
calls then add to the call with each logical call
on a new line (@hrbrmstr style)
<- data.frame(x = 1:100, y = rnorm(100))
tmpdata head(tmpdata)
#> x y
#> 1 1 0.8930142
#> 2 2 -2.2215165
#> 3 3 -0.5178338
#> 4 4 0.4729639
#> 5 5 -0.1718715
#> 6 6 0.1927056
library(ggplot2)
library(ggghost)
%g<% ggplot(tmpdata, aes(x, y))
z <- z + geom_point(col = "steelblue")
z <- z + theme_bw()
z <- z + labs(title = "My cool ggplot")
z <- z + labs(x = "x axis", y = "y axis")
z <- z + geom_smooth() z
This invisibly stores the ggplot2
calls in a list which
can be reviewed either with the list of calls
summary(z)
#> [[1]]
#> ggplot(tmpdata, aes(x, y))
#>
#> [[2]]
#> geom_point(col = "steelblue")
#>
#> [[3]]
#> theme_bw()
#>
#> [[4]]
#> labs(title = "My cool ggplot")
#>
#> [[5]]
#> labs(x = "x axis", y = "y axis")
#>
#> [[6]]
#> geom_smooth()
or the concatenated call
summary(z, combine = TRUE)
#> [1] "ggplot(tmpdata, aes(x, y)) + geom_point(col = \"steelblue\") + theme_bw() + labs(title = \"My cool ggplot\") + labs(x = \"x axis\", y = \"y axis\") + geom_smooth()"
The plot can be generated using a print
method
z
which re-evaluates the list of calls and applies them to the saved data, meaning that the plot remains reproducible even if the data source is changed/destroyed.
The call list can be subset, removing parts of the call
subset(z, c(1,2,6))
Plot features can be removed by name, a task that would otherwise have involved re-generating the entire plot
<- z + geom_line(col = "coral")
z2 - geom_point() z2
Calls are removed based on matching to the regex \\(.*$
(from the first bracket to the end of the call), so arguments are
irrelevant.
The object still generates all the grob
info, it’s just
stored as calls rather than a completed image.
str(print(z))
#> List of 9
#> $ data :'data.frame': 100 obs. of 2 variables:
#> ..$ x: int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ y: num [1:100] 0.893 -2.222 -0.518 0.473 -0.172 ...
#> $ layers :List of 2
#> [... truncated ...]
Since the grob
info is still produced, normal
ggplot2
operators can be applied after the
print
statement, such as replacing the data
<- seq(0,2*pi,0.1)
xvals <- data.frame(x = xvals, y = sin(xvals))
tmpdata_new print(z - geom_smooth()) %+% tmpdata_new
ggplot2
calls still work as normal if you want to avoid
storing the calls.
ggplot(tmpdata) + geom_point(aes(x,y), col = "red")
Since the object is a list, we can stepwise show the process of building up the plot as a (re-)animation
lazarus(z, "mycoolplot.gif")
A supplementary data object (e.g. for use in a geom_*
or
scale_*
call) can be added to the ggghost
object
<- c("alpha" = "red", "beta" = "blue", "gamma" = "green")
myColors supp_data(z) <- myColors
These will be recovered along with the primary data.
For full reproducibility, the entire structure can be saved to an
object for re-loading at a later point. This may not have made much
sense for a ggplot2
object, but now both the original data
and the calls to generate the plot are saved. Should the environment
that generated the plot be destroyed, all is not lost.
saveRDS(z, file = "README_supp/mycoolplot.rds")
rm(z)
rm(tmpdata)
rm(myColors)
exists("z")
#> [1] FALSE
exists("tmpdata")
#> [1] FALSE
exists("myColors")
#> [1] FALSE
Reading the ggghost
object back to the session, both the
relevant data and plot-generating calls can be re-executed.
<- readRDS("README_supp/mycoolplot.rds")
z str(z)
#> List of 6
#> $ : language ggplot(tmpdata, aes(x, y))
#> $ : language geom_point(col = "steelblue")
#> $ : language theme_bw()
#> $ : language labs(title = "My cool ggplot")
#> $ : language labs(x = "x axis", y = "y axis")
#> $ : language geom_smooth()
#> - attr(*, "class")= chr [1:2] "ggghost" "gg"
#> - attr(*, "data")=List of 2
#> ..$ data_name: chr "tmpdata"
#> ..$ data :'data.frame': 100 obs. of 2 variables:
#> .. ..$ x: int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
#> .. ..$ y: num [1:100] 0.893 -2.222 -0.518 0.473 -0.172 ...
#> - attr(*, "suppdata")=List of 2
#> ..$ supp_data_name: chr "myColors"
#> ..$ supp_data : Named chr [1:3] "red" "blue" "green"
#> .. ..- attr(*, "names")= chr [1:3] "alpha" "beta" "gamma"
recover_data(z, supp = TRUE)
head(tmpdata)
#> x y
#> 1 1 0.8930142
#> 2 2 -2.2215165
#> 3 3 -0.5178338
#> 4 4 0.4729639
#> 5 5 -0.1718715
#> 6 6 0.1927056
myColors#> alpha beta gamma
#> "red" "blue" "green"
z
We now have a proper reproducible graphic.
ggplot2
call, not piped in to it. Pipelines such as
z %g<% tmpdata %>% ggplot()
won’t work… yet.ggplot(data = x)
call. If you require
supplementary data for some geom
then you need manage
storage/consistency of that.labs
calls, an argument must be
present. It doesn’t need to be the actual one (all will be removed) but
it must evaluate in scope. TRUE
will do fine.