Rpadrino
provides an interface to the PADRINO database.
PADRINO houses metadata on published Integral Projection Models, and all
the information needed to rebuild them. Rpadrino
provides a
set of functions that wrap around ipmr
so that
you can rebuild these models using R. Additionally, there’s
some data downloading and management functionality, as well as tools to
help report and cite studies used in an analysis.
Below are some very brief examples of how to select data based on the
Metadata table, rebuild those models, and conduct an analysis. The first
step for any of this is to access PADRINO. For first time users, this
will always mean downloading it. For returning users, you may save and
re-load PADRINO, but this example will not make use of the
saving/loading functionality provided in pdb_load()
and
pdb_save()
.
library(Rpadrino)
<- pdb_download(save = FALSE) pdb
We’ve now downloaded the PADRINO database. It consists of 10 tables,
all linked by ipm_id
.
We can get a brief overview of the information using the
print
method:
pdb
## A 'pdb' object with 56 unique species, 40 publications, and 280 models.
## Please cite all publications used in an analysis! These can be accessed with
## 'pdb_citations()'.
##
## The following models have continuously varying environments:
## aaaa15, aaaa16, aaaa21, aaaa54, aaaa59
## These can take longer to re-build - adjust your expectations accordingly!
We see that some of the IPMs have continuously varying environments -
we don’t want to use those for now because they take a while to run. We
can use the pdb_subset()
function to get only deterministic
models. Because of the way PADRINO is structured,
pdb_subset()
only accepts a set of ipm_id
’s
that we want to keep (as opposed to other subset
functions, which may accept some arbitrary logic). Thus, we need to
create an index of those, which we can pass to pdb_subset()
like so:
<- setdiff(pdb$Metadata$ipm_id,
sub_ind c("aaaa15", "aaaa16", "aaaa21", "aaaa54", "aaaa59"))
<- pdb_subset(pdb, ipm_ids = sub_ind) det_pdb
Great! Say we want to find all the Asteraceae in PADRINO. We
can first check and see which ipm_id
s correspond to those
by querying the tax_family
column of
pdb$Metadata
.
<- det_pdb$Metadata$ipm_id[det_pdb$Metadata$tax_family == "Asteraceae"]
aster_ind
<- pdb_subset(pdb, ipm_ids = aster_ind) aster_pdb
Next, we can rebuild their IPMs. This is a two step process in
Rpadrino
. The first is to create proto_ipm
s
with pdb_make_proto_ipm()
. This function takes a database
object and, optionally, a subset of ipm_id
s, and constructs
a list of proto_ipm
s. proto_ipm
s are a common
data structure used to represent IPM objects before they are actually
constructed. One advantage of this extra step is that you can combine
proto_ipm
s generated from your own data with
ipmr
with proto_ipm
s generated from PADRINO,
so you can augment syntheses with your own data. An example of that is
provided in the Other Data Sources vignette.
The simplest way to make proto_ipm
s for a whole
pdb
object is:
<- pdb_make_proto_ipm(aster_pdb) proto_list
## 'ipm_id' aaa326 has the following notes that require your attention:
## aaa326: 'Demographic data from Metcalf Funct Ecol 2006'
## 'ipm_id' aaa329 has the following notes that require your attention:
## aaa329: 'Based on IPM from Rose Ecology 2005; The GPS coordinates were approximated
## to the closest geographic location described in the reference'
We see that there are some notes that require our attention. The
first tells us that this IPM is using data from another publication. The
second warns us that GPS coordinates aren’t exact, so we should be
cautious if we wanted to merge the outputs from this analysis with other
spatially referenced datasets (e.g. gridded climate data). We can
inspect the IPM structure by printing each proto_ipm
object:
proto_list
## This list of 'proto_ipm's contains the following species:
## Cirsium arvense
## Cirsium canescens
##
## You can inspect each model by printing it individually.
Next, we can reconstruct the IPMs using pdb_make_ipm()
.
There are a variety of different options we can specify here, and we’ll
get into how that works in the next example.
<- pdb_make_ipm(proto_list)
cirsiums
<- lambda(cirsiums)
lambdas
cirsiums
## $aaa326
## A simple, density independent, deterministic IPM with 2 sub-kernel(s) defined.
## Deterministic lambda = 1
## $aaa329
## A simple, density dependent, deterministic IPM with 1 sub-kernel(s) defined.
## Lambda for the final time step of the model is: 1.146
## Call lambda(x, type_lambda = "all") for deterministic lambdas
## from each iteration.
## attr(,"class")
## [1] "pdb_ipm" "list"
We’ve rebuilt our first set of IPMs! Next, we’ll explore how to extract a bit more information from these.
Often times, we’ll want more than just the deterministic population
growth rates. We’ll explore how to run some more complicated analyses
here. Specifically, we’ll compute the probability of surviving to age
\(a\), \(l_a(z_0)\) and the average per-capita
fecundity at age \(a\), \(f_a(z_0)\). After that, we’ll compute mean
and variance in lifespan (\(\bar\eta(z_0)\) and \(\sigma_\eta^2 (z_0)\), respectively). Along
the way, we’ll learn more about the structure of IPM objects in
Rpadrino
and ipmr
.
We’re going to use a subset of PADRINO IPMs to illustrate how to implement these calculations. We’ll select simple IPMs that are density-independent, deterministic, and from North America, so that things run a bit faster.
# This creates a table of the number of state variables per ipm_id. Since
# simple IPMs, by defintion, have only 1 state variable, we can use the names
# of the vector that it returns to choose only those.
<- table(pdb$StateVariables$ipm_id)
n_state_vars <- names(n_state_vars)[n_state_vars == 1]
simple_mod_ind
# Next, we want to capture any models that have stochastic dynamics, multiple
# values per parameter, or density dependence.
<- unique(c(pdb$EnvironmentalVariables$ipm_id, # Stochastic models
rm_mod_ind $ParSetIndices$ipm_id, # Multiple parameter values
pdb$Metadata$ipm_id[pdb$Metadata$has_dd], # Density dependent
pdb$Metadata$ipm_id[pdb$Metadata$continent != "n_america"]))
pdb
# Remove these IDs from our simple_mod_ind vector
<- simple_mod_ind[!simple_mod_ind %in% rm_mod_ind]
simple_mod_ind
# We're also going to remove monocarpic perennials, as their survival/growth
# kernels are slightly trickier to work with (note that there is an example
# of working with these in the manuscript/si/case_study_1.pdf file in PADRINO
# GitHub repository).
<- pdb$Metadata$ipm_id[pdb$Metadata$organism_type == "biennial"]
monocarps
<- simple_mod_ind[!simple_mod_ind %in% monocarps]
simple_mod_ind
# Finally, we'll subset the database and build the IPM objects!
<- pdb_subset(pdb, simple_mod_ind) %>%
simple_pdb pdb_make_proto_ipm() %>%
pdb_make_ipm()
## 'ipm_id' aaa310 has the following notes that require your attention:
## aaa310: 'Geo and time info retrieved from COMPADRE (v.X.X.X.4)'
## 'ipm_id' aaa385 has the following notes that require your attention:
## aaa385: 'Same data as AAA385. State variable Height (Cm)'
## 'ipm_id' aaa388 has the following notes that require your attention:
## aaa388: 'Same data as AAA388. State variable Height (Cm)'
## 'ipm_id' dddd30 has the following notes that require your attention:
## dddd30: 'Frankenstein IPM'
## 'ipm_id' dddd31 has the following notes that require your attention:
## dddd31: 'Frankenstein IPM'
## 'ipm_id' dddd32 has the following notes that require your attention:
## dddd32: 'Frankenstein IPM'
We now have 28 distinct ipm_id
s to work with!
Age-specific calculations are straightforward once we’ve extracted
sub-kernels. We can define functions that accept a single IPM object and
then lapply()
them to do our computations. We’ll start with
\(l_a(z_0)\) and \(f_a(z_0)\). These are defined by the
following equations:
\(l_a(z_0) = eP^a\), and
\(f_a(z_0) = (eFP^a)/l_a\).
\(P\) and \(F\) are survival/growth and fecundity kernels, respectively. \(e\) is a constant function \(e(z) \equiv 1\). Left multiplication with this function has the effect of summing columns. \(a\) is the age we wish to do the calculations for.
<- function(ipm, a) {
l_a
<- ipm$sub_kernels$P
P
# %^% is a function from ipmr that raises matrices to a power, rather than
# a pointwise power that ^ does.
<- P %^% a
P_a
colSums(P_a)
}
<- function(ipm, a) {
f_a
<- ipm$sub_kernels$P
P <- ipm$sub_kernels$F
F
<- l_a(ipm, a)
l_age
<- P %^% a
P_a
colSums(F %*% P_a) / l_age
}
Now, we just need to apply our functions to the IPMs. We’ll compute survival and fecundities for 5 year olds, and then plot the results:
<- lapply(simple_pdb,
l_as function(x, a) l_a(x, a),
a = 5)
<- lapply(simple_pdb,
f_as function(x, a) f_a(x, a),
a = 5)
# This only plots the figures for the first two species.
# Remove the [1:2] to see all of them.
# Uncomment the par(mfrow = c(...)) line to get an arrangement you like
# par(mfrow = c(2, 2))
for(i in seq_along(l_as)[1:2]) {
<- pdb$Metadata$species_accepted[pdb$Metadata$ipm_id == names(l_as)[i]]
nm
plot(l_as[[i]], type = "l",
# ylim = c(0, 1),
main = paste0(nm,": Probability of survival to age 5"),
xlab = expression(paste("Initial size z"[0])),
ylab = "Pr(s)")
plot(f_as[[i]], type = "l",
# ylim = c(0, 1),
main = paste0(nm,": Expected Fecundity at age 5 (given survival)"),
xlab = expression(paste("Initial size z"[0])),
ylab = "E[f]")
}
We can also generate survivorship curves for each one to investigate type I/II/III species. These require simulating cohorts for a number of years using the \(P\) and \(F\) kernels. Because this can take some time to run, we’ll only do it for a single model for each species in our data set.
<- pdb_subset(pdb, simple_mod_ind) %>%
keep_ind $Metadata %>%
.!duplicated(.$species_accepted), "ipm_id"]
.[
<- simple_pdb[keep_ind]
use_ipms
<- 10
n_yrs
<- right_ev(use_ipms) init_pops
## 'x' did not converge to asymptotic dynamics after 51 iterations.
## Will re-iterate the model 100 times and check for convergence.
## model is now converged :)
## 'x' did not converge to asymptotic dynamics after 51 iterations.
## Will re-iterate the model 100 times and check for convergence.
## model is now converged :)
## 'x' did not converge to asymptotic dynamics after 51 iterations.
## Will re-iterate the model 100 times and check for convergence.
## model is now converged :)
## 'x' did not converge to asymptotic dynamics after 51 iterations.
## Will re-iterate the model 100 times and check for convergence.
## model is now converged :)
## 'x' did not converge to asymptotic dynamics after 51 iterations.
## Will re-iterate the model 100 times and check for convergence.
## model is now converged :)
# As above, remove the [1:2] to see all plots and use par() to control their
# arrangement
# par(mfrow = c(3, 2))
for(i in seq_along(init_pops)[1:2]) {
<- l_age <- numeric(n_yrs)
f_age
<- diag(nrow(use_ipms[[i]]$sub_kernels[[1]]))
P_a
for(j in seq_len(n_yrs)) {
<- use_ipms[[i]]$sub_kernels$P
P_now <- use_ipms[[i]]$sub_kernels$F
F_now
<- sum(colSums(P_a) * init_pops[[i]][[1]])
l_age[j] <- sum(colSums(F_now %*% P_a) * init_pops[[i]][[1]])
f_age[j]
<- P_now %*% P_a
P_a
}
<- f_age / l_age
f_age
<- pdb$Metadata$species_accepted[pdb$Metadata$ipm_id == names(init_pops)[i]]
nm
plot(l_age, type = "l",
ylim = c(0, 1),
main = paste0(nm, ": Probability of survival"),
xlab = "Age",
ylab = "Pr(s)")
plot(f_age, type = "l",
# ylim = c(0, 1),
main = paste0(nm, ": Average Fecundity"),
xlab = "Age",
ylab = "E[f]")
}
\(\bar\eta(z_0)\) is given by the following equation: \(\bar\eta(z_0) = eN\), where \(N\) is the fundamental operator. This can be thought of as the expected amount of time an individual with initial state \(z_0\) spends in state \(z'\) before death. It is computed as \((I - P)^{-1}\), where \(I\) is an identity operator (in practice, it is an identity matrix with dimension equal to \(P\)), and \(P\) is the survival/growth kernel of the IPM. Thus, all we need are the \(P\) kernels from each model.
<- function(ipm) {
make_N
<- ipm$sub_kernel$P
P <- diag(nrow(P))
I <- solve(I - P)
N
return(N)
}<- function(ipm) {
eta_bar_z0
<- make_N(ipm)
N return(colSums(N))
}
<- lapply(use_ipms, eta_bar_z0) mean_lifespan
The formula for the variance in lifespan is only a bit more
complicated. It is given by \(\sigma_\eta^2(z_0) = e(2N^2 - N) -
(eN)^2\). For this, we also need to use ipmr
’s
%^%
operator to ensure we correctly exponentiate the first
term, and use the regular ^
to exponentiate the second
term. Calculating \(N\) is the same as
before, we’ll compute the variance and standard deviation of lifespan
(for plotting).
<- function(ipm) {
sigma_eta_z0
<- make_N(ipm)
N
<- colSums(2 * N %^% 2 - N) - (colSums(N)) ^ 2
out
return(out)
}
<- lapply(use_ipms, sigma_eta_z0)
var_lifespan
<- lapply(var_lifespan, sqrt) sd_lifespan
Warning? Huh? Let’s see what’s going on.
vapply(mean_lifespan, range, numeric(2L))
## aaaa34 aaaa36 aaa310 aaa341 aaa351 aaa385 ddddd5 dddd10
## [1,] 1.810595 1.590430 6.36749 -348825.70 1.045242 1.005169 2.106308 1.000000
## [2,] 3.739900 5.874642 13.68391 -30555.21 2.023906 3.631503 4.638129 1.626605
## dddd30
## [1,] 1.299821
## [2,] 1.537262
vapply(var_lifespan, range, numeric(2L))
## aaaa34 aaaa36 aaa310 aaa341 aaa351 aaa385 ddddd5
## [1,] 2.475952 2.774118 71.19712 20383705217 0.05439601 0.02589388 3.35095
## [2,] 6.023528 13.706128 115.84883 121676736576 0.99166204 9.44496008 13.28191
## dddd10 dddd30
## [1,] 0.0000000 0.3902612
## [2,] 0.9258931 0.7019200
The fundamental operator should not have any negative numbers, and
yet aaa341
contains them. This is because of the survival
function used in the model. We’ll show how to remedy this in the next
section, but for now, we’ll just remove it from our analysis. Next,
we’ll learn how to visualize our results quickly with
ggplot2
:
<- mean_lifespan[!names(mean_lifespan) %in% "aaa341"]
mean_lifespan <- sd_lifespan[!names(sd_lifespan) %in% "aaa341"]
sd_lifespan
library(ggplot2)
<- data.frame(
all_data id = NA,
species = NA,
mean_ls = NA,
upper = NA,
lower = NA,
z_0 = NA
)
for(i in seq_along(mean_lifespan)) {
<- data.frame(
temp id = names(mean_lifespan)[i],
species = pdb$Metadata$species_accepted[pdb$Metadata$ipm_id == names(mean_lifespan)[i]],
mean_ls = mean_lifespan[[i]],
upper = mean_lifespan[[i]] + 1.96 * sd_lifespan[[i]],
lower = mean_lifespan[[i]] - 1.96 * sd_lifespan[[i]],
z_0 = seq(1, length(mean_lifespan[[i]]), 1)
)
<- rbind(all_data, temp)
all_data
}
# Remove the NA dummy row, and restrict the lower CI to >= 0 (can't have negative
# lifespan)
<- all_data[-1, ]
all_data $lower <- ifelse(all_data$lower < 0, 0, all_data$lower)
all_data
# Now, ggplot using facet wrap and geom_ribbon to get the confidence interval
ggplot(all_data, aes(x = z_0, y = mean_ls)) +
geom_line() +
geom_ribbon(aes(ymin = lower,
ymax = upper),
fill = "grey50",
alpha = 0.5) +
facet_wrap( ~ species,
scales = "free") +
theme_bw()
We saw above that sometimes data in PADRINO can give values that make no sense. PADRINO is committed to providing IPMs as they are published, and does not take a stance on the technical correctness of these models. The analysis above provides an opportunity to quickly illustrate how to address some of these issues. There is a separate vignette with a more complete overview of known issues in PADRINO.
We’ll introduce two new functions -
vital_rate_exprs<-
and pdb_new_fun_form()
.
Since we’re computing all new survival values, we’ll have to modify the
proto_ipm
first, then re-build the IPM object (the species
in question is Lonicera maackii).
<- pdb_make_proto_ipm(pdb, "aaa341")
lonicera_proto
# Inspect the vital rate expressions
vital_rate_exprs(lonicera_proto)
## $aaa341
## s: 1/(1 + exp(-(si + ss1 * size_1 + ss2 * size_1^2)))
## g_mean: gi + gs * size_1
## g: dnorm(size_2, g_mean, g_sd)
## Fp: 1/(1 + exp(-(fpi + fps * size_1)))
## Fs: exp(fi + fs * size_1)
## Fd: dnorm(size_2, fd_mean, fd_sd)
We can see that the function s
is the one we want to
update. We’ll use the setter vital_rate_exprs<-
and
pdb_new_fun_form()
to do this. These two functions combine
with the following syntax:
vital_rate_exprs(proto_ipms) <- pdb_new_fun_form(
list(
<ipm_id_1> = list(
<vital_rate_name_1> = <expression_1>
<vital_rate_name_2> = <expression_2>
),<ipm_id_2> = list(
<vital_rate_name_3> = <expression_3>
)
) )
With a maximum survival probability of 0.98, our example of
aaa341
looks like this:
vital_rate_exprs(lonicera_proto) <-pdb_new_fun_form(
list(
aaa341 = list(
s = pmin(0.98, 1 / (1 + exp(-(si + ss1 * size_1 + ss2 * size_1 ^ 2))))
)
)
)
vital_rate_exprs(lonicera_proto)
## $aaa341
## s: pmin(0.98, 1/(1 + exp(-(si + ss1 * size_1 + ss2 * size_1^2))))
## g_mean: gi + gs * size_1
## g: dnorm(size_2, g_mean, g_sd)
## Fp: 1/(1 + exp(-(fpi + fps * size_1)))
## Fs: exp(fi + fs * size_1)
## Fd: dnorm(size_2, fd_mean, fd_sd)
Great! Let’s re-run the analysis above with our corrected model!
# Rebuild the IPM, then use our functions for mean, variance, and SD on it.
<- pdb_make_ipm(lonicera_proto)
lonicera_ipm
<- eta_bar_z0(lonicera_ipm$aaa341)
lonicera_mu_ls <- sigma_eta_z0(lonicera_ipm$aaa341)
lonicera_var_ls <- sqrt(lonicera_var_ls)
lonicera_sd_ls
<- data.frame(
temp id = "aaa341",
species = "Lonicera_maackii",
mean_ls = lonicera_mu_ls,
upper = lonicera_mu_ls + 1.96 * lonicera_sd_ls,
lower = lonicera_mu_ls - 1.96 * lonicera_sd_ls,
z_0 = seq(1, length(lonicera_mu_ls), 1)
)
<- rbind(all_data, temp)
all_data
# Again, restrict the lower CI to have minimum of 0
$lower <- ifelse(all_data$lower < 0, 0, all_data$lower)
all_data
# Rebuild our plot!
ggplot(all_data, aes(x = z_0, y = mean_ls)) +
geom_line() +
geom_ribbon(aes(ymin = lower,
ymax = upper),
fill = "grey50",
alpha = 0.5) +
facet_wrap( ~ species,
scales = "free") +
theme_bw()
The results look a bit better for Lonicera, and are at least now mathematically correct. When updating functional forms in PADRINO, it is important to explore how assumptions like the maximum survival probability in the modified form affect results. This is left as an exercise to you!
This is just a very brief overview of the analyses that are possible
with Rpadrino
. Ellner,
Childs & Rees 2016 give a comprehensive overview of IPM theory
and analyses. There are further vignettes in this package that describe
data cleaning and how to use PADRINO with other databases.