This vignette describes how to write formulas using
ipmr
’s notation for models derived from parameter sets.
Parameter sets may be derived from mixed models, or fixed effects that
are discrete. The notation is meant to mirror the standard mathematical
notation used to represent the models. The examples here will primarily
use lme4 to illustrate how
to go from vital rate models to functional forms that you can use in
ipmr
. You are of course free to use whatever model fitting
software you like, but the number of different possibilities are far too
great to cover in this vignette.
The ipmr
notation is meant to make implementing models
more concise, as it allows you to avoid re-typing/copy+pasting the same
kernel formats and just substituting, for example, plot name or
transition year. It works by appending an index suffix to kernels, vital
rates, and parameters that can take multiple values (e.g. P
becomes P_yr
). We then add a named list where the names
correspond to the index and the entries correspond to the values it can
take. We pass pass the list to par_set_indices
(i.e. par_set_indices = list(yr = 1:5)
), and set
uses_par_sets = TRUE
.
NB: because of the way these indices are handled internally, they should always appear at the end of the name they modify. For example, use
s_slope_site
instead ofs_site_slope
. Multiple indices are fine as long as they chained together at the end (e.g.s_slope_site_year
). Models may still work if formatted with indices in the middle of a name, but they also may not (often with obscure error messages).
Some quick translations between mathematical notation, R model
syntax, and ipmr
style notation. In the ipmr
column, index suffixes that are appended are highlighted in
bold.
Suffix names are not restricted to the examples below, and can be anything you want them to be.
The mathematical notation in this vignette will take the following form:
Coefficients/parameters: \(\alpha\) - intercept, \(\beta\) - slope, \(\mu\) - mean, \(\sigma\) standard deviation.
Vital rates: subscripts on the coefficients/parameters. For example, \(\alpha_G\) for the intercept of a growth model.
Kernels: bivariate kernels are denoted with capital letters (e.g. a growth kernel \(G(z',z)\)) and univariate functions are denoted with lower case letters (e.g. a survival function \(s(z)\)).
Indices: superscripts on coefficients. For example, \(\beta_g^{site}\) for a site specific growth slope.
Probability density functions: these are denoted with \(f_p\), where \(_p\) is replaced by the vital rate. For example, \(f_G(z',\mu_G, \sigma_G)\) would denote a probability density function for growth.
Math Formula | R Formula | ipmr | Model Number |
---|---|---|---|
\(\mu_G^{yr} = \alpha_G + \alpha_G^{yr} * \beta_G * z\) |
size_2 ~ size_1 + (1 | year), family = gaussian()
|
mu_g_yr = g_int + g_int_yr + g_slope * z | 1 |
\(G(z',z) = f_G(z', \mu_g^{yr}, \sigma_G)\) | g = dnorm(z_2, mu_g_yr, sd_g) | 1 | |
\(Logit(\mu_{s}^{plot,yr}) = \alpha_s + \alpha_{s}^{plot,yr} + \beta * z\) |
surv ~ size_1 + (1 | year) + (1 | plot), family = binomial()
|
s_pl_yr = s_int + s_int_pl_yr + s_slope * z | 2 |
surv ~ size_1 + (1 | year) + (1 | plot), family = binomial()
|
s_pl_yr = s_int + s_int_pl + s_int_yr + s_slope * z | 2 | |
s = inv_logit(s_pl_yr) | 2 | ||
\(log(\mu_{r}^{yr}) = \alpha_{r} + \alpha_r^{yr} + (\beta_r + \beta_r^{yr}) * z\) |
fec ~ size_1 + (size_1 | year), family = poisson()
|
r = exp(r_int + r_int_yr + (r_slope + r_slope_yr) * z) | 3 |
Internally, ipmr
generates the various levels using
expand.grid
. This means that every combination of the
values in par_set_indices
must exist in the
data_list
, or the model will fail with an error along the
lines of (Error in eval_tidy: object 'x_yz' not found
). In
order to exclude levels that do not exist in your data, you can add a
vector to the list in par_set_indices
called
drop_levels
. This should contain the values you wish to
exclude as a character vector. For example, if data are collected for
sites = c("a", "b", "c")
, and
years = c(2005:2008)
, but there is no data from site
"a"
in 2007, we can use
par_set_indices = list(site = c("a", "b", "c"), year = c(2005:2008), drop_levels = c("a_2007"))
.
When picking values to appear in par_set_indices
, it is
best to avoid longer words or phrases where words are separated by
underscores
(e.g. treatment = c("h_removal", "c_addition")
), as this
may cause problems when values are substituted into expressions.
CamelCase is generally safer in these cases (e.g. use
treatment = c("hRemoval", "cAddition")
instead of
treatment = c("h_removal", "c_addition")
).
These models are pretty common in the IPM literature. Examples include IPMs where a single site has been sampled over many years, or sampled many sites across one transition.
This example will use a hypothetical study that spans 5 transitions
and has 6 consecutive censuses. The temporal variation will be denoted
with a index/superscript appended to each term that it modifies (\(x^{yr}\)/x_yr
). To limit
complexity, let’s pretend our exploratory analysis and model selection
indicated the best overall models included random year-specifc
intercepts for growth (\(g^{yr}\)) and
survival (\(s^{yr}\)), but not
probability of reproduction (\(r_p\)),
recruit production (\(r_r\)), or the
recruit size distribution (\(r_d\)). We
will work with a simple model to start (i.e. 1 continuous state
variable, no discrete states). This yields the following IPM (\(z,z'\) represent size/height/weight at
time \(t\) and \(t+1\), respectively):
\(n(z', t + 1) = \int_L^U [P^{yr}(z',z) + F(z',z)]n(z,t)dz\)
\(P^{yr}(z', z) = s^{yr}(z) * g^{yr}(z',z)\)
\(F(z',z) = r_p(z) + r_r(z) + r_d(z')\)
Note that only the \(P\) kernel get
the \(yr\) subscript appended to them -
this is the only one that our model selection process decided had
substantial year to year variation. Thus, we can actually write the
\(F\) kernel the exact same way as we
would in a simple IPM when we implement the model in
ipmr
.
The vital rate models could be written on paper as follows:
Growth
\(\mu_G^{yr}(z) = (\alpha_G + \alpha_G^{yr})+ \beta_G * z\)
\(G^{yr}(z',z) = f_G(z', \mu_G^{yr}(z), \sigma_G)\)
Survival
Probability of reproducing
Recruit production
Recruit size distribution
and modeled in R as follows:
library(lme4)
library(ipmr)
<- lmer(size_2 ~ size_1 + (1 | year), data = grow_data)
grow_mod
<- glmer(surv ~ size_1 + (1 | year), data = surv_data, family = binomial)
surv_mod
<- glm(repr ~ size_1, data = repr_data, family = binomial)
repr_mod
<- glm(recr ~ size_1, data = recr_data, family = poisson)
recr_mod
<- mean(rcsz_data$size_2)
rcsz_mu <- sd(rcsz_data$size_2) rcsz_sd
We could then extract the fixed coefficients with the
fixef
method and the conditional modes of the random
effects via the ranef
method:
<- fixef(grow_mod)
beta_gs
<- ranef(grow_mod)
alpha_g_yrs
<- fixef(surv_mod)
beta_ss
<- ranef(surv_mod) alpha_s_yrs
The non-mixed models can use the coef
method to extract
a vector of coefficients:
<- coef(repr_mod)
repr_coef
<- coef(recr_mod) recr_coef
The next step is to transform these values into something
ipmr
can use. This must always be a named
list where the names of the list components are the names used
in the vital expressions or kernel formula. Transforming the \(\beta\)s is easier - the output from
fixef
is always a named vector, so we just use
as.list()
and change the names to whatever we want to use.
We then bundle it into one big list with all the fixed coefficients.
<- as.list(beta_gs)
g_list names(g_list) <- c("alpha_g", "beta_g")
<- as.list(beta_ss)
s_list names(s_list) <- c("alpha_s", "beta_s")
<- as.list(recr_coef)
r_p_list names(f_p_list ) <- c("alpha_r_p", "beta_r_p")
<- as.list(repr_coef)
r_r_list names(r_r_list) <- c("alpha_r_r", "beta_r_r")
<- list(mu_r_d = rcsz_mu, sigma_r_d = rcsz_sd)
r_d_list
<- c(g_list, s_list, r_p_list, r_r_list, r_d_list) all_fixed_params
For the random effects, we probably need to do some a bit more
processing, though this really depends on which modeling framework you
use and the format of the outputs. For lme4
, it looks like
this:
<- as.list(unlist(alpha_g_yrs))
g_alpha_list names(g_alpha_list) <- paste("alpha_g_", 2001:2006, sep = "")
<- as.list(unlist(alpha_s_yrs))
s_alpha_list names(s_alpha_list) <- paste("alpha_s_", 2001:2006, sep = "")
First, we strip away all of lme4
-related attributes with
unlist()
, then convert it to the flat list format that we
need. Next, we set the names. Now, we can combine it with our
all_fixed_params
list and we have all of the parameters we
need for the kernel functions:
<- c(all_fixed_params, g_alpha_list, s_alpha_list) all_params
We are now ready to begin implementing the kernels. This model will be a simple, density independent, stochastic kernel-resampled model. The parts where the index syntax is used are highlighted by comments in the code block:
<-init_ipm(sim_gen = "simple",
ex_ipm di_dd = "di",
det_stoch = "stoch",
kern_param = "kern") %>%
define_kernel(
# The _yr index is appended as a suffix with an underscore. Notice how the formula
# from Eq 2 is translated into the formula argument and the kernel name
name = "P_yr",
family = "CC",
formula = s_yr * g_yr,
# We also modify each parameter name is indexed as well.
# Here, I've split out the linear predictor and the inverse logit
# transformation into separate steps to avoid over cluttering
s_lin_p = alpha_s + alpha_s_yr + beta_s * z_1,
s_yr = 1 / (1 + exp(- s_lin_p)),
# We do the same with growth as we did with survival and the P_yr formula
g_yr = dnorm(z_2, mu_g_yr, sigma_g),
mu_g_yr = alpha_g + alpha_g_yr + beta_g * z_1,
data_list = all_params,
states = list(c("z")),
# We signal that the kernel has parameter sets
uses_par_sets = TRUE,
# And provide the values that each index can take as a named list.
# The name(s) in this list MUST match the index used in the expressions.
par_set_indices = list(yr = 2001:2006),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g_yr")
%>%
)
# This kernel doesn't get anything special because there are no
# time-varying parameter values.
define_kernel(
name = "F",
family = "CC",
formula = r_p * r_r * r_d,
r_p_lin_p = alpha_r_p + beta_r_p * z_1,
r_p = 1 / (1 + exp( -r_p_lin_p)),
r_r = exp(alpha_r_r + beta_r_r * z_1),
r_d = dnorm(z_2, mu_r_d, sigma_r_d),
data_list = all_params,
states = list(c("z")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "r_d")
%>%
) define_impl(
make_impl_args_list(
kernel_names = c("P_yr", "F"),
int_rule = rep("midpoint", 2),
state_start = rep("z", 2),
state_end = rep("z", 2)
)%>%
) define_domains(
z = c(0.1, 10, 50)
%>%
) define_pop_state(
n_z = runif(50)
%>%
) make_ipm(
iterate = TRUE,
iterations = 100,
kernel_seq = sample(2001:2006, size = 100, replace = TRUE),
normalize_pop_size = TRUE
)
Sometimes, models will have discretely and continuously varying components as functions of different variables. For example, a study could consider multiple sites (discrete component) and spatial and/or temporal fluctuations in the environment at each site (continuous component). We’ll use a hypothetical study that models demography as a function of environment across multiple sites to illustrate this.
We’ll implement a general IPM of a perennial plant species. Our data
will come from 6 sites, "A"
- "F"
, and the
vital rate intercepts for the plants will vary across each one. Its
seeds can either enter a seed bank or recruit immediately, producing
recruits at the next time step. We’ll pretend that data from the recruit
size distribution was pooled across sites, and so does not vary with
space. The seed bank will be age structured and its parameters will be
space- and time-invariant. Seeds in the first year can either survive
(\(s_{{sb}_1}\)/s_sb1
) and
recruit (\(r_{{sb}_1}\)), or survive
and become two year old seeds. Two year old seeds can either survive
(\(s_{{sb}_2}\)/s_sb2
) and
recruit (\(r_{{sb}_2}\)/r_sb2
) or
die.
The vital rate information fed into our hypothetical regression
models are (\(s^{site}\)/s_site
), growth
(\(g^{site}\)/g_site
),
probability of reproduction (\(c_{p}^{site}\)/c_p_site
), and
seed production (\(c_{r}^{site}\)/c_r_site
). They
are all functions of size (z
) and environmental covariates
(\(\theta^{t}\) and \(\theta^{p}\), denoted together in the
kernel formulas as just \(\theta\)).
The full IPM looks like this:
\[ n(z, t + 1) = \int_L^U [P^{site}(z', z,\theta) + r_i * F^{site}(z', z, \theta)]n(z, t)dz +\\ s_{{sb}_1} * r_{{sb}_1} * sb_1(t) * c_d(z') + \\s_{{sb}_2} * r_{{sb}_2} * sb_2(t) * c_d(z') \]
\(sb_1(t + 1) = (1-r_i) * \int_L^U [c_p^{site}(z, \theta) * c_r^{site}(z,\theta)] n(z, t)dz\)
\(sb_2(t + 1) = s_{{sb}_1} * (1-r_{{sb}_1}) * sb_1(t)\)
The sub kernels are:
\(P^{site}(z',z, \theta) = s^{site}(z, \theta) * G^{site}(z',z, \theta)\)
\(F^{site}(z',z, \theta) = c_{{p}}^{site}(z, \theta) * c_{r}^{site}(z, \theta) * c_d(z')\)
\(\theta^{t} \sim Norm(\mu^{t}, \sigma^{t})\)
\(\theta^{p} \sim Gamma(\alpha^{p}, \beta^{p})\)
And the size/climate dependent vital rate functions and example R code are:
\(Logit(s^{site}(z, \theta)) = (\alpha_s + \alpha_{s}^{site}) + \beta_s^{z} * z + \beta_s^{t} * \theta^{t} + \beta_s^{p} * \theta^{p}\)
glmer(survival ~ size + temp + precip + (1 | site), data = my_surv_data, family = binomial())
\(G^{site}(z',z,\theta) = f_G(z', \mu_G^{site}(z, \theta), \sigma_G)\)
\(\mu_G^{site}(z, \theta) = (\alpha_G + \alpha_G^{site}) + \beta_G^{z} * z + \beta_G^{t} * \theta^{t} + \beta_G^{p} * \theta^{p}\)
lmer(size_next ~ size + temp + precip + (1 | site), data = my_grow_data)
\(Logit(c_{p}^{site}(z, \theta)) = (\alpha_{c_p} + \alpha_{c_p}^{site}) + \beta_{c_p}^{z} * z + \beta_{c_p}^{t} * \theta^{t} + \beta_{c_p}^{p} * \theta^{p}\)
glmer(repro ~ size + temp + precip + (1 | site), data = my_repro_data, family = binomial())
\(log(c_{r}^{site}(z, \theta)) = (\alpha_{c_r} + \alpha_{c_r}^{site}) + \beta_{c_r}^{z} * z + \beta_{c_r}^{t} * \theta_{t} + \beta_{c_r}^{p} * \theta_{p}\)
glmer(flower_n ~ size + temp + precip + (1 | site), data = my_flower_data, family = poisson())
\(c_d(z') = f_{c_d}(z', \mu_{c_d}, \sigma_{c_d})\)
mean(my_recr_data$size_next); sd(my_recr_data$size_next)
Since this example doesn’t work with actual data, we need to simulate some parameters for each site. Additionally, we’ll specify the distribution parameters for each \(\theta\) and write a function that samples them once.
library(ipmr)
set.seed(51)
# Set up the sites, and the parameters
<- LETTERS[1:6]
sites
<- list(
all_pars r_i = 0.6,
r_sb1 = 0.2,
r_sb2 = 0.3,
s_sb1 = 0.4,
s_sb2 = 0.1,
mu_c_d = 4,
sigma_c_d = 0.9,
sigma_g = 3,
beta_s_z = 2.5,
beta_s_prec = 0.3,
beta_s_temp = -0.5,
beta_g_z = 0.99,
beta_g_prec = 0.01,
beta_g_temp = 0.1,
beta_c_p_z = 0.4,
beta_c_p_prec = 0.6,
beta_c_p_temp = -0.3,
beta_c_r_z = 0.005,
beta_c_r_prec = -0.3,
beta_c_r_temp = 0.9
)
<- rnorm(6, mean = 4.5, sd = 1) %>%
g_alphs as.list() %>%
setNames(
paste('alpha_g_', sites, sep = "")
)
<- rnorm(6, mean = -7, sd = 1.5) %>%
s_alphs as.list() %>%
setNames(
paste('alpha_s_', sites, sep = "")
)
<- rnorm(6, mean = -10, sd = 2) %>%
c_p_alphs as.list() %>%
setNames(
paste('alpha_c_p_', sites, sep = "")
)
<- runif(6, 0.01, 0.1) %>%
c_r_alphs as.list() %>%
setNames(
paste('alpha_c_r_', sites, sep = "")
)
<- c(all_pars,
all_pars
g_alphs,
s_alphs,
c_p_alphs,
c_r_alphs)
# This list contains parameters that define the distributions for environmental
# covariates
<- list(
env_params temp_mu = 12,
temp_sd = 2,
prec_shape = 1000,
prec_rate = 2
)
# This function samples the environmental covariate distributions and returns
# the values in a named list. We can reference the names in the list in our
# vital rate expressions.
<- function(env_params) {
sample_fun
<- rnorm(1, env_params$temp_mu, env_params$temp_sd)
temp
<- rgamma(1, env_params$prec_shape, env_params$prec_rate)
prec
<- list(temp = temp,
out prec = prec)
return(out)
}
We’ve written out our model - now we can implement it. We’ll use the general, density independent, stochastic parameter resampled machinery for this.
<- init_ipm(sim_gen = "general",
ex_ipm di_dd = "di",
det_stoch = "stoch",
kern_param = "param") %>%
define_kernel(
name = "P_site",
family = "CC",
# site is appended so that we don't have to write out s_A, s_B, etc.
formula = s_site * g_site * d_z,
s_site_lin = alpha_s_site +
* z_1 +
beta_s_z * prec +
beta_s_prec * temp,
beta_s_temp s_site = 1 / (1 + exp(-s_site_lin)),
mu_g_site = alpha_g_site +
* z_1 +
beta_g_z * prec +
beta_g_prec * temp,
beta_g_temp g_site = dnorm(z_2, mu_g_site, sigma_g),
data_list = all_pars,
states = list(c("z")),
uses_par_sets = TRUE,
par_set_indices = list(site = LETTERS[1:6]),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "g_site")
%>%
) define_kernel(
name = "F_site",
family = "CC",
formula = c_p_site * c_r_site * c_d * d_z,
c_p_site_lin = alpha_c_p_site +
* z_1 +
beta_c_p_z * prec +
beta_c_p_prec * temp,
beta_c_p_temp c_p_site = 1 / (1 + exp(-c_p_site_lin)),
c_r_site = exp(alpha_c_r_site +
* z_1 +
beta_c_r_z * prec +
beta_c_r_prec * temp),
beta_c_r_temp c_d = dnorm(z_2, mu_c_d, sigma_c_d),
data_list = all_pars,
states = list(c("z", "sb1", "sb2")),
uses_par_sets = TRUE,
par_set_indices = list(site = LETTERS[1:6]),
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
%>%
) define_kernel(
name = "go_sb_1_site",
family = "CD",
formula = (1 - r_i) * c_p_site * c_r_site * d_z,
c_p_site_lin = alpha_c_p_site +
* z_1 +
beta_c_p_z * prec +
beta_c_p_prec * temp,
beta_c_p_temp c_p_site = 1 / (1 + exp(-c_p_site_lin)),
c_r_site = exp(alpha_c_r_site +
* z_1 +
beta_c_r_z * prec +
beta_c_r_prec * temp),
beta_c_r_temp data_list = all_pars,
states = list(c("z", "sb1")),
uses_par_sets = TRUE,
par_set_indices = list(site = LETTERS[1:6]),
evict_cor = FALSE
%>%
) define_kernel(
name = "go_sb_2",
family = "DD",
formula = s_sb1 * (1 - r_sb1),
data_list = all_pars,
states = list(c("sb1", "sb2")),
uses_par_sets = FALSE,
evict_cor = FALSE
%>%
) define_kernel(
name = "leave_sb_1",
family = "DC",
formula = s_sb1 * r_sb1 * c_d,
c_d = dnorm(z_2, mu_c_d, sigma_c_d),
data_list = all_pars,
states = list(c("z", "sb1")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
%>%
) define_kernel(
name = "leave_sb_2",
family = "DC",
formula = s_sb2 * r_sb2 * c_d,
c_d = dnorm(z_2, mu_c_d, sigma_c_d),
data_list = all_pars,
states = list(c("z", "sb2")),
uses_par_sets = FALSE,
evict_cor = TRUE,
evict_fun = truncated_distributions("norm", "c_d")
%>%
) define_impl(
make_impl_args_list(
kernel_names = c("P_site",
"F_site",
"go_sb_1_site",
"go_sb_2",
"leave_sb_1",
"leave_sb_2"),
int_rule = rep("midpoint", 6),
state_start = c("z", "z", "z", "sb1", "sb1", "sb2"),
state_end = c("z", "z", "sb1", "sb2", "z", "z")
)%>%
) define_domains(
z = c(0.1, 100, 200)
%>%
) define_pop_state(
n_z = runif(200),
n_sb1 = 20,
n_sb2 = 20
%>%
) define_env_state(
env_state = sample_fun(env_params),
data_list = list(env_params = env_params,
sample_fun = sample_fun)
%>%
) make_ipm(
iterations = 20,
kernel_seq = sample(LETTERS[1:6], 20, replace = TRUE),
normalize_pop_size = TRUE
)
lambda(ex_ipm)