Modifying Stochastic IPMs in PADRINO

Modifying the environment of a PADRINO IPM

Sometimes, we may want to modify the environmental sequence of a stochastic IPM stored in PADRINO. As of now, there is no single interace for doing this, and so you may need to write some code on your own. This vignette is meant to provide a bit of help with that. It will make use of Rpadrino and ipmr to modify proto_ipms, and doesn’t assume any prior knowledge of the data structure or of PADRINO itself (I don’t think, anyway. Let me know if I’m incorrect via a Github Issue).

Finding parameter-resampled IPMs

These aren’t explicitly marked in the Metadata table. However, we can find them by searching the EnvironmentalVariables table, specifically the ipm_id column.

library(Rpadrino)

pdb <- pdb_download(FALSE)

unique(pdb$EnvironmentalVariables$ipm_id)
## [1] "aaaa15" "aaaa16" "aaaa21" "aaaa54" "aaaa59"

For now, we’ll just work with the two IPMs from Westerband et al. (2016):

stoch_ids <- c("aaaa15", "aaaa16")

Ok, now we have the data we want to work with, we can check out various ways of modifying them to explore the questions we want to answer!

re-set environmental variables w/ pre-defined sequence of values

First, we’ll examine how to re-set IPMs to sample from a pre-determined sequence of environmental values. This is useful for exploring things like, for example, environmental autocorrelation and/or enhancing reproducibility of our subsequent analyses.

This will require a few steps:

  1. Create proto_ipm objects from the PADRINO data.

  2. Remove the current stochastic environmental state expressions

  3. Replace them with something that will generate the values we want.

Create the proto_ipm’s and find the stochastic parameter names

First, we should investigate what’s in each model by creating and printing the proto_ipms.

# Step 1 - create proto_ipms, and then figure out which parameters in the
# vital rate expressions we need to work with.

protos <- pdb_make_proto_ipm(pdb, stoch_ids)
## 'ipm_id' aaaa15 has resampled parameters, resetting 'det_stoch' to 'stoch' and 
## 'kern_param' to 'param'!
## Default number of iterations for 'pdb_make_ipm' will be 50. Modify this behavior 
## with the 'addl_args' argument of 'pdb_make_ipm'.
## 'ipm_id' aaaa16 has resampled parameters, resetting 'det_stoch' to 'stoch' and 
## 'kern_param' to 'param'!
## Default number of iterations for 'pdb_make_ipm' will be 50. Modify this behavior 
## with the 'addl_args' argument of 'pdb_make_ipm'.
protos[[1]]
## A simple, density independent, stochastic, parameter-resampled proto_ipm with 2 kernels defined:
##  P, F 
## 
## Kernel formulae:
## 
## P: s * g
## F: f
## 
## Vital rates:
## 
## s: 1/(1 + exp(-(s_0 + s_1 * leafarea_1 + s_2 * j + s_3 * leafarea_1 * 
##     j)))
## g_mu: g_0 + g_1 * leafarea_1 + g_2 * j + g_3 * A_max + g_4 * leafarea_1 * 
##     j + g_5 * leafarea_1 * A_max + g_6 * j * A_max + g_7 * leafarea_1 * 
##     j * A_max
## g: dnorm(leafarea_2, g_mu, g_sd)
## f: r_p * r_o * n_f * n_s * s_s * sdl_s * sdl_size
## r_p: 1/(1 + exp(-(r_0 + r_1 * leafarea_1 + r_2 * j + r_3 * leafarea_1 * 
##     j)))
## r_o: exp(i_0 + i_1 * leafarea_1 + i_2 * j + i_3 * leafarea_1 * j)
## s_s: ifelse(j < 6, s_sl, s_sh)
## sdl_s: ifelse(j < 6, sdl_sl, sdl_sh)
## sdl_size: ifelse(j < 6, sdl_m_l, sdl_m_h)
## sdl_m_l: dnorm(leafarea_2, sdl_meanl, sdl_sdl)
## sdl_m_h: dnorm(leafarea_2, sdl_meanh, sdl_sdh)
## 
## Parameter names:
## 
##  [1] "s_0"       "s_1"       "s_2"       "s_3"       "g_0"       "g_1"      
##  [7] "g_2"       "g_3"       "g_4"       "g_5"       "g_6"       "g_7"      
## [13] "g_sd"      "r_0"       "r_1"       "r_2"       "r_3"       "i_0"      
## [19] "i_1"       "i_2"       "i_3"       "n_f"       "n_s"       "s_sl"     
## [25] "s_sh"      "sdl_sl"    "sdl_sh"    "sdl_meanl" "sdl_meanh" "sdl_sdl"  
## [31] "sdl_sdh"  
## 
## All parameters in vital rate expressions found in 'data_list':  FALSE
## Did not test parameters in 'define_env_state()'.
## 
## Items in vital rate expressions not found in 'data_list':
## 
## *P*
## j
## A_max 
## 
## *F*
## j 
## 
## This may be because 'define_domains()' has not yet been called.
##  If it has been called, then double check the model definition and 'data_list'.
## 
## 
## Domains for state variables:
## 
## leafarea: lower_bound = 0.57, upper_bound = 11.9, n_meshpoints = 50
## 
## Population states defined:
## 
## n_leafarea: Pre-defined population state.
## 
## Internally generated model iteration procedure:
## 
## n_leafarea_t_1: right_mult(kernel = P, vectr = n_leafarea_t) + right_mult(kernel = F, 
##     vectr = n_leafarea_t)

It seems that j and A_max are the two that we need to create sequences for. Let’s double check the EnvironmentalVariables table to be sure:

pdb$EnvironmentalVariables[pdb$EnvironmentalVariables$ipm_id %in% stoch_ids, ]
EnvironmentalVariables Table, subsetted using ‘stoch_ids’
ipm_id env_variable vr_expr_name env_range env_function model_type
aaaa15 LightLevel j NULL sample(j_seq) Substituted
aaaa15 LightLevel j_seq 1:5 NULL Parameter
aaaa15 A_max A_max NULL Unif(a_max_low, a_max_high) Substituted
aaaa15 A_max a_max_low 5 NULL Parameter
aaaa15 A_max a_max_high 7 NULL Parameter
aaaa16 LightLevel j NULL sample(j_seq) Substituted
aaaa16 LightLevel j_seq 1:5 NULL Parameter
aaaa16 A_max A_max NULL Unif(a_max_low, a_max_high) Substituted
aaaa16 A_max a_max_low 5 NULL Parameter
aaaa16 A_max a_max_high 8 NULL Parameter

Create new stochastic parameter values

Next, we need to simulate values for j and A_max. This next chunk replicates what PADRINO is doing in a simulation, it just does the sampling ahead of time. In your own code, you’ll want to replace it with, say, creating autocorrelated sequences of vectors, or a specific climate change scenario.

use_j <- sample(1:5, size = 50, replace = TRUE)
use_a <- runif(50, 5, 8)

# The names of this data.frame need to be the same as the names of the variables
# we're substituting in the IPM. Otherwise, the function we write in the next
# block won't work!

use_env_data <- data.frame(j = use_j, A_max = use_a)

Write our sampling function

We now have our values! But how to insert them and then run the model? We need to write a function that samples these values in the order we want, and then sets their names correctly. It should return a named list. For this example, we’ll assume that above, we created them in the correct order, and that each row of the resulting data.frame corresponds to a time step. For now, don’t worry about where index will come from - that will become clear shortly.

sample_env <- function(env_data, index) {

  as.list(env_data[index, ]) %>%
    setNames(names(env_data))

}

Remove the current environmental variation

We’re ready to begin working with the actual proto_ipm objects! For now, we’ll just modify the first one. The first step is to eliminate the current environmental information from the proto_ipm. That information is stored in the env_state column. By default, it is NA unless there is a continuous environmental variation, so we want to re-set it to that state before we insert our own information.

# For this demonstration, we'll create copy in the 'protos' object and overwrite
# that.

protos$aaaa15_2 <- protos$aaaa15

protos$aaaa15_2$env_state <- lapply(
  protos$aaaa15_2$env_state, 
  function(x) return(NA)
)

# In practice, you may just want to to overwrite the proto_ipm object in place.
# We can still recover the object by re-building the proto_ipm list from the
# intact version of the 'pdb' object.

# protos$aaaa15$env_state <- lapply(
#   protos$aaaa15$env_state, 
#   function(x) return(NA)
# )

# If you want to do get rid of both at the same time, use a double-lapply:

# protos <- lapply(protos, 
#                  function(x) {
#                    x$env_state <- lapply(x$env_state, 
#                                          function(y) return(NA))
#                    
# })

Insert our environmental variation

Next, we’re going to insert our environmantal information using ipmr’s define_env_state(). In the call to sample_env(), we set index = t. t is an internal variable that ipmr defines to track what timestep of the simulation it is currently on. This will sample from the correct row of use_env_data. We also supply the use_env_data object and sample_env() function in the data_list slot so that they can be found when the simulation runs.

# Replace the environemntal set up with the one we just created using ipmrs'
# define_env_state.

protos$aaaa15_2 <- define_env_state(
  protos$aaaa15_2,
  env_vars = sample_env(use_env_data, index = t),
  data_list = list(use_env_data = use_env_data,
                   sample_env   = sample_env)
)

Rebuild the models

We can rebuild the models now using pdb_make_ipm():

ipms <- pdb_make_ipm(protos)

lambda(ipms)
## $aaaa15
##     lambda 
## -0.1035558 
## 
## $aaaa16
##    lambda 
## 0.2710374 
## 
## $aaaa15_2
##     lambda 
## -0.1034538
Environmental covariate sequence from unaltered PADRINO data (first 10 iterations)
j A_max
2 6.175675
2 5.713312
2 6.277382
4 6.123218
4 6.770578
1 5.061616
5 6.075462
1 6.907894
3 6.833537
2 5.919426
Environmental covariate sequence from custom function (first 10 iterations)
j A_max
5 7.420765
5 6.026861
5 7.400723
4 6.439495
4 6.635122
2 6.653187
1 6.017218
4 5.551615
1 7.911561
1 7.074661

And finally, a quick check to verify that our pre-defined sequence is indeed the one that got used for aaaa15_2:

all.equal(ipms$aaaa15_2$env_seq$j,     use_env_data$j) 
## [1] TRUE
all.equal(ipms$aaaa15_2$env_seq$A_max, use_env_data$A_max)
## [1] TRUE

Modifying parameter values

This is a bit simpler, because we just need to make some alterations to a table, rather than creating new expressions. Unfortunately, we can’t use the parameters()<- interface for environmental variation because that setter function does not modify the right place in the proto_ipm object.

Identify parameter names

The env_variable column of the EnvironmentalVariables usually provides a very brief summary of what these parameters are, but not always. In general, PADRINO notation in the vr_expr_name column mirrors the publication notation. So to understand the meaning of A_max, we need to consult the original publication (Westerband, A. C., Horvitz, C. C. (2017). Photosynthetic rates influence the population dynamics of understory herbs in stochastic light environments. Ecology, 98 (2): 370-381). Upon doing this, we see that A_max corresponds to photosynthetic capacity. For this instance, we’ll pretend that we want to set the lower limit of A_max to a smaller value, and the upper limit to a higher value.

EnvironmentalVariables table, subsetted using ‘stoch_ids’
ipm_id env_variable vr_expr_name env_range env_function model_type
aaaa15 LightLevel j NULL sample(j_seq) Substituted
aaaa15 LightLevel j_seq 1:5 NULL Parameter
aaaa15 A_max A_max NULL Unif(a_max_low, a_max_high) Substituted
aaaa15 A_max a_max_low 5 NULL Parameter
aaaa15 A_max a_max_high 7 NULL Parameter
aaaa16 LightLevel j NULL sample(j_seq) Substituted
aaaa16 LightLevel j_seq 1:5 NULL Parameter
aaaa16 A_max A_max NULL Unif(a_max_low, a_max_high) Substituted
aaaa16 A_max a_max_low 5 NULL Parameter
aaaa16 A_max a_max_high 8 NULL Parameter

We can see that A_max is given by a Uniform distribution with parameters A_max_low and A_max_high. These two, as their names imply, are the lower and upper limits for A_max, respectively. We can modify these just as we do any other data.frame. We’ll create a copy of the pdb object to modify, but you can modify it in place and re-download an unaltered copy, or save the unaltered copy before altering it to preserve the original version. Again, we’ll only modify aaaa15.

pdb_2 <- pdb

inds <- which(pdb_2$EnvironmentalVariables$vr_expr_name %in% paste0("a_max_",
                                                                    c("low", 
                                                                      "high")) &
                pdb_2$EnvironmentalVariables$ipm_id == "aaaa15")

pdb_2$EnvironmentalVariables$env_range[inds[1]] <- 3 # Bump min down two steps
pdb_2$EnvironmentalVariables$env_range[inds[2]] <- 9 # Bump max up two steps

# The rest is the usual building workflow: make a proto_ipm, then convert to
# ipm.

new_protos <- pdb_make_proto_ipm(pdb_2, "aaaa15")
## 'ipm_id' aaaa15 has resampled parameters, resetting 'det_stoch' to 'stoch' and 
## 'kern_param' to 'param'!
## Default number of iterations for 'pdb_make_ipm' will be 50. Modify this behavior 
## with the 'addl_args' argument of 'pdb_make_ipm'.
new_ipms   <- pdb_make_ipm(new_protos)

lambda(new_ipms)
## log(lambda) is returned by default for stochastic models. Set 'log = FALSE' for lambda on linear scale.
## $aaaa15
##     lambda 
## -0.1033656

Changing distributions

What if we want to change the distribution that a certain environmental variable is sampled from? For example, rather than use a uniform distribution, we decide we want to sample it from a Gamma distribution. There are a couple steps we need to go through. They are:

  1. Write a function that substitutes distribution names. We can find PADRINO’s distribution naming convention in the “Abbreviation” column here.

  2. Write a function to insert the new parameter values and names into the EnvironmentalVariables Table.

  3. Wrap 1-2 in a top-level function that we’ll actually use.

  4. Use it to modify EnvironmentalVariables, then rebuild the IPMs using pdb_make_proto_ipm() and pdb_make_ipm().

NB: The functions we define in 1-3 will probably be incorporated into Rpadrino at some point in the near future, but further testing for stability is needed before they become part of the package. As such, use them at your own risk!

The function in 1 will make use of rlang to safely modify function names and arguments. It is possible to do this with gsub("Unif", "Gamma", pdb$EnvironmentalVariables$env_function[index]) (where index is the row number of the function you want to update), but could lead to unexpected results sometimes.

Top-level function

This is probably the most straightforward of the functions to write. We know that there two steps we have to implement that this wraps: substitution the functioal form of the distribution, and substituting the parameter values. Additionally, we’re going to insert a subsetting capacity so that we can pass a larger database loop over a set of ipm_ids to rapidly update many models at once.

library(rlang)

sub_env_fun <- function(db,          # pdb object
                        parameter,   # parameter created by the distribution
                        new_fun,     # The new distribution
                        new_pars,    # The parameters for the new distribution
                        id = NULL) { # ipm_id to operate on
  
  if(!is.null(id)) {
    
    if(length(id) > 1L) stop("'id' should only have length 1!")
    
    db <- pdb_subset(db, id)
  } 
  
  ev_tab <- db$EnvironmentalVariables %>%
    sub_rng_fun(parameter, new_fun, new_pars) %>%
    sub_rng_pars(parameter, new_pars)
  
  db$EnvironmentalVariables <- ev_tab
  
  return(db)
}

Substituting functions

The first substituting function we’ll write is the one that substitutes the distribution we wish to use into the env_function column. This will use the names of the new_pars object as arguments in a call to the new_fun function. Don’t worry about finding the correct R r<distribution> function here - we use PADRINO’s syntax to modify these.

sub_rng_fun <- function(ev_tab, parameter, new_fun, new_pars) {

  # Get the old function form
  old_fun_ind <- which(ev_tab$vr_expr_name == parameter)

  # Create a new function call, and insert new arguments
  arg_nms     <- syms(names(new_pars))
  new_fun     <- call2(new_fun, !!! syms(names(new_pars)))
  
  # Convert back to a string and insert into the table
  ev_tab$env_function[old_fun_ind] <- expr_text(new_fun)
  
  return(ev_tab)
}

There is a lot going on in this function, and most of it is beyond the scope of this vignette (if you’re curious about the details, see the Metaprogramming topics here). The key takeaway is that we take our new function, "Gamma" and convert it into a function call with arguments that are named after new_pars. Then we stick it back into the table, overwriting the old "Unif" function.

Adding a parameter value to the table is far less complicated. First, we remove the parameters associated with the old distribution. We have to do this because of the way that Rpadrino finds the parameters to functions in the EnvironmentalVariables table. The first lines of the function accomplish this. After that, we create a version of the EnvironmentalVariables table that contains our new parameters and add those back in.

sub_rng_pars <- function(ev_tab, parameter, new_pars) { 
  
  # Remove the existing parameters associated with the distribution
  # that we've replaced.
  
  if(sum(ev_tab$env_variable == parameter) > 1) {
    rm_ind <- ev_tab$env_variable == parameter & ev_tab$model_type == "Parameter"
    ev_tab <- ev_tab[!rm_ind, ]
  }
  
  
  new_rows <- data.frame(
    ipm_id       = rep(unique(ev_tab$ipm_id), length(new_pars)),
    env_variable = rep(parameter, length(new_pars)),
    vr_expr_name = names(new_pars),
    env_range    = unlist(new_pars),
    env_function = rep("NULL", length(new_pars)),
    model_type   = "Parameter"
  )
  rownames(new_rows) <- NULL
  
  out <- rbind(ev_tab, new_rows)
  return(out)
}

small_pdb <- pdb %>%
  pdb_subset("aaaa15")

new_pdb <- sub_env_fun(db = small_pdb, 
                       id = NULL, 
                       "A_max",
                       "Gamma", 
                       list(a_max_shape = 14, a_max_rate = 3.5))
New ‘EnvironmentalVariables’ table
ipm_id env_variable vr_expr_name env_range env_function model_type
aaaa15 LightLevel j NULL sample(j_seq) Substituted
aaaa15 LightLevel j_seq 1:5 NULL Parameter
aaaa15 A_max A_max NULL Gamma(a_max_shape, a_max_rate) Substituted
aaaa15 A_max a_max_shape 14 NULL Parameter
aaaa15 A_max a_max_rate 3.5 NULL Parameter

Notice that our new Gamma distribution parameters have found their way into the parameter table. We can now rebuild the model as using the same pipeline as above.

new_ipm <- new_pdb %>% 
  pdb_make_proto_ipm() %>%
  pdb_make_ipm()

lambda(new_ipm)
## $aaaa15
##     lambda 
## -0.1022285

It is probably ok to copy/paste the functions we defined above and apply to in your own analyses. However, as they are not yet incorporated into Rpadrino, I cannot promise that the results will be error free.