In support of our manuscript, we developed an R (R Core Team 2023) package to help construct integrated species distribution models (ISDMs) from disparate datasets in a simple and reproducible framework. This R Markdown document presents an illustration of the package by creating an ISDM for red-listed plant species obtained via the Vascular Plant Field Notes survey program in Norway, as well as citizen-science data obtained from Global Biodiversity Information Facility (GBIF). The first step in exploring this document is to download the package using the following script:
intSDM has two main functions: startWorkflow()
and sdmWorkflow()
. The first of which is designed to setup
and specify all the individual components of the workflow using
different slot functions. The functions related to this object
include:
Function name | Function use |
---|---|
.$plot() |
Plot data and other objects required for the model. |
.$addStructured() |
Add data not available on GBIF. |
.$addMesh() |
Create an inla.mesh object. |
.$addGBIF() |
Add data from GBIF. |
.$addArea() |
Specify sampling domain. |
.$addCovariates() |
Add spatial covariates. |
.$crossValidation() |
Specify the cross-validation method. |
.$modelOptions() |
Add R-INLA (Martins et al. 2013), inlabru (Bachl et al. 2019) and PointedSDMs (Mostert and O’Hara 2023) options. |
.$specifySpatial() |
Add penalizing complexity priors to the spatial effects. |
.$biasFields() |
Specify an additional spatial effect for a dataset. |
.$workflowOutput() |
Specify the output of the workflow. |
.$obtainMeta() |
Obtain metadata for the occurrence records. |
To start the workflow, we need to specify the coordinate reference system (CRS) considered for the analysis as well as the species used. The three species selected for this analysis from the Vascular Plant Field Notes (arnica montana, fraxinus excelsior and ulmus glabra) have records predominantly spread across the southern and eastern part of Norway. However the species ulmus glabra (which has the largest spread of the three species selected), has some of the records approaching the middle and middle-upper parts of Norway.
The other arguments (saveOptions and Save) should be used if the user wants the objects to be saved in a folder created by the function.
proj <- '+proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +datum=WGS84 +units=km +x_0=500 +y_0=0 +no_defs'
#proj <- '+proj=utm +zone=32 +datum=WGS84 +units=km +no_defs +type=crs'
workflow <- startWorkflow(
Projection = proj,
Species = c("Fraxinus_excelsior", "Ulmus_glabra", "Arnica_montana"),
saveOptions = list(projectName = 'Vascular'), Save = FALSE
)
Next we specify the study domain for the study: in this case Norway. This can be achieved using either the countryName argument which will then access the object from the giscoR (Hernangómez 2023) R package, or by using Object and supplying our own polygon object. In this case we choose to add our own polygon object because the rigid coastline of a unedited Norway polygon will cause issues further down.
We then use .$plot()
to see what the boundary looks
like.
Norway <- fm_transform(giscoR::gisco_get_countries(country = 'Norway', resolution = 60),
proj)
Norway <- st_cast(st_as_sf(Norway), 'POLYGON')
Norway <- Norway[which.max(st_area(Norway)),]
Norway <- rmapshaper::ms_simplify(Norway, keep = 0.8)
Norway <- st_as_sf(fm_extensions(Norway, convex = c(10, 20))[[1]])
workflow$addArea(Object = Norway)
workflow$plot()
Species’ occurrence data is certainly the most important component of
a SDM, and intSDM has two slot functions to help you add data
into the workflow: .$addGBIF()
and
.$addStructured()
.
The former of which uses the rgbif package to download data
directly from GBIF. For this function, we need to specify the name of
the dataset (datasetName) and the type of the dataset
(datasetType) – which can be one of PO, PA or
Counts. The … argument is used to specify any
addiditional arguments for rgbif::occ_data()
(Chamberlain, Oldoni, and Waller 2022) (in this
case, limit and datasetKey). If
datasetType = 'PA'
, absences may be generated
using generateAbsences = TRUE
. This will treat the
obtained data as a checklist survey data: combining all the sampling
locations for the species in the dataset, and creating absences when a
given species did not occur in a given region.
For this example we consider three sources of data. The Vascular Plant Field Notes is a collection of observations provided by the Norwegian University of Science and Technology’s (NTNU) (Norwegian University of Science and Technology 2023) University Museum and the University of Oslo (UiO) (University of Oslo 2023), containing records of standardized cross-lists of most vascular plants found in Norway. We treat these two datasets as detection/non-detection data, generating absences in sampling locations where the species does not occur.
The other source of data considered comes from the Norwegian Species Observation service (published by Artsdatabanken) (The Norwegian Biodiversity Information Centre 2023). This data is a collection of citizen science records – and as a result we treat it as presence-only data.
workflow$addGBIF(datasetName = 'CZ',
datasetType = 'PO',
coordinateUncertaintyInMeters = '0,100',
limit = 10000,
datasetKey = 'b124e1e0-4755-430f-9eab-894f25a9b59c')
workflow$addGBIF(datasetName = 'UiO',
datasetType = 'PA',
limit = 10000,
coordinateUncertaintyInMeters = '0,100',
generateAbsences = TRUE,
datasetKey = 'e45c7d91-81c6-4455-86e3-2965a5739b1f')
workflow$addGBIF(datasetName = 'NTNU',
datasetType = 'PA',
limit = 10000,
coordinateUncertaintyInMeters = '0,100',
generateAbsences = TRUE,
datasetKey = 'd29d79fd-2dc4-4ef5-89b8-cdf66994de0d')
workflow$plot(Species = TRUE)
Covariate data may be added to the model using
.$addCovariates()
. Layers from WorldClim (Fick and Hijmans 2017) may be accessed
using the worldClim argument, and land cover covariates may be
obtained using the landCover argument.. This in turn uses the
geodata (Hijmans et al. 2023) R
package to obtain spatRaster objects of the covariates cropped
around the study domain. Other covariate layers may be added using the
Object argument.
We can then view the metadata for the obtained occurrence records
using the .$obtainMeta()
function, which will give us the
citation for the datasets used in this workflow.
One of the objects required for our model is an fm_mesh_2d
object, which we will use in the approximation of our spatial random
fields. The .$addMesh()
function’s argument … uses
fmesher::fm_mesh_2d_inla()
to create this object.
Furthermore we also used penalizing complexity (PC) priors
in our model, which are designed to control the spatial range and
standard deviation in the GRF’s Matérn covariance function in order to
reduce over-fitting in the model (Simpson et al.
2017). An integrate-to-zero constraint is set to stabilize the
model, using constr = TRUE
.
We also specify priors for the intercept terms and the fixed effects of the models. In this vignette we choose tight priors (mean = 0; precision = 1).
We specify an additional spatial effect for the citizen science data
using .$biasFields()
to account biases in the collection
process (Simmonds et al. 2020). The
same
To simplify the model, we set the parameter used to connect the shared spatial effect between the likelioods to a fixed value of 1.
We them specify the model output using the function
.$workflowOutput
. In this workflow, we want to return the
R-INLA model objects and the maps of the predicted
intensity.
The workflow is then implemented using the sdmWorkflow()
function. We also specify some R-INLA options to speed and
stabilize the estimates of the model using inlaOptions
.
Due to the lengthy time it requires to produce this map, inference is not made in this vignette. However the script is available below for the user to run the model themselves.
Maps of the predicted intensity are given as follows.
And model summaries as: