Spatial autocorrelation can severely bias transfer function performance estimates. If can also bias reconstruction significance tests, but I suspect the bias is not as severe.
This vignette show how to use the autosim
argument to
randomTF()
to use an autocorrelated simulated environmental
variables, instead of the default uniformly distributed independent
environmental variables, to make the reference distribution.
library(palaeoSig)
library(rioja)
library(sf)
library(gstat)
library(dplyr)
library(tibble)
library(tidyr)
library(purrr)
library(ggplot2)
set.seed(42) # for reproducibility
We use the foraminifera dataset by Kucera et al. (2005). We use some of the samples to represent a core.
# load data
data(Atlantic)
<- c("Core", "Latitude", "Longitude", "summ50")
meta <- as.data.frame(Atlantic) # prevents rowname warnings
Atlantic
# pseudocore as no fossil foram data in palaeoSig
<- Atlantic |>
fosn filter(between(summ50, 5, 10)) |>
slice_sample(n = 20)
# remaining samples as training set
<- Atlantic |>
Atlantic anti_join(fosn, by = "Core") |>
slice_sample(n = 300) # random subset to speed analysis up
<- Atlantic |>
Atlantic_meta select(one_of(meta)) # to keep rdist.earth happy
<- Atlantic |> # species
Atlantic select(-one_of(meta))
<- fosn |>
fos select(-one_of(meta))
We need to convert the meta data into an sf
object for
further calculation.
<- st_as_sf(
Atlantic_meta x = Atlantic_meta,
coords = c("Longitude", "Latitude"),
crs = 4326
)
Fitting the variogram is the hardest part. There are several types of variogram model available, e.g. exponential “Exp”, spherical “Sph”, gaussian “Gau” and Matérn “Mat”. These have different shapes. It is important to find one that fits the data well.
# Estimate the variogram model for the environmental variable of interest
<- variogram(summ50 ~ 1, data = Atlantic_meta)
ve <- fit.variogram(
vem object = ve,
model = vgm(40, "Mat", 5000, .1, kappa = 1.8)
)plot(ve, vem)
vem#> model psill range kappa
#> 1 Nug 0.7358321 0.000 0.0
#> 2 Mat 185.3131517 3404.449 1.8
Now we can use gstat::krige
to do Gaussian unconditional
simulation and make simulated environmental fields with the same spatial
structure as the observed variable. This step is quite slow with large
datasets.
# Simulating environmental variables
<- krige(sim ~ 1,
sim locations = Atlantic_meta,
dummy = TRUE,
nsim = 100,
beta = mean(Atlantic_meta$"summ50"),
model = vem,
newdata = Atlantic_meta
)#> [using unconditional Gaussian simulation]
# convert sf back to a regular data.frame
<- sim |> st_drop_geometry() sim
Now we can run randomTF
using the simulated
environmental variables.
<- randomTF(
rtf_auto spp = Atlantic,
env = Atlantic_meta$summ50,
fos = fos,
autosim = sim,
fun = MAT,
col = "MAT.wm"
)#> Warning in Merge(object$y, newdata, split = TRUE): Some row names were changed
#> to avoid duplicates.
#> Warning in Merge(object$y, newdata, split = TRUE): Some row names were changed
#> to avoid duplicates.
plot(rtf_auto)
<- randomTF(
rtf_ind spp = Atlantic,
env = Atlantic_meta$summ50,
fos = fos,
fun = MAT,
col = "MAT.wm"
)#> Warning in Merge(object$y, newdata, split = TRUE): Some row names were changed
#> to avoid duplicates.
#> Warning in Merge(object$y, newdata, split = TRUE): Some row names were changed
#> to avoid duplicates.
plot(rtf_ind)
Kucera, M., Weinelt, M., Kiefer, T., Pflaumann, U., Hayes, A., Weinelt, M., Chen, M.-T., Mix, A.C., Barrows, T.T., Cortijo, E., Duprat, J., Juggins, S., Waelbroeck, C. 2005. Reconstruction of the glacial Atlantic and Pacific sea-surface temperatures from assemblages of planktonic foraminifera: multi-technique approach based on geographically constrained calibration datasets. Quaternary Science Reviews 24, 951-998 doi:10.1016/j.quascirev.2004.07.014.
Telford, R.J., Birks, H.J.B. 2009. Evaluation of transfer functions in spatially structured environments. Quaternary Science Reviews 28, 1309-1316 doi:10.1016/j.quascirev.2008.12.020.