Compared to the univariate gridded Gaussian case, we now place the data irregularly and assume we observe counts rather than a Gaussian response.
library(magrittr)
library(dplyr)
library(ggplot2)
library(meshed)
set.seed(2021)
<- 30 # coord values for jth dimension
SS <- 2 # spatial dimension
dd <- SS^2 # number of locations
n <- 3 # number of covariates
p
# irregularly spaced
<- cbind(runif(n), runif(n))
coords colnames(coords) <- c("Var1", "Var2")
<- 1.5
sigmasq <- 0.5
nu <- 10
phi <- .1
tausq <- rnorm(n) * tausq^.5
ee
# covariance at coordinates
<- sigmasq * exp(-phi * as.matrix(dist(coords)))
CC # cholesky of covariance matrix
<- t(chol(CC))
LC # spatial latent effects are a GP
<- LC %*% rnorm(n)
w
# covariates and coefficients
<- matrix(rnorm(n*p), ncol=p)
X <- matrix(rnorm(p), ncol=1)
Beta
# univariate outcome, fully observed
<- 1 + X %*% Beta + w + ee
y_full
# now introduce some NA values in the outcomes
<- y_full %>% matrix(ncol=1)
y sample(1:n, n/5, replace=FALSE), ] <- NA
y[
<- coords %>%
simdata cbind(data.frame(Outcome_full= y_full,
Outcome_obs = y,
w = w))
%>% ggplot(aes(Var1, Var2, color=w)) +
simdata geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("w: Latent GP")
%>% ggplot(aes(Var1, Var2, color=y)) +
simdata geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("y: Observed outcomes")
In spmeshed
we can choose
settings$forced_grid=FALSE
, in which case we are fitting
the model \[ y = X \beta + w + \epsilon,
\] where \(w \sim MGP\) are
spatial random effects and \(\epsilon \sim
N(0, \tau^2 I_n)\). If instead we choose
settings$forced_grid=TRUE
, then the model becomes \[ y = X \beta + H w + \epsilon', \]
where \(w\) are now sampled on a grid
of knots, \(H\) is a matrix that
interpolates \(w\) to the observed
locations, and \(\epsilon \sim N(0, R + \tau^2
I_n)\) where \(R\) is a diagonal
matrix that corrects the measurement error variance. Refer to Peruzzi et
al (2021)1 for details.
Regardless of the model chosen, for spatial data, an exponential covariance function is used by default: \(C(h) = \sigma^2 \exp( -\phi h )\) where \(h\) is the spatial distance.
The prior for the spatial decay \(\phi\) is \(U[l,u]\) and the values of \(l\) and \(u\) must be specified. We choose \(l=1\), \(u=30\) for this dataset.2
Setting verbose
tells spmeshed
how many
intermediate messages to output while running MCMC. We set up MCMC to
run for 3000 iterations, of which we discard the first third. We use the
argument block_size=25
to specify the coarseness of domain
partitioning. In this case, the same result can be achieved by setting
axis_partition=c(10, 10)
.
Finally, we note that NA
values for the outcome are OK
since the full dataset is on a grid. spmeshed
will figure
this out and use settings optimized for partly observed lattices, and
automatically predict the outcomes at missing locations. On the other
hand, X
values are assumed to not be missing.
Let’s start with the model without gridded knots.
<- 200 # too small! this is just a vignette.
mcmc_keep <- 400
mcmc_burn <- 2
mcmc_thin
<- system.time({
mesh1_total_time <- spmeshed(y, X, coords,
meshout1 block_size = 25,
n_samples = mcmc_keep, n_burn = mcmc_burn, n_thin = mcmc_thin,
n_threads = 2,
verbose = 0,
settings=list(forced_grid=FALSE, cache=FALSE),
prior=list(phi=c(1,30))
)})
And now with the gridded knots. We build a grid with as many knots as there are observations, but we could use a smaller or a bigger one.
<- system.time({
mesh2_total_time <- spmeshed(y, X, coords,
meshout2 grid_size = c(30, 30),
block_size = 25,
n_samples = mcmc_keep, n_burn = mcmc_burn, n_thin = mcmc_thin,
n_threads = 2,
verbose = 0,
prior=list(phi=c(1,30))
)})
Post-processing proceeds the same way as when the data locations are on a grid. Let’s do that for the second model.
summary(meshout2$lambda_mcmc[1,1,]^2)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.452 2.668 3.507 4.141 5.081 10.164
summary(meshout2$theta_mcmc[1,1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.010 1.393 2.368 2.457 3.016 6.423
summary(meshout2$tausq_mcmc[1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.03712 0.12541 0.18388 0.19330 0.23855 0.41074
summary(meshout2$beta_mcmc[1,1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -1.401 -1.349 -1.331 -1.332 -1.315 -1.263
We proceed to plot predictions across the domain along with the
recovered latent effects. We plot the predictions at the input
locations, i.e. the points not on the grid of knots, which we find via
forced_grid==0
. We plot the recovered latent process on the
grid, for a nicer picture.
# process means
<- data.frame(w_mgp = meshout2$w_mcmc %>% summary_list_mean())
wmesh # predictions
<- data.frame(y_mgp = meshout2$yhat_mcmc %>% summary_list_mean())
ymesh
<- meshout2$coordsdata %>%
outdf cbind(wmesh, ymesh) %>%
left_join(simdata)
%>% filter(forced_grid==1) %>%
outdf ggplot(aes(Var1, Var2, fill=w_mgp)) +
geom_raster() +
scale_fill_viridis_c() +
theme_minimal() + ggtitle("w: recovered")
%>% filter(forced_grid==0) %>%
outdf ggplot(aes(Var1, Var2, color=y_mgp)) +
geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("y: predictions")
spmeshed
implements a model which can be
interpreted as assigning \(\sigma^2\) a
folded-t via parameter expansion if settings$ps=TRUE
, or an
inverse Gamma with parameters \(a=2\)
and \(b=1\) if
settings$ps=FALSE
, which cannot be changed at this time.
\(\tau^2\) is assigned an exponential
prior.↩︎