Petrus would like to think about co-occurrence of hunting spider species. He wants to know which species co-occur, and to what extent can these co-occurrence patterns be explained by joint response to environmental gradients.
What approach should he used to look at these questions?
He is clearly after methods for studying co-occurrence, so he wants to do covariance modelling on \(\Sigma\), e.g. using latent variables or graphical modelling.
library(mvabund)
data(spider)
library(ecoCopula)
par(mfrow=c(1,2), mgp=c(1.75,0.75,0), mar=c(3,3,1,1))
spiderAbund = mvabund(spider$abund)
spider_glmInt = manyglm(spiderAbund~1,data=spider$x)
ord_spiderInt = cord(spider_glmInt)
plot(ord_spiderInt, biplot=TRUE) #for a biplot
# now plot a correlation matrix
library(gclus)
#> Warning: package 'gclus' was built under R version 4.3.3
#> Loading required package: cluster
sortVars=order.single(ord_spiderInt$sigma)
library(corrplot)
#> corrplot 0.92 loaded
corrplot(ord_spiderInt$sigma[sortVars, sortVars], type="lower", diag=FALSE, method="square")
Repeat the analyses of Code Box 17.1 on presence-absence data,
which you can construct using spiderPA=pmin(spiderAbund,1)
.
Estimate the correlation matrix.
par(mfrow=c(1,2), mgp=c(1.75,0.75,0), mar=c(3,3,1,1))
spiderPA=pmin(spiderAbund,1)
spiderPA_glmInt = manyglm(spiderPA~1,data=spider$x, family="cloglog")
ord_spiderPAInt=cord(spiderPA_glmInt)
plot(ord_spiderPAInt, biplot=TRUE) #for a biplot
# now plot a correlation matrix
sortVars=order.single(ord_spiderPAInt$sigma)
corrplot(ord_spiderPAInt$sigma[sortVars, sortVars], type="lower",
diag=FALSE, method="square")
How does the correlation matrix compare to Figure 17.2b?
Correlations tend to be smaller, they range from about -0.4 up to 0.4.
Calculate the sum of squared loadings, with and without predictors, as in Code Box 17.2. Are the values smaller for the presence/absence data?
par(mfrow=c(1,2), mgp=c(1.75,0.75,0), mar=c(3,3,1,1))
spiderPA_glmX = manyglm(spiderPA~.,data=spider$x, family="cloglog")
ord_spiderPAX = cord(spiderPA_glmX)
plot(ord_spiderPAX, biplot=TRUE)
corrplot(ord_spiderPAX$sigma[sortVars,sortVars], type="lower", diag=FALSE, method="square")
ss = c(sum(ord_spiderPAInt$loadings^2), sum(ord_spiderPAX$loadings^2))
c(ss, 1-ss[2]/ss[1])
#> [1] 4.535242 1.353017 0.701666
absCor = c( sum(abs(ord_spiderPAInt$sigma)),
sum( abs(ord_spiderPAX$sigma)) ) - ncol(spiderPA)
c(absCor, 1-absCor[2]/absCor[1])
#> [1] 31.716950 1.473854 0.953531
Is this what you expected?
The fact that correlations became weaker is expected. But interestingly, predictors were able to explain most of the correlation, with responses being close to uncorrelated after environmental variables were included.
Are your conclusions any different to Code Box 17.2, in terms of the extent to which patterns in co-occurrence can be explained by environmental variables?
I guess you would conclude here that most (more than two thirds) of co-occurrence in presence-absence can be explained by environmental variables, whereas this number was about half when looking at abundance.
Françoise visited 51 sites near Lyon, France, and estimated abundance of 40 different species of birds (by listening to bird songs for 15 minutes) (Tatibouët, 1981). He would like to understand bird co-occurrence, and the extent to which it is explained by predictors related to urbanisation. Load the data and fit ordinal regressions to each response:
library(ade4)
#> Warning: package 'ade4' was built under R version 4.3.3
data(aviurba)
abund=mvabund(aviurba$fau)
library(ordinal)
ft_birdsInt=manyany(abund~1, "clm", family="ordinal", data=aviurba$mil)
Use cord to fit a latent variable model and estimate the correlation matrix.
par(mfrow=c(1,2),mgp=c(1.75,0.75,0),mar=c(3,3,1,1))
ord_birdsInt=cord(ft_birdsInt)
plot(ord_birdsInt, biplot=TRUE)
sortBirdVars=order.single(ord_birdsInt$sigma)
corrplot(ord_birdsInt$sigma[sortBirdVars, sortBirdVars], type="lower", diag=FALSE, method="square")
Looks like nothing much is happening here!
Add fields to the model, to study the extent to which presence or absence of fields explains co-occurrence patterns. Calculate the sum of squared loadings, for latent variable models with and without fields as a predictor.
par(mfrow=c(1,2), mgp=c(1.75,0.75,0), mar=c(3,3,1,1))
ft_birdsX=manyany(abund~fields, "clm", family="ordinal", data=aviurba$mil)
ord_birdsX=cord(ft_birdsX)
plot(ord_birdsX,biplot=TRUE)
corrplot(ord_birdsX$sigma[sortBirdVars,sortBirdVars],type="lower",diag=FALSE, method="square")
ss = c(sum(ord_birdsInt$loadings^2), sum(ord_birdsX$loadings^2))
c(ss, 1-ss[2]/ss[1])
#> [1] 4.9059113 4.2268432 0.1384183
absCor = c( sum(abs(ord_birdsInt$sigma)), sum( abs(ord_birdsX$sigma)) ) - ncol(abund)
c(absCor, 1-absCor[2]/absCor[1])
#> [1] 87.6423718 72.3964502 0.1739561
What can you conclude about co-occurrence patterns of these
birds, and the extent to which they are explained by presence or absence
of fields
?
The proportion of co-occurrence patterns that can be explained is
relatively small, so I guess we can say that fields
is not
a major driver of co-occurrence patterns of birds.
The ordinal
package has a bug in it (in version 2019.12)
so it conflicts with lme4
(specifically it overwrites the
ranef
function), [issue posted on Github] (https://github.com/runehaubo/ordinal/issues/48). So if
you are running analyses using both packages, you need to detach the
ordinal
package before continuing…
par(mfrow=c(1,2),mgp=c(1.75,0.75,0),mar=c(3,3,1,1))
graph_spiderInt = cgr(spider_glmInt)
plot(graph_spiderInt, vary.edge.lwd=TRUE)
graph_spiderX = cgr(spider_glmX, graph_spiderInt$all_graphs$lambda.opt)
#> Warning in cgr(spider_glmX, graph_spiderInt$all_graphs$lambda.opt): 'best'
#> model selected among supplied lambda only
plot(graph_spiderX, vary.edge.lwd=TRUE)
Recall that in Figure 16.1a, Alopecosa accentuata, Alopecosa fabrilis and Arctosa perita decreased in response to soil dryness, while all other species increased. Note that the “unconstrained” correlation matrix of Figure 17.2b found negative correlation patterns between these species and most others. To what extent do contrasting responses to soil dryness explain the negative correlations of Figure 17.2b? Answer this question by fitting a covariance model of your choice to the spider data with and without soil dryness as a predictor.
par(mfrow=c(1,2),mgp=c(1.75,0.75,0),mar=c(3,3,1,1))
spider_glmSoil = manyglm(spiderAbund~soil.dry,data=spider$x)
ord_spiderSoil=cord(spider_glmSoil)
corrplot(ord_spiderInt$sigma[sortVars,sortVars],type="lower",diag=FALSE, method="square")
corrplot(ord_spiderSoil$sigma[sortVars,sortVars],type="lower",diag=FALSE, method="square")
ss = c(sum(ord_spiderInt$loadings^2), sum(ord_spiderSoil$loadings^2))
c(ss, 1-ss[2]/ss[1])
#> [1] 7.490253 5.931681 0.208080
absCor = c( sum(abs(ord_spiderInt$sigma)),
sum( abs(ord_spiderSoil$sigma)) ) - ncol(spider$abund)
c(absCor, 1-absCor[2]/absCor[1])
#> [1] 52.7196966 40.1773826 0.2379057
This explained less covariation than the model with all six
environmental variables, more like a fifth of it rather than half. But
anyway, the negative correlations we saw previously (above, left) do
involve Arctperi
, Alopacce
and
Alopfabr
, the three species that have a negative
association with soil.dry
. These three species are all
positively correlated with each other but negatively correlated with
most other species.
After including soil.dry
in the model (above, right),
most of the negative associations have disappeared, and these species
become largely uncorrelated with all others. So while plenty of the
covariation does not seem to be explained by soil.dry
, it
is doing a good job of capturing the negative co-occurrence patterns in
spider abundance.