One of the main features of BayesTools is assistance in generating JAGS (Plummer, 2003) code based on formulas and prior distribution objects and subsequent estimation of marginal likelihoods with the bridgesampling R package (Gronau et al., 2020). Marginal likelihoods, \(p(\text{data} \mid \mathcal{M})\), are the key ingredient for computing Bayes factors,
\(\text{BF}_{10} = \frac{p(\text{data} \mid \mathcal{M}_{1})}{p(\text{data} \mid \mathcal{M}_{0})}\),
which quantify relative predictive performance of two competing models (Kass & Raftery, 1995; Rouder & Morey, 2019; Wrinch & Jeffreys, 1921). Convenient model specification then allows users and package developers to easily compute Bayes factors and test a wide range of informed hypotheses. See RoBMA (Bartoš & Maier, 2020) and (Bartoš, 2022) R packages for implementation examples.
However, when considering a simple regression, the model space of all possible models increases exponentially with additional predictors. I.e., the possibility of including vs. excluding \(k\) predictors leads to \(2^k\) possible submodels that need to be computed. Even relatively computationally simple models (e.g. ~ 1 min of computation) with 10 possible predictors would result in more than 17 hours of computation. Therefore, we might require more computationally efficient methods when performing variable selection with more than a few covariates. In this vignette, I showcase how to use BayesTools package to specify spike and slab priors that aim to explore most of the model space and obtain posterior inclusion probabilities for each predictor within a single MCMC run (Kuo & Mallick, 1998; O’Hara & Sillanpää, 2009).
To keep it simple, let’s consider a linear regression with one continuous predictor \(x\). We simulate \(N = 100\) observations of a dependent variable \(y\) under the presence of a small effect \(\beta\) of a continuous predictor \(x\).
set.seed(-68) # set seed for reproducibility
N <- 100 # number of observations
x <- rnorm(N) # continuous predictor
alpha <- -0.5 # intercept
beta <- 0.15 # small effect
# compute the mean parameter for each predictor value
mu <- alpha + beta * x
# generate the response for each observation
y <- rnorm(N, mean = mu, sd = 1)
We quickly verify that our simulated data correspond to the desired
settings (up to a random error) with the lm
function.
summary(lm(y ~ x))
#>
#> Call:
#> lm(formula = y ~ x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.2960 -0.6832 0.0683 0.7448 2.3815
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.5487 0.1032 -5.315 6.7e-07 ***
#> x 0.1962 0.1010 1.943 0.0549 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.03 on 98 degrees of freedom
#> Multiple R-squared: 0.03708, Adjusted R-squared: 0.02725
#> F-statistic: 3.774 on 1 and 98 DF, p-value: 0.05494
We consider two following models:
where \(g()\) characterizes our hypothesis about the degree of the effect. In our example, we specify a simple two-sided hypothesis represented by a normal distribution with mean 0 and standard deviation 0.5, e.g., \(\beta \sim \text{Normal}(0, 0.5^2)\).
First, we compute the Bayes factor model comparison via marginal likelihoods. To do that, we need to specify the likelihood for the response variable \(y\),
where mu
corresponds to the mean parameter (that we
specify via a formula in the next step) and sigma
to a
standard deviation of the response variable (that we treat as a nuisance
parameter here).
We specify formulas for the mu
parameter of each of the
considered models,
where 1
corresponds to the intercept (it is not
necessary for the second model as it is included by default).
To finish the model specification, we set the prior distribution
corresponding to our hypothesis test of the beta parameter, set a broad
prior distributions for the nuisance intercept and sigma parameters, and
create a list containing data for the model specified within
model_likelihood
(in the first step) and a data frame for
the data contained within the formula for mu within
formula_M0
and formula_M1
(specified in the
second step).
# prior on the test parameter
prior_beta <- prior(distribution = "normal", parameters = list(mean = 0, sd = 0.5))
# priors on the nuisance parameters
prior_int <- prior(distribution = "normal", parameters = list(mean = 0, sd = 5))
prior_sigma <- prior(distribution = "normal", parameters = list(mean = 0, sd = 5), truncation = list(0, Inf))
# the data list
data_list <- list(
y = y,
N = N
)
data_formula <- data.frame(
x = x
)
We estimate the models with the JAGS_fit
function. Since
we are using the formula interface (which allows us to specify multiple
formulas for different parameters), we need to pass the arguments as
named lists,
M0 <- JAGS_fit(
# specification for the `model_likelihood` part
model_syntax = model_likelihood,
data = list(y = y, N = N),
prior_list = list("sigma" = prior_sigma),
# specification for the `formula_M0` part
formula_list = formula_M0,
formula_prior_list = list("mu" = list("intercept" = prior_int)),
formula_data_list = list("mu" = data_formula),
# seed for reproducibility
seed = 0
)
M1 <- JAGS_fit(
model_syntax = model_likelihood,
data = list(y = y, N = N),
prior_list = list("sigma" = prior_sigma),
formula_list = formula_M1,
formula_prior_list = list("mu" = list("intercept" = prior_int, "x" = prior_beta)),
formula_data_list = list("mu" = data_formula),
seed = 1
)
We quickly verify that our parameter estimates (from the full model)
are similar to the frequentist results obtained via lm
function earlier.
JAGS_estimates_table(M1)
#> Mean SD lCI Median uCI error(MCMC) error(MCMC)/SD
#> (mu) intercept -0.547 0.104 -0.753 -0.548 -0.341 0.00081 0.008
#> (mu) x 0.188 0.101 -0.012 0.189 0.385 0.00080 0.008
#> sigma 1.042 0.076 0.905 1.037 1.205 0.00078 0.010
#> ESS R-hat
#> (mu) intercept 16408 1.000
#> (mu) x 15964 1.000
#> sigma 9502 1.000
To obtain the marginal likelihoods and compute Bayes factors, we only
need to write the likelihood function corresponding to the JAGS model.
Importantly, BayesTools handles all priors and formula related
computation automatically, in other words, we do not need to worry about
computing the mean parameter based on the intercept and predictors since
we already obtain the computed mu in the parameters[["mu"]]
object (a vector with a value for each y),
log_posterior <- function(parameters, data){
sum(dnorm(
x = data[["y"]],
mean = parameters[["mu"]],
sd = parameters[["sigma"]],
log = TRUE
))
}
where the parameters
arguments is a list containing the
parameters and data
argument is a list containing data. We
use sum(dnorm(..., log = TRUE))
to sum the logarithmic
likelihood of all observations.
Finally, we pass our objects to the JAGS_bridgesampling
function to compute the marginal likelihoods.
marglik_model_H0 <- JAGS_bridgesampling(
# specification for the model part
fit = M0,
log_posterior = log_posterior,
data = list(y = y, N = N),
prior_list = list("sigma" = prior_sigma),
# specification for the formula` part
formula_list = formula_M0,
formula_prior_list = list("mu" = list("intercept" = prior_int)),
formula_data_list = list("mu" = data_formula)
)
marglik_model_H1 <- JAGS_bridgesampling(
fit = M1,
log_posterior = log_posterior,
data = list(y = y, N = N),
prior_list = list("sigma" = prior_sigma),
formula_list = formula_M1,
formula_prior_list = list("mu" = list("intercept" = prior_int, "x" = prior_beta)),
formula_data_list = list("mu" = data_formula),
)
We specify a BayesTools model ensemble object that we interrogate
with the ensemble_inference_table
function for information
about the test for the beta parameter.
models_list <- models_inference(list(
list(model = M0, marglik = marglik_model_H0, prior_weights = 1/2),
list(model = M1, marglik = marglik_model_H1, prior_weights = 1/2)
))
ensemble_info <- ensemble_inference(models_list, parameters = "x", is_null_list = list("x" = c(TRUE, FALSE)))
ensemble_inference_table(ensemble_info, parameters = "x")
#> Models Prior prob. Post. prob. Inclusion BF
#> x 1/2 0.500 0.541 1.181
We find absence of evidence for either of the hypotheses, \(\text{BF}_{10} = 1.181\), with posterior
probability of \(P(\mathcal{M}_{1} \mid
\text{data}) = 0.542\) (asuming equal prior probability specified
via prior_weights
in the models_inference
function previously).
The Kuo & Mallick (1998)’s spike an slab prior distribution is specified as a mixture of two prior distributions. A spike, a parameter value of zero corresponding to no effect, and a slab, a parameter value sampled from a continuous density corresponding to the alternative hypothesis. The parametrization uses two independent prior distributions: one for the parameter value, \(\beta \sim g()\), and one for the inclusion indicator, \(I_\beta \sim f()\), which assigns the prior model probability \(P(\mathcal{M}_{1})\) of inclusion.
The inclusion indicator can attain one of two values: either zero or one. Multiplying the parameter with the inclusion indicator, \(\beta I_\beta\), then results in setting the parameter to zero when the indicator is zero, or keeping its original value when the indicator is one. The proportion of times the indicator attains the value of one then corresponds to the posterior inclusion probability of the predictor, \(P(\mathcal{M}_{1} \mid \text{data})\). Since Bayes factor can be written as the change from prior to posterior odds,
\(\text{BF}_{10} = \frac{p(\mathcal{M}_{1} \mid \text{data})}{p(\mathcal{M}_{0} \mid \text{data})} / \frac{p(\mathcal{M}_{1})}{p(\mathcal{M}_{0})}\),
we can also estimate the Bayes factor via the inclusion indicator.
Now, we compare the two models using the spike and slab prior. We
have already specified the likelihood, data lists, prior distributions
for the nuisance parameters, and even the formulas (now we need only
formula for the full model) in the previous sections. Therefore, we
proceed directly by specifying the spike and slab prior distribution for
the predictor. We use the prior_spike_and_slab
which
follows similar notation as the prior
function. We need to
specify the distribution, the parameters (and we could also set
truncation if needed) corresponding to the alternative hypothesis.
Furthermore, we need to specify the prior distribution for the inclusion
via the prior_inclusion
argument. Here, we use a \(\text{Spike}(0.5)\) prior which sets the
prior inclusion probability to 1/2.
prior_beta_spike_and_slab <- prior_spike_and_slab(
prior_parameter = prior(distribution = "normal", parameters = list(mean = 0, sd = 0.5)),
prior_inclusion = prior(distribution = "spike", parameters = list(location = 0.5))
)
Then we can directly proceed to calling the JAGS_fit
function with the same specification as we used for the M1
model, however, changing the prior distribution object for the predictor
x
to the newly created
prior_beta_spike_and_slab
prior distribution.
MS <- JAGS_fit(
model_syntax = model_likelihood,
data = list(y = y, N = N),
prior_list = list("sigma" = prior_sigma),
formula_list = formula_M1,
formula_prior_list = list("mu" = list("intercept" = prior_int, "x" = prior_beta_spike_and_slab)),
formula_data_list = list("mu" = data_formula),
seed = 1
)
We can again verify that our parameter estimates match the previous
results. Now, we need to set the conditional = TRUE
argument in the JAGS_estimates_table
to obtain samples
assuming that the spike and slab parameter values are included. (The
function summarized the complete posterior distribution by default,
i.e., parameter estimates model-averaged across the spike and the
slab.)
JAGS_estimates_table(MS, conditional = TRUE)
#> Mean SD lCI Median uCI error(MCMC) error(MCMC)/SD
#> (mu) intercept -0.540 0.106 -0.749 -0.540 -0.333 0.00085 0.008
#> (mu) x 0.189 0.100 -0.009 0.190 0.385 0.00145 0.012
#> (mu) x (inclusion) 0.500 0.000 0.500 0.500 0.500 NA NA
#> sigma 1.050 0.076 0.912 1.047 1.211 0.00081 0.011
#> ESS R-hat
#> (mu) intercept 15590 1.000
#> (mu) x 6830 1.000
#> (mu) x (inclusion) NA NA
#> sigma 8894 1.000
The estimates are essentially identical to the estimates from the
previous models. Finally, we can also obtain summary of the inclusion
probabilities via the JAGS_inference_table
function
JAGS_inference_table(MS)
#> Parameter Prior prob. Post. prob. Inclusion BF
#> (mu) x 0.500 0.536 1.157
As before, we find absence of evidence for either of the hypotheses, \(\text{BF}_{10} = 1.157\), with posterior probability of \(P(\mathcal{M}_{1} \mid \text{data}) = 0.536\).