Bayes factors via spike and slab prior vs. bridge sampling

One of the main features of BayesTools is assistance in generating JAGS (Plummer, 2003) code based on formulas and prior distribution objects and subsequent estimation of marginal likelihoods with the bridgesampling R package (Gronau et al., 2020). Marginal likelihoods, \(p(\text{data} \mid \mathcal{M})\), are the key ingredient for computing Bayes factors,

\(\text{BF}_{10} = \frac{p(\text{data} \mid \mathcal{M}_{1})}{p(\text{data} \mid \mathcal{M}_{0})}\),

which quantify relative predictive performance of two competing models (Kass & Raftery, 1995; Rouder & Morey, 2019; Wrinch & Jeffreys, 1921). Convenient model specification then allows users and package developers to easily compute Bayes factors and test a wide range of informed hypotheses. See RoBMA (Bartoš & Maier, 2020) and (Bartoš, 2022) R packages for implementation examples.

However, when considering a simple regression, the model space of all possible models increases exponentially with additional predictors. I.e., the possibility of including vs. excluding \(k\) predictors leads to \(2^k\) possible submodels that need to be computed. Even relatively computationally simple models (e.g. ~ 1 min of computation) with 10 possible predictors would result in more than 17 hours of computation. Therefore, we might require more computationally efficient methods when performing variable selection with more than a few covariates. In this vignette, I showcase how to use BayesTools package to specify spike and slab priors that aim to explore most of the model space and obtain posterior inclusion probabilities for each predictor within a single MCMC run (Kuo & Mallick, 1998; O’Hara & Sillanpää, 2009).

Simulated Data

To keep it simple, let’s consider a linear regression with one continuous predictor \(x\). We simulate \(N = 100\) observations of a dependent variable \(y\) under the presence of a small effect \(\beta\) of a continuous predictor \(x\).

set.seed(-68) # set seed for reproducibility

N     <- 100      # number of observations
x     <- rnorm(N) # continuous predictor
alpha <- -0.5     # intercept
beta  <- 0.15     # small effect

# compute the mean parameter for each predictor value
mu <- alpha + beta * x

# generate the response for each observation 
y  <- rnorm(N, mean = mu, sd = 1)

We quickly verify that our simulated data correspond to the desired settings (up to a random error) with the lm function.

summary(lm(y ~ x))
#> 
#> Call:
#> lm(formula = y ~ x)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2.2960 -0.6832  0.0683  0.7448  2.3815 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  -0.5487     0.1032  -5.315  6.7e-07 ***
#> x             0.1962     0.1010   1.943   0.0549 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.03 on 98 degrees of freedom
#> Multiple R-squared:  0.03708,    Adjusted R-squared:  0.02725 
#> F-statistic: 3.774 on 1 and 98 DF,  p-value: 0.05494

Model Specification

We consider two following models:

\(\mathcal{M}_0\): \(\beta = 0\)
and \(\mathcal{M}_1\): \(\beta \sim g()\),

where \(g()\) characterizes our hypothesis about the degree of the effect. In our example, we specify a simple two-sided hypothesis represented by a normal distribution with mean 0 and standard deviation 0.5, e.g., \(\beta \sim \text{Normal}(0, 0.5^2)\).

Maginal Likelihoods

First, we compute the Bayes factor model comparison via marginal likelihoods. To do that, we need to specify the likelihood for the response variable \(y\),

model_likelihood <- 
"model{
  for(i in 1:N){
    y[i] ~ dnorm(mu[i], pow(sigma, -2))
  }
}
"

where mu corresponds to the mean parameter (that we specify via a formula in the next step) and sigma to a standard deviation of the response variable (that we treat as a nuisance parameter here).

We specify formulas for the mu parameter of each of the considered models,

formula_M0 <- list("mu" = ~ 1)
formula_M1 <- list("mu" = ~ 1 + x)

where 1 corresponds to the intercept (it is not necessary for the second model as it is included by default).

To finish the model specification, we set the prior distribution corresponding to our hypothesis test of the beta parameter, set a broad prior distributions for the nuisance intercept and sigma parameters, and create a list containing data for the model specified within model_likelihood (in the first step) and a data frame for the data contained within the formula for mu within formula_M0 and formula_M1 (specified in the second step).

# prior on the test parameter
prior_beta  <- prior(distribution = "normal", parameters = list(mean = 0, sd = 0.5))

# priors on the nuisance parameters
prior_int   <- prior(distribution = "normal", parameters = list(mean = 0, sd = 5))
prior_sigma <- prior(distribution = "normal", parameters = list(mean = 0, sd = 5), truncation = list(0, Inf))

# the data list
data_list <- list(
  y = y,
  N = N
)
data_formula <- data.frame(
  x = x
)

We estimate the models with the JAGS_fit function. Since we are using the formula interface (which allows us to specify multiple formulas for different parameters), we need to pass the arguments as named lists,

M0 <- JAGS_fit(
  # specification for the `model_likelihood` part
  model_syntax = model_likelihood,
  data         = list(y = y, N = N),
  prior_list   = list("sigma" = prior_sigma),

  # specification for the `formula_M0` part 
  formula_list       = formula_M0,
  formula_prior_list = list("mu" = list("intercept" = prior_int)),
  formula_data_list  = list("mu" = data_formula),
  
  # seed for reproducibility
  seed         = 0
)

M1 <- JAGS_fit(
  model_syntax = model_likelihood,
  data         = list(y = y, N = N),
  prior_list   = list("sigma" = prior_sigma),
  formula_list       = formula_M1,
  formula_prior_list = list("mu" = list("intercept" = prior_int, "x" = prior_beta)),
  formula_data_list  = list("mu" = data_formula),
  seed         = 1
)

We quickly verify that our parameter estimates (from the full model) are similar to the frequentist results obtained via lm function earlier.

JAGS_estimates_table(M1)
#>                  Mean    SD    lCI Median    uCI error(MCMC) error(MCMC)/SD
#> (mu) intercept -0.547 0.104 -0.753 -0.548 -0.341     0.00081          0.008
#> (mu) x          0.188 0.101 -0.012  0.189  0.385     0.00080          0.008
#> sigma           1.042 0.076  0.905  1.037  1.205     0.00078          0.010
#>                  ESS R-hat
#> (mu) intercept 16408 1.000
#> (mu) x         15964 1.000
#> sigma           9502 1.000

To obtain the marginal likelihoods and compute Bayes factors, we only need to write the likelihood function corresponding to the JAGS model. Importantly, BayesTools handles all priors and formula related computation automatically, in other words, we do not need to worry about computing the mean parameter based on the intercept and predictors since we already obtain the computed mu in the parameters[["mu"]] object (a vector with a value for each y),

log_posterior <- function(parameters, data){
  sum(dnorm(
    x    = data[["y"]],
    mean = parameters[["mu"]],
    sd   = parameters[["sigma"]],
    log  = TRUE
  ))
}

where the parameters arguments is a list containing the parameters and data argument is a list containing data. We use sum(dnorm(..., log = TRUE)) to sum the logarithmic likelihood of all observations.

Finally, we pass our objects to the JAGS_bridgesampling function to compute the marginal likelihoods.

marglik_model_H0 <- JAGS_bridgesampling(
  # specification for the model part
  fit           = M0,
  log_posterior = log_posterior,
  data          = list(y = y, N = N),
  prior_list    = list("sigma" = prior_sigma),

  # specification for the formula` part 
  formula_list       = formula_M0,
  formula_prior_list = list("mu" = list("intercept" = prior_int)),
  formula_data_list  = list("mu" = data_formula)
)

marglik_model_H1 <- JAGS_bridgesampling(
  fit           = M1,
  log_posterior = log_posterior,
  data          = list(y = y, N = N),
  prior_list    = list("sigma" = prior_sigma),
  formula_list       = formula_M1,
  formula_prior_list = list("mu" = list("intercept" = prior_int, "x" = prior_beta)),
  formula_data_list  = list("mu" = data_formula),
)

We specify a BayesTools model ensemble object that we interrogate with the ensemble_inference_table function for information about the test for the beta parameter.

models_list <- models_inference(list(
  list(model = M0, marglik = marglik_model_H0, prior_weights = 1/2),
  list(model = M1, marglik = marglik_model_H1, prior_weights = 1/2)
))
ensemble_info <- ensemble_inference(models_list, parameters = "x", is_null_list = list("x" = c(TRUE, FALSE)))

ensemble_inference_table(ensemble_info, parameters = "x")
#>   Models Prior prob. Post. prob. Inclusion BF
#> x    1/2       0.500       0.541        1.181

We find absence of evidence for either of the hypotheses, \(\text{BF}_{10} = 1.181\), with posterior probability of \(P(\mathcal{M}_{1} \mid \text{data}) = 0.542\) (asuming equal prior probability specified via prior_weights in the models_inference function previously).

Spike and Slab Priors

The Kuo & Mallick (1998)’s spike an slab prior distribution is specified as a mixture of two prior distributions. A spike, a parameter value of zero corresponding to no effect, and a slab, a parameter value sampled from a continuous density corresponding to the alternative hypothesis. The parametrization uses two independent prior distributions: one for the parameter value, \(\beta \sim g()\), and one for the inclusion indicator, \(I_\beta \sim f()\), which assigns the prior model probability \(P(\mathcal{M}_{1})\) of inclusion.

The inclusion indicator can attain one of two values: either zero or one. Multiplying the parameter with the inclusion indicator, \(\beta I_\beta\), then results in setting the parameter to zero when the indicator is zero, or keeping its original value when the indicator is one. The proportion of times the indicator attains the value of one then corresponds to the posterior inclusion probability of the predictor, \(P(\mathcal{M}_{1} \mid \text{data})\). Since Bayes factor can be written as the change from prior to posterior odds,

\(\text{BF}_{10} = \frac{p(\mathcal{M}_{1} \mid \text{data})}{p(\mathcal{M}_{0} \mid \text{data})} / \frac{p(\mathcal{M}_{1})}{p(\mathcal{M}_{0})}\),

we can also estimate the Bayes factor via the inclusion indicator.

Now, we compare the two models using the spike and slab prior. We have already specified the likelihood, data lists, prior distributions for the nuisance parameters, and even the formulas (now we need only formula for the full model) in the previous sections. Therefore, we proceed directly by specifying the spike and slab prior distribution for the predictor. We use the prior_spike_and_slab which follows similar notation as the prior function. We need to specify the distribution, the parameters (and we could also set truncation if needed) corresponding to the alternative hypothesis. Furthermore, we need to specify the prior distribution for the inclusion via the prior_inclusion argument. Here, we use a \(\text{Spike}(0.5)\) prior which sets the prior inclusion probability to 1/2.

prior_beta_spike_and_slab <- prior_spike_and_slab(
  prior_parameter = prior(distribution = "normal", parameters = list(mean = 0, sd = 0.5)),
  prior_inclusion = prior(distribution = "spike", parameters = list(location = 0.5)) 
)

Then we can directly proceed to calling the JAGS_fit function with the same specification as we used for the M1 model, however, changing the prior distribution object for the predictor x to the newly created prior_beta_spike_and_slab prior distribution.

MS <- JAGS_fit(
  model_syntax = model_likelihood,
  data         = list(y = y, N = N),
  prior_list   = list("sigma" = prior_sigma),
  formula_list       = formula_M1,
  formula_prior_list = list("mu" = list("intercept" = prior_int, "x" = prior_beta_spike_and_slab)),
  formula_data_list  = list("mu" = data_formula),
  seed         = 1
)

We can again verify that our parameter estimates match the previous results. Now, we need to set the conditional = TRUE argument in the JAGS_estimates_table to obtain samples assuming that the spike and slab parameter values are included. (The function summarized the complete posterior distribution by default, i.e., parameter estimates model-averaged across the spike and the slab.)

JAGS_estimates_table(MS, conditional = TRUE)
#>                      Mean    SD    lCI Median    uCI
#> (mu) intercept     -0.540 0.106 -0.749 -0.540 -0.333
#> (mu) x (inclusion)  0.536    NA     NA     NA     NA
#> (mu) x              0.189 0.100 -0.009  0.190  0.385
#> sigma               1.050 0.076  0.912  1.047  1.211

The estimates are essentially identical to the estimates from the previous models. Finally, we can also obtain summary of the inclusion probabilities via the JAGS_inference_table function

JAGS_inference_table(MS)
#>        Prior prob. Post. prob. Inclusion BF
#> (mu) x       0.500       0.536        1.157

As before, we find absence of evidence for either of the hypotheses, \(\text{BF}_{10} = 1.157\), with posterior probability of \(P(\mathcal{M}_{1} \mid \text{data}) = 0.536\).

References

Bartoš, F. (2022). RoBSA: An R package for robust Bayesian survival-analyses. https://CRAN.R-project.org/package=RoBSA

Bartoš, F., & Maier, M. (2020). RoBMA: An R package for robust Bayesian meta-analyses. https://CRAN.R-project.org/package=RoBMA

Gronau, Q. F., Singmann, H., & Wagenmakers, E.-J. (2020). bridgesampling: An R package for estimating normalizing constants. Journal of Statistical Software, 92(10), 1–29. https://doi.org/10.18637/jss.v092.i10

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795. https://doi.org/10.1080/01621459.1995.10476572

Kuo, L., & Mallick, B. (1998). Variable selection for regression models. Sankhyā: The Indian Journal of Statistics, Series B, 65–81.

O’Hara, R. B., & Sillanpää, M. J. (2009). A review of bayesian variable selection methods: What, how and which. Bayesian Analysis, 4(1), 85–117. https://doi.org/10.1214/09-BA403

Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing, 124, 1–10.

Rouder, J. N., & Morey, R. D. (2019). Teaching Bayes’ theorem: Strength of evidence as predictive accuracy. The American Statistician, 73(2), 186–190. https://doi.org/10.1080/00031305.2017.1341334

Wrinch, D., & Jeffreys, H. (1921). On certain fundamental principles of scientific inquiry. Philosophical Magazine, 42, 369–390. https://doi.org/10.1080/14786442108633773