MultiLevelOptimalBayes (MLOB) is designed for estimating two-level latent variable models, particularly in small sample settings. This is especially useful in psychology, education, and other fields with hierarchical or nested data structures. We present the R package MultiLevelOptimalBayes (MLOB) for estimating between-group effects in multilevel latent variable models. MLOB employs a regularised Bayesian estimator devised by Dashuk, Hecht, Luedtke, Robitzsch, and Zitzmann (2024), which was subsequently enhanced for additional covariates by the same authors in 2025. This estimator chooses prior parameters to minimise the mean squared error (MSE) of the between-group effect by effectively balancing bias and variance. The regularised Bayesian estimator provides MSE-optimal estimations due to the mean-variance tradeoff, especially in scenarios of small sample sizes and poor intraclass correlation (ICC). The MLOB software supports imbalanced group sizes through integrated data-balancing methods and offers comprehensive inference, including point estimates, standard errors, p-values, and confidence intervals for both primary regressors and covariates. To gain comprehensive understanding, we initially examine the theoretical underpinnings of the regularised Bayesian estimator (Dashuk et al. 2024, 2025), followed by a discussion of its implementation in MLOB, namely the core function mlob(). We illustrate the application of mlob() using real datasets. Consequently, we provide researchers in psychology, education, and related disciplines a robust, user-friendly instrument for dependable multilevel latent variable estimation, particularly in contexts characterised by small sample sizes and low ICCs.
The core function mlob()
estimates the between-group
coefficient (beta_b
) using a regularized Bayesian approach,
and applies a data balancing procedure if the groups are unbalanced.
Below is the signature for the mlob()
function. This
shows the arguments an theit default values you can pass, but
note that this chunk is not meant to be executed.
mlob(
formula,
data,
group,
balancing.limit = 0.2,
conf.level = 0.95,
jackknife = FALSE,
punish.coeff = 2,
...
)
Arguments:
formula
: A formula (e.g., Y ~ X + C
) where
Y is the outcome, X is the context variable of interest, and C
represents covariates.data
: A data.frame containing all variables in the
formula and the grouping variable.group
: A string naming the grouping variable.balancing.limit
: Proportion (0-1) of the dataset that
can be removed to balance group sizes. Default is 0.2.conf.level
: Confidence level for confidence intervals.
Default is 0.95 (95% CI).jackknife
: Logical. If TRUE, standard errors and CIs
are computed via jackknife resampling. Default is FALSE.punish.coeff
: Multiplier for penalizing removal of
entire groups during balancing. Higher values discourage full-group
deletions.Balancing Procedure: The mlob() function also verifies whether the data is balanced, that is each group consist of exactly the same number of individuals. If the data is unbalanced, the balancing procedure comes into effect and identifies the optimal number of individuals and groups to delete based on the punishment coefficient. If the amount of data to be deleted is more than the threshold (balancing.limit), the regularized Bayesian estimator is not calculated and mlob() produces an error. This forces the user to increase the balancing limit manually and warns that the results should be interpreted with caution. # Examples
result_iris <- mlob(
Sepal.Length ~ Sepal.Width + Petal.Length,
data = iris,
group = "Species",
conf.level = 0.99,
jackknife = FALSE
)
summary(result_iris)
#> Call:
#> mlob(Sepal.Length ~ Sepal.Width + Petal.Length, data = iris, group = Species, conf.level = 0.99, jackknife = FALSE)
#>
#> Summary of Coefficients:
#> Estimate Std. Error Lower CI (99%) Upper CI (99%) Z value
#> beta_b 0.8308711 1.4655556 -2.9441499 4.605892 0.5669325
#> gamma_Petal.Length 0.4679522 0.2582579 -0.1972762 1.133181 1.8119567
#> Pr(>|z|) Significance
#> beta_b 0.57076004
#> gamma_Petal.Length 0.06999289 .
#>
#>
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#> Estimate Std. Error Lower CI (99%) Upper CI (99%)
#> beta_b 0.6027440 5.424780e+15 -1.397331e+16 1.397331e+16
#> gamma_Petal.Length 0.4679522 2.582579e-01 -1.972762e-01 1.133181e+00
#> Z value Pr(>|z|) Significance
#> beta_b 1.111094e-16 1.00000000
#> gamma_Petal.Length 1.811957e+00 0.06999289 .
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Note:
#> The standard error from unoptimized ML estimation is about 3.701518e+17% larger than the standard error obtained through our optimization procedure,
#> meaning that the optimized estimates are more accurate.
#> Concerning the estimates themselves, the unoptimized ML estimates may
#> differ greatly from the optimized estimates and should not be reported.
#> As the optimized estimates are always at least as accurate as the
#> unoptimized ML estimates,
#> please use them and their corresponding standard errors (first table of
#> output) for interpretation and reporting.
#> For more information, see Dashuk et al. (2025).
result_chick <- mlob(
weight ~ Time,
data = ChickWeight,
group = "Diet",
punish.coeff = 1.5,
jackknife = FALSE
)
print(result_chick)
#> Call:
#> mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE)
#>
#> Coefficients
#> beta_b
#> -392.9573
#>
#> Standard_Error
#> beta_b
#> 335.9924
#>
#> Confidence_Interval (95%)
#> Lower Upper
#> beta_b -1051.49 265.5757
#>
#> Z value
#> beta_b
#> -1.169542
#>
#> p value
#> beta_b
#> 0.2421852
summary(result_chick)
#> Call:
#> mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE)
#>
#> Summary of Coefficients:
#> Estimate Std. Error Lower CI (95%) Upper CI (95%) Z value Pr(>|z|)
#> beta_b -392.9573 335.9924 -1051.49 265.5757 -1.169542 0.2421852
#> Significance
#> beta_b
#>
#>
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#> Estimate Std. Error Lower CI (95%) Upper CI (95%) Z value Pr(>|z|)
#> beta_b -57.92977 169.8339 -390.798 274.9385 -0.3410967 0.7330308
#> Significance
#> beta_b
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Note:
** Interperation of the results for Chickenweight Dataset **
Estimated beta_b and additional covariates effects, denoted as gamma_Diet3 and gamma_Diet4, are included in the end result table. We did not incorporate any additional covariates into the mlob() formula. However, due to the fact that Diet is a four-level factor, mlob() considers Diet 1 to be the primary regressor (X), excludes Diet 2 from the design matrix to prevent multicollinearity, and reports the remaining levels as gamma_Diet3 and gamma_Diet4, using Diet 2 as the reference. The results of the estimation indicate a substantial distinction between the regularised Bayesian estimator (first table) and the ML estimator (second table). Beta_b’s regularised Bayesian estimator is 5.108, with a statistically significant p-value of 0.0139. This implies that the average weight of chicks on Diet 1 is approximately 5g more than that of chicks on Diet 2 at each time point. In contrast, the ML estimate of the between-group effect beta_b is significantly larger (13.993) but not statistically significant (p value of 0.3910), demonstrating how standard ML can overstate the group-level effect when data is scarce. The positive estimate of gamma_Diet3 is 41.69 and is highly significant for both estimators (p = 0.0048 in MultiLevelOptimalBayes (MLOB)). This suggests that chicks on Diet 3 have a significant increase in weight in comparison to the baseline group. Gamma_Diet4 has an estimate of 29.64, but its p-value of 0.0504 is marginally significant at the 5% level. This implies that Diet 4 has a positive impact on weight; however, the magnitude of this effect is less significant than that of Diet 3.
result_mtcars <- mlob(
mpg ~ hp + wt + am + hp:wt + hp:am,
data = mtcars,
group = "cyl",
balancing.limit = 0.35
)
summary(result_mtcars)
#> Call:
#> mlob(mpg ~ hp + wt + am + hp:wt + hp:am, data = mtcars, group = cyl, balancing.limit = 0.35, conf.level = 0.95)
#>
#> Summary of Coefficients:
#> Estimate Std. Error Lower CI (95%) Upper CI (95%) Z value
#> beta_b -10.14396628 9.35801453 -28.48533774 8.1974052 -1.0839870
#> gamma_wt -7.35030630 11.52439853 -29.93771236 15.2370998 -0.6378039
#> gamma_am 2.08029569 9.58981281 -16.71539204 20.8759834 0.2169277
#> gamma_hp:wt 0.02030139 0.05611299 -0.08967805 0.1302808 0.3617948
#> gamma_hp:am -0.01091929 0.07222809 -0.15248375 0.1306452 -0.1511779
#> Pr(>|z|) Significance
#> beta_b 0.2783706
#> gamma_wt 0.5236013
#> gamma_am 0.8282647
#> gamma_hp:wt 0.7175054
#> gamma_hp:am 0.8798354
#>
#>
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#> Estimate Std. Error Lower CI (95%) Upper CI (95%)
#> beta_b -0.10122425 4.530593e+12 -8.879800e+12 8.879800e+12
#> gamma_wt -7.35030630 1.152440e+01 -2.993771e+01 1.523710e+01
#> gamma_am 2.08029569 9.589813e+00 -1.671539e+01 2.087598e+01
#> gamma_hp:wt 0.02030139 5.611299e-02 -8.967805e-02 1.302808e-01
#> gamma_hp:am -0.01091929 7.222809e-02 -1.524838e-01 1.306452e-01
#> Z value Pr(>|z|) Significance
#> beta_b -2.234238e-14 1.0000000
#> gamma_wt -6.378039e-01 0.5236013
#> gamma_am 2.169277e-01 0.8282647
#> gamma_hp:wt 3.617948e-01 0.7175054
#> gamma_hp:am -1.511779e-01 0.8798354
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Note:
#> The standard error from unoptimized ML estimation is about 4.841404e+13% larger than the standard error obtained through our optimization procedure,
#> meaning that the optimized estimates are more accurate.
#> Concerning the estimates themselves, the unoptimized ML estimates may
#> differ greatly from the optimized estimates and should not be reported.
#> As the optimized estimates are always at least as accurate as the
#> unoptimized ML estimates,
#> please use them and their corresponding standard errors (first table of
#> output) for interpretation and reporting.
#> For more information, see Dashuk et al. (2025).
The output is an object of class mlob_result
, which
contains:
beta_b
and gamma values)The mlob_result
object supports a comprehensive set of
S3 methods that follow standard R conventions, making it easy to work
with results in familiar ways. Here are all available methods:
# Get a basic result for demonstration
result <- mlob(weight ~ Time, data = ChickWeight, group = 'Diet', jackknife = FALSE)
# Print method - displays coefficients, standard errors, confidence intervals, Z-values, and p-values
print(result)
#> Call:
#> mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE)
#>
#> Coefficients
#> beta_b
#> -11.96512
#>
#> Standard_Error
#> beta_b
#> 234.4558
#>
#> Confidence_Interval (95%)
#> Lower Upper
#> beta_b -471.4901 447.5598
#>
#> Z value
#> beta_b
#> -0.0510336
#>
#> p value
#> beta_b
#> 0.9592987
# Summary method - comprehensive summary with significance stars and comparison to unoptimized ML
summary(result)
#> Call:
#> mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE)
#>
#> Summary of Coefficients:
#> Estimate Std. Error Lower CI (95%) Upper CI (95%) Z value Pr(>|z|)
#> beta_b -11.96512 234.4558 -471.4901 447.5598 -0.0510336 0.9592987
#> Significance
#> beta_b
#>
#>
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#> Estimate Std. Error Lower CI (95%) Upper CI (95%) Z value Pr(>|z|)
#> beta_b -3.806336 161.6417 -320.6182 313.0055 -0.02354798 0.9812132
#> Significance
#> beta_b
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Note:
# Extract coefficients as a data frame
coef(result)
#> beta_b
#> 1 -11.96512
# Extract standard errors
se(result)
#> beta_b
#> 234.4558
# Extract variance-covariance matrix (diagonal only)
vcov(result)
#> beta_b
#> 54969.53
# Extract confidence intervals
confint(result)
#> 2.5% 97.5%
#> beta_b -471.4901 447.5598
# Extract confidence intervals for specific parameters
confint(result, "beta_b")
#> 2.5% 97.5%
#> beta_b -471.4901 447.5598
# Extract confidence intervals with different confidence level
confint(result, level = 0.99)
#> 0.5% 99.5%
#> beta_b -615.8833 591.953
# Convert results to a data frame format
as.data.frame(result)
#> Estimate Std. Error Lower CI (95%) Upper CI (95%) Z value Pr(>|z|)
#> beta_b -11.96512 234.4558 -471.4901 447.5598 -0.0510336 0.9592987
# Get dimensions (number of parameters)
dim(result)
#> [1] 1 1
# Get number of parameters
length(result)
#> [1] 1
# Get parameter names
names(result)
#> [1] "beta_b"
# Update model with new parameters (e.g., different confidence level)
updated_result <- update(result, conf.level = 0.99)
summary(updated_result)
#> Call:
#> mlob(weight ~ Time, data = data, group = Diet, conf.level = 0.99, jackknife = FALSE)
#>
#> Summary of Coefficients:
#> Estimate Std. Error Lower CI (99%) Upper CI (99%) Z value Pr(>|z|)
#> beta_b -11.96512 234.4558 -615.8833 591.953 -0.0510336 0.9592987
#> Significance
#> beta_b
#>
#>
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#> Estimate Std. Error Lower CI (99%) Upper CI (99%) Z value Pr(>|z|)
#> beta_b -3.806336 161.6417 -420.1677 412.5551 -0.02354798 0.9812132
#> Significance
#> beta_b
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Note:
You can discover all available methods for mlob_result
objects using:
Here’s a practical example showing how to use multiple methods together:
# Run analysis
result <- mlob(weight ~ Time, data = ChickWeight, group = 'Diet', jackknife = FALSE)
# Get basic information
cat("Number of parameters:", length(result), "\n")
#> Number of parameters: 1
cat("Parameter names:", paste(names(result), collapse = ", "), "\n")
#> Parameter names: beta_b
# Extract key statistics
coefficients <- coef(result)
standard_errors <- se(result)
confidence_intervals <- confint(result, level = 0.99)
# Create a custom summary table
custom_summary <- data.frame(
Parameter = names(result),
Estimate = as.numeric(coefficients),
SE = as.numeric(standard_errors),
CI_Lower = confidence_intervals[, 1],
CI_Upper = confidence_intervals[, 2]
)
print(custom_summary)
#> Parameter Estimate SE CI_Lower CI_Upper
#> 1 beta_b -290.836 476.4304 -1518.039 936.3675
All these methods follow standard R conventions, making your
mlob_result
objects compatible with existing R workflows
and familiar to users of other statistical packages.
While MultiLevelOptimalBayes
provides a robust solution
for regularized estimation in two-level models, users should be aware of
the following limitations:
Local Grid Search:
The optimization is restricted to a local 5*sigma region around the ML
estimate. Although this region captures the true MSE minimum with high
probability (>= 0.999 for J = 5), it does not guarantee
identification of the global minimum.
Assumption of Equal Group Sizes:
The estimator assumes equal group sizes to simplify the model. While
averaging group sizes is a proposed solution, the method does not yet
handle unbalanced group sizes natively, but find the optimal reduction
to the balanced size.
Jackknife Computational Cost:
Jackknife resampling improves interval coverage in small samples. Note
that it may be computationally demanding for larger samples.
Limited Model Scope:
Currently, MLOB only handles two-level models with continuous outcomes.
Extensions to support generalized linear mixed models, three- and
more-level structures, or multivariate outcomes are not yet
available.
Dashuk, V., Hecht, M., Luedtke, O., Robitzsch, A., & Zitzmann, S.
(2024).
An Optimally Regularized Estimator of Multilevel Latent Variable Models,
with Improved MSE Performance.
https://doi.org/10.13140/RG.2.2.18148.39048
Luedtke, O., Marsh, H. W., Robitzsch, A., et al. (2008).
The multilevel latent covariate model: A new, more reliable approach to
group-level effects in contextual studies.
https://doi.org/10.1037/a0012869