MPTmultiverse
provides a single function, fit_mpt()
, that performs a multiverse analysis for multinomial processing tree models. The idea of a multiverse analysis (Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016) is to perform and report all possible combinations of reasonable modeling choices. This contrasts with a more common analysis approach in which only one specific analysis (i.e., one path through a garden of forking paths) is reported.
fit_mpt
performs a multiverse analysis combining two different factors.
For the frequentist approaches, no pooling (with and without parametric or nonparametric bootstrap) and complete pooling are implemented using MPTinR
(Singmann & Kellen, 2013). For the Bayesian approaches, no pooling, complete pooling, and three different variants of partial pooling are implemented using TreeBUGS
(Heck, Arnold, & Arnold, 2017).
First of all, make sure that you have installed the latest version of R
, of all necessary R
packages, and of JAGS
. To install JAGS
, go to mcmc-jags.sourceforge.net and follow installation instructions. After that, install or update the required R
packages:
Here we use parts of the data from Experiment 1 reported in Bayen and Kuhlmann (2011) investigating source-memory of participants under two different cognitive load conditions. The 24 participants in the load
condition generated a random number between one and nine every time they heard a tone, which happened every 2 s during the entire study phase, and said it out loud. The 24 participants in the no_load
condition performed the study phase without a secondary task. Participants and both conditions performed the test phase in the same manner.
We use the same model as Bayen and Kuhlman (2011), model variant 4 of the 2-high threshold source-memory model (2HTSM) introduced by Bayen, Murnane, and Erdfelder (1996). Model variant 4 of the 2HTSM separates observed source-memory data into 4 parameters: Parameter \(D\) (item recognition); parameter \(d\) (source memory); parameter \(b\) (probability of guessing that an item is old); and parameter \(g\) (probability of guessing that an item comes from source A versus source B).
Both data and model come with package MPTmultiverse
so their location can be obtained using function system.file
. In other applications, the file paths need to be provided by the user to match the location of data and model on their file system. It often makes sense to set the working directory to the directory in which the data and model file resides, either via setwd()
or via the menu.
# load packages
library("MPTmultiverse")
# If you're running the analysis from an .rmd file, you only need to ensure that
# the .rmd, .eqn, and .csv files are all in the same directory.
# ------------------------------------------------------------------------------
# MPT model definition & data
EQN_FILE <- system.file("extdata", "2HTSM_Submodel4.eqn", package = "MPTmultiverse")
DATA_FILE <- system.file("extdata", "Kuhlmann_dl7.csv", package = "MPTmultiverse")
# if .csv format uses semicolons ";" (German format):
data <- read.csv2(DATA_FILE, fileEncoding = "UTF-8-BOM") ## use correct encoding if necessary
# if .csv format uses commata "," (international format):
# data <- read.csv(DATA_FILE, fileEncoding = "UTF-8-BOM")
# We first take a look at the data using head()
head(data)
#> Subject ExpCond Yee Yeu Yen Yue Yuu Yun Yne Ynu Ynn
#> 1 110 1 6 12 5 2 18 5 0 2 14
#> 2 138 1 11 8 6 3 16 4 2 2 12
#> 3 141 1 9 10 7 1 16 5 2 1 13
#> 4 149 1 10 9 4 0 19 6 0 0 16
#> 5 102 1 12 4 8 5 11 8 0 0 16
#> 6 105 1 14 2 5 1 19 7 2 0 14
## We then plot the response frequencies using plotFreq from the TreeBUGS package
TreeBUGS::plotFreq(data, boxplot = FALSE, eqn = EQN_FILE)
The look at the data.frame
tells us which columns contain the subject identifier and the variable encoding group membership. We need to record these variables for the use in fit_mpt
.
The plot of the individual response frequencies shows quite some individual variability, but nothing concerning.
Next, we prepare the data for the fitting.
COL_ID <- "Subject" # name of the variable encoding subject ID
COL_CONDITION <- "ExpCond" # name of the variable encoding group membership
# Experimental conditions should be labeled in a meaningful way. To accomplish
# this, you may want to use the `factor` function:
unique(data[, COL_CONDITION])
#> [1] 1 2
data[[COL_CONDITION]] <- factor(
data[[COL_CONDITION]]
, levels = c(1:2)
, labels = c("no_load", "load")
)
### check input data frame
head(data)
#> Subject ExpCond Yee Yeu Yen Yue Yuu Yun Yne Ynu Ynn
#> 1 110 no_load 6 12 5 2 18 5 0 2 14
#> 2 138 no_load 11 8 6 3 16 4 2 2 12
#> 3 141 no_load 9 10 7 1 16 5 2 1 13
#> 4 149 no_load 10 9 4 0 19 6 0 0 16
#> 5 102 no_load 12 4 8 5 11 8 0 0 16
#> 6 105 no_load 14 2 5 1 19 7 2 0 14
Every time the package MPTmultiverse
is loaded, it automatically sets some more or less useful defaults for model estimation, usage of multiple processor cores, number of posterior predictive samples, etc. By calling mpt_options()
without any parameters, you can inspect these default values. If you want to change them, call mpt_options
with the respective parameter specified, i.e. mpt_options(n.iter = 1000)
. For testing purposes, you can also specify mpt_options("test")
, which is a shorthand for setting fast, but highly unreliable settings. You can set options to defaults, again, by typing the shorthand mpt_options("default")
.
The main computations are performed with a call to fit_mpt
. In the default settings, the ten analysis options offered by MPTmultiverse
are performed. Type ?fit_mpt
in the R console if you want to see what these options are and find out more about the parameters of the function. The help page also contains a comprehensive overview of the results object returned by fit_mpt
.
Before fitting, we set a seed to make the analysis reproducible and set the options to the default settings.
set.seed(42)
mpt_options("default")
results <- fit_mpt(
model = EQN_FILE
, dataset = DATA_FILE
, data = data
, id = COL_ID
, condition = COL_CONDITION
, core = c("D", "d")
)
After fitting it is a good idea to save the results as a binary R
data file for later. This is easiest done using save()
. To save all information necessary to recreate the analysis we only need to save the results tibble
as it contains both data and model as attributes (see str(results, 1)
).
We can automatically create a file name for the file holding the results based on the model and data file.
In the current example this would usually lead to quite a long filename (e.g., see EQN_FILE
), so one can also use a custom filename.
One can also directly save in a subfolder of the current working directory (if the subfolder exists).
After computations finished, which may take between a couple of hours to days, check if model estimation worked by using the function check_results
.
check_results(results)
#> ## MPTinR: no pooling
#> Based on asymptotic method, proportion of participants with non-identified parameters:
#> condition core proportion
#> 1 load FALSE 0
#> 2 load TRUE 0.0417
#> 3 no_load FALSE 0.0208
#> 4 no_load TRUE 0.0208
#>
#> Based on asymptotic CIs, table of non-identified parameters:
#> condition core parameter n
#> 1 load TRUE d 2
#> 2 no_load FALSE g 1
#> 3 no_load TRUE d 1
#>
#> Based on PB/MLE method, proportion of participants with non-identified parameters:
#> condition core proportion
#> 1 load FALSE 0
#> 2 load TRUE 0.354
#> 3 no_load FALSE 0.0625
#> 4 no_load TRUE 0
#>
#> Based on PB/MLE CIs, table of non-identified parameters:
#> condition core parameter n
#> 1 load TRUE d 17
#> 2 no_load FALSE g 3
#>
#> Based on NPB/MLE method, proportion of participants with non-identified parameters:
#> condition core proportion
#> 1 load FALSE 0
#> 2 load TRUE 0.354
#> 3 no_load FALSE 0.0625
#> 4 no_load TRUE 0
#>
#> Based on NPB/MLE CIs, table of non-identified parameters:
#> condition core parameter n
#> 1 load TRUE d 17
#> 2 no_load FALSE g 3
#>
#>
#> ## MPTinR: complete pooling
#> No convergence problems.
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // no // TreeBUGS // simple:
#> All Rhat < 1.05 .
#> All effect sample sizes > 2000 .
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // complete // TreeBUGS // simple:
#> All Rhat < 1.05 .
#> All effect sample sizes > 2000 .
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // partial // TreeBUGS // trait:
#> All Rhat < 1.05 .
#> All effect sample sizes > 2000 .
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // partial // TreeBUGS // trait_uncorrelated:
#> All Rhat < 1.05 .
#> All effect sample sizes > 2000 .
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // partial // TreeBUGS // beta:
#> All Rhat < 1.05 .
#> 5 core parameters with effect sample size n.eff < 2000 :
#> alph[d], alph[D], bet[d], bet[D], mean[d]
#> 0 auxiliary parameters with effect sample size n.eff < 2000 :
#>
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // partial // TreeBUGS // betacpp:
#> All Rhat < 1.05 .
#> 12 core parameters with effect sample size n.eff < 2000 :
#> sd[d], sd[D], alph[d], alph[D], bet[d], bet[D], mean[d], alph[d], alph[D], bet[d], bet[D], d[21]
#> 0 auxiliary parameters with effect sample size n.eff < 2000 :
#>
In this example, for the no-pooling asymptotic approaches the rate of participants with non-identified parameters is very low. For the bootstrap-based approaches the results pattern is different. Here we see that the rate of participants with non-identified parameters in the load
condition is considerably higher, around .17 versus .03 in the no_load
condition. Particularly, the \(d\) parameter shows problematic behavior.
For the Bayesian approaches, the betacpp
did not reach an effective sample size \(\mathit{ESS} > 2{,}000\). Increasing the number of iterations by typing mpt_options(n.iter = 2e5)
, and re-running, should solve this problem.
fit_mpt
returns a tibble
with an additional class, multiverseMPT
, with one row per estimation method. When using the default setting and if all methods succeed, fit_mpt
uses ten estimation methods and thus this tibble
contains ten rows. The first five columns contain information about data and method and the remaining columns contain the results. Most of the results columns are list
columns and inspection of the content is performed most conveniently using packages dplyr
and tidyr
. We therefore load these packages before taking a glimpse at the columns.
library("dplyr")
library("tidyr")
glimpse(results)
#> Rows: 10
#> Columns: 18
#> $ model <chr> "/home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6…
#> $ dataset <chr> "/home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6…
#> $ pooling <chr> "complete", "no", "no", "no", "no", "complete", "par…
#> $ package <chr> "MPTinR", "MPTinR", "MPTinR", "MPTinR", "TreeBUGS", …
#> $ method <chr> "asymptotic", "asymptotic", "PB/MLE", "NPB/MLE", "si…
#> $ est_group <list> [<tbl_df[8 x 9]>, <tbl_df[8 x 9]>, <tbl_df[8 x 9]>,…
#> $ est_indiv <list> [<tbl_df[0 x 0]>, <tbl_df[192 x 11]>, <tbl_df[192 x…
#> $ est_rho <list> [<tbl_df[0 x 3]>, <tbl_df[0 x 3]>, <tbl_df[0 x 3]>,…
#> $ test_between <list> [<tbl_df[4 x 11]>, <tbl_df[4 x 11]>, <tbl_df[4 x 11…
#> $ test_within <list> [<tbl_df[12 x 14]>, <tbl_df[12 x 14]>, <tbl_df[12 x…
#> $ gof <list> [<tbl_df[1 x 6]>, <tbl_df[1 x 6]>, <tbl_df[1 x 6]>,…
#> $ gof_group <list> [<tbl_df[2 x 7]>, <tbl_df[2 x 7]>, <tbl_df[2 x 7]>,…
#> $ gof_indiv <list> [<tbl_df[0 x 0]>, <tbl_df[48 x 8]>, <tbl_df[48 x 8]…
#> $ fungibility <list> [<tbl_df[0 x 3]>, <tbl_df[0 x 3]>, <tbl_df[0 x 3]>,…
#> $ test_homogeneity <list> [<tbl_df[2 x 4]>, <tbl_df[2 x 4]>, <tbl_df[2 x 4]>,…
#> $ convergence <list> [<tbl_df[3 x 4]>, <tbl_df[48 x 5]>, <tbl_df[48 x 6]…
#> $ estimation <list> [<tbl_df[4 x 2]>, <tbl_df[1 x 2]>, <tbl_df[1 x 2]>,…
#> $ options <list> [<tbl_df[1 x 14]>, <tbl_df[1 x 14]>, <tbl_df[1 x 14…
Bayen and Kuhlman (2011) report a difference in the \(g\) parameter across conditions (they actually report an interaction with a further within-subjects factor, but this is not considered here). Thus, we can take a look at the difference in \(g\) parameter across conditions and methods in the following manner: We first select the column containing the results of the between-condition tests, unnest
this column, and then select only data containing the relevant parameter.
results %>%
select(pooling:method, test_between) %>%
unnest(cols = test_between) %>%
filter(parameter == "g") %>%
print(width = 150)
#> # A tibble: 10 x 14
#> pooling package method parameter core condition1 condition2
#> <chr> <chr> <chr> <chr> <lgl> <chr> <chr>
#> 1 complete MPTinR asymptotic g FALSE no_load load
#> 2 no MPTinR asymptotic g FALSE no_load load
#> 3 no MPTinR PB/MLE g FALSE no_load load
#> 4 no MPTinR NPB/MLE g FALSE no_load load
#> 5 no TreeBUGS simple g FALSE no_load load
#> 6 complete TreeBUGS simple g FALSE no_load load
#> 7 partial TreeBUGS trait g FALSE no_load load
#> 8 partial TreeBUGS trait_uncorrelated g FALSE no_load load
#> 9 partial TreeBUGS beta g FALSE no_load load
#> 10 partial TreeBUGS betacpp g FALSE no_load load
#> est_diff se p ci_0.025 ci_0.1 ci_0.9 ci_0.975
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.186 0.0316 NA -0.248 -0.227 -0.146 -0.125
#> 2 -0.151 0.0781 0.0587 -0.304 -0.251 -0.0511 0.00189
#> 3 -0.133 0.0808 0.104 -0.292 -0.237 -0.0298 0.0250
#> 4 -0.133 0.0808 0.104 -0.292 -0.237 -0.0298 0.0250
#> 5 -0.145 0.0281 0 -0.200 -0.181 -0.109 -0.0888
#> 6 -0.187 0.0312 0 -0.247 -0.227 -0.147 -0.126
#> 7 -0.184 0.0978 0.0693 -0.370 -0.308 -0.0582 0.0157
#> 8 -0.199 0.0949 0.0462 -0.379 -0.318 -0.0759 -0.00323
#> 9 -0.153 0.0710 0.036 -0.289 -0.243 -0.0611 -0.0110
#> 10 -0.152 0.0712 0.0385 -0.288 -0.242 -0.0609 -0.00990
Inspecting the differences across the analysis multiverse shows that the estimated difference is negative in each case and the 95% credibility/confidence intervals around the estimate do not include zero for 6 out of the 10 methods. Only the CIs for the no-pooling frequentist methods as well as the most sophisticated model, the latent trait model, include 0. However, the 80% CIs do not include zero for all methods. Taken together, this provides evidence that the \(g\) parameter is larger in the load
compared to the no_load
condition.
In a similar manner, it is also possible to examine differences between parameter estimates within each group: We first select
the column containing within-condition tests, unnest
this column, and then select
only data containing the relevant parameters.
results %>%
select(pooling:method, test_within) %>%
unnest(cols = test_within) %>%
filter(condition == "no_load") %>%
filter(parameter1 == "d" & parameter2 == "D") %>%
print(width = 150)
#> # A tibble: 10 x 17
#> pooling package method condition parameter1 parameter2 core1
#> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
#> 1 complete MPTinR asymptotic no_load d D TRUE
#> 2 no MPTinR asymptotic no_load d D TRUE
#> 3 no MPTinR PB/MLE no_load d D TRUE
#> 4 no MPTinR NPB/MLE no_load d D TRUE
#> 5 no TreeBUGS simple no_load d D TRUE
#> 6 complete TreeBUGS simple no_load d D TRUE
#> 7 partial TreeBUGS trait no_load d D TRUE
#> 8 partial TreeBUGS trait_uncorrelated no_load d D TRUE
#> 9 partial TreeBUGS beta no_load d D TRUE
#> 10 partial TreeBUGS betacpp no_load d D TRUE
#> core2 est se statistic df p ci_0.025 ci_0.1 ci_0.9 ci_0.975
#> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TRUE -0.00493 0.0530 -0.0930 NA 0.926 -0.109 -0.0728 0.0630 0.0989
#> 2 TRUE -0.0126 0.0542 -0.232 22 0.819 -0.125 -0.0842 0.0591 0.0999
#> 3 TRUE -0.0216 0.0527 -0.410 23 0.686 -0.131 -0.0911 0.0479 0.0874
#> 4 TRUE -0.0216 0.0527 -0.410 23 0.686 -0.131 -0.0911 0.0479 0.0874
#> 5 TRUE -0.000551 0.0438 NA NA 0.982 -0.0857 -0.0568 0.0551 0.0853
#> 6 TRUE -0.00221 0.0526 NA NA 0.951 -0.104 -0.0684 0.0657 0.102
#> 7 TRUE -0.000351 0.0743 NA NA 0.973 -0.143 -0.0925 0.0945 0.152
#> 8 TRUE -0.00389 0.0714 NA NA 0.944 -0.144 -0.0924 0.0865 0.141
#> 9 TRUE -0.00505 0.0615 NA NA 0.930 -0.125 -0.0833 0.0741 0.117
#> 10 TRUE -0.00597 0.0604 NA NA 0.911 -0.123 -0.0827 0.0722 0.115
In this example, these comparisons are probably not meaningful, but for other designs this column may be used for within-subjects comparisons.
The analysis output results
is an object of class multiverseMPT
, that has its own plot()
method. The plot
method returns ggplot2
objects, which allows further customization (such as using themes
). Type ?plot.multiverseMPT
to see the documentation of possible arguments to this method.
To plot group-level parameter estimates use:
To plot between-subjects comparisons across all parameters:
To plot overall goodness-of-fit use:
To plot group-wise goodness-of-fit use: