The goal of measr is to make it easy to estimate and evaluate diagnostic classification models (DCMs). DCMs are primarily useful for assessment or survey data where responses are recorded dichotomously (e.g., right/wrong, yes/no) or polytomously (e.g., strongly agree, agree, disagree, strongly disagree). When using DCMs, the measured skills, or attributes, are categorical. Thus, these models are particularly useful when you are measuring multiple attributes that exist in different states. For example, an educational assessment may be interested in reporting whether or not students are proficient on a set of academic standards. Similarly, we might explore the presence or absence of attributes before and after an intervention.
There are two main classes of functions we need to get started. Estimation functions are used for building the DCM using the Stan probabilistic programming language and getting estimates of respondent proficiency. Evaluation functions can then be applied to the fitted model to assess how well the estimates represent the observed data.
Because measr uses Stan as a backend for estimating DCMs, an installation of rstan or cmdstanr is required.
Before installing rstan, your system must be configured to compile C++ code. You can find instructions on the RStan Getting Started guide for Windows, Mac, and Linux.
The rstan package can then be installed directly from CRAN:
To verify the installation was successful, you can run a test model. If everything is set up correctly, the model should compile and sample. For additional troubleshooting help, see the RStan Getting Started guide.
The cmdstanr package is not yet available on CRAN. The beta release can be installed from the Stan R package repository:
Or the development version can be installed from GitHub:
The cmdstanr package requires a suitable C++ toolchain. Requirements and instructions for ensuring your toolchain is properly set up are described in the CmdStan User Guide.
You can verify that the C++ toolchain is set up correctly with:
Finally, cmdstanr requires that CmdStan (the shell interface to Stan). Once the toolchain is properly set up, CmdStan can be installed with:
For additional installation help, getting the Getting Started with CmdStanR vignette.
Once rstan and/or cmdstanr have been installed, we are ready to install measr. The released version of measr can be installed directly from CRAN:
Or, the development version can be installed from GitHub:
Once everything has been installed, we’re ready to start estimating and evaluating our DCMs.
To illustrate, we’ll fit a loglinear cognitive diagnostic model (LCDM) to an assessment of English language proficiency (see Templin & Hoffman, 2013). There are many different subtypes of DCMs that make different assumptions about how the attributes relate to each other. The LCDM is a general model that makes very few assumptions about the compensatory nature of the relationships between attributes. For details on the LCDM, see Henson & Templin (2019).
The data set we’re using contains 29 items that together measure
three attributes: morphosyntactic rules, cohesive rules, and lexical
rules. The Q-matrix defines which attributes are measured by each item.
For example, item E1 measures morphosyntactic and cohesive rules. The
data is further described in ?ecpe
.
ecpe_data
#> # A tibble: 2,922 × 29
#> resp_id E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11
#> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 1 1 1 0 1 1 1 1 1 1 1
#> 2 2 1 1 1 1 1 1 1 1 1 1 1
#> 3 3 1 1 1 1 1 1 0 1 1 1 1
#> 4 4 1 1 1 1 1 1 1 1 1 1 1
#> 5 5 1 1 1 1 1 1 1 1 1 1 1
#> 6 6 1 1 1 1 1 1 1 1 1 1 1
#> 7 7 1 1 1 1 1 1 1 1 1 1 1
#> 8 8 0 1 1 1 1 1 0 1 1 1 0
#> 9 9 1 1 1 1 1 1 1 1 1 1 1
#> 10 10 1 1 1 1 0 0 1 1 1 1 1
#> # ℹ 2,912 more rows
#> # ℹ 17 more variables: E12 <int>, E13 <int>, E14 <int>, E15 <int>, E16 <int>,
#> # E17 <int>, E18 <int>, E19 <int>, E20 <int>, E21 <int>, E22 <int>,
#> # E23 <int>, E24 <int>, E25 <int>, E26 <int>, E27 <int>, E28 <int>
ecpe_qmatrix
#> # A tibble: 28 × 4
#> item_id morphosyntactic cohesive lexical
#> <chr> <int> <int> <int>
#> 1 E1 1 1 0
#> 2 E2 0 1 0
#> 3 E3 1 0 1
#> 4 E4 0 0 1
#> 5 E5 0 0 1
#> 6 E6 0 0 1
#> 7 E7 1 0 1
#> 8 E8 0 1 0
#> 9 E9 0 0 1
#> 10 E10 1 0 0
#> # ℹ 18 more rows
We can estimate the LCDM using the measr_dcm()
function.
We specify the data set, the Q-matrix, and the column names of the
respondent and item identifiers in each (if they exist). Finally, we add
two additional arguments. The method
defines how the model
should be estimated. For computational efficiency, I’ve selected
"optim"
, which uses Stan’s optimizer to estimate the model.
For a fully Bayesian estimation, you can change this
method = "mcmc"
. Finally, we specify the type of DCM to
estimate. As previously discussed, we’re estimating an LCDM in this
example. For more details and options for customizing the model
specification and estimation, see the model
estimation article on the measr website.
ecpe_lcdm <- measr_dcm(data = ecpe_data, qmatrix = ecpe_qmatrix,
resp_id = "resp_id", item_id = "item_id",
method = "optim", type = "lcdm")
Once the model as estimated, we can use measr_extract()
to pull out the probability that each respondent is proficient on each
of the attributes. For example, the first respondent has probabilities
near 1 for all attributes, indicating a high degree of confidence that
they are proficient in all attributes. On the other hand, respondent 8
has relatively low probabilities for morphosyntactic and cohesive
attributes, and is likely only proficient in lexical rules.
ecpe_lcdm <- add_respondent_estimates(ecpe_lcdm)
measr_extract(ecpe_lcdm, "attribute_prob")
#> # A tibble: 2,922 × 4
#> resp_id morphosyntactic cohesive lexical
#> <fct> <dbl> <dbl> <dbl>
#> 1 1 0.997 0.962 1.00
#> 2 2 0.995 0.900 1.00
#> 3 3 0.985 0.990 1.00
#> 4 4 0.998 0.991 1.00
#> 5 5 0.989 0.985 0.965
#> 6 6 0.993 0.991 1.00
#> 7 7 0.993 0.991 1.00
#> 8 8 0.00411 0.471 0.964
#> 9 9 0.949 0.986 0.999
#> 10 10 0.552 0.142 0.111
#> # ℹ 2,912 more rows
There are many ways to evaluate our estimated model including model
fit, model comparisons, and reliability. For a complete listing of
available options, see ?model_evaluation
. To illustrate how
these functions work, we’ll look at the classification accuracy and
consistency metrics described by Johnson &
Sinharay (2018).
We start by adding the reliability information to our estimated model
using add_reliability()
. We can then extract that
information, again using measr_extract()
. For these
indices, numbers close to 1 indicate a high level of classification
accuracy or consistency. These numbers are not amazing, but overall look
pretty good. For guidance on cutoff values for “good,” “fair,” etc.
reliability, see Johnson & Sinharay (2018).