Getting started with measr

The goal of measr is to make it easy to estimate and evaluate diagnostic classification models (DCMs). DCMs are primarily useful for assessment or survey data where responses are recorded dichotomously (e.g., right/wrong, yes/no) or polytomously (e.g., strongly agree, agree, disagree, strongly disagree). When using DCMs, the measured skills, or attributes, are categorical. Thus, these models are particularly useful when you are measuring multiple attributes that exist in different states. For example, an educational assessment may be interested in reporting whether or not students are proficient on a set of academic standards. Similarly, we might explore the presence or absence of attributes before and after an intervention.

There are two main classes of functions we need to get started. Estimation functions are used for building the DCM using the Stan probabilistic programming language and getting estimates of respondent proficiency. Evaluation functions can then be applied to the fitted model to assess how well the estimates represent the observed data.

Installation

Because measr uses Stan as a backend for estimating DCMs, an installation of rstan or cmdstanr is required.

rstan

Before installing rstan, your system must be configured to compile C++ code. You can find instructions on the RStan Getting Started guide for Windows, Mac, and Linux.

The rstan package can then be installed directly from CRAN:

install.packages("rstan")

To verify the installation was successful, you can run a test model. If everything is set up correctly, the model should compile and sample. For additional troubleshooting help, see the RStan Getting Started guide.

library(rstan)

example(stan_model, package = "rstan", run.dontrun = TRUE)

cmdstanr

The cmdstanr package is not yet available on CRAN. The beta release can be installed from the Stan R package repository:

install.packages("cmdstanr",
                 repos = c("https://mc-stan.org/r-packages/",
                           getOption("repos")))

Or the development version can be installed from GitHub:

# install.packages("remotes")
remotes::install_github("stan-dev/cmdstanr")

The cmdstanr package requires a suitable C++ toolchain. Requirements and instructions for ensuring your toolchain is properly set up are described in the CmdStan User Guide.

You can verify that the C++ toolchain is set up correctly with:

library(cmdstanr)

check_cmdstan_toolchain()

Finally, cmdstanr requires that CmdStan (the shell interface to Stan). Once the toolchain is properly set up, CmdStan can be installed with:

install_cmdstan(cores = 2)

For additional installation help, getting the Getting Started with CmdStanR vignette.

measr

Once rstan and/or cmdstanr have been installed, we are ready to install measr. The released version of measr can be installed directly from CRAN:

install.packages("measr")

Or, the development version can be installed from GitHub:

# install.packages("remotes")
remotes::install_github("wjakethompson/measr")

Once everything has been installed, we’re ready to start estimating and evaluating our DCMs.

library(measr)

Model Estimation

To illustrate, we’ll fit a loglinear cognitive diagnostic model (LCDM) to an assessment of English language proficiency (see Templin & Hoffman, 2013). There are many different subtypes of DCMs that make different assumptions about how the attributes relate to each other. The LCDM is a general model that makes very few assumptions about the compensatory nature of the relationships between attributes. For details on the LCDM, see Henson & Templin (2019).

The data set we’re using contains 29 items that together measure three attributes: morphosyntactic rules, cohesive rules, and lexical rules. The Q-matrix defines which attributes are measured by each item. For example, item E1 measures morphosyntactic and cohesive rules. The data is further described in ?ecpe.

ecpe_data
#> # A tibble: 2,922 × 29
#>    resp_id    E1    E2    E3    E4    E5    E6    E7    E8    E9   E10   E11
#>      <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#>  1       1     1     1     1     0     1     1     1     1     1     1     1
#>  2       2     1     1     1     1     1     1     1     1     1     1     1
#>  3       3     1     1     1     1     1     1     0     1     1     1     1
#>  4       4     1     1     1     1     1     1     1     1     1     1     1
#>  5       5     1     1     1     1     1     1     1     1     1     1     1
#>  6       6     1     1     1     1     1     1     1     1     1     1     1
#>  7       7     1     1     1     1     1     1     1     1     1     1     1
#>  8       8     0     1     1     1     1     1     0     1     1     1     0
#>  9       9     1     1     1     1     1     1     1     1     1     1     1
#> 10      10     1     1     1     1     0     0     1     1     1     1     1
#> # ℹ 2,912 more rows
#> # ℹ 17 more variables: E12 <int>, E13 <int>, E14 <int>, E15 <int>, E16 <int>,
#> #   E17 <int>, E18 <int>, E19 <int>, E20 <int>, E21 <int>, E22 <int>,
#> #   E23 <int>, E24 <int>, E25 <int>, E26 <int>, E27 <int>, E28 <int>

ecpe_qmatrix
#> # A tibble: 28 × 4
#>    item_id morphosyntactic cohesive lexical
#>    <chr>             <int>    <int>   <int>
#>  1 E1                    1        1       0
#>  2 E2                    0        1       0
#>  3 E3                    1        0       1
#>  4 E4                    0        0       1
#>  5 E5                    0        0       1
#>  6 E6                    0        0       1
#>  7 E7                    1        0       1
#>  8 E8                    0        1       0
#>  9 E9                    0        0       1
#> 10 E10                   1        0       0
#> # ℹ 18 more rows

We can estimate the LCDM using the measr_dcm() function. We specify the data set, the Q-matrix, and the column names of the respondent and item identifiers in each (if they exist). Finally, we add two additional arguments. The method defines how the model should be estimated. For computational efficiency, I’ve selected "optim", which uses Stan’s optimizer to estimate the model. For a fully Bayesian estimation, you can change this method = "mcmc". Finally, we specify the type of DCM to estimate. As previously discussed, we’re estimating an LCDM in this example. For more details and options for customizing the model specification and estimation, see the model estimation article on the measr website.

ecpe_lcdm <- measr_dcm(data = ecpe_data, qmatrix = ecpe_qmatrix,
                       resp_id = "resp_id", item_id = "item_id",
                       method = "optim", type = "lcdm")

Once the model as estimated, we can use measr_extract() to pull out the probability that each respondent is proficient on each of the attributes. For example, the first respondent has probabilities near 1 for all attributes, indicating a high degree of confidence that they are proficient in all attributes. On the other hand, respondent 8 has relatively low probabilities for morphosyntactic and cohesive attributes, and is likely only proficient in lexical rules.

ecpe_lcdm <- add_respondent_estimates(ecpe_lcdm)
measr_extract(ecpe_lcdm, "attribute_prob")
#> # A tibble: 2,922 × 4
#>    resp_id morphosyntactic cohesive lexical
#>    <fct>             <dbl>    <dbl>   <dbl>
#>  1 1               0.997      0.962   1.00 
#>  2 2               0.995      0.900   1.00 
#>  3 3               0.985      0.990   1.00 
#>  4 4               0.998      0.991   1.00 
#>  5 5               0.989      0.985   0.965
#>  6 6               0.993      0.991   1.00 
#>  7 7               0.993      0.991   1.00 
#>  8 8               0.00411    0.471   0.964
#>  9 9               0.949      0.986   0.999
#> 10 10              0.552      0.142   0.111
#> # ℹ 2,912 more rows

Model Evaluation

There are many ways to evaluate our estimated model including model fit, model comparisons, and reliability. For a complete listing of available options, see ?model_evaluation. To illustrate how these functions work, we’ll look at the classification accuracy and consistency metrics described by Johnson & Sinharay (2018).

We start by adding the reliability information to our estimated model using add_reliability(). We can then extract that information, again using measr_extract(). For these indices, numbers close to 1 indicate a high level of classification accuracy or consistency. These numbers are not amazing, but overall look pretty good. For guidance on cutoff values for “good,” “fair,” etc. reliability, see Johnson & Sinharay (2018).

ecpe_lcdm <- add_reliability(ecpe_lcdm)
measr_extract(ecpe_lcdm, "classification_reliability")
#> # A tibble: 3 × 3
#>   attribute       accuracy consistency
#>   <chr>              <dbl>       <dbl>
#> 1 morphosyntactic    0.897       0.835
#> 2 cohesive           0.858       0.809
#> 3 lexical            0.918       0.858

References

Henson, R., & Templin, J. (2019). Loglinear cognitive diagnostic model (LCDM). In M. von Davier & Y.-S. Lee (Eds.), Handbook of Diagnostic Classification Models (pp. 171–185). Springer International Publishing. https://doi.org/10.1007/978-3-030-05584-4_8
Johnson, M. S., & Sinharay, S. (2018). Measures of agreement to assess attribute-level classification accuracy and consistency for cognitive diagnostic assessments. Journal of Educational Measurement, 55(4), 635–664. https://doi.org/10.1111/jedm.12196
Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice, 32(2), 37–50. https://doi.org/10.1111/emip.12010