Examples

First, we will load the package required in the vignette.

library(ThurMod)

The next step is to define the model we are interested in. For this we have to define the following aspects:

the number of factors of the model
the number of items of the model
the item-to-factor relations

We consider a model with four factors (traits), measured by 12 items. The item-to-factor relations are defined so that

items 1, 5, 9 measure factor 1
items 2, 6, 10 measure factor 2
items 3, 7, 11 measure factor 3
items 4, 8, 12 measure factor 4.

Further we simulate data of 1000 respondents with loadings between .30 and .95 and latent utility means between -1 and 1. We assume that the data results from rankings, that is that transitivity between responses holds (the variance of the binary indicators is zero). Further, we assume uncorrelated data. We will simulate the data of all paired comparisons possible (see Jansen & Schulze, 2023b).

Set up the simulation conditions:

nfactor <- 4
nitem <- 12
nperson <- 1000
itf <- rep(1:4, 3)
varcov <- diag(1, 4)

# latent utility means
set.seed(69)
lmu <- runif(nitem, -1, 1)
loadings <- runif(nitem, 0.30, 0.95)

Next, we simulate a data set based on the true parameter values:

data <- sim.data(nfactor = nfactor, nitem = nitem, nperson = nperson, itf = itf,
                 varcov = varcov, lmu = lmu, loadings = loadings)

#save the file
write.table(data, paste0(tempdir(),'/','myDataFile_f.dat'), quote = FALSE, 
            sep = " ", col.names = FALSE, row.names = FALSE)

The data set contains all (12 )/2 = 66 possible paired comparison variables. With this data set we will perform analyses of the full design (IRT, CFA).

Full design

Full designs include all paired comparisons. The estimation of these designs can take a while, as with categorical data a large correlations matrix must be estimated.

For all functions, the blocks we use must be defined. In a full design, there is only one block with all items:

blocks <- matrix(1:nitem, nrow = 1)

The blocks must be defined as a matrix where the rows correspond to each block. Only for a full design, a vector, for example, 1:12 would work.

Thurstonian factor models

We will analyse the data with Mplus and lavaan. We can do this in two ways: First, in three separate steps, second, directly.

Way 1, step 1: Build syntax

# Mplus
syntax.mplus(blocks, itf, model = 'lmean', input_path = 'myFC_model_f.inp', data_path = 
               "myDataFile_f.dat")

#lavaan
modelsyn <- syntax.lavaan(blocks, itf, model = 'lmean')

For step 2, now these syntaxes must be run (evaluation will not be performed here):

# Mplus
system('mplus myFC_model_f.inp', wait = TRUE, show.output.on.console= FALSE)

#lavaan
results_lav1 <- lavaan::lavaan(modelsyn, data = data, ordered = TRUE, auto.fix.first = FALSE,
                      auto.var = TRUE, int.lv.free = TRUE, parameterization = "theta",
                      estimator = 'ULSMV')

If you replicate this, be patient, it can take about 20 minutes each, depending on your processing power.

Step 3: Read results. For Mplus you can either open the output file, or use read.mplus

# Mplus
results_mplus1 <- read.mplus(blocks, itf, model = 'lmean', output_path = "myFC_model_f.out")

unlist(results_mplus1$fit)

          srmr            cfi            tli          chisq             df         pvalue 
        0.0460         1.0000         1.0000      2131.5890      2171.0000         0.7231 
       blchisq           bldf       blpvalue          rmsea        rmsea_p rmsea.ci.lower 
    20861.0240      2145.0000         0.0000         0.0000         1.0000         0.0000 
rmsea.ci.upper 
        0.0060

results_lav1 <- lavaan::fitmeasures(results_lav1)[c('chisq.scaled','df.scaled','pvalue.scaled',
                                     'rmsea.scaled','rmsea.ci.lower.scaled','rmsea.ci.upper.scaled',
                                     'rmsea.pvalue.scaled','cfi.scaled')]

results_lav1

         chisq.scaled             df.scaled         pvalue.scaled          rmsea.scaled 
         2.130962e+03          2.171000e+03          7.261660e-01          0.000000e+00 
rmsea.ci.lower.scaled rmsea.ci.upper.scaled   rmsea.pvalue.scaled            cfi.scaled 
         0.000000e+00          5.706839e-03          1.000000e+00          1.000000e+00

The second way is to use fit_mplus or fit_lavaan, each function does the above steps at once:

# Mplus
results_mplus2 <- fit.mplus(blocks, itf, model = 'lmean', input_path = 'myFC_model_f.inp',
                            output_path = "myFC_model_f.out", data_path = "myDataFile_f.dat")

#lavaan
results_lav2 <- fit.lavaan(blocks, itf, model = 'lmean', data = data)
lavaan::fitmeasures(results_lav2)[c('rmsea.scaled','rmsea.ci.lower.scaled','rmsea.ci.upper.scaled',
                                    'rmsea.pvalue.scaled','cfi.scaled')]

Thurstonian IRT models

For IRT models, the procedure is the same, just change model = 'lmean' to model = 'irt', for example

# Mplus
results_mplus2irt <- fit.mplus(blocks, itf, model = 'irt', input_path = 'myFC_model_irt.inp',
                               output_path = "myFC_model_irt.out", data_path = "myDataFile_f.dat")

unlist(results_mplus2irt$fit)

          srmr            cfi            tli          chisq             df         pvalue 
        0.0460         1.0000         1.0000      2076.5110      2116.0000         0.7261 
       blchisq           bldf       blpvalue          rmsea        rmsea_p rmsea.ci.lower 
    20861.0240      2145.0000         0.0000         0.0000         1.0000         0.0000 
rmsea.ci.upper 
        0.0060

#lavaan
results_lav2irt <- fit.lavaan(blocks, itf, model = 'irt', data = data)

results_lav2irt <- lavaan::fitmeasures(results_lav2irt)[c('chisq.scaled','df.scaled','pvalue.scaled',
                                     'rmsea.scaled','rmsea.ci.lower.scaled','rmsea.ci.upper.scaled',
                                     'rmsea.pvalue.scaled','cfi.scaled')]

results_lav2irt

         chisq.scaled             df.scaled         pvalue.scaled          rmsea.scaled 
         2.075897e+03          2.116000e+03          7.291136e-01          0.000000e+00 
rmsea.ci.lower.scaled rmsea.ci.upper.scaled   rmsea.pvalue.scaled            cfi.scaled 
         0.000000e+00          5.719821e-03          1.000000e+00          1.000000e+00

scores_results_lav2irt <- lavaan::lavPredict(results_lav2irt)

Block designs - unlinked

The first block designs were introduced as multidimensional forced-choice blocks (Brown & Maydeu-Olivares, 2011). However, it was shown that the designs must neither be multidimensional, nor must every paired comparison only be contained once (Jansen & Schulze, 2023b).

We first simulate a new data set. Set up the simulation conditions:

nfactor <- 5
nitem <- 30
nperson <- 1000
itf <- rep(1:5, 6)
varcov <- diag(1, 5)

# latent utility means
set.seed(69)
lmu <- runif(nitem, -1, 1)
loadings <- runif(nitem, 0.30, 0.95)

Next, we simulate a data set based on the true parameter values:

set.seed(1234)
data <- sim.data(nfactor = nfactor, nitem = nitem, nperson = nperson, itf = itf, varcov = varcov,
                 lmu = lmu, loadings = loadings)

#save the file
write.table(data,'myDataFile.dat', quote = FALSE, sep = " ", col.names = FALSE, row.names = FALSE)

Next, we consider unlinked blocks of three items (triplets):

blocks <- matrix(sample(1:nitem, nitem), ncol = 3)

The blocks are defined by a matrix where the rows are the blocks and the number of columns is the number of items per block. Before we can fit the model, we must ensure that the data fits to the syntax we create. The data simulated before assumes that all items are in ascending order. This is important, as the order of the items determine the coding. We code binary items as \[\begin{equation} y_{l} = \begin{cases} 1 & \text{if item $i$ is chosen over item $j$} \\ 0 & \text{else} .\\ \end{cases} \end{equation}\] In a ranking task, all choice alternatives are presented at once, but for each possible comparison, the coding scheme can be used. For example, consider $n = 3$ items which are labeled as {A, B, C}. Then we have the pairs {A, B}, {A, C}, {B, C}. If for {A, B} A is preferred over B then $y_{A,B}=1$ and 0 otherwise. This can be done for every paired comparison and ranking task (Maydeu-Olivares & Böckenholt, 2005): For {A,C,B} the binary outcomes are $y_{A,B}=1$, $y_{A,C}=1$, and $y_{B,C}=0$. However, if the order of the binary item is changed, then the coding is changed accordingly, e.g. $y_{B,A}=0$, $y_{C,A}=0$, and $y_{C,B}=1$.

We can get an overview over all possible comparisons by the block design, by using pair.combn:

pair.combn(blocks)

      [,1] [,2]
 [1,]    1   15
 [2,]    2   19
 [3,]    4   18
 [4,]    4   21
 [5,]    5   20
 [6,]    7   30
 [7,]    8    2
 [8,]    8   19
 [9,]    9   10
[10,]   11    6
[11,]   13   17
[12,]   14    1
[13,]   14   15
[14,]   16    5
[15,]   16   20
[16,]   21   18
[17,]   22    3
[18,]   23    3
[19,]   23   22
[20,]   24   12
[21,]   24   26
[22,]   25   13
[23,]   25   17
[24,]   26   12
[25,]   27    7
[26,]   27   30
[27,]   28    6
[28,]   28   11
[29,]   29    9
[30,]   29   10

See that we have some items, that are not in ascending order. The data simulated assumes, that the first named item (item $i$) has a smaller number e.g., i3i9. The blocks we created however, have some comparisons, where the first items have a larger number e.g. i9i3. There are two ways to fit the syntax to the data: First, we can simply sort the blocks via blocksort:

blocks_sorted <- blocksort(blocks)
pair.combn(blocks_sorted)

      [,1] [,2]
 [1,]    1   14
 [2,]    1   15
 [3,]    2    8
 [4,]    2   19
 [5,]    3   22
 [6,]    3   23
 [7,]    4   18
 [8,]    4   21
 [9,]    5   16
[10,]    5   20
[11,]    6   11
[12,]    6   28
[13,]    7   27
[14,]    7   30
[15,]    8   19
[16,]    9   10
[17,]    9   29
[18,]   10   29
[19,]   11   28
[20,]   12   24
[21,]   12   26
[22,]   13   17
[23,]   13   25
[24,]   14   15
[25,]   16   20
[26,]   17   25
[27,]   18   21
[28,]   22   23
[29,]   24   26
[30,]   27   30

Now all paired comparisons that can be derived are in ascending order.

The second way is to recode the corresponding binary indicators in the data, as if we flip the coding schema. We can just recode the variables:

# Get names of binary indicators that have non-ascending names
tmp <- which(pair.combn(blocks)[,1] > pair.combn(blocks)[,2])

# get names
pair_names_b <- i.name(blocks)
pair_names_ori <- i.name(1:nitem)

# Rename
pair_names <- i.name(1:nitem)
if(length(tmp) != 0){
  tmp1 <- pair_names_b[tmp]
  tmp2 <- sub('^i.+i','i', tmp1)
  tmp3 <- tmp1
  for(j in 1:length(tmp)){
    tmp3 <- paste0(tmp2[j], sub(paste0(tmp2[j],'$'), '', tmp1[j]))
    pair_names[which(pair_names %in% tmp3)] <- pair_names_b[tmp[j]]
  }
}

tmp <- which(!names(data) %in% pair_names)
# Clone data
data_recoded <- data
# Recode and rename
data_recoded[,tmp] <- abs(data[,tmp]-1)
names(data_recoded) <- pair_names

# Save data
write.table(data_recoded, 'myDataFile_rec.dat', quote = FALSE, sep = " ", col.names = FALSE,
            row.names = FALSE)

Both ways yield the same results. We have to add the argument data_full = TRUE, as we simulated all data in the data file, but only use some of the data. We demonstrate only the model = 'irt' case, however, also for the CFA model types, these analyses can be performed (model = 'lmean', model = 'uc', or model = 'simple').

# Blocks_sorted
# Mplus
results_mplus_b <- fit.mplus(blocks_sorted, itf, model = 'irt', input_path = 'myFC_model_bu.inp',
                             output_path = "myFC_model_bu.out", data_path = "myDataFile.dat",
                             data_full = TRUE)

unlist(results_mplus_b$fit)

          srmr            cfi            tli          chisq             df         pvalue 
         0.041          1.000          1.000        356.264        375.000          0.749 
       blchisq           bldf       blpvalue          rmsea        rmsea_p rmsea.ci.lower 
      4171.100        435.000          0.000          0.000          1.000          0.000 
rmsea.ci.upper 
         0.009

#lavaan
results_lav_b <- fit.lavaan(blocks_sorted, itf, model = 'irt', data = data)
results_lav_b_fm <- lavaan::fitmeasures(results_lav_b)

results_lav_b <- lavaan::fitmeasures(results_lav_b)[c('chisq.scaled','df.scaled','pvalue.scaled',
                                     'rmsea.scaled','rmsea.ci.lower.scaled','rmsea.ci.upper.scaled',
                                     'rmsea.pvalue.scaled','cfi.scaled')]

results_lav_b

         chisq.scaled             df.scaled         pvalue.scaled          rmsea.scaled 
         3.560344e+02          3.750000e+02          7.517818e-01          0.000000e+00 
rmsea.ci.lower.scaled rmsea.ci.upper.scaled   rmsea.pvalue.scaled            cfi.scaled 
         0.000000e+00          8.660242e-03          1.000000e+00          1.000000e+00

scores_results_lav_b <- lavaan::lavPredict(results_lav_b)

# Recoded data
# Mplus
results_mplus_brec <- fit.mplus(blocks, itf, model = 'irt', input_path = 'myFC_model_brecu.inp',
                                output_path = "myFC_model_brecu.out", data_path = 
                                  "myDataFile_rec.dat", data_full = TRUE)

unlist(results_mplus_brec$fit)

          srmr            cfi            tli          chisq             df         pvalue 
         0.041          1.000          1.000        356.264        375.000          0.749 
       blchisq           bldf       blpvalue          rmsea        rmsea_p rmsea.ci.lower 
      4171.100        435.000          0.000          0.000          1.000          0.000 
rmsea.ci.upper 
         0.009

#lavaan
results_lav_brec <- fit.lavaan(blocks, itf, model = 'irt', data = data_recoded)

results_lav_brec <- lavaan::fitmeasures(results_lav_brec)[c('chisq.scaled','df.scaled','pvalue.scaled',
                                     'rmsea.scaled','rmsea.ci.lower.scaled','rmsea.ci.upper.scaled',
                                     'rmsea.pvalue.scaled','cfi.scaled')]

results_lav_brec

         chisq.scaled             df.scaled         pvalue.scaled          rmsea.scaled 
         3.560344e+02          3.750000e+02          7.517822e-01          0.000000e+00 
rmsea.ci.lower.scaled rmsea.ci.upper.scaled   rmsea.pvalue.scaled            cfi.scaled 
         0.000000e+00          8.660236e-03          1.000000e+00          1.000000e+00

scores_results_lav_brec <- lavaan::lavPredict(results_lav_brec)

In the case of blocks, we must correct the fit indices, as there are redundancies among the thresholds and tetrachoric correlations. The redundancies can be determined with the function redundancies. We can also directly correct the fit with fit.correct:

#save fit measures
tmp <- results_lav_b_fm
fit.correct(1000, blocks, tmp['chisq.scaled'], tmp['df.scaled'], tmp['baseline.chisq.scaled'],
            tmp['baseline.df.scaled'])

df.df_corr      rmsea        cfi 
       365          0          1

Block designs - linked

Now, we consider linked block designs (Jansen & Schulze, 2023b). The simplest way to construct these designs is to take the original unlinked design and link all blocks together (rank of the design $r_D=N-1$, where $N$ is the number of items; Jansen & Schulze, 2023b). The original blocks are:

blocks

      [,1] [,2] [,3]
 [1,]   23   22    3
 [2,]   14    1   15
 [3,]    8    2   19
 [4,]   24   26   12
 [5,]   28   11    6
 [6,]    4   21   18
 [7,]   25   13   17
 [8,]   16    5   20
 [9,]   29    9   10
[10,]   27    7   30

To get a linked design we need extra blocks. The number can be determined with count.xblocks

count.xblocks(blocks)

[1] 5

Hence, we need five extra triplets in this case. We add for example the following linking blocks:

block 1: 23,14,8
block 2: 24,28,4
block 3: 25,16,29
block 4: 22,26,13
block 5: 1,21,30

blocks_extra <- matrix(c(23,14,8,24,28,4,25,16,29,22,26,13,1,21,30), ncol = 3, byrow = TRUE)
blocks_con <- rbind(blocks, blocks_extra)

We can determine the rank of the design matrix with rankA:

rankA(blocks_con)

[1] 29

The rank is 30-1=29. The function metablock also gives a overview over which blocks are linked.

metablock(blocks_con)

[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[29] 29 30

There is only one meta block, therefore all items are linked. A counter example would be if we would not add block 4:

blocks_extra <- matrix(c(23,14,8,24,28,4,25,16,29,1,21,30), ncol = 3, byrow = TRUE)
blocks_con <- rbind(blocks, blocks_extra)

Then the rank is

rankA(blocks_con)

[1] 28

The rank is 30-1=29$\neq$ 28. And we have two “meta” blocks.

metablock(blocks_con)

[[1]]
[1]  5  9 10 13 16 17 20 25 29

[[2]]
 [1]  1  2  3  4  6  7  8 11 12 14 15 18 19 21 22 23 24 26 27 28 30

Instead of manually constructing the blocks, ThurMod enables the user to get extra blocks via a function get.xblocks. We have to define if the blocks should be multidimensional (else multidim = FALSE), and if items should not be considered (e.g. because they are negatively keyed):

blocks_extra <- get.xblocks(blocks, itf, multidim = TRUE, item_not = NULL)
blocks_con <- rbind(blocks, blocks_extra)
blocks_con

      [,1] [,2] [,3]
 [1,]   23   22    3
 [2,]   14    1   15
 [3,]    8    2   19
 [4,]   24   26   12
 [5,]   28   11    6
 [6,]    4   21   18
 [7,]   25   13   17
 [8,]   16    5   20
 [9,]   29    9   10
[10,]   27    7   30
[11,]   23   14    2
[12,]   12   11   18
[13,]   17   16    9
[14,]    7   23   26
[15,]   20    2    9

blocks_con_sorted <- blocksort(blocks_con)

If we take the blocks as is, we need to recode the data set:

# Get names of binary indicators that have non-ascending names
tmp <- which(pair.combn(blocks_con)[,1] > pair.combn(blocks_con)[,2])

# get names
pair_names_b <- i.name(blocks_con)
pair_names_ori <- i.name(1:nitem)

# Rename
pair_names <- i.name(1:nitem)
if(length(tmp)!=0){
  tmp1 <- pair_names_b[tmp]
  tmp2 <- sub('^i.+i','i', tmp1)
  tmp3 <- tmp1
  for(j in 1:length(tmp)){
    tmp3 <- paste0(tmp2[j], sub(paste0(tmp2[j],'$'), '', tmp1[j]))
    pair_names[which(pair_names %in% tmp3)] <- pair_names_b[tmp[j]]
  }
}

tmp <- which(!names(data) %in% pair_names)
# Clone data
data_recoded <- data
# Recode and rename
data_recoded[,tmp] <- abs(data[,tmp]-1)
names(data_recoded) <- pair_names

# Save data
write.table(data_recoded, 'myDataFile_rec.dat', quote = FALSE, sep = " ", col.names = FALSE,
            row.names = FALSE)

The estimation of linked designs is done equivalent via

# Blocks_sorted
# Mplus
results_mplus_bc <- fit.mplus(blocks_con_sorted,itf,model='irt',input_path='myFC_model_b_con.inp',
                             output_path="myFC_model_b_con.out",data_path="myDataFile.dat",
                             data_full = TRUE)

unlist(results_mplus_bc$fit)

          srmr            cfi            tli          chisq             df         pvalue 
        0.0430         1.0000         1.0000       886.7800       921.0000         0.7858 
       blchisq           bldf       blpvalue          rmsea        rmsea_p rmsea.ci.lower 
    10190.6740       990.0000         0.0000         0.0000         1.0000         0.0000 
rmsea.ci.upper 
        0.0060

#lavaan
results_lav_bc <- fit.lavaan(blocks_con_sorted, itf, model = 'irt', data = data)
results_lav_bc_fm <- lavaan::fitmeasures(results_lav_bc)

results_lav_bc <- lavaan::fitmeasures(results_lav_bc)[c('chisq.scaled','df.scaled','pvalue.scaled',
                                     'rmsea.scaled','rmsea.ci.lower.scaled','rmsea.ci.upper.scaled',
                                     'rmsea.pvalue.scaled','cfi.scaled')]

results_lav_bc

         chisq.scaled             df.scaled         pvalue.scaled          rmsea.scaled 
         8.863299e+02          9.210000e+02          7.888948e-01          0.000000e+00 
rmsea.ci.lower.scaled rmsea.ci.upper.scaled   rmsea.pvalue.scaled            cfi.scaled 
         0.000000e+00          6.388203e-03          1.000000e+00          1.000000e+00

scores_results_lav_bc <- lavaan::lavPredict(results_lav_bc)

# Recoded data
# Mplus
results_mplus_bcrec <- fit.mplus(blocks_con, itf, model = 'irt', input_path = 'myFC_model_brec.inp',
                                output_path = "myFC_model_brec.out",data_path =
                                  "myDataFile_rec.dat",data_full = TRUE, byblock = FALSE)

unlist(results_mplus_bcrec$fit)

          srmr            cfi            tli          chisq             df         pvalue 
        0.0430         1.0000         1.0000       886.1220       920.0000         0.7835 
       blchisq           bldf       blpvalue          rmsea        rmsea_p rmsea.ci.lower 
    10190.6740       990.0000         0.0000         0.0000         1.0000         0.0000 
rmsea.ci.upper 
        0.0060

#lavaan
results_lav_bcrec <- fit.lavaan(blocks_con, itf, model = 'irt', data = data_recoded)

results_lav_bcrec <- lavaan::fitmeasures(results_lav_bcrec)[c('chisq.scaled','df.scaled','pvalue.scaled',
                                     'rmsea.scaled','rmsea.ci.lower.scaled','rmsea.ci.upper.scaled',
                                     'rmsea.pvalue.scaled','cfi.scaled')]

results_lav_bcrec

         chisq.scaled             df.scaled         pvalue.scaled          rmsea.scaled 
         8.856718e+02          9.200000e+02          7.866540e-01          0.000000e+00 
rmsea.ci.lower.scaled rmsea.ci.upper.scaled   rmsea.pvalue.scaled            cfi.scaled 
         0.000000e+00          6.419601e-03          1.000000e+00          1.000000e+00

scores_results_lav_bcrec <- lavaan::lavPredict(results_lav_bcrec)

Here, we also must correct the fit indices, as there are redundancies among the thresholds and tetrachoric correlations. Again, the redundancies can be determined with the function redundancies. We directly correct the fit with fit.correct:

#save fit measures
tmp <- results_lav_bc_fm
fit.correct(1000, blocks_con, tmp['chisq.scaled'], tmp['df.scaled'], tmp['baseline.chisq.scaled'],
            tmp['baseline.df.scaled'])

df.df_corr      rmsea        cfi 
       906          0          1

Simulating only relevant data

The estimation of a full design will most likely seldom be of interest. More likely, (linked) block designs are of interest. Especially for simulations with many items (more than 50), many binary variables are simulated and saved. While the simulation of all items is important (see Jansen & Schulze, 2023b), saving all items is not. sim.data can also return only items that are of relevance. For that we need to specify the argument variables. Assume we have the linked block design we specified before. We only have to give the item names of the sorted blocks and define it as the variables argument:

#Get the relevant names
tmp_names <- i.name(blocks_con_sorted)

#same example as before
set.seed(1234)
data <- sim.data(nfactor = nfactor, nitem = nitem, nperson = nperson, itf = itf, varcov = varcov,
                 lmu = lmu, loadings = loadings, variables = tmp_names)

#save the file
write.table(data, 'myDataFile.dat', quote = FALSE, sep = " ", col.names = FALSE,
            row.names = FALSE)

For the Mplus analysis, the argument data_full must then be set to FALSE.

Taken together, we have demonstrated the basic steps on how to use the functions given by ThurMod. Please report any bugs. If you miss functions that could be helpful for the analysis of Thurstonian forced-choice data, please let me know.

Simulating data and estimating Thurstonian IRT and factor models with ThurMod

Markus Thomas Jansen

2023-09-19

Introduction