This R package performs regularization of differential item functioning (DIF) parameters in item response theory (IRT) models using a penalized expectation-maximization algorithm.
regDIF can:
To get the current released version from CRAN:
To get the current development version from Github:
A simulated data example with 6 item responses (binary) and 3 background variables (gender, age, study) is available in the regDIF package:
library(regDIF)
head(ida)
#> item1 item2 item3 item4 item5 item6 age gender study
#> 1 0 0 0 0 0 0 -2 -1 -1
#> 2 0 0 0 0 0 0 0 -1 -1
#> 3 0 0 0 0 0 0 3 -1 -1
#> 4 0 1 1 1 1 1 1 -1 -1
#> 5 0 0 0 0 0 0 -2 -1 -1
#> 6 1 0 0 0 0 0 1 -1 -1
First, the item responses and predictor values are separately specified:
Second, the regDIF()
function fits a sequence of 10 tuning parameter values using a penalized EM algorithm, which assumes a normal latent variable affects all item responses:
The DIF results are shown below:
summary(fit)
#> Call:
#> regDIF(item.data = item.data, pred.data = pred.data, num.tau = 10)
#>
#> Optimal model (out of 10):
#> tau bic
#> 0.1753246 4081.6941000
#>
#> Non-zero DIF effects:
#> item4.int.age item5.int.age item5.int.gender item5.int.study
#> 0.2153 -0.0897 -0.5717 0.6018
#> item4.slp.study item5.slp.gender
#> -0.0936 -0.1764
When estimation speed is slow, proxy data may be used in place of latent score estimation:
summary(fit_proxy)
#> Call:
#> regDIF(item.data = item.data, pred.data = pred.data, prox.data = rowSums(item.data))
#>
#> Optimal model (out of 100):
#> tau bic
#> 0.2766486 3540.8070000
#>
#> Non-zero DIF effects:
#> item3.int.gender item4.int.age item5.int.gender item5.int.study
#> 0.0955 0.2200 -0.5118 0.7040
#> item2.slp.gender item4.slp.study item5.slp.gender
#> 0.1102 -0.1413 -0.1384
Other penalty functions (besides LASSO) may also be used. For instance, the elastic net penalty uses a second tuning parameter, alpha
, to vary the ratio of LASSO to ridge penalties:
summary(fit_proxy_net)
#> Call:
#> regDIF(item.data = item.data, pred.data = pred.data, prox.data = rowSums(item.data),
#> alpha = 0.5)
#>
#> Optimal model (out of 100):
#> tau bic
#> 0.5685967 3563.7495000
#>
#> Non-zero DIF effects:
#> item3.int.gender item4.int.age item5.int.age item5.int.gender
#> 0.0681 0.1672 -0.0939 -0.3463
#> item5.int.study item2.slp.gender item4.slp.study item5.slp.gender
#> 0.4346 0.0778 -0.1172 -0.1379
Please send any questions to wbelzak@gmail.com.