README

The goal of skewlmm is to fit skew robust linear mixed models, using scale mixture of skew-normal linear mixed models with possible within-subject dependence structure, using an EM-type algorithm. In addition, some tools for model adequacy evaluation are available.

For more information about the model formulation and estimation, please see Schumacher, F. L., Lachos, V. H., and Matos, L. A. (2021). Scale mixture of skew‐normal linear mixed models with within‐subject serial dependence. Statistics in Medicine. DOI: 10.1002/sim.8870.

Installation

remotes::install_github("fernandalschumacher/skewlmm")

install.packages("skewlmm")

Example

library(skewlmm)
#> Loading required package: optimParallel
#> Loading required package: parallel
#> 
#> Attaching package: 'skewlmm'
#> The following object is masked from 'package:stats':
#> 
#>     nobs

dat1 <- as.data.frame(nlme::Orthodont)
fm1 <- smsn.lmm(dat1, formFixed = distance ~ age, formRandom = ~ age,
                groupVar = "Subject", distr = "st",
                control = lmmControl(quiet = TRUE))
summary(fm1)
#> Linear mixed models with distribution st and dependency structure UNC 
#> Call:
#> smsn.lmm(data = dat1, formFixed = distance ~ age, groupVar = "Subject", 
#>     formRandom = ~age, distr = "st", control = lmmControl(quiet = TRUE))
#> 
#> Distribution st with nu = 4.662322 
#> 
#> Random effects: 
#>   Formula: ~age
#>   Structure:  
#>   Estimated variance (D):
#>             (Intercept)         age
#> (Intercept)   6.5378399 -0.55063265
#> age          -0.5506326  0.07893262
#> 
#> Fixed effects: distance ~ age
#> with approximate confidence intervals
#>                  Value Std.error CI 95% lower CI 95% upper
#> (Intercept) 17.0163263 0.9456852   15.1628173   18.8698353
#> age          0.6248518 0.1242525    0.3813214    0.8683822
#> 
#> Dependency structure: UNC
#>   Estimate(s):
#>  sigma2 
#> 0.81705 
#> 
#> Skewness parameter estimate: -3.000814 2.202111
#> 
#> Model selection criteria:
#>    logLik     AIC     BIC
#>  -209.837 437.675 461.814
#> 
#> Number of observations: 108 
#> Number of groups: 27

Several methods are available for SMSN and SMN objects, such as: print, summary, plot, fitted, residuals, predict, and update.

acf1<- acfresid(fm1, calcCI = TRUE)
plot(acf1)

plot(mahalDist(fm1), nlabels = 2)

healy.plot(fm1, calcCI = TRUE)

fm2 <- smn.lmm(dat1, formFixed = distance ~ age, formRandom = ~ age,
               groupVar = "Subject", distr = "t",
               control = lmmControl(quiet = TRUE))
summary(fm2)
#> Linear mixed models with distribution t and dependency structure UNC 
#> Call:
#> smn.lmm(data = dat1, formFixed = distance ~ age, groupVar = "Subject", 
#>     formRandom = ~age, distr = "t", control = lmmControl(quiet = TRUE))
#> 
#> Distribution t with nu = 4.966122 
#> 
#> Random effects: 
#>   Formula: ~age
#>   Structure:  
#>   Estimated variance (D):
#>             (Intercept)         age
#> (Intercept)   3.2735098 -0.16423589
#> age          -0.1642359  0.03246643
#> 
#> Fixed effects: distance ~ age
#> with approximate confidence intervals
#>                 Value  Std.error CI 95% lower CI 95% upper
#> (Intercept) 17.274030 0.67741340   15.9463240   18.6017357
#> age          0.593514 0.06218718    0.4716294    0.7153986
#> 
#> Dependency structure: UNC
#>   Estimate(s):
#>    sigma2 
#> 0.8926729 
#> 
#> Model selection criteria:
#>    logLik     AIC     BIC
#>  -211.351 436.701 455.476
#> 
#> Number of observations: 108 
#> Number of groups: 27

Now, for performing a LRT for testing if the skewness parameter is 0 (\(\text{H}_0: \lambda_i=0, \forall i\)), one can use the following:

lr.test(fm1,fm2)
#> 
#> Model selection criteria:
#>       logLik     AIC     BIC
#> fm1 -209.837 437.675 461.814
#> fm2 -211.351 436.701 455.476
#> 
#>     Likelihood-ratio Test
#> 
#> chi-square statistics =  3.026434 
#> df =  2 
#> p-value =  0.2202005 
#> 
#> The null hypothesis that both models represent the 
#> data equally well is not rejected at level  0.05

By default, the functions smsn.lmm and smn.lmm now use the DAAREM method (a method for EM accelaration, for details see help(package="daarem")) for estimation, to improve the computational performance. This method usually greatly reduces the convergence time, but its use can result in numerical errors, specially for small samples. In this cases, the EM algorithm can be used, as follows:

fm2EM <- smn.lmm(dat1, formFixed = distance ~ age, formRandom = ~ age, distr = 't',
                 groupVar = "Subject", control = lmmControl(algorithm = "EM", 
                                                            quiet = TRUE))
fm2EM
#> Linear mixed models with distribution t and dependency structure UNC 
#> Call:
#> smn.lmm(data = dat1, formFixed = distance ~ age, groupVar = "Subject", 
#>     formRandom = ~age, distr = "t", control = lmmControl(algorithm = "EM", 
#>         quiet = TRUE))
#> 
#> Fixed: distance ~ age
#> Random:
#>   Formula: ~age
#>   Structure: General positive-definite 
#>   Estimated variance (D):
#>             (Intercept)        age
#> (Intercept)   3.1584628 -0.1533659
#> age          -0.1533659  0.0314773
#> 
#> Estimated parameters:
#>      (Intercept)    age sigma2 Dsqrt11 Dsqrt12 Dsqrt22    nu1
#>          17.2876 0.5958 0.8982  1.7754 -0.0793  0.1587 4.9883
#> s.e.      0.6684 0.0616 0.2460  0.8421  0.0931  0.0518     NA
#> 
#> Model selection criteria:
#>    logLik     AIC     BIC
#>  -211.351 436.701 455.476
#> 
#> Number of observations: 108 
#> Number of groups: 27

Also, we can fit a t-LMM with diagonal scale matrix for the random effects by using:

fm2diag <- update(fm2, covRandom = "pdDiag")
fm2diag
#> Linear mixed models with distribution t and dependency structure UNC 
#> Call:
#> smn.lmm(data = dat1, formFixed = distance ~ age, groupVar = "Subject", 
#>     formRandom = ~age, distr = "t", covRandom = "pdDiag", control = lmmControl(quiet = TRUE))
#> 
#> Fixed: distance ~ age
#> Random:
#>   Formula: ~age
#>   Structure: Diagonal 
#>   Estimated variance (D):
#>             (Intercept)        age
#> (Intercept)    1.546268 0.00000000
#> age            0.000000 0.01789115
#> 
#> Estimated parameters:
#>      (Intercept)    age sigma2 Dsqrt11 Dsqrt22    nu1
#>          17.2827 0.5959 0.9699  1.2435  0.1338 4.9841
#> s.e.      0.5864 0.0540 0.2388  0.6191  0.0551     NA
#> 
#> Model selection criteria:
#>    logLik     AIC    BIC
#>  -211.598 435.197 451.29
#> 
#> Number of observations: 108 
#> Number of groups: 27

We can compare the information criteria for all fitted models using the criteria function, as follows:

criteria(list(`ST-LMM` = fm1, `t-LMM` = fm2, `t-LMM(EM)` = fm2EM, `t-LMM-diag` = fm2diag))
#>               logLik npar      AIC      BIC
#> ST-LMM     -209.8374    9 437.6748 461.8140
#> t-LMM      -211.3506    7 436.7012 455.4761
#> t-LMM(EM)  -211.3506    7 436.7013 455.4762
#> t-LMM-diag -211.5985    6 435.1969 451.2897

Handling censored/missing observations

An extension of the methods to account for censoring in SMSN-LMM is under development. Tools for accommodating left, right, or interval censored observations in the symmetrical family SMN-LMM are now available using the function smn.clmm.

For more information on censored models, we refer to Matos, L. A., Prates, M. O., Chen, M. H., and Lachos, V. H. (2013). Likelihood-based inference for mixed-effects models with censored response using the multivariate-t distribution. Statistica Sinica. DOI: 10.5705/ss.2012.043.