A-quick-tour-of-SNMoE

Introduction

SNMoE (Skew-Normal Mixtures-of-Experts) provides a flexible modelling framework for heterogenous data with possibly skewed distributions to generalize the standard Normal mixture of expert model. SNMoE consists of a mixture of K skew-Normal expert regressors network (of degree p) gated by a softmax gating network (of degree q) and is represented by:

Model estimation/learning is performed by a dedicated expectation conditional maximization (ECM) algorithm by maximizing the observed data log-likelihood. We provide simulated examples to illustrate the use of the model in model-based clustering of heterogeneous regression data and in fitting non-linear regression functions.

It was written in R Markdown, using the knitr package for production.

See help(package="meteorits") for further details and references provided by citation("meteorits").

Application to a simulated dataset

Generate sample

n <- 500 # Size of the sample
alphak <- matrix(c(0, 8), ncol = 1) # Parameters of the gating network
betak <- matrix(c(0, -2.5, 0, 2.5), ncol = 2) # Regression coefficients of the experts
lambdak <- c(3, 5) # Skewness parameters of the experts
sigmak <- c(1, 1) # Standard deviations of the experts
x <- seq.int(from = -1, to = 1, length.out = n) # Inputs (predictors)

# Generate sample of size n
sample <- sampleUnivSNMoE(alphak = alphak, betak = betak, sigmak = sigmak, 
                          lambdak = lambdak, x = x)
y <- sample$y

Set up SNMoE model parameters

K <- 2 # Number of regressors/experts
p <- 1 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-6
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

snmoe <- emSNMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
                 threshold, verbose, verbose_IRLS)
## EM - SNMoE: Iteration: 1 | log-likelihood: -527.287937164066
## EM - SNMoE: Iteration: 2 | log-likelihood: -488.149669819772
## EM - SNMoE: Iteration: 3 | log-likelihood: -486.613979894615
## EM - SNMoE: Iteration: 4 | log-likelihood: -486.302628698495
## EM - SNMoE: Iteration: 5 | log-likelihood: -486.222460715282
## EM - SNMoE: Iteration: 6 | log-likelihood: -486.184660195025
## EM - SNMoE: Iteration: 7 | log-likelihood: -486.153034476555
## EM - SNMoE: Iteration: 8 | log-likelihood: -486.122006681072
## EM - SNMoE: Iteration: 9 | log-likelihood: -486.091566542363
## EM - SNMoE: Iteration: 10 | log-likelihood: -486.062270874981
## EM - SNMoE: Iteration: 11 | log-likelihood: -486.034460049825
## EM - SNMoE: Iteration: 12 | log-likelihood: -486.008263297245
## EM - SNMoE: Iteration: 13 | log-likelihood: -485.983682395756
## EM - SNMoE: Iteration: 14 | log-likelihood: -485.960662236847
## EM - SNMoE: Iteration: 15 | log-likelihood: -485.93911281283
## EM - SNMoE: Iteration: 16 | log-likelihood: -485.91889800827
## EM - SNMoE: Iteration: 17 | log-likelihood: -485.899948784636
## EM - SNMoE: Iteration: 18 | log-likelihood: -485.882220335441
## EM - SNMoE: Iteration: 19 | log-likelihood: -485.865585665219
## EM - SNMoE: Iteration: 20 | log-likelihood: -485.849975250496
## EM - SNMoE: Iteration: 21 | log-likelihood: -485.835317464871
## EM - SNMoE: Iteration: 22 | log-likelihood: -485.8215462664
## EM - SNMoE: Iteration: 23 | log-likelihood: -485.808607071084
## EM - SNMoE: Iteration: 24 | log-likelihood: -485.796444399875
## EM - SNMoE: Iteration: 25 | log-likelihood: -485.784990363605
## EM - SNMoE: Iteration: 26 | log-likelihood: -485.774197263514
## EM - SNMoE: Iteration: 27 | log-likelihood: -485.764028131654
## EM - SNMoE: Iteration: 28 | log-likelihood: -485.754440985716
## EM - SNMoE: Iteration: 29 | log-likelihood: -485.745404877648
## EM - SNMoE: Iteration: 30 | log-likelihood: -485.736886260643
## EM - SNMoE: Iteration: 31 | log-likelihood: -485.728830856893
## EM - SNMoE: Iteration: 32 | log-likelihood: -485.721230890484
## EM - SNMoE: Iteration: 33 | log-likelihood: -485.714036912717
## EM - SNMoE: Iteration: 34 | log-likelihood: -485.707220850139
## EM - SNMoE: Iteration: 35 | log-likelihood: -485.700770898581
## EM - SNMoE: Iteration: 36 | log-likelihood: -485.694657650289
## EM - SNMoE: Iteration: 37 | log-likelihood: -485.688853535926
## EM - SNMoE: Iteration: 38 | log-likelihood: -485.683371909014
## EM - SNMoE: Iteration: 39 | log-likelihood: -485.678178306597
## EM - SNMoE: Iteration: 40 | log-likelihood: -485.673241061917
## EM - SNMoE: Iteration: 41 | log-likelihood: -485.668553347505
## EM - SNMoE: Iteration: 42 | log-likelihood: -485.664108229458
## EM - SNMoE: Iteration: 43 | log-likelihood: -485.659891312708
## EM - SNMoE: Iteration: 44 | log-likelihood: -485.65587084941
## EM - SNMoE: Iteration: 45 | log-likelihood: -485.652051592504
## EM - SNMoE: Iteration: 46 | log-likelihood: -485.648423458796
## EM - SNMoE: Iteration: 47 | log-likelihood: -485.644956903056
## EM - SNMoE: Iteration: 48 | log-likelihood: -485.641651379967
## EM - SNMoE: Iteration: 49 | log-likelihood: -485.638504265308
## EM - SNMoE: Iteration: 50 | log-likelihood: -485.63550427347
## EM - SNMoE: Iteration: 51 | log-likelihood: -485.632648684527
## EM - SNMoE: Iteration: 52 | log-likelihood: -485.629926044387
## EM - SNMoE: Iteration: 53 | log-likelihood: -485.627320251661
## EM - SNMoE: Iteration: 54 | log-likelihood: -485.624829419361
## EM - SNMoE: Iteration: 55 | log-likelihood: -485.622453305036
## EM - SNMoE: Iteration: 56 | log-likelihood: -485.620178199553
## EM - SNMoE: Iteration: 57 | log-likelihood: -485.617996552235
## EM - SNMoE: Iteration: 58 | log-likelihood: -485.615918885241
## EM - SNMoE: Iteration: 59 | log-likelihood: -485.61393912745
## EM - SNMoE: Iteration: 60 | log-likelihood: -485.61203778135
## EM - SNMoE: Iteration: 61 | log-likelihood: -485.610218075827
## EM - SNMoE: Iteration: 62 | log-likelihood: -485.608475863347
## EM - SNMoE: Iteration: 63 | log-likelihood: -485.606800073052
## EM - SNMoE: Iteration: 64 | log-likelihood: -485.605189380751
## EM - SNMoE: Iteration: 65 | log-likelihood: -485.603648407257
## EM - SNMoE: Iteration: 66 | log-likelihood: -485.60217125484
## EM - SNMoE: Iteration: 67 | log-likelihood: -485.600766619527
## EM - SNMoE: Iteration: 68 | log-likelihood: -485.599407085375
## EM - SNMoE: Iteration: 69 | log-likelihood: -485.59809908388
## EM - SNMoE: Iteration: 70 | log-likelihood: -485.59684184304
## EM - SNMoE: Iteration: 71 | log-likelihood: -485.595629799638
## EM - SNMoE: Iteration: 72 | log-likelihood: -485.59447564897
## EM - SNMoE: Iteration: 73 | log-likelihood: -485.593371612486
## EM - SNMoE: Iteration: 74 | log-likelihood: -485.592313444969
## EM - SNMoE: Iteration: 75 | log-likelihood: -485.591295083416
## EM - SNMoE: Iteration: 76 | log-likelihood: -485.590316544476
## EM - SNMoE: Iteration: 77 | log-likelihood: -485.5893686805
## EM - SNMoE: Iteration: 78 | log-likelihood: -485.588445462352
## EM - SNMoE: Iteration: 79 | log-likelihood: -485.587558943622
## EM - SNMoE: Iteration: 80 | log-likelihood: -485.586704633952
## EM - SNMoE: Iteration: 81 | log-likelihood: -485.585878110093
## EM - SNMoE: Iteration: 82 | log-likelihood: -485.585078538216
## EM - SNMoE: Iteration: 83 | log-likelihood: -485.584310754457
## EM - SNMoE: Iteration: 84 | log-likelihood: -485.583572491005
## EM - SNMoE: Iteration: 85 | log-likelihood: -485.582860765507
## EM - SNMoE: Iteration: 86 | log-likelihood: -485.58217443264
## EM - SNMoE: Iteration: 87 | log-likelihood: -485.581510965869
## EM - SNMoE: Iteration: 88 | log-likelihood: -485.580867196463
## EM - SNMoE: Iteration: 89 | log-likelihood: -485.580242663066
## EM - SNMoE: Iteration: 90 | log-likelihood: -485.579645636856
## EM - SNMoE: Iteration: 91 | log-likelihood: -485.579071362399
## EM - SNMoE: Iteration: 92 | log-likelihood: -485.578512662018
## EM - SNMoE: Iteration: 93 | log-likelihood: -485.577973190244
## EM - SNMoE: Iteration: 94 | log-likelihood: -485.577452194271
## EM - SNMoE: Iteration: 95 | log-likelihood: -485.576948142351
## EM - SNMoE: Iteration: 96 | log-likelihood: -485.576456396579
## EM - SNMoE: Iteration: 97 | log-likelihood: -485.575974064756

Summary

snmoe$summary()
## -----------------------------------------------
## Fitted Skew-Normal Mixture-of-Experts model
## -----------------------------------------------
## 
## SNMoE model with K = 2 experts:
## 
##  log-likelihood df      AIC      BIC       ICL
##        -485.576 10 -495.576 -516.649 -516.6574
## 
## Clustering table (Number of observations in each expert):
## 
##   1   2 
## 249 251 
## 
## Regression coefficients:
## 
##     Beta(k = 1) Beta(k = 2)
## 1      1.051904    1.013374
## X^1    3.004689   -2.778066
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2)
##      0.3738266     0.4534028

Plots

Mean curve

snmoe$plot(what = "meancurve")

Confidence regions

snmoe$plot(what = "confregions")

Clusters

snmoe$plot(what = "clusters")

Log-likelihood

snmoe$plot(what = "loglikelihood")

Application to a real dataset

Load data

data("tempanomalies")
x <- tempanomalies$Year
y <- tempanomalies$AnnualAnomaly

Set up SNMoE model parameters

K <- 2 # Number of regressors/experts
p <- 1 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-6
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

snmoe <- emSNMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
                 threshold, verbose, verbose_IRLS)
## EM - SNMoE: Iteration: 1 | log-likelihood: 67.1393912546267
## EM - SNMoE: Iteration: 2 | log-likelihood: 86.3123763058244
## EM - SNMoE: Iteration: 3 | log-likelihood: 88.4049020398015
## EM - SNMoE: Iteration: 4 | log-likelihood: 88.7786025096324
## EM - SNMoE: Iteration: 5 | log-likelihood: 88.9863371759242
## EM - SNMoE: Iteration: 6 | log-likelihood: 89.2159102763086
## EM - SNMoE: Iteration: 7 | log-likelihood: 89.4166837570103
## EM - SNMoE: Iteration: 8 | log-likelihood: 89.5378228423525
## EM - SNMoE: Iteration: 9 | log-likelihood: 89.6078941897507
## EM - SNMoE: Iteration: 10 | log-likelihood: 89.6506081922485
## EM - SNMoE: Iteration: 11 | log-likelihood: 89.680679493927
## EM - SNMoE: Iteration: 12 | log-likelihood: 89.7054127986757
## EM - SNMoE: Iteration: 13 | log-likelihood: 89.7271627052861
## EM - SNMoE: Iteration: 14 | log-likelihood: 89.7466422435391
## EM - SNMoE: Iteration: 15 | log-likelihood: 89.7644359313908
## EM - SNMoE: Iteration: 16 | log-likelihood: 89.7808442763708
## EM - SNMoE: Iteration: 17 | log-likelihood: 89.7959623872005
## EM - SNMoE: Iteration: 18 | log-likelihood: 89.8098298887156
## EM - SNMoE: Iteration: 19 | log-likelihood: 89.8224765128155
## EM - SNMoE: Iteration: 20 | log-likelihood: 89.8339351359208
## EM - SNMoE: Iteration: 21 | log-likelihood: 89.8444584666489
## EM - SNMoE: Iteration: 22 | log-likelihood: 89.8539391972029
## EM - SNMoE: Iteration: 23 | log-likelihood: 89.8623392185522
## EM - SNMoE: Iteration: 24 | log-likelihood: 89.8697291463709
## EM - SNMoE: Iteration: 25 | log-likelihood: 89.8763827151644
## EM - SNMoE: Iteration: 26 | log-likelihood: 89.8811754383375
## EM - SNMoE: Iteration: 27 | log-likelihood: 89.8860645132145
## EM - SNMoE: Iteration: 28 | log-likelihood: 89.8901911599733
## EM - SNMoE: Iteration: 29 | log-likelihood: 89.8939229923584
## EM - SNMoE: Iteration: 30 | log-likelihood: 89.897264155598
## EM - SNMoE: Iteration: 31 | log-likelihood: 89.9007321568667
## EM - SNMoE: Iteration: 32 | log-likelihood: 89.9035508488742
## EM - SNMoE: Iteration: 33 | log-likelihood: 89.9060694862566
## EM - SNMoE: Iteration: 34 | log-likelihood: 89.9086672705961
## EM - SNMoE: Iteration: 35 | log-likelihood: 89.9109149161921
## EM - SNMoE: Iteration: 36 | log-likelihood: 89.9130049122629
## EM - SNMoE: Iteration: 37 | log-likelihood: 89.9151466747962
## EM - SNMoE: Iteration: 38 | log-likelihood: 89.9170490540402
## EM - SNMoE: Iteration: 39 | log-likelihood: 89.9189455614356
## EM - SNMoE: Iteration: 40 | log-likelihood: 89.920722490437
## EM - SNMoE: Iteration: 41 | log-likelihood: 89.9223861223175
## EM - SNMoE: Iteration: 42 | log-likelihood: 89.9240011170035
## EM - SNMoE: Iteration: 43 | log-likelihood: 89.9255444752544
## EM - SNMoE: Iteration: 44 | log-likelihood: 89.9270147197148
## EM - SNMoE: Iteration: 45 | log-likelihood: 89.9284205205757
## EM - SNMoE: Iteration: 46 | log-likelihood: 89.929768350036
## EM - SNMoE: Iteration: 47 | log-likelihood: 89.9310655713287
## EM - SNMoE: Iteration: 48 | log-likelihood: 89.9323114372458
## EM - SNMoE: Iteration: 49 | log-likelihood: 89.9335083111587
## EM - SNMoE: Iteration: 50 | log-likelihood: 89.9346590487228
## EM - SNMoE: Iteration: 51 | log-likelihood: 89.9357648946395
## EM - SNMoE: Iteration: 52 | log-likelihood: 89.9368284790995
## EM - SNMoE: Iteration: 53 | log-likelihood: 89.9378517785344
## EM - SNMoE: Iteration: 54 | log-likelihood: 89.9388344884152
## EM - SNMoE: Iteration: 55 | log-likelihood: 89.9397794710125
## EM - SNMoE: Iteration: 56 | log-likelihood: 89.9406929038835
## EM - SNMoE: Iteration: 57 | log-likelihood: 89.9415721977169
## EM - SNMoE: Iteration: 58 | log-likelihood: 89.9424179529526
## EM - SNMoE: Iteration: 59 | log-likelihood: 89.9432317703868
## EM - SNMoE: Iteration: 60 | log-likelihood: 89.9440151036607
## EM - SNMoE: Iteration: 61 | log-likelihood: 89.9447720669891
## EM - SNMoE: Iteration: 62 | log-likelihood: 89.9455021664009
## EM - SNMoE: Iteration: 63 | log-likelihood: 89.9462065398637
## EM - SNMoE: Iteration: 64 | log-likelihood: 89.9468856981156
## EM - SNMoE: Iteration: 65 | log-likelihood: 89.9475410134714
## EM - SNMoE: Iteration: 66 | log-likelihood: 89.9481732090574
## EM - SNMoE: Iteration: 67 | log-likelihood: 89.9487828085701
## EM - SNMoE: Iteration: 68 | log-likelihood: 89.9493709174674
## EM - SNMoE: Iteration: 69 | log-likelihood: 89.9499393216653
## EM - SNMoE: Iteration: 70 | log-likelihood: 89.9504915641522
## EM - SNMoE: Iteration: 71 | log-likelihood: 89.9510234324277
## EM - SNMoE: Iteration: 72 | log-likelihood: 89.9515375509019
## EM - SNMoE: Iteration: 73 | log-likelihood: 89.9520343897918
## EM - SNMoE: Iteration: 74 | log-likelihood: 89.9525147730548
## EM - SNMoE: Iteration: 75 | log-likelihood: 89.952979526795
## EM - SNMoE: Iteration: 76 | log-likelihood: 89.9534287405897
## EM - SNMoE: Iteration: 77 | log-likelihood: 89.9538633332105
## EM - SNMoE: Iteration: 78 | log-likelihood: 89.9542840954176
## EM - SNMoE: Iteration: 79 | log-likelihood: 89.9546914335969
## EM - SNMoE: Iteration: 80 | log-likelihood: 89.9550861492999
## EM - SNMoE: Iteration: 81 | log-likelihood: 89.9554686454909
## EM - SNMoE: Iteration: 82 | log-likelihood: 89.9558386903462
## EM - SNMoE: Iteration: 83 | log-likelihood: 89.9561975428098
## EM - SNMoE: Iteration: 84 | log-likelihood: 89.956545549163
## EM - SNMoE: Iteration: 85 | log-likelihood: 89.9568826067365
## EM - SNMoE: Iteration: 86 | log-likelihood: 89.9572095986266
## EM - SNMoE: Iteration: 87 | log-likelihood: 89.9575263695436
## EM - SNMoE: Iteration: 88 | log-likelihood: 89.9578328566839
## EM - SNMoE: Iteration: 89 | log-likelihood: 89.9581293780223
## EM - SNMoE: Iteration: 90 | log-likelihood: 89.9584173442332
## EM - SNMoE: Iteration: 91 | log-likelihood: 89.958697543531
## EM - SNMoE: Iteration: 92 | log-likelihood: 89.95897017134
## EM - SNMoE: Iteration: 93 | log-likelihood: 89.9592343217354
## EM - SNMoE: Iteration: 94 | log-likelihood: 89.959490268592
## EM - SNMoE: Iteration: 95 | log-likelihood: 89.9597407658552
## EM - SNMoE: Iteration: 96 | log-likelihood: 89.9599830242252
## EM - SNMoE: Iteration: 97 | log-likelihood: 89.960219158931
## EM - SNMoE: Iteration: 98 | log-likelihood: 89.9604487697759
## EM - SNMoE: Iteration: 99 | log-likelihood: 89.9606701685812
## EM - SNMoE: Iteration: 100 | log-likelihood: 89.9608852187594
## EM - SNMoE: Iteration: 101 | log-likelihood: 89.9610939894636
## EM - SNMoE: Iteration: 102 | log-likelihood: 89.9612985304711
## EM - SNMoE: Iteration: 103 | log-likelihood: 89.961496994385
## EM - SNMoE: Iteration: 104 | log-likelihood: 89.9616903747286
## EM - SNMoE: Iteration: 105 | log-likelihood: 89.9618790690262
## EM - SNMoE: Iteration: 106 | log-likelihood: 89.9620614678624
## EM - SNMoE: Iteration: 107 | log-likelihood: 89.9622377985414
## EM - SNMoE: Iteration: 108 | log-likelihood: 89.9624112482239
## EM - SNMoE: Iteration: 109 | log-likelihood: 89.9625810627667
## EM - SNMoE: Iteration: 110 | log-likelihood: 89.9627449576569
## EM - SNMoE: Iteration: 111 | log-likelihood: 89.9629049110195
## EM - SNMoE: Iteration: 112 | log-likelihood: 89.9630633947957
## EM - SNMoE: Iteration: 113 | log-likelihood: 89.9632165833158
## EM - SNMoE: Iteration: 114 | log-likelihood: 89.9633637034398
## EM - SNMoE: Iteration: 115 | log-likelihood: 89.9635083452088
## EM - SNMoE: Iteration: 116 | log-likelihood: 89.9636499016958
## EM - SNMoE: Iteration: 117 | log-likelihood: 89.9637870583276
## EM - SNMoE: Iteration: 118 | log-likelihood: 89.9639202934018
## EM - SNMoE: Iteration: 119 | log-likelihood: 89.9640519846681
## EM - SNMoE: Iteration: 120 | log-likelihood: 89.964180667269
## EM - SNMoE: Iteration: 121 | log-likelihood: 89.9643046747079
## EM - SNMoE: Iteration: 122 | log-likelihood: 89.9644253123161
## EM - SNMoE: Iteration: 123 | log-likelihood: 89.9645423331732
## EM - SNMoE: Iteration: 124 | log-likelihood: 89.9646558210273
## EM - SNMoE: Iteration: 125 | log-likelihood: 89.9647663127239
## EM - SNMoE: Iteration: 126 | log-likelihood: 89.9648744243076
## EM - SNMoE: Iteration: 127 | log-likelihood: 89.9649800581561
## EM - SNMoE: Iteration: 128 | log-likelihood: 89.9650828879559
## EM - SNMoE: Iteration: 129 | log-likelihood: 89.9651846451398
## EM - SNMoE: Iteration: 130 | log-likelihood: 89.9652861758818
## EM - SNMoE: Iteration: 131 | log-likelihood: 89.9653850801511
## EM - SNMoE: Iteration: 132 | log-likelihood: 89.9654812002778
## EM - SNMoE: Iteration: 133 | log-likelihood: 89.9655748811957
## EM - SNMoE: Iteration: 134 | log-likelihood: 89.9656663487702
## EM - SNMoE: Iteration: 135 | log-likelihood: 89.9657558236992

Summary

snmoe$summary()
## -----------------------------------------------
## Fitted Skew-Normal Mixture-of-Experts model
## -----------------------------------------------
## 
## SNMoE model with K = 2 experts:
## 
##  log-likelihood df      AIC      BIC      ICL
##        89.96576 10 79.96576 65.40248 65.31117
## 
## Clustering table (Number of observations in each expert):
## 
##  1  2 
## 70 66 
## 
## Regression coefficients:
## 
##       Beta(k = 1)  Beta(k = 2)
## 1   -14.046791397 -33.81591372
## X^1   0.007206665   0.01720159
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2)
##      0.0143586    0.01759203

Plots

Mean curve

snmoe$plot(what = "meancurve")

Confidence regions

snmoe$plot(what = "confregions")

Clusters

snmoe$plot(what = "clusters")

Log-likelihood

snmoe$plot(what = "loglikelihood")