tidychangepoint

Tidy methods for changepoint analysis

library(tidychangepoint)

The tidychangepoint package allows you to use any number of algorithms for detecting changepoint sets in univariate time series with a common, tidyverse-compliant interface. It also provides model-fitting procedures for commonly-used parametric models, tools for computing various penalty functions, and graphical diagnostic displays.

Changepoint sets are computed using the segment() function, which takes a numeric vector that is coercible into a ts object, and a string indicating the algorithm you wish you use. segment() always returns a tidycpt object.

x <- segment(DataCPSim, method = "pelt")
class(x)
#> [1] "tidycpt"

Various methods are available for tidycpt objects. For example, as.ts() returns the original data as ts object, and changepoints() returns the set of changepoint indices.

changepoints(x)
#> [1] 547 822 972

Retrieving information using the broom interface

tidychangepoint follows the design interface of the broom package. Therefore, augment(), tidy() and glance() methods exists for tidycpt objects.

augment(x)
#> Registered S3 method overwritten by 'tsibble':
#>   method               from 
#>   as_tibble.grouped_df dplyr
#> # A tsibble: 1,096 x 5 [1]
#> # Groups:    region [4]
#>    index     y region  .fitted  .resid
#>    <int> <dbl> <fct>     <dbl>   <dbl>
#>  1     1  35.5 [0,547)    35.3   0.232
#>  2     2  29.0 [0,547)    35.3  -6.27 
#>  3     3  35.6 [0,547)    35.3   0.357
#>  4     4  33.0 [0,547)    35.3  -2.29 
#>  5     5  29.5 [0,547)    35.3  -5.74 
#>  6     6  25.4 [0,547)    35.3  -9.87 
#>  7     7  28.8 [0,547)    35.3  -6.45 
#>  8     8  50.3 [0,547)    35.3  15.0  
#>  9     9  24.9 [0,547)    35.3 -10.3  
#> 10    10  58.9 [0,547)    35.3  23.6  
#> # ℹ 1,086 more rows
tidy(x)
#> # A tibble: 4 × 10
#>   region  num_obs   min   max  mean    sd begin   end param_mu param_sigma_hatsq
#>   <chr>     <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>             <dbl>
#> 1 [0,547)     546  13.7  92.8  35.3  11.3     0   547     35.3              127.
#> 2 [547,8…     275  20.5 163.   58.1  19.3   547   822     58.1              372.
#> 3 [822,9…     150  39.2 215.   96.7  30.5   822   972     96.7              924.
#> 4 [972,1…     125  67.2 299.  156.   49.6   972  1096    156.              2442.
glance(x)
#> # A tibble: 1 × 8
#>   pkg      version algorithm seg_params model_name criteria fitness elapsed_time
#>   <chr>    <pckg_> <chr>     <list>     <chr>      <chr>      <dbl> <drtn>      
#> 1 changep… 2.2.4   PELT      <list [1]> meanvar    MBIC       9403. 0.07 secs

Other methods

The plot() method leverages ggplot2 to provide an informative plot, with the regions defined by the changepoint set clearly demarcated, and the means within each region also indicated.

plot(x)

Other generic functions defined for tidycpt objects include fitness(), as.model(), and exceedances(). For example, fitness() returns a named vector with the value of the penalized objective function used.

fitness(x)
#>     MBIC 
#> 9403.391

Structure

Every tidycpt objects contains two main children:

Both segmenters and models implement methods for the generic functions changepoints(), as.ts(), nobs(), logLik(), model_name(), and glance(). However, it is important to note that while tidychangepoint does its best to match the model used by the segmenter to its corresponding model-fitting function, exact matches do not always exist. Thus, the logLik() of the segmenter may not always match the logLik() of the model. Nevertheless, squaring these values is the focus of ongoing work.

Segmenters

In the example above, the segmenter is of class cpt, because segment() simply wraps the cpt.meanvar() function from the changepoint package.

x |>
  as.segmenter() |>
  str()
#> Formal class 'cpt' [package "changepoint"] with 12 slots
#>   ..@ data.set : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>   ..@ cpttype  : chr "mean and variance"
#>   ..@ method   : chr "PELT"
#>   ..@ test.stat: chr "Normal"
#>   ..@ pen.type : chr "MBIC"
#>   ..@ pen.value: num 28
#>   ..@ minseglen: num 2
#>   ..@ cpts     : int [1:4] 547 822 972 1096
#>   ..@ ncpts.max: num Inf
#>   ..@ param.est:List of 2
#>   .. ..$ mean    : num [1:4] 35.3 58.2 96.8 156.5
#>   .. ..$ variance: num [1:4] 127 371 921 2406
#>   ..@ date     : chr "Wed Jul 17 14:23:24 2024"
#>   ..@ version  : chr "2.2.4"

In addition to the generic functions listed above, segmenters implement methods for the generic functions fitness(), model_args(), and seg_params().

Models

The model object in this case is created by fit_meanvar(), and is of class mod_cpt.

x |>
  as.model() |>
  str()
#> List of 6
#>  $ data         : Time-Series [1:1096] from 1 to 1096: 35.5 29 35.6 33 29.5 ...
#>  $ tau          : int [1:3] 547 822 972
#>  $ region_params: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ region           : chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#>   ..$ param_mu         : num [1:4] 35.3 58.1 96.7 155.9
#>   ..$ param_sigma_hatsq: Named num [1:4] 127 372 924 2442
#>   .. ..- attr(*, "names")= chr [1:4] "[0,547)" "[547,822)" "[822,972)" "[972,1.1e+03]"
#>  $ model_params : NULL
#>  $ fitted_values: num [1:1096] 35.3 35.3 35.3 35.3 35.3 ...
#>  $ model_name   : chr "meanvar"
#>  - attr(*, "class")= chr "mod_cpt"

In addition to the generic functions listed above, models implement methods for the generic functions fitted(), residuals(), coef(), augment(), tidy(), and plot().