Tweedie likelihood computations

library(tweedie)

Tweedie distributions

Tweedie distributions are exponential dispersion models, with a mean \(\mu\) and a variance \(\phi \mu^\xi\), for some dispersion parameter \(\phi > 0\) and a power index \(\xi\) (sometimes called \(p\)) that uniquely defines the distribution within the Tweedie family (for all real values of \(\xi\) not between 0 and 1).

Special cases of the Tweedie distributions are:

For all other values of \(\xi\), the probability functions and distribution functions have no closed forms.

Power index, \(\xi\) Distribution Support
\(\xi = 0\) Normal \((-\infty, \infty)\)
\(0 < \xi < 1\) Do not exist
\(\xi = 1\), \(\phi = 1\) Poisson Discrete: \((0, 1, 2, \dots)\)
\(1 < \xi < 2\) Poisson–gamma \([ 0, \infty)\)
\(\xi = 2\) gamma \((0, \infty)\)
\(\xi > 2\) Skewed right \((0, \infty\))
\(\xi = 3\) inverse Gaussian \((0, \infty)\)

For \(\xi < 1\), applications are limited (non-existent so far?), but have support on the entire real line and \(\mu > 0\).

For \(1 < \xi < 2\), Tweedie distributions can be represented as a Poisson sum of gamma distributions. These distributions are continuous for \(Y > 0\) but have a discrete mass at \(Y = 0\).

Examples

Likelihood computations are not need to fit a Tweedie generalized linear model, using the tweedie() family from the statmod package:

library(statmod)

set.seed(96)
N <- 25
# Mean of the Poisson (lambda) and Gamma (shape/scale)
lambda <- 1.5
# Generating Compound Poisson-Gamma data manually
y <- replicate(N, {
  n_events <- rpois(1, lambda = lambda)
  if (n_events == 0) 0 else sum(rgamma(n_events, shape = 2, scale = 1))
})

mod.tw <- glm(y ~ 1, 
              family = statmod::tweedie(var.power = 1.5, link.power = 0) )
              # link.power = 0  means the log-link

However, likelihood computations are necessary in other situations.

Generating random numbers

Random numbers from a Tweedie distribution are generated using rtweedie():

tweedie::rtweedie(10, xi = 1.1, mu = 2, phi = 1)
#>  [1] 1.536587 2.520597 4.624115 1.999447 2.532301 4.259932 1.725366 0.000000
#>  [9] 0.000000 2.545023

Plotting density and probability functions

Accurate density functions are generated using dtweedie():

y <- seq(0, 2, length = 100)
xi <- 1.1
mu <- 0.5
phi <- 0.4

twden <- tweedie::dtweedie(y, xi = xi, mu = mu, phi = phi)  
twdtn <- tweedie::ptweedie(y, xi = xi, mu = mu, phi = phi)

plot( twden[y > 0] ~ y[y > 0], 
      type ="l",
      lwd = 2,
      xlab = expression(italic(y)),
      ylab = "Density function")
points(twden[y==0] ~ y[y == 0],
      lwd = 2,
      pch = 19,
      xlab = expression(italic(y)),
      ylab = "Distribution function")

Accurate probability functions are generated using ptweedie():

plot(twdtn ~ y,
     type = "l",
      lwd = 2,
     ylim = c(0, 1),
      xlab = expression(italic(y)),
      ylab = "Distribution function")

However, the function tweedie_plot() is sometimes more convenient (especially for \(1 < \xi \ < 2\), when a probability mass at \(Y = 0\) is present):

tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi,
                      ylab = "Density function",
                      xlab = expression(italic(y)),
                      lwd = 2)

tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi, 
                      ylab = "Distribution function",
                      xlab = expression(italic(y)),
                      lwd = 2, 
                      ylim = c(0, 1), 
                      type = "cdf")

Computing the quantile residuals:

Quantile residuals can be computed to assess the fit of a glm:

library(tweedie)

qqnorm( statmod::qresid(mod.tw) )

Estimating \(\xi\)

To use a Tweedie distribution in a glm, the value of \(\xi\) is needed. To find the most suitable value of \(\xi\), tweedie_profile() can be used:

out <- tweedie::tweedie.profile(y ~ 1, 
                                xi.vec = seq(1.2, 1.8, by = 0.05), 
                                do.plot = TRUE)
#> Warning: `tweedie.profile()` was deprecated in tweedie 3.0.5.
#> ℹ Please use `tweedie_profile()` instead.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> .............Done.


# The estimated power index:
out$xi.max
#> [1] 1.432653