Tweedie distributions are exponential dispersion models, with a mean \(\mu\) and a variance \(\phi \mu^\xi\), for some dispersion parameter \(\phi > 0\) and a power index \(\xi\) (sometimes called \(p\)) that uniquely defines the distribution within the Tweedie family (for all real values of \(\xi\) not between 0 and 1).
Special cases of the Tweedie distributions are:
For all other values of \(\xi\), the probability functions and distribution functions have no closed forms.
| Power index, \(\xi\) | Distribution | Support |
|---|---|---|
| \(\xi = 0\) | Normal | \((-\infty, \infty)\) |
| \(0 < \xi < 1\) | Do not exist | |
| \(\xi = 1\), \(\phi = 1\) | Poisson | Discrete: \((0, 1, 2, \dots)\) |
| \(1 < \xi < 2\) | Poisson–gamma | \([ 0, \infty)\) |
| \(\xi = 2\) | gamma | \((0, \infty)\) |
| \(\xi > 2\) | Skewed right | \((0, \infty\)) |
| \(\xi = 3\) | inverse Gaussian | \((0, \infty)\) |
For \(\xi < 1\), applications are limited (non-existent so far?), but have support on the entire real line and \(\mu > 0\).
For \(1 < \xi < 2\), Tweedie distributions can be represented as a Poisson sum of gamma distributions. These distributions are continuous for \(Y > 0\) but have a discrete mass at \(Y = 0\).
Likelihood computations are not need to fit a Tweedie generalized
linear model, using the tweedie() family from the statmod
package:
library(statmod)
set.seed(96)
N <- 25
# Mean of the Poisson (lambda) and Gamma (shape/scale)
lambda <- 1.5
# Generating Compound Poisson-Gamma data manually
y <- replicate(N, {
n_events <- rpois(1, lambda = lambda)
if (n_events == 0) 0 else sum(rgamma(n_events, shape = 2, scale = 1))
})
mod.tw <- glm(y ~ 1,
family = statmod::tweedie(var.power = 1.5, link.power = 0) )
# link.power = 0 means the log-linkHowever, likelihood computations are necessary in other situations.
Random numbers from a Tweedie distribution are generated using rtweedie():
Accurate density functions are generated using dtweedie():
y <- seq(0, 2, length = 100)
xi <- 1.1
mu <- 0.5
phi <- 0.4
twden <- tweedie::dtweedie(y, xi = xi, mu = mu, phi = phi)
twdtn <- tweedie::ptweedie(y, xi = xi, mu = mu, phi = phi)
plot( twden[y > 0] ~ y[y > 0],
type ="l",
lwd = 2,
xlab = expression(italic(y)),
ylab = "Density function")
points(twden[y==0] ~ y[y == 0],
lwd = 2,
pch = 19,
xlab = expression(italic(y)),
ylab = "Distribution function")Accurate probability functions are generated using ptweedie():
plot(twdtn ~ y,
type = "l",
lwd = 2,
ylim = c(0, 1),
xlab = expression(italic(y)),
ylab = "Distribution function")However, the function tweedie_plot() is sometimes more convenient (especially for \(1 < \xi \ < 2\), when a probability mass at \(Y = 0\) is present):
tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi,
ylab = "Density function",
xlab = expression(italic(y)),
lwd = 2)tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi,
ylab = "Distribution function",
xlab = expression(italic(y)),
lwd = 2,
ylim = c(0, 1),
type = "cdf")To use a Tweedie distribution in a glm, the value of \(\xi\) is needed. To find the most suitable value of \(\xi\), tweedie_profile() can be used:
out <- tweedie::tweedie.profile(y ~ 1,
xi.vec = seq(1.2, 1.8, by = 0.05),
do.plot = TRUE)
#> Warning: `tweedie.profile()` was deprecated in tweedie 3.0.5.
#> ℹ Please use `tweedie_profile()` instead.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> .............Done.