The recent pandemic highlighted the need for quick access to new drugs and vaccines for patients. However stability assessment represents a bottleneck when it is based on real-time data covering 2 or 3 years. To accelerate the decisions and ultimately the time-to-market, accelerated stability studies may be used with data obtained for 6 months (sometimes less) in stressed conditions (higher temperature or higher humidity). Data from studies at accelerated and stress conditions can be used compellingly to establish shelf-life and predict the effect of temperatures different from the recommended condition (for example, during transportation). Three or four temperatures are typically tested with at least 3 time points in addition to the time zero.
The kinetic Arrhenius model and its related Arrhenius plot are oversimplified to extrapolate the critical quality attribute over time. Furthermore, data obtained at fridge temperature (5 Celsius) might show a positive slope degradation due to the measurement errors. This document illustrates the use of the ordinary differential equation (ODE) from Sestak-Berggren model. This equation models jointly all the data obtained at different temperatures, allowing for more accurate extrapolation of the degradation both in time and temperature.
In this document, the focus will be the Sestak-Berggren 1-step model with a decreasing variable over time (i.e. the loss of potency over time) as it covers a large variety of accelerated stability studies:
\[\begin{equation}\frac{\text{d}\alpha}{\text{d}t}=A_1 \exp{\left(\frac{-E_1}{RT}\right)}\left(1-\alpha\right)^{n_1},\end{equation}\]
Where \(t\) is the time, \(\alpha\) is the degradation (the reaction progress), \(A_1\) is the kinetic parameter, \(E_1\) is the activation energy, \(R\) is the universal gas constant and \(T\) the temperature (in Kelvin). For the statistical modeling, this formula is rewritten as:
\[\begin{equation}\frac{\text{d}\alpha}{\text{d}t}= \exp{\left(k_1\right)} \exp{\left(\frac{-k_2}{T}\right)}\left(1-\alpha\right)^{k_3},\end{equation}\]
Where \(k_1\), \(k_2\) and \(k_3\) are 3 kinetic parameters to estimate together with the intercept, \(C_0\), which is the concentration (or any CQA) at time zero. This equation can be integrated by assuming that the degradation starts at time zero (\(\alpha=0\) at \(t=0\)). The resulting formula is then a non-linear model given by:
\[\begin{equation}Y = C_0 \sqrt[1-k_3]{1-\left(1-k_3\right)t\exp{\left(k_1-\frac{k_2}{T}\right)}}\end{equation}\]
The 2 kinetic parameters \(k_1\) and \(k_2\) are highly correlated. An alternative model reduces this correlation by including the mean temperature of the study, \(\bar{T}\), as follows:
\[\begin{equation}Y = C_0 \sqrt[1-k_3]{1-\left(1-k_3\right)t\exp{\left(k_1-\frac{k_2}{T}+\frac{k_2}{\bar{T}}\right)}}\end{equation}\]
In the special case of zero order reaction (\(k_3=0\)), the ODE model reduces to a straight line per temperature: \[\begin{equation}Y = C_0 \left(1-t\exp{\left(k_1-\frac{k_2}{T}\right)}\right)\end{equation}\] In this document, the normality of the data and the homoscedasticity are assumed (equal variances across time and temperature).
The R package AccelStab can be used to analyse an accelerated stability study and to visualise the statistical intervals. Two real data sets are included: antigenicity and potency.
The antigenicity of a candidate vaccine is investigated during an accelerated stability study where 4 temperatures are tested with time points from 0 to 5 months. The goal is to predict (to extrapolate) the antigenicity at the end of the (expected) shelf life (3 years) at 5C. The data set contains a total of 50 antigenicity measurements. The antigenicity data set can be downloaded as follows:
library(AccelStab)
data(antigenicity)
antigenicity$Validate = as.factor(ifelse(antigenicity$time <= 0.5, 0, 1))
head(antigenicity)
#> time Celsius K conc N.days validA Validate
#> 1 0.00000000 5 278.15 96.94 0 0 0
#> 2 0.00000000 5 278.15 102.40 0 0 0
#> 3 0.00000000 5 278.15 97.35 0 0 0
#> 4 0.00000000 5 278.15 89.65 0 0 0
#> 5 0.03835616 5 278.15 103.32 14 0 0
#> 6 0.03835616 37 310.15 67.98 14 0 0
Four different temperatures are tested in several time points:
table(antigenicity$Celsius)
#>
#> 5 20 32 37
#> 20 12 14 10
unique(antigenicity$time)
#> [1] 0.00000000 0.03835616 0.04109589 0.05753425 0.08493151 0.09589041
#> [7] 0.12328767 0.15890411 0.17808219 0.19178082 0.19452055 0.21643836
#> [13] 0.23013699 0.23287671 0.25479452 0.25753425 0.29315068 0.33972603
#> [19] 0.35068493 0.35342466 0.40273973 0.44383562 0.46575342 0.46849315
#> [25] 1.11232877 1.11506849
Data obtained after 6 months will be used for validation only (they are not used to fit the model). The raw data of the study can be visualised quickly with the help of the function step1_plot_desc as follows:
step1_plot_desc(data = antigenicity, .time = "time", y = "conc", C = "Celsius", validation = "Validate", yname = "Antigenicity")
This plot displays the data points with one colour per temperature and connects the data points over time (through the mean values for replicated data).
The main function step1_down contains several options to fit the model for a decreasing variable.
args(step1_down)
#> function (data, y, .time, K = NULL, C = NULL, validation = NULL,
#> draw = 10000, parms = NULL, temp_pred_C = NULL, max_time_pred = NULL,
#> confidence_interval = 0.95, by = 101, reparameterisation = FALSE,
#> zero_order = FALSE, ...)
#> NULL
The user must give the data frame, the y variable (column name), time variable and a Celsius or Kelvin variable. The starting values for the algorithm (non-linear fit) can be given with the argument parms. If null (by default), the algorithm will use random numbers as starting values (our recommendation unless the user has reasonable values). The model is then fitted by calling step1_down and the summary of the fit and other statistics are obtained by generic R functions:
res = step1_down(data = antigenicity, y = "conc", .time = "time", C = "Celsius", validation = "Validate")
summary(res$fit)
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> k1 42.405 4.043 10.488 3.15e-14 ***
#> k2 11966.540 1139.461 10.502 3.02e-14 ***
#> k3 8.472 1.109 7.641 5.99e-10 ***
#> c0 98.058 1.602 61.203 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.939 on 50 degrees of freedom
#> Number of iterations to termination: 7
#> Reason for termination: Relative error in the sum of squares is at most `ftol'.
confint(res$fit)
#> 2.5 % 97.5 %
#> k1 34.480490 50.32897
#> k2 9733.236370 14199.84284
#> k3 6.299113 10.64522
#> c0 94.917336 101.19777
The degrees of freedom are equal to 50 and the residual standard deviation is 3.9. All the parameters (the 3 kinetic parameters and the intercept) are highly significant. Note that the intercept is close to 100 (the theoretical antigenicity value at \(t=0\)) and its 95% confidence interval (CI) is given by [94.9, 101.2] which contains 100. The same results would be obtained by fixing the starting values in a list through the argument parms. This is useful if the model does not converge to an optimal solution with random starting values.
res = step1_down(data = antigenicity, y = "conc", .time = "time", C = "Celsius", validation = "Validate", parms = list(k1 = 50, k2 = 10000, k3 = 3, c0 = 100))
summary(res$fit)
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> k1 42.405 4.043 10.488 3.16e-14 ***
#> k2 11966.548 1139.496 10.502 3.02e-14 ***
#> k3 8.472 1.109 7.641 5.99e-10 ***
#> c0 98.058 1.602 61.204 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.939 on 50 degrees of freedom
#> Number of iterations to termination: 15
#> Reason for termination: Relative error in the sum of squares is at most `ftol'.
confint(res$fit)
#> 2.5 % 97.5 %
#> k1 34.480318 50.32922
#> k2 9733.176758 14199.91957
#> k3 6.299131 10.64525
#> c0 94.917431 101.19774
The predictions are calculated, by default, for all temperatures included in the data set and from time zero to the largest time point. The number of time points for the predictions is controlled by the argument by (set to 101 by default). All the predictions are embedded in a data frame called prediction from the model fit.
head(res$prediction[,1:5])
#> time K Degradation Response Celsius
#> 1 0.00000000 278.15 0.000000000 98.05758 5
#> 2 0.01115068 278.15 0.005866975 97.48228 5
#> 3 0.02230137 278.15 0.011454537 96.93438 5
#> 4 0.03345205 278.15 0.016786723 96.41152 5
#> 5 0.04460274 278.15 0.021884650 95.91163 5
#> 6 0.05575342 278.15 0.026766966 95.43288 5
The predictions can be visualised with the function step1_plot_pred. A plot is then drawn with the R package ggplot2. If needed, one can easily add any additional aesthetics or line from the ggplot graph.
As the goal of an accelerated stability study is to extrapolate the CQA for a longer time period, the predictions can be extrapolated in AccelStab by re-running step1_down and using the argument max_time_pred. For example, the following code will predict the antigenicity until 3 years and a lower specification limit (LSL) is added at 65.
res = step1_down(data = antigenicity, y = "conc", .time = "time", C = "Celsius", validation = "Validate", parms = list(k1 = 50, k2 = 10000, k3 = 3, c0 = 100), max_time_pred = 3)
graph = step1_plot_pred(res, yname = "Antigenicity")
graph = graph + geom_hline(aes(yintercept = 65), linetype = "dotted")
graph
The model can also extrapolate the degradation for any temperature not included in the design. This is illustrated in the following line by extrapolating the degradation at 2C by using the argument temp_pred_C in the step1_down function.
res = step1_down(data = antigenicity, y = "conc", .time = "time", C = "Celsius", validation = "Validate", parms = list(k1 = 50, k2 = 10000, k3 = 3, c0 = 100), max_time_pred = 3, temp_pred_C = 2)
step1_plot_pred(res, yname = "Antigenicity")
This option is very useful to predict long-term storage at 5C when no data are collected at 5C. This is illustrated in the following lines by re-running step1_down without the data at 5C (except at time zero), then extrapolating the predictions at 5C.
subdat = antigenicity[!(antigenicity$Celsius == "5" & antigenicity$time != 0),]
res = step1_down(data = subdat, y = "conc", .time = "time", C = "Celsius", max_time_pred = 3, temp_pred_C = 5)
step1_plot_pred(res, yname = "Antigenicity")
Pointwise confidence intervals (CIs) and prediction intervals (PIs) are given in the data frame predictions with a default 95% confidence level.
res = step1_down(data = antigenicity, y = "conc", .time = "time", C = "Celsius", max_time_pred = 3, validation = "Validate")
head(res$prediction[,-c(3,6:8)])
#> time K Response Celsius CI1 CI2 PI1 PI2
#> 1 0.00 278.15 98.05757 5 94.91144 101.29335 89.67451 106.43820
#> 2 0.03 278.15 96.57083 5 93.91913 98.99138 88.48467 104.76014
#> 3 0.06 278.15 95.25576 5 92.97427 97.23725 87.14420 103.21862
#> 4 0.09 278.15 94.07849 5 92.01689 95.83432 86.00612 101.79644
#> 5 0.12 278.15 93.01412 5 91.06667 94.67567 85.01921 100.80798
#> 6 0.15 278.15 92.04387 5 90.10237 93.69388 84.04460 99.86015
Our recommendation to calculate the statistical intervals is to sample the coefficients within the multi-t distribution by using the argument draw. This gives, by default, 10^4 sets of coefficients similarly to the posterior distribution in Bayesian (without any prior). The predictions are calculated for every set of coefficients. The quantiles (typically 2.5 and 97.5%) are then calculated for each temperature and each time point in the prediction data frame in order to obtain the confidence or the prediction intervals. The drawing technique does not require any additional fit which is a main advantage compared to other techniques like bootstrapping. If the draw option is set to null, then, the delta method will be used (this will give symmetric interval around the predicted curves).
Three main functions are available in AccelStab to visualise the statistical intervals. The first one, step1_plot_CI, can be used to visualise the predictions with confidence intervals. The second one, step1_plot_PI, can be used to visualise the predictions with prediction intervals. The third one, step1_plot_T, can be used to visualise the confidence and prediction interval for a given temperature. A ribbon can be added easily by using the argument ribbon.
res = step1_down(data = antigenicity, y = "conc", .time = "time", C = "Celsius", max_time_pred = 3, validation = "Validate")
#> [1] "The 95% Wald Confidence Interval for k3 includes 0, k3 is estimated as 8056. We suggest considering option zero_order = TRUE"
#> 50.42% of the bootstraps for k3 are below zero, this might have an adverse effect on the confidence interval, particularly if this value exceeds the confidence %.
#> We suggest considering option zero_order = TRUE
step1_plot_CI(res, yname = "Antigenicity")
A Temperature excursion is any deviation from the product’s optimal temperature range during either transport, handling or storage. The excursion function within AccelStab allows for straightforward estimation of the effect that temperature excursions have on a product and the corresponding confidence and prediction intervals. We have modified the Sestak-Berggren model to ‘carry over’ degradation from the previous phase. There are two new variables \(t_p\) represents the time since the start of that phase, \(\alpha'\) is the degradation at the end of the previous phase.
\[\begin{equation}Y = C_0 \sqrt[1-k_3]{(1-\alpha')^{(1-k_3)}-\left(1-k_3\right)t_p\exp{\left(k_1-\frac{k_2}{T}\right)}}\end{equation}\]
This updated equation is incorporated into the excursion function. The following example will estimate a temperature excursion on top of the antigenicity data set. Initially the product is stored at 5 degrees for 6 months, then it is subject to 35 degrees for 1 month, then it is returned to 6 degrees for 17 months. Predictions and intervals are available alongside a plot.
res = step1_down(data = antigenicity, y = "conc", .time = "time", C = "Celsius", max_time_pred = 3, validation = "Validate")
exc <- excursion(res, temp_changes = c(5,35,6), time_changes = c(6/12,7/12,24/12), yname = "Antigenicity")
#> [1] "Sample draw progress: 10%"
#> [1] "Sample draw progress: 20%"
#> [1] "Sample draw progress: 30%"
#> [1] "Sample draw progress: 40%"
#> [1] "Sample draw progress: 50%"
#> [1] "Sample draw progress: 60%"
#> [1] "Sample draw progress: 70%"
#> [1] "Sample draw progress: 80%"
#> [1] "Sample draw progress: 90%"
#> [1] "Sample draw progress: 100%"
tail(exc$predictions[,-c(4)])
#> phase_time temps conc phase total_time CI1b CI2b PI1b
#> 298 1.345833 6 61.79954 3 1.929167 59.46089 63.20626 53.77476
#> 299 1.360000 6 61.78206 3 1.943333 59.43187 63.19301 53.59059
#> 300 1.374167 6 61.76462 3 1.957500 59.39952 63.18019 53.63916
#> 301 1.388333 6 61.74723 3 1.971667 59.36825 63.16569 53.58811
#> 302 1.402500 6 61.72987 3 1.985833 59.33898 63.15384 53.42880
#> 303 1.416667 6 61.71256 3 2.000000 59.30656 63.14229 53.57260
#> PI2b
#> 298 69.44693
#> 299 69.64421
#> 300 69.42691
#> 301 69.56450
#> 302 69.44825
#> 303 69.43375
exc$excursion_plot
It is straightforward to calculate any statistics by sampling from the multi-t distribution. For example, the loss in antigenicity at 1 year 5C can be calculated and its 95% CI obtained from the set of coefficients drawn in the multi-t. The function step1_sample_mvt samples coefficients in the multi-t distribution.
draws = step1_sample_mvt(data = antigenicity, y = "conc", .time = "time", C = "Celsius", validation = "Validate", draw = 10^4)
draws = as.data.frame(draws)
head(draws)
#> k1 k2 k3 c0
#> 1 51.28547 14485.00 10.450851 97.96261
#> 2 44.71924 12728.63 8.691783 95.68006
#> 3 40.11267 11259.98 8.341292 99.31406
#> 4 41.07097 11750.76 7.006221 95.87220
#> 5 57.10405 16127.53 11.668159 97.15919
#> 6 45.90543 12910.85 9.825832 98.57293
The predictions at 1 year can be calculated “by hand” as follows and compared to the predictions at t0:
p1 = draws$c0 - draws$c0 * (1 - ((1 - draws$k3) * (1/(1 - draws$k3) - 1 * exp(draws$k1 - draws$k2 / (5+273.15))))^(1/(1-draws$k3)))
loss_1y = 1 - p1/draws$c0
mean(loss_1y)
#> [1] 0.1950138
quantile(loss_1y, c(0.025, 0.975))
#> 2.5% 97.5%
#> 0.1482715 0.2462301
hist(loss_1y, main = "Lost (%) in 1 year at 5C")
abline(v = mean(loss_1y), lwd = 2, col = 3)
abline(v = quantile(loss_1y, c(0.025, 0.975)), lwd = 2, col = 3, lty = 2)
One can see that the expected loss is 19.5% with a 95% CI given by [14.8, 24.6]%.
In this data set, the potency of a vaccine is measured with 3 temperatures and time points from 0 to 6 months (0, 3 and 6 months at 5C, and 1.5, 3 and 6 months at 25C, and 1, 2 and 4 weeks at 37C). The goal is to predict the vaccine drug product potency at 5C in 3 years. The data set contains a total of 78 potency measurements including 55 measurements until 6 months and 23 measurements obtained a posteriori at 5C (from 6 months to 3 years) used to validate the model.
data(potency)
potency$Validate = as.factor(ifelse(potency$Time < 8, 0, 1))
head(potency)
#> Time Potency Celsius Validate
#> 1 0 9.5 5 0
#> 2 3 9.5 5 0
#> 3 6 9.4 5 0
#> 4 9 9.3 5 1
#> 5 12 9.5 5 1
#> 6 15 9.4 5 1
step1_plot_desc(data = potency, .time = "Time", y = "Potency", C = "Celsius", validation = "Validate", yname = "Potency")
The model is fitted by calling step1_down:
res = step1_down(data = potency, y = "Potency", .time = "Time", C = "Celsius", validation = "Validate")
#> k3 is fitted to be exactly 0, we strongly suggest using option zero_order = TRUE
#> The model will continue with k3 = 0, so degradation is linear over time
#>
#> 50.35% of the bootstraps for k3 are below zero, this might have an adverse effect on the confidence interval, particularly if this value exceeds the confidence %.
#> We suggest considering option zero_order = TRUE
summary(res$fit)
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> k1 3.794e+01 1.779e+00 21.33 <2e-16 ***
#> k2 1.250e+04 5.544e+02 22.55 <2e-16 ***
#> k3 0.000e+00 2.679e+00 0.00 1
#> c0 9.524e+00 2.455e-02 387.88 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.1177 on 51 degrees of freedom
#> Number of iterations to termination: 37
#> Reason for termination: Relative error between `par' and the solution is at most `ptol'.
confint(res$fit)
#> 2.5 % 97.5 %
#> k1 34.456714 41.430699
#> k2 11412.774783 13586.003806
#> k3 -5.250377 5.250377
#> c0 9.476266 9.572519
One can see that the kinetic order is estimated as zero by the model. A warning is then printed to suggest to the user to rerun the code with the option zero_order = TRUE.
res = step1_down(data = potency, y = "Potency", .time = "Time", C = "Celsius", validation = "Validate", zero_order = TRUE)
summary(res$fit)
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> k1 3.946e+01 1.631e+00 24.19 <2e-16 ***
#> k2 1.296e+04 4.939e+02 26.23 <2e-16 ***
#> c0 9.525e+00 2.059e-02 462.71 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.1157 on 52 degrees of freedom
#> Number of iterations to termination: 22
#> Reason for termination: Relative error in the sum of squares is at most `ftol'.
confint(res$fit)
#> 2.5 % 97.5 %
#> k1 36.261657 42.655302
#> k2 11989.456755 13925.699195
#> c0 9.484622 9.565314
Built-in visualisations of residuals are produced as a list of five plots, the first of which is a histogram.
The root mean square error (RMSE) can be calculated by the function step1_down_rmse for any sets of starting values. This might be useful to choose the starting values to model the fit with step1_down function. The following heat map illustrates the RMSE for multiple sets of k1 and k2. A log scale is used for better visualisation.
RMSE = step1_down_rmse(data = potency, y = "Potency", .time = "Time",
C = "Celsius", parms = list(c0 = 9.5, k1 = seq(38, 42, 0.02),
k2 = seq(12000, 14000, 5), k3 = 0))
RMSE$logrmse = log(RMSE$rmse)
plot = ggplot() + geom_point(data=RMSE, mapping=aes(x= k1, y = k2, colour = logrmse), size = 1.5, stroke = 0)
plot = plot + labs( x = "k1", y = "k2")
plot = plot + scale_color_gradient2(midpoint = mean(RMSE$logrmse), low = "blue", mid = "yellow", high = "green")
plot
Note that the two kinetic parameters k1 and k2 are highly correlated. One might use the option reparameterisation = TRUE, if needed.
The predictions and statistical intervals can be obtained as previously explained.
head(res$prediction[,-(6:8)])
#> time K Degradation Response Celsius CI1 CI2 PI1 PI2
#> 1 0.00 278.15 0.0000000000 9.524968 5 9.483672 9.565645 9.291800 9.758105
#> 2 0.36 278.15 0.0002893051 9.522212 5 9.480942 9.562735 9.298402 9.752758
#> 3 0.72 278.15 0.0005786103 9.519456 5 9.478173 9.559793 9.284793 9.749726
#> 4 1.08 278.15 0.0008679154 9.516701 5 9.475456 9.556934 9.287075 9.746416
#> 5 1.44 278.15 0.0011572206 9.513945 5 9.472560 9.553951 9.283673 9.742523
#> 6 1.80 278.15 0.0014465257 9.511189 5 9.469803 9.551094 9.280809 9.741191
plot = step1_plot_pred(res, yname = "Potency")
plot = plot + scale_y_continuous(limits = c(5,10))
plot
plot = step1_plot_T(res, focus_T = 5, yname = "Potency", ribbon = TRUE)
plot = plot + scale_y_continuous(limits = c(5,10))
plot
To illustrate the use of sampling within the multi-t distribution, consider a lower specification limit equal to 9. It is then straightforward to calculate the time needed to reach this limit at 5C. This can be solved by sampling the coefficients of the fit within the multi-t distribution. The inverse prediction is then calculated from the sets of coefficients.
draws = step1_sample_mvt(data = potency, y = "Potency", .time = "Time", C = "Celsius", validation = "Validate", draw = 10^4, zero_order = TRUE)
draws = as.data.frame(draws)
head(draws)
#> k1 k2 c0
#> 1 39.45317 12938.85 9.564305
#> 2 37.62054 12399.34 9.563955
#> 3 40.66143 13324.32 9.543633
#> 4 38.30754 12604.80 9.548509
#> 5 40.46231 13281.99 9.510039
#> 6 40.84026 13353.62 9.564208
T9 = (1 - 9/draws$c0) / exp(draws$k1 - draws$k2 / (5+273.15))
The vector T9 is the distribution of time needed to reach the lower specification (9). This distribution can be visualised with a histogram where the estimated mean and its 95% CI are added.
mean(T9)
#> [1] 69.35768
quantile(T9, c(0.025, 0.975))
#> 2.5% 97.5%
#> 50.96537 92.76726
hist(T9, main = "Time to reach the lower specification at 5C")
abline(v = mean(T9), lwd = 2, col = 3)
abline(v = quantile(T9, c(0.025, 0.975)), lwd = 2, col = 3, lty = 2)