One person has become infected with COVID-19 on 28.12.2021. When will the contact person become ill and show the first symptoms?
get_serial_interval_density
The function get_serial_interval_density()
creates a
dataframe containing the probability that a contact will start showing
symptoms (serial interval) at a particular date/time. Therefore, only
the symptom begin date of the infected person is needed. Furthermore,
the probability when the infected contacts of infected contacts show
symptoms can be calculated, known as the second and also third
generation of contacts.
Multiple arguments are needed for the function
get_serial_interval_density()
:
First of all, the symptom_begin_date
has to be
specified, which is defined as the date when the person started to show
symptoms.
Furthermore, the max_serial_interval_days
is needed,
which defines the interval length of the distribution output.
The remaining two inputs shape_serial
and
rate_serial
are the parameters of the log-normal
distribution, which models the serial interval.
<- as.Date("2021-12-28")
symptom_begin_date <- 20
max_serial_interval_days <- 2.154631545
shape_serial <- 0.377343528
rate_serial
<- get_serial_interval_density(symptom_begin_date,
serial_in_df_v1
max_serial_interval_days,
shape_serial, rate_serial)
The default values of the parameters for the distribution, when the infected contacts will show symptoms, are from the paper Najafi et al. [1], in which a gamma distribution for the serial interval between symptom onsets of infected persons and the symptom onset of infected contacts was estimated.
For the second generation of contacts the probability equals the
summation of random variables because we assume that the infection from
infected person to the first contact generation is independent from the
first to the second generation. Thus, a convolution of two identical
gamma distributions has to be conducted. For gamma distributions the
convolutions of gamma distributions equals the summation of their first
parameters [2]. Thus, the density for the second generation of infected
persons showing symptoms is given by a gamma distribution with \(2 \cdot\) shape_serial
and the
same rate_serial
. In general, the serial interval for the
\(i\)-th generation of contacts can be
calculated with a gamma distribution with parameters \(i \cdot\) shape_serial
and
rate_serial
. This holds in an analogous way for the third
generation.
The function call returns the following data set:
dates | distribution | |
---|---|---|
100 | 2022-01-01 04:00:00 | 0.1233038 |
101 | 2022-01-01 05:00:00 | 0.1227971 |
102 | 2022-01-01 06:00:00 | 0.1222783 |
103 | 2022-01-01 07:00:00 | 0.1217479 |
104 | 2022-01-01 08:00:00 | 0.1212064 |
105 | 2022-01-01 09:00:00 | 0.1206542 |
106 | 2022-01-01 10:00:00 | 0.1200916 |
107 | 2022-01-01 11:00:00 | 0.1195191 |
108 | 2022-01-01 12:00:00 | 0.1189372 |
109 | 2022-01-01 13:00:00 | 0.1183462 |
The data frame shows for each hour beginning at
symptom_begin_date
until
max_serial_interval_days
the resulting density of the gamma
distribution. This density can be used for calculating the most probable
period for a contact person start showing symptoms. The same data frame
is obtained for the second generation. Here, the time interval of the
distribution is larger.
<- as.Date("2021-12-28")
symptom_begin_date <- 20
max_serial_interval_days <- 2 * 2.154631545
shape_serial <- 0.377343528
rate_serial
<- get_serial_interval_density(symptom_begin_date,
serial_in_df_v2
max_serial_interval_days,
shape_serial, rate_serial)
dates | distribution | |
---|---|---|
100 | 2022-01-01 04:00:00 | 0.0383758 |
101 | 2022-01-01 05:00:00 | 0.0390547 |
102 | 2022-01-01 06:00:00 | 0.0397325 |
103 | 2022-01-01 07:00:00 | 0.0404089 |
104 | 2022-01-01 08:00:00 | 0.0410838 |
105 | 2022-01-01 09:00:00 | 0.0417569 |
106 | 2022-01-01 10:00:00 | 0.0424281 |
107 | 2022-01-01 11:00:00 | 0.0430971 |
108 | 2022-01-01 12:00:00 | 0.0437638 |
109 | 2022-01-01 13:00:00 | 0.0444279 |
get_serial_interval_density
The following code generates a plot with the gamma distribution of the illness probability of the first contact generation and the 80% and 95% high density intervals. In addition, the illness probability for the second generation is plotted in violet.
<- function(probability, df) {
.calculate_qstart_qend <- hdr(den = data.frame(x = 1:length(df$distribution), y = df$distribution),
hdr_df p = probability * 100)$hdr
<- (hdr_df[1, 1] - 1) / 24
qstart <- (hdr_df[1, 2] - 1) / 24
qend return(list("qstart" = qstart, "qend" = qend))
}
<- function(df, qstart, qend, fill = "red", alpha = 0.4) {
.shade_curve <- df[floor(qstart * 24):ceiling(qend * 24), ]
subset_df geom_area(data = subset_df,
aes(x = x, y = y),
fill = fill,
color = NA,
alpha = alpha)
}
<- as.Date("2021-12-28")
symptom_begin_date
<- get_serial_interval_density(symptom_begin_date,
df max_serial_interval_days = 20,
shape_serial = 2.154631545,
rate_serial = 0.377343528)
<- .calculate_qstart_qend(0.8, df)
period_80 <- .calculate_qstart_qend(0.95, df)
period_95
<- get_serial_interval_density(symptom_begin_date,
df_2 max_serial_interval_days = 20,
shape_serial = 2 * 2.154631545,
rate_serial = 0.377343528)
<- as.POSIXct(format(as.POSIXct(symptom_begin_date, tz = "CET"), "%Y-%m-%d"))
symp_date_posixct_start <- as.POSIXct(format(as.POSIXct(symptom_begin_date + 1, tz = "CET"), "%Y-%m-%d"))
symp_date_posixct_end <- symp_date_posixct_start - as.numeric(difftime(symp_date_posixct_start,
symp_date_posixct_mid units = "hours")) / 2 * 3600
symp_date_posixct_end,
<- ggplot() +
g
scale_x_datetime(breaks = scales::date_breaks("1 days"), labels = scales::date_format("%d %b")) +
theme(axis.text.x = element_text(angle = 90)) +
# scale_x_continuous(breaks = x_tick,
# labels = x_label) +
# theme(axis.ticks.x = element_line(color = c(rbind(rep("black", length(x_label) / 2), rep(NA, length(x_label) / 2))), linetype = 2, size = 1)) +
geom_path(aes(x = df$dates, y = df$distribution), color = "red", size = 1) +
geom_path(aes(x = df_2$dates, y = df_2$distribution), color = "purple", size = 1) +
.shade_curve(df = data.frame(x = df$dates, y = df$distribution),
$qstart,
period_80$qend) +
period_80.shade_curve(df = data.frame(x = df$dates, y = df$distribution),
$qstart,
period_95$qend,
period_95alpha = 0.2) +
geom_rect(data = data.frame(xmin = symp_date_posixct_start,
xmax = symp_date_posixct_end,
ymin = -Inf,
ymax = Inf),
aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
fill = "brown", alpha = 0.3) +
geom_label(aes(x = symp_date_posixct_mid, y = 0.9*max(df$distribution), label = "symptom\nonset"),
colour = "brown", fill = "white", size = 5, label.size = NA) +
ylab("probability") +
xlab("timeline") +
labs(color = 'Verteilung') +
# ggtitle("Visualization of get_infection_date_density ") +
theme(legend.position = "none", text = element_text(size = 16*5/5)) +
theme(axis.text.x = element_text(colour = "black", face = "bold", angle = 30, hjust = 1)) +
theme(axis.title.x = element_text(colour = "black", face = "bold")) +
theme(axis.text.y = element_text(colour = "gray50")) +
theme(axis.title.y = element_text(colour = "gray50"))
g
[1] Najafi F, Izadi N, Hashemi-Nazari S-S, Khosravi-Shadmani F, Nikbakht R, Shakiba E. Serial interval and time-varying reproduction number estimation for COVID-19 in western Iran. New Microbes and New Infections. 2020; 36: 100715.