This R Markdown document illustrates the sample size calculation for a fixed follow-up design, in which the treatment allocation is 3:1 and the hazard ratio is 0.3. This is a case for which neither the Schoenfeld method nor the Lakatos method provides an accurate sample size estimate, and simulation tools are needed to obtain a more accurate result.
Consider a fixed design with the hazard rate of the control group being 0.95 per year, a hazard ratio of the experimental group to the control group being 0.3, a randomization ratio of 3:1, an enrollment rate of 5 patients per month, a 2-year drop-out rate of 10%, and a planned fixed follow-up of 26 weeks for each patient. The target power is 90%, and we are interested in the number of patients to enroll to achieve the target 90% power.
Using the Schoenfeld formula, the required number of events is 39. This requires 191 patients enrolled over 38.2 months. Denote this design as design 1.
lrsamplesize(beta = 0.1, kMax = 1, criticalValues = 1.96,
allocationRatioPlanned = 3, accrualIntensity = 5,
lambda2 = 0.95/12, lambda1 = 0.3*0.95/12,
gamma1 = -log(1-0.1)/24, gamma2 = -log(1-0.1)/24,
accrualDuration = NA, followupTime = 26/4,
fixedFollowup = TRUE,
typeOfComputation = "schoenfeld")
#> $resultsUnderH1
#>
#> Fixed design for log-rank test
#> Overall power: 0.902, overall significance level (1-sided): 0.025
#> Number of events: 39
#> Number of dropouts: 4.8
#> Number of subjects: 191
#> Information: 7.31
#> Study duration: 43.1
#> Accrual duration: 38.2, follow-up duration: 6.5, fixed follow-up: TRUE
#> Allocation ratio: 3
#>
#>
#> Efficacy boundary (Z) 1.960
#> Efficacy boundary (HR) 0.484
#> Efficacy boundary (p) 0.0250
#> HR 0.300
#>
#> $resultsUnderH0
#>
#> Fixed design for log-rank test
#> Overall power: 0.025, overall significance level (1-sided): 0.025
#> Number of events: 39
#> Number of dropouts: 2.2
#> Number of subjects: 113
#> Information: 7.31
#> Study duration: 22.6
#> Accrual duration: 22.6, follow-up duration: 6.5, fixed follow-up: TRUE
#> Allocation ratio: 3
#>
#>
#> Efficacy boundary (Z) 1.960
#> Efficacy boundary (HR) 0.484
#> Efficacy boundary (p) 0.0250
#> HR 1.000
On the other hand, the output from the default lrsamplesize call implies that we only need 26 events with 127 subjects enrolled over 25.4 months, a dramatic difference from the Schoenfeld formula. Denote this design as design 2.
lrsamplesize(beta = 0.1, kMax = 1, criticalValues = 1.96,
allocationRatioPlanned = 3, accrualIntensity = 5,
lambda2 = 0.95/12, lambda1 = 0.3*0.95/12,
gamma1 = -log(1-0.1)/24, gamma2 = -log(1-0.1)/24,
accrualDuration = NA, followupTime = 26/4,
fixedFollowup = TRUE,
typeOfComputation = "direct")
#> $resultsUnderH1
#>
#> Fixed design for log-rank test
#> Overall power: 0.902, overall significance level (1-sided): 0.025
#> Number of events: 26
#> Number of dropouts: 3.2
#> Number of subjects: 127
#> Information: 4.46
#> Study duration: 31.1
#> Accrual duration: 25.4, follow-up duration: 6.5, fixed follow-up: TRUE
#> Allocation ratio: 3
#>
#>
#> Efficacy boundary (Z) 1.960
#> Efficacy boundary (HR) 0.463
#> Efficacy boundary (p) 0.0250
#> HR 0.300
#>
#> $resultsUnderH0
#>
#> Fixed design for log-rank test
#> Overall power: 0.025, overall significance level (1-sided): 0.025
#> Number of events: 26
#> Number of dropouts: 1.4
#> Number of subjects: 80.3
#> Information: 4.88
#> Study duration: 16.1
#> Accrual duration: 16.1, follow-up duration: 6.5, fixed follow-up: TRUE
#> Allocation ratio: 3
#>
#>
#> Efficacy boundary (Z) 1.960
#> Efficacy boundary (HR) 0.412
#> Efficacy boundary (p) 0.0250
#> HR 1.000
To check the accuracy of either solution, we run simulations using the lrsim function.
lrsim(kMax = 1, criticalValues = 1.96,
allocation1 = 3, allocation2 = 1,
accrualIntensity = 5,
lambda2 = 0.95/12, lambda1 = 0.3*0.95/12,
gamma1 = -log(1-0.1)/24, gamma2 = -log(1-0.1)/24,
accrualDuration = 38.2, followupTime = 6.5,
fixedFollowup = TRUE,
plannedEvents = 39,
maxNumberOfIterations = 10000, seed = 12345)
#>
#> Fixed design for log-rank test
#> Overall power: 0.949
#> Expected # events: 37
#> Expected # dropouts: 4.4
#> Expected # subjects: 186.3
#> Expected study duration: 39.3
#> Accrual duration: 38.2, fixed follow-up: TRUE
#>
lrsim(kMax = 1, criticalValues = 1.96,
allocation1 = 3, allocation2 = 1,
accrualIntensity = 5,
lambda2 = 0.95/12, lambda1 = 0.3*0.95/12,
gamma1 = -log(1-0.1)/24, gamma2 = -log(1-0.1)/24,
accrualDuration = 25.4, followupTime = 6.5,
fixedFollowup = TRUE,
plannedEvents = 26,
maxNumberOfIterations = 10000, seed = 12345)
#>
#> Fixed design for log-rank test
#> Overall power: 0.833
#> Expected # events: 24.3
#> Expected # dropouts: 2.9
#> Expected # subjects: 124.1
#> Expected study duration: 27
#> Accrual duration: 25.4, fixed follow-up: TRUE
#>
The simulated power is about 95% for design 1, and 83% for design 2. Neither is close to the target 90% power.
We use the following formula to adjust the sample size to attain the
target power, \[
D = D_0 \left( \frac{\Phi^{-1}(1-\alpha) + \Phi^{-1}(1-\beta)}
{\Phi^{-1}(1-\alpha) + \Phi^{-1}(1-\beta_0)} \right)^2
\] where \(D_0\) and \(\beta_0\) are the initial event number and
the correponding type II error, and \(D\) and \(\beta\) are the required event number and
the target type II error, respectively. For \(\alpha=0.025\) and \(\beta=0.1\), plugging in \((D_0=39, \beta_0=0.05)\) and \((D_0=26, \beta_0=0.17)\) would yield \(D=32\) and \(D=32\), respectively. For \(D=32\), we need about 156 patients for an
enrollment period of 31.2 months,
\[
N = \frac{D}{ \frac{r}{1+r}\frac{\lambda_1}{\lambda_1+\gamma_1} (1 -
\exp(-(\lambda_1+\gamma_1)T_f)) +
\frac{1}{1+r}\frac{\lambda_2}{\lambda_2+\gamma_2} (1 -
\exp(-(\lambda_2+\gamma_2)T_f)) }
\] Simulation results confirmed the accuracy of this sample size
estimate.
lrsim(kMax = 1, criticalValues = 1.96,
allocation1 = 3, allocation2 = 1,
accrualIntensity = 5,
lambda2 = 0.95/12, lambda1 = 0.3*0.95/12,
gamma1 = -log(1-0.1)/24, gamma2 = -log(1-0.1)/24,
accrualDuration = 31.2, followupTime = 6.5,
fixedFollowup = TRUE,
plannedEvents = 32,
maxNumberOfIterations = 10000, seed = 12345)
#>
#> Fixed design for log-rank test
#> Overall power: 0.905
#> Expected # events: 30.1
#> Expected # dropouts: 3.6
#> Expected # subjects: 152.4
#> Expected study duration: 32.6
#> Accrual duration: 31.2, fixed follow-up: TRUE
#>