In this vignette we will explore the functionality and arguments of
summariseSequenceRatios()
function, which is used to
generate the sequence ratios of the SSA. As this function uses the
output of generateSequenceCohortSet()
function (explained
in detail in the vignette: Step 1. Generate a sequence
cohort), we will pick up the explanation from where we left off
in the previous vignette.
Recall that in the previous vignette: Step 1. Generate a sequence
cohort, we’ve generated cdm$aspirin
and
cdm$acetaminophen
before and using them we could generate
cdm$intersect
like so:
One can obtain the crude and adjusted sequence ratios (with its
corresponding confidence intervals) using
summariseSequenceRatios()
function:
summariseSequenceRatios(
cohort = cdm$intersect
) |>
dplyr::glimpse()
#> Rows: 10
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name <chr> "index_cohort_name &&& marker_cohort_name", "index_co…
#> $ group_level <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "crude", "adjusted", "crude", "crude", "adjusted", "a…
#> $ variable_level <chr> "sequence_ratio", "sequence_ratio", "sequence_ratio",…
#> $ estimate_name <chr> "point_estimate", "point_estimate", "lower_CI", "uppe…
#> $ estimate_type <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value <chr> "1.8108504398827", "1.78715329996299", "1.64970963817…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
The obtained output has a summarised result format. In the later vignette (Step 3. Visualise results) we will explore how to visualise the results in a more intuitive way.
cohort_definition_id
This parameter is used to subset the cohort table inputted to the
summariseSequenceRatios()
. Imagine the user only wants to
include cohort_definition_id
\(=
1\) from cdm$intersect
in the
summariseSequenceRatios()
, then one could do the
following:
summariseSequenceRatios(cohort = cdm$intersect,
cohortId = 1) |>
dplyr::glimpse()
#> Rows: 10
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name <chr> "index_cohort_name &&& marker_cohort_name", "index_co…
#> $ group_level <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "crude", "adjusted", "crude", "crude", "adjusted", "a…
#> $ variable_level <chr> "sequence_ratio", "sequence_ratio", "sequence_ratio",…
#> $ estimate_name <chr> "point_estimate", "point_estimate", "lower_CI", "uppe…
#> $ estimate_type <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value <chr> "1.8108504398827", "1.78715329996299", "1.64970963817…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
Of course in this case this does nothing because every entry in
cdm$intersect
has cohort_definition_id
\(= 1\).
confidenceInterval
By default, the summariseSequenceRatios()
function will
use 95% (two-sided) confidence interval. If another confidence interval
is desired, for example 99% confidence interval, one can use the
confidenceInterval
argument:
summariseSequenceRatios(
cohort = cdm$intersect,
confidenceInterval = 99) |>
dplyr::glimpse()
#> Rows: 10
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name <chr> "index_cohort_name &&& marker_cohort_name", "index_co…
#> $ group_level <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "crude", "adjusted", "crude", "crude", "adjusted", "a…
#> $ variable_level <chr> "sequence_ratio", "sequence_ratio", "sequence_ratio",…
#> $ estimate_name <chr> "point_estimate", "point_estimate", "lower_CI", "uppe…
#> $ estimate_type <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value <chr> "1.8108504398827", "1.78715329996299", "1.60240541369…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
movingAverageRestriction
The idea of moving average restriction is necessary only for the null
sequence ratio calculation, please refer to Lai et al. (2017) for more
details on this parameter (parameter d when calculating P in page 578).
Following Tsiropoulos et al. (2009), by default, the argument
movingAverageRestriction
is set to be \(548\) (\(18\) months). Should one wish to modify
this, one could do something like:
summariseSequenceRatios(
cohort = cdm$intersect,
movingAverageRestriction = 600) |>
dplyr::glimpse()
#> Rows: 10
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name <chr> "index_cohort_name &&& marker_cohort_name", "index_co…
#> $ group_level <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "crude", "adjusted", "crude", "crude", "adjusted", "a…
#> $ variable_level <chr> "sequence_ratio", "sequence_ratio", "sequence_ratio",…
#> $ estimate_name <chr> "point_estimate", "point_estimate", "lower_CI", "uppe…
#> $ estimate_type <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value <chr> "1.8108504398827", "1.78574458428809", "1.64970963817…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
minCellCount
By default, the minimum number of events to reported is 5, below which results will be obscured. If 0, all results will be reported and the user could do this via:
summariseSequenceRatios(cohort = cdm$intersect,
minCellCount = 0) |>
dplyr::glimpse()
#> Rows: 10
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name <chr> "index_cohort_name &&& marker_cohort_name", "index_co…
#> $ group_level <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "crude", "adjusted", "crude", "crude", "adjusted", "a…
#> $ variable_level <chr> "sequence_ratio", "sequence_ratio", "sequence_ratio",…
#> $ estimate_name <chr> "point_estimate", "point_estimate", "lower_CI", "uppe…
#> $ estimate_type <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value <chr> "1.8108504398827", "1.78715329996299", "1.64970963817…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…