This vignette provides examples of how to use the phe_sii function with different kinds of indicators.
The following packages must be installed and loaded if not already available
# source functions required
library(PHEindicatormethods)
library(dplyr)
phe_sii is an aggregate function, returning the slope index of inequality (SII) statistic for each grouping set in the inputted dataframe, with lower and upper confidence limits based on the specified confidence. The user can choose whether to return the Relative Index of Inequality (RII) via an optional argument in the function.
Each grouping set in the input data should have a row for each quantile, labelled with the quantile number, which contains the associated population, indicator value and 95% confidence limits. The user has the option to provide the standard error instead of the 95% confidence limits, in which case this is used directly rather than being calculated by the function.
The user can also specify the indicator type from “0 - default”, “1 - rate” or “2 - proportion”, where different transformations are applied to the input indicator value and confidence limits in the case of a rate or proportion. Examples are provided below for the three cases.
The example below calculates the SII on some life expectancy data. This is assumed to have symmetric confidence intervals around the indicator values, so default standard error calculations would be done (involving no prior transformations).
The relevant fields in the input dataset are specified for the
arguments quantile
, population
and
value
. value_type
is kept equal to 0
(default), and the number of repetitions set to 1000 for faster running
of the function as a demonstration.
The standard error (se
) has been provided here in the
input dataset, meaning this will be used directly and lower/upper 95%
confidence limits of the indicator values are not needed.
A warning is generated because one of the GeoCodes (E06000053) in the data does not contain a record for every quantile so no output is provided for this area.
# Pass data through SII function ---------------------------------------
<- LE_data %>%
LE_data_SII # Group the input dataframe to create subgroups to calculate the SII for
group_by(Sex, GeoCode) %>%
# Run SII function on grouped dataset
phe_sii(quantile = Decile,
population = Pop ,
value = LifeExp,
value_type = 0, # specify default indicator type
confidence = c(0.95, 0.998),
se = SE,
repetitions = 1000,
rii = FALSE,
type = "full") # use smaller no. of repetitions e.g. for testing
## Warning in phe_sii(., quantile = Decile, population = Pop, value = LifeExp, :
## WARNING: some records have been removed due to incomplete or invalid data
# View first 10 rows of results
::kable(head(LE_data_SII, 10)) knitr
Sex | GeoCode | sii | sii_lower95_0cl | sii_upper95_0cl | sii_lower99_8cl | sii_upper99_8cl | indicator_type | multiplier | transform | CI_confidence | CI_method |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | E06000001 | 11.68886 | 9.163157 | 14.03102 | 7.636848 | 15.91570 | normal | 1 | none | 95%, 99.8% | simulation 1000 reps |
1 | E06000002 | 12.54785 | 10.530798 | 14.63620 | 8.998043 | 15.55380 | normal | 1 | none | 95%, 99.8% | simulation 1000 reps |
1 | E06000003 | 10.05084 | 7.832693 | 12.24529 | 5.777159 | 13.73026 | normal | 1 | none | 95%, 99.8% | simulation 1000 reps |
1 | E06000004 | 14.85223 | 13.131610 | 16.64977 | 12.179546 | 17.84701 | normal | 1 | none | 95%, 99.8% | simulation 1000 reps |
1 | E06000005 | 11.68095 | 9.153319 | 14.20844 | 7.675025 | 15.08127 | normal | 1 | none | 95%, 99.8% | simulation 1000 reps |
1 | E06000006 | 12.27526 | 9.860600 | 14.36077 | 8.721377 | 15.15166 | normal | 1 | none | 95%, 99.8% | simulation 1000 reps |
1 | E06000007 | 11.59893 | 10.002898 | 13.01182 | 8.521268 | 13.67681 | normal | 1 | none | 95%, 99.8% | simulation 1000 reps |
1 | E06000008 | 10.72510 | 8.760912 | 12.73142 | 7.588910 | 13.83343 | normal | 1 | none | 95%, 99.8% | simulation 1000 reps |
1 | E06000009 | 13.59387 | 11.503227 | 15.83495 | 10.200896 | 16.66205 | normal | 1 | none | 95%, 99.8% | simulation 1000 reps |
1 | E06000010 | 11.20094 | 9.738255 | 12.75270 | 8.739445 | 13.68133 | normal | 1 | none | 95%, 99.8% | simulation 1000 reps |
Note that some areas are missing quantiles in the dataset, and these are subsequently excluded from the function output with a warning given.
The example below calculates both the SII and RII on Directly
Standardised Rate (DSR) data. The value_type
argument is
set to 1 to specify this indicator is a rate; this means a log
transformation will be applied to the value
,
lower_cl
and upper_cl
fields before
calculating the standard error. The transform
argument is
set to TRUE because rates do not show a linear relationship across the
quantiles so a log transformation will be applied before calculating the
SII and then reverted once the SII has been calculated to ensure the SII
is given in the original units.
As the number of repetitions is not specified, the function will run
on the default 100,000. To return the RII, the rii
argument
is set to TRUE.
Finally, setting reliability_stat = TRUE
will run
additional sample sets of the SII/RII confidence limits and return a
Mean Average Difference (MAD) value for each subgroup. See below for
guidance on how to use this.
# Pass data through SII function ---------------------------------------
<- DSR_data %>%
DSR_data_SII # Group the input dataframe to create subgroups to calculate the SII for
group_by(Period) %>%
# Run SII function on grouped dataset
phe_sii(quantile = Quintile,
population = total_pop ,
value = value,
value_type = 1, # specifies indicator is a rate
lower_cl = lowercl,
upper_cl = uppercl,
transform = TRUE,
rii = TRUE, # returns RII as well as SII (default is FALSE)
reliability_stat = TRUE) # returns reliability stats (default is FALSE)
# View results
::kable(DSR_data_SII) knitr
Period | sii | rii | sii_lower95_0 | sii_upper95_0 | rii_lower95_0 | rii_upper95_0 | sii_mad95_0 | rii_mad95_0 | indicator_type | multiplier | transform | CI_confidence | CI_method |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 | -14.03684 | 0.9010655 | -17.93809 | -10.128497 | 0.8753835 | 0.9275700 | 0.0178387 | 0.0001188 | rate | 1 | log | 95% | simulation 1e+05 reps |
2011 | -12.83196 | 0.9086641 | -16.72766 | -8.962340 | 0.8826517 | 0.9352799 | 0.0201048 | 0.0001370 | rate | 1 | log | 95% | simulation 1e+05 reps |
2012 | -11.09234 | 0.9199455 | -14.94129 | -7.262016 | 0.8937156 | 0.9468290 | 0.0120692 | 0.0000837 | rate | 1 | log | 95% | simulation 1e+05 reps |
2013 | -10.06041 | 0.9266342 | -13.85162 | -6.268165 | 0.9004255 | 0.9536282 | 0.0103028 | 0.0000713 | rate | 1 | log | 95% | simulation 1e+05 reps |
2014 | -10.06085 | 0.9263086 | -13.82532 | -6.288312 | 0.9001742 | 0.9532752 | 0.0141002 | 0.0000992 | rate | 1 | log | 95% | simulation 1e+05 reps |
2015 | -8.53077 | 0.9368534 | -12.17939 | -4.805368 | 0.9110936 | 0.9639195 | 0.0393208 | 0.0002759 | rate | 1 | log | 95% | simulation 1e+05 reps |
This example calculates the SII for a prevalence indicator.
Proportions need to be between 0 and 1 - this formatting is done in the
mutate
command below, before passing the grouped dataset to
the phe_sii
function.
The value_type
argument is set to 2 to specify the
indicator is a proportion, and a logit transformation is applied to the
value
, lower_cl
and upper_cl
fields before calculating the standard error. The transform
argument is set to TRUE because proportions do not show a linear
relationship across the quantiles so a logit transformation will be
applied before calculating the SII and then reverted once the SII has
been calculated to ensure the SII is given in the original units.
The function will again run on the default 100,000 reps, and neither the RII or MAD values will be returned.
There is the option to specify a numeric multiplier
in
the arguments, which will scale the SII, SII_lowerCL, SII_upperCL (and
SII_MAD) before outputting. This could be used if an absolute
(i.e. positive) slope is desired for an indicator, where the “high is
bad” polarity would otherwise give negative SII results.
Below, a multiplier of -100 is used, to output absolute prevalence figures that are expressed on a scale between 0 and 100.
# Pass data through SII function ---------------------------------------
<- prevalence_data %>%
prevalence_SII # Group the input dataframe to create subgroups to calculate the SII for
group_by(Period, SchoolYear, AreaCode) %>%
# Format prevalences to be between 0 and 1
mutate(Rate = Rate/100,
LCL = LCL/100,
UCL = UCL/100) %>%
# Run SII function on grouped dataset
phe_sii(quantile = Decile,
population = Measured,
value = Rate,
value_type = 2, # specifies indicator is a proportion
lower_cl = LCL,
upper_cl = UCL,
transform = TRUE,
multiplier = -100) # negative multiplier to scale SII outputs
# View first 10 rows of results
::kable(head(prevalence_SII,10)) knitr
Period | SchoolYear | AreaCode | sii | sii_lower95_0 | sii_upper95_0 | indicator_type | multiplier | transform | CI_confidence | CI_method |
---|---|---|---|---|---|---|---|---|---|---|
607 | 6 | E92000001 | 10.964626 | 10.440909 | 11.493183 | proportion | -100 | logit | 95% | simulation 1e+05 reps |
607 | R | E92000001 | 5.970062 | 5.548527 | 6.392306 | proportion | -100 | logit | 95% | simulation 1e+05 reps |
708 | 6 | E92000001 | 11.271798 | 10.881040 | 11.661851 | proportion | -100 | logit | 95% | simulation 1e+05 reps |
708 | R | E92000001 | 6.200092 | 5.893460 | 6.508868 | proportion | -100 | logit | 95% | simulation 1e+05 reps |
809 | 6 | E92000001 | 11.804672 | 11.421474 | 12.187102 | proportion | -100 | logit | 95% | simulation 1e+05 reps |
809 | R | E92000001 | 6.911770 | 6.618138 | 7.209215 | proportion | -100 | logit | 95% | simulation 1e+05 reps |
910 | 6 | E92000001 | 12.459329 | 12.076054 | 12.841594 | proportion | -100 | logit | 95% | simulation 1e+05 reps |
910 | R | E92000001 | 7.141644 | 6.846238 | 7.433707 | proportion | -100 | logit | 95% | simulation 1e+05 reps |
1011 | 6 | E92000001 | 13.057800 | 12.673589 | 13.441555 | proportion | -100 | logit | 95% | simulation 1e+05 reps |
1011 | R | E92000001 | 7.139306 | 6.855987 | 7.422353 | proportion | -100 | logit | 95% | simulation 1e+05 reps |
If reliability_stat
is set to TRUE in the function, a
MAD value is returned for each subgroup as a measure of how much the SII
(or RII) confidence limits vary.
Note: this option will increase the runtime of the function, as the MAD calculation involves an additional 9 sample sets of the confidence limits to be taken.
A MAD of 0.005 implies that, on rerunning the phe_sii function, the confidence limits can be expected to change by approximately 0.005. The more repetitions the function is run on, the smaller this statistic should be. The tolerance will depend on the level of accuracy to which the user wishes to present the confidence limits - ideally, to display them to 1 d.p., the MAD should be smaller than 0.01. To 2 d.p., smaller than 0.001, etc.