Introduction to hesim

2024-09-18

1 Overview

This article provides an overview of the hesim package and a quick example. The other articles provide more in depth examples.

hesim supports three types of health economic models: (i) cohort discrete time state transition models (cDTSTMs), (ii) N-state partitioned survival models (PSMs), and (iii) individual-level continuous time state transition models (iCTSTMs). cDTSTMs are Markov cohort models and can be time-homogeneous or time-inhomogeneous. iCTSTMs are individual-level simulations that can encompass both Markov and semi-Markov processes. All models are implemented as R6 classes and have methods for simulating disease progression, QALYs, and costs.

Economic model R6 class
N-state partitioned survival model (PSM) hesim::Psm
Cohort discrete time state transition model (cDTSTM) hesim::CohortDtstm
Individual-level continuous time state transition model (iCTSTM) hesim::IndivCtstm

Each economic model consists of submodels for disease progression, utility, and costs (usually for multiple cost categories). As shown in the figure, a typical analysis proceeds in a 3-steps:




  1. Parameterization: An economic model is parameterized by estimating statistical models for disease progression, utilities, and costs using “estimation” datasets, such as individual patient data (IPD) from a single study or aggregate data from multiple studies.
  2. Simulation: The statistical models estimated in Step 1 are combined to construct an economic model. For a given model structure, disease progression, QALYs, and costs are simulated from “input data”, based on the target population and treatment strategies of interest.
  3. Decision analysis: Simulated outcomes from Step 2 are used to perform decision analysis using approaches such as cost-effectiveness analysis (CEA) and multi-criteria decision analysis (MCDA), although only CEA is currently supported.

The entire analysis is inherently Bayesian, as uncertainty in the parameters from the statistical models is propagated throughout the economic model and decision analysis with probabilistic sensitivity analysis (PSA). Furthermore, since the statistical and economic models are integrated, patient heterogeneity can be easily introduced with patient level covariates.

2 Treatment strategies, target population, and model structure

Before beginning an analysis, it is necessary to define the treatment strategies of interest, the target population, and the model structure. This can be done in hesim by creating a hesim_data object with the function hesim_data(). Integer valued identification (ID) variables are used to uniquely identify strategies (strategy_id), patients (patient_id), non-death health-states (state_id), and (if applicable) health-state transitions (transition_id). Subgroups can optionally be identified with grp_id.

Let’s consider an example where we use an iCTSTM to evaluate two competing treatment strategies, the standard of care (SOC) and a New treatment. We will consider a generic model of disease progression with three health states (stage 1, stage 2, and death) with four transitions (stage 1 -> stage 2, stage 2 -> stage 1, stage 1 -> death, and stage 2 -> death). Since we are using an individual-level model, we must simulate a target population that is sufficiently large so that uncertainty reflects uncertainty in the model parameters, rather than variability across simulated individuals. For the sake of illustration, we will create subgroups stratified by sex.

library("hesim")
library("data.table")

# Treatment strategies
strategies <- data.table(strategy_id = c(1, 2),
                         strategy_name  = c("SOC", "New"))

# Patients
n_patients <- 1000
patients <- data.table(patient_id = 1:n_patients,
                       age = rnorm(n_patients, mean = 45, sd = 7),
                       female = rbinom(n_patients, size = 1, prob = .51))
patients[, grp_id := ifelse(female == 1, 1, 2)]
patients[, grp_name := ifelse(female == 1, "Female", "Male")]

# (Non-death) health states
states <- data.table(state_id = c(1, 2),
                     state_name = c("Stage 1", "Stage 2")) 

# Transitions
tmat <- rbind(c(NA, 1, 2),
              c(3, NA, 4),
              c(NA, NA, NA))
colnames(tmat) <- rownames(tmat) <- c("Stage 1", "Stage 2", "Death")
transitions <- create_trans_dt(tmat)
transitions[, trans := factor(transition_id)]

# Combining
hesim_dat <- hesim_data(strategies = strategies,
                        patients = patients, 
                        states = states,
                        transitions = transitions)
print(hesim_dat)
## $strategies
##    strategy_id strategy_name
##          <num>        <char>
## 1:           1           SOC
## 2:           2           New
## 
## $patients
##       patient_id      age female grp_id grp_name
##            <int>    <num>  <int>  <num>   <char>
##    1:          1 42.51256      1      1   Female
##    2:          2 44.25106      1      1   Female
##    3:          3 45.25576      1      1   Female
##    4:          4 51.60264      1      1   Female
##    5:          5 43.94890      1      1   Female
##   ---                                           
##  996:        996 49.37283      1      1   Female
##  997:        997 43.01727      1      1   Female
##  998:        998 54.49378      1      1   Female
##  999:        999 47.39527      1      1   Female
## 1000:       1000 46.12863      0      2     Male
## 
## $states
##    state_id state_name
##       <num>     <char>
## 1:        1    Stage 1
## 2:        2    Stage 2
## 
## $transitions
##    transition_id  from    to from_name to_name  trans
##            <num> <int> <int>    <char>  <char> <fctr>
## 1:             1     1     2   Stage 1 Stage 2      1
## 2:             2     1     3   Stage 1   Death      2
## 3:             3     2     1   Stage 2 Stage 1      3
## 4:             4     2     3   Stage 2   Death      4
## 
## attr(,"class")
## [1] "hesim_data"

When presenting results, it may be preferable to have more informative labels that the ID variables. These can be generated from a hesim_data object using get_labels().

labs <- get_labels(hesim_dat)
print(labs)
## $strategy_id
## SOC New 
##   1   2 
## 
## $grp_id
## Female   Male 
##      1      2 
## 
## $state_id
## Stage 1 Stage 2   Death 
##       1       2       3

3 Parameterization

Each submodel contains fields for the model parameters and the input data. Models can be parameterized by either fitting statistical models using R, inputting values directly, or from a combination of the two. There are two types of parameter objects, standard parameter objects prefixed by “params” and “transformed” parameter objects prefixed by “tparams”. The former contain the underlying parameters of a statistical model and are used alongside the input data to make predictions. The latter contain parameters more immediate to prediction that have already been transformed as function of the input data. The regression coefficients of a logistic regression are an example of a parameter objects while the predicted probabilities are examples of a transformed parameter object.

3.1 Disease progression

As shown in the table below, the statistical model used to parameterize the disease model varies by the type of economic model. For example, multinomial logistic regressions can be used to parameterize a cDTSTM, a set of N-1 independent survival models are used to parameterize an N-state partitioned survival model, and multi-state models can be used to parameterize an iCTSTM.

Economic model (R6 class) Statistical model Parameter object Model object
hesim::CohortDtstm Custom hesim::tparams_transprobs msm::msm
Multinomial logistic regressions hesim::params_mlogit_list hesim::multinom_list
hesim::Psm Independent survival models hesim::params_surv_list hesim::flexsurvreg_list
hesim::IndivCtstm Multi-state model (joint likelihood) hesim::params_surv flexsurv::flexsurvreg
Multi-state model (transition-specific) hesim::params_surv_list hesim::flexsurvreg_list

The parameters of a survival model are stored in a params_surv object and a params_surv_list can be used to store the parameters of multiple survival models. The latter is useful for storing the parameters of a multi-state model or the independent survival models required for a PSM. The parameters of a multinomial logistic regression are stored in a params_mlogit object and can be created by fitting a model for each row in a transition probability matrix with nnet::multinom(). tparams_transprobs objects are examples of transformed parameter objects that store transition probability matrices. They can be predicted from a fitted multi-state model using the msm package or constructed “by hand” in a custom manner.

We illustrate an example of a statistical model of disease progression fit with R by estimating a multi-state model with a joint likelihood using flexsurv::flexsurvreg().

library("flexsurv")
mstate_data <- data.table(mstate3_exdata$transitions)
mstate_data[, trans := factor(trans)]
fit_wei <- flexsurv::flexsurvreg(Surv(years, status) ~ trans + 
                                                       factor(strategy_id):trans +
                                                       age:trans + 
                                                       female: trans +
                                                       shape(trans), 
                                 data = mstate_data, 
                                 dist = "weibull")

3.2 Costs and utility

State values (i.e., utilities and costs) do not depend on the choice of disease model. They can currently either be modeled using a linear model or with predicted means.

Statistical model Parameter object Model object
Predicted means hesim::tparams_mean hesim::stateval_tbl
Linear model hesim::params_lm stats::lm

The most straightforward way to construct state values is with stateval_tbl(), which creates a special object used to assign values (i.e. predicted means) to health states that can vary across PSA samples, treatment strategies, patients, and/or time intervals. State values can be specified either as moments (e.g., mean and standard error) or parameters (e.g., shape and scale of gamma distribution) of a probability distribution, or by pre-simulating values from a suitable probability distribution (e.g., from a Bayesian model). Here we will use stateval_tbl objects for utility and two cost categories (drug and medical).

# Utility
utility_tbl <- stateval_tbl(
  data.table(state_id = states$state_id,
             mean = mstate3_exdata$utility$mean,
             se = mstate3_exdata$utility$se),
  dist = "beta"
)

# Costs
drugcost_tbl <- stateval_tbl(
  data.table(strategy_id = strategies$strategy_id,
             est = mstate3_exdata$costs$drugs$costs),
  dist = "fixed"
)

medcost_tbl <- stateval_tbl(
  data.table(state_id = states$state_id,
             mean = mstate3_exdata$costs$medical$mean,
             se = mstate3_exdata$costs$medical$se),
  dist = "gamma"
)
print(utility_tbl)
##    state_id  mean        se
##       <num> <num>     <num>
## 1:        1  0.65 0.1732051
## 2:        2  0.85 0.2000000
print(drugcost_tbl)
##    strategy_id   est
##          <num> <num>
## 1:           1  5000
## 2:           2 10000
print(medcost_tbl)
##    state_id  mean       se
##       <num> <num>    <num>
## 1:        1  1000 10.95445
## 2:        2  1600 14.14214

4 Simulation

4.1 Constructing an economic model

The utility and cost models are always hesim::StateVals objects, whereas the disease models vary by economic model. The disease model is used to simulate survival curves in a PSM and health state transitions in a cDTSTM and iCTSTM.

Economic model Disease model Utility model Cost model(s)
hesim::CohortDtstm hesim::CohortDtstmTrans hesim::StateVals hesim::StateVals
hesim::Psm hesim::PsmCurves hesim::StateVals hesim::StateVals
hesim::IndivCtstm hesim::IndivCtstmTrans hesim::StateVals hesim::StateVals

The submodels are constructed from (i) parameter or model objects and (ii) input data (if a transformed parameter object is not used). They can be instantiated using S3 generic methods prefixed by “create” or with the R6 constructor method $new(). We illustrate use of the former below.

In all cases, it is necessary to specify the number of parameter samples to use for the PSA.

n_samples <- 1000

4.1.1 Disease model

The disease model is constructed as a function of the fitted multi-state model (using the stored regression coefficients) and input data. The input data must be an object of class expanded_hesim_data, which is a data.table containing the covariates for the statistical model. In our multi-state model, each row is a unique treatment strategy, patient, and health-state transition.

An expanded_hesim_data object can be created by expanding an object of class hesim_data using expand.hesim_data().

transmod_data <- expand(hesim_dat, 
                        by = c("strategies", "patients", "transitions"))
head(transmod_data)
##    strategy_id patient_id transition_id strategy_name      age female grp_id
##          <num>      <int>         <num>        <char>    <num>  <int>  <num>
## 1:           1          1             1           SOC 42.51256      1      1
## 2:           1          1             2           SOC 42.51256      1      1
## 3:           1          1             3           SOC 42.51256      1      1
## 4:           1          1             4           SOC 42.51256      1      1
## 5:           1          2             1           SOC 44.25106      1      1
## 6:           1          2             2           SOC 44.25106      1      1
##    grp_name  from    to from_name to_name  trans
##      <char> <int> <int>    <char>  <char> <fctr>
## 1:   Female     1     2   Stage 1 Stage 2      1
## 2:   Female     1     3   Stage 1   Death      2
## 3:   Female     2     1   Stage 2 Stage 1      3
## 4:   Female     2     3   Stage 2   Death      4
## 5:   Female     1     2   Stage 1 Stage 2      1
## 6:   Female     1     3   Stage 1   Death      2

The disease model is instantiated using the create_IndivCtstmTrans() generic method. Parameters for the PSA are, by default, drawn from the multivariate normal distribution of the maximum likelihood estimate of the regression coefficients, although we make this explicit with the uncertainty argument.

transmod <- create_IndivCtstmTrans(fit_wei, transmod_data,
                                   trans_mat = tmat, n = n_samples,
                                   uncertainty = "normal")
class(transmod)
## [1] "IndivCtstmTrans" "CtstmTrans"      "R6"

4.1.2 Cost and utility models

Since we are using predicted means for utilities and costs, we do not need to specify input data. Instead, we can construct the utility and cost models directly from the stateval_tbl objects.

# Utility
utilitymod <- create_StateVals(utility_tbl, n = n_samples, hesim_data = hesim_dat)

# Costs
drugcostmod <- create_StateVals(drugcost_tbl, n = n_samples, hesim_data = hesim_dat)
medcostmod <- create_StateVals(medcost_tbl, n = n_samples, hesim_data = hesim_dat)
costmods <- list(drugs = drugcostmod,
                 medical = medcostmod)

4.1.3 Combining the disease progression, cost, and utility models

Once the disease, utility, and cost models have been constructed, we combine them to create the full economic model using $new().

ictstm <- IndivCtstm$new(trans_model = transmod,
                         utility_model = utilitymod,
                         cost_models = costmods)

4.2 Simulating outcomes

Each economic model contains methods (i.e., functions) for simulating disease progression, QALYs, and costs.

Economic model (R6 class) Disease progression QALYs Costs
hesim::CohortDtstm $sim_stateprobs() $sim_qalys() $sim_costs()
hesim::Psm $sim_survival() and $sim_stateprobs() $sim_qalys() $sim_costs()
hesim::IndivCtstm $sim_disease() and $sim_stateprobs() $sim_qalys() $sim_costs()

Although all models simulate state probabilities, they do so in different ways. The cDTSTM uses discrete time Markov chains, the PSM calculates differences in probabilities from simulated survival curves, and the iCTSTM aggregates individual trajectories simulated using random number generation. The individual-level simulation is advantageous because it can be used for semi-Markov processes where transition rates depend on time since entering a health state (rather than time since the start of the model).

The utility and cost models always simulate QALYs and costs from the simulated progression of disease with the methods $sim_qalys() and $sim_costs(), respectively. In the cohort models, QALYs and costs are computed as a function of the state probabilities whereas in individual-level models they are based on the simulated individual trajectories. Like the disease model, the individual-level simulation is more flexible because QALYs and costs can depend on time since entering the health state.

We illustrate with the iCTSTM. The first step is to simulate disease progression for each patient.

ictstm$sim_disease()
head(ictstm$disprog_)
##    sample strategy_id patient_id grp_id  from    to final time_start time_stop
##     <num>       <int>      <int>  <int> <num> <num> <int>      <num>     <num>
## 1:      1           1          1      1     1     3     1   0.000000  1.888201
## 2:      1           1          2      1     1     2     0   0.000000  1.400140
## 3:      1           1          2      1     2     1     0   1.400140  3.136599
## 4:      1           1          2      1     1     2     0   3.136599  4.152644
## 5:      1           1          2      1     2     1     0   4.152644  6.043755
## 6:      1           1          2      1     1     2     0   6.043755 10.222769

The disease trajectory is summarized with $sim_stateprobs().

ictstm$sim_stateprobs(t = c(0:10))
head(ictstm$stateprobs_)
##    sample strategy_id grp_id state_id     t  prob
##     <num>       <int>  <int>    <num> <num> <num>
## 1:      1           1      1        1     0 0.497
## 2:      1           1      1        1     1 0.439
## 3:      1           1      1        1     2 0.369
## 4:      1           1      1        1     3 0.317
## 5:      1           1      1        1     4 0.293
## 6:      1           1      1        1     5 0.250

Finally, we compute QALYs and costs (using a discount rate of 3 percent).

# QALYs
ictstm$sim_qalys(dr = .03)
head(ictstm$qalys_)
##    sample strategy_id grp_id state_id    dr     qalys       lys
##     <num>       <int>  <int>    <num> <num>     <num>     <num>
## 1:      1           1      1        1  0.03 0.7168197 3.4432892
## 2:      1           1      1        2  0.03 0.5496901 0.5775029
## 3:      1           1      2        1  0.03 0.8248263 3.9621058
## 4:      1           1      2        2  0.03 0.7193772 0.7557757
## 5:      1           2      1        1  0.03 0.9594360 4.6087122
## 6:      1           2      1        2  0.03 0.7222966 0.7588428
# Costs
ictstm$sim_costs(dr = .03)
head(ictstm$costs_)
##    sample strategy_id grp_id state_id    dr category     costs
##     <num>       <int>  <int>    <num> <num>   <char>     <num>
## 1:      1           1      1        1  0.03    drugs 17216.446
## 2:      1           1      1        2  0.03    drugs  2887.514
## 3:      1           1      2        1  0.03    drugs 19810.529
## 4:      1           1      2        2  0.03    drugs  3778.878
## 5:      1           2      1        1  0.03    drugs 46087.122
## 6:      1           2      1        2  0.03    drugs  7588.428

5 Decision analysis

Once output has been simulated with an economic model, a decision analysis can be performed. CEAs can be conducted using other R packages such as BCEA or directly with hesim.

To perform a CEA, simulated QALYs and costs are summarized and a ce object is created, which contains mean QALYs and costs for each sample from the PSA by treatment strategy. QALYs and costs can either be summarized by subgroup (by_grp = TRUE) or aggregated across all patients (by_grp = FALSE).

ce <- ictstm$summarize(by_grp = FALSE)
print(ce)
## $costs
##       category    dr sample strategy_id     costs grp_id
##         <char> <num>  <num>       <int>     <num>  <num>
##    1:    drugs  0.03      1           1  43693.37      1
##    2:    drugs  0.03      1           2 111130.98      1
##    3:    drugs  0.03      2           1  55460.50      1
##    4:    drugs  0.03      2           2 106310.60      1
##    5:    drugs  0.03      3           1  58318.34      1
##   ---                                                   
## 5996:    total  0.03    998           2 119439.13      1
## 5997:    total  0.03    999           1  71977.59      1
## 5998:    total  0.03    999           2 106759.44      1
## 5999:    total  0.03   1000           1  64661.79      1
## 6000:    total  0.03   1000           2 125083.29      1
## 
## $qalys
##          dr sample strategy_id    qalys grp_id
##       <num>  <num>       <int>    <num>  <num>
##    1:  0.03      1           1 2.810713      1
##    2:  0.03      1           2 3.469447      1
##    3:  0.03      2           1 8.725548      1
##    4:  0.03      2           2 8.316823      1
##    5:  0.03      3           1 9.807074      1
##   ---                                         
## 1996:  0.03    998           2 7.015627      1
## 1997:  0.03    999           1 9.480516      1
## 1998:  0.03    999           2 7.712823      1
## 1999:  0.03   1000           1 6.755360      1
## 2000:  0.03   1000           2 7.025920      1
## 
## attr(,"class")
## [1] "ce"

The functions cea() and cea_pw() are used to perform a CEA. The former simultaneously accounts for all treatment strategies while the latter makes pairwise comparisons between interventions and a chosen comparator.

cea_out <- cea(ce, dr_qalys = .03, dr_costs = .03)
cea_pw_out <- cea_pw(ce, dr_qalys = .03, dr_costs = .03, comparator = 1)

Summary and plotting functions are available to analyze the output. For instance, we can use plot_ceac() to quickly plot a cost-effectiveness acceptability curve (CEAC), which displays the probability that each treatment strategy is the most cost-effective at a given willingness to pay for a QALY. The labels we constructed earlier are used to give the treatment strategies informative names.

library("ggplot2")
plot_ceac(cea_out, labels = labs) +
  theme_minimal()

6 Next steps

This article provided an overview of the hesim package. We recommend exploring the examples in the other articles to learn more.

cDTSTMs (i.e., Markov cohort models) are probably the most commonly used models in health economics and there are examples demonstrating multiple ways to build them with hesim. One approach that has not yet been discussed is a functional one that allows users to define a model (with define_model()) in terms of expressions that transform underlying parameter draws from a PSA into relevant transformed parameters (e.g., transition probability matrices, mean state values) as a function of input data.

Other relevant topics include more through treatments of CEA, multi-state modeling, individual-level simulations based on aggregate data, and partitioned survival analysis. As the examples illustrate, any analysis can be performed either for a single group or in the context of multiple subgroups.