This vignette describes how the pccc
package generates the Complex Chronic Condition Categories (CCC) from ICD-9 and ICD-10 codes.
A CCC is “any medical condition that can be reasonably expected to last at least 12 months (unless death intervenes) and to involve either several different organ systems or 1 organ system severely enough to require specialty pediatric care and probably some period of hospitalization in a tertiary care center.” The categorization is based on the work of Feudtner et al. (2000 & 2014), as referenced below.
A supplemental reference document showing the lists of codes for each category was published as a supplement to Feudtner et al. (2014) and we have made it available as part of the pccc
package. After installing the package, you can find the file on your system with the below system.file
call. Open the file with your preferred/available program for .docx
files (Word, etc.).
To evaluate the code chunks in this example you will need to load the following R packages.
There are 12 total categories of CCCs used in this package. The first group of 10 are mutually exclusive - only one of them can be derived from a single ICD code:
The last 2 can be be selected in addition to the above codes - for example, one ICD code could generate CCC categorization as both Gastrointestinal and Technology Dependency:
To see actual specific ICD codes by category, see pccc-icd-codes.
The ccc
function is the workhorse here. Simply put, a user will provide ICD codes as strings and ccc
will return CCC categories. CCC codes for ICD-9-CM are matched on substrings and ICD 10 codes are matched on full codes, but the ccc
function uses the same “starts with substring” matching logic for both, except in a few cases described in the next paragraph.
Some datasets may contain different degrees of specificity of ICD-9-CM codes, which can lead to issues with substring matching for certain codes. For example, consider a patient with Congenital hereditary muscular dystrophy. The least specific ICD-9-CM code for Muscular dystropy is 359, which is a CCC code. The exact ICD-9-CM code specifying Congenital hereditary muscular dystrophy is 3590. Even when describing the same patient, one dataset may contain the 359 code while another dataset may contain the 3590 code. If we use substring matching logic above and match on 359, we would capture the patient in both datasets. However, we would also capture non-CCC diagnoses like 3594, Toxic myopathy. If we use substring matching logic and match on 3590, we would only capture the patient in the dataset with more specific ICD-9-CM codes. We address this problem by exact matching for less specific codes (e.g., the code 359 will match only if the dataset contains the 3-digit code 359) and substring matching for more specific codes (e.g., code 3590 will match any code beginning with 3590). This approach improves the sensitivity of detecting CCCs in datasets with less specific codes (e.g. 359) and also reduces misclassification errors in datasets with more specific codes (e.g. 3590).
We have listed these exact match exceptions under their corresponding CCC category in the pccc-icd-codes description.
Users of the pccc
package will need to pre-process the ICD-9 and ICD-10 codes in their data so that the strings are formatted in the way that the pccc
package will recognize them.
Specific rules to format ICD Codes correctly:
Potential issues with improperly formatted ICD codes:
Users of PCCC may find the R Package ICD useful.
To illustrate the how the input formatting impacts the identification of a CCC, consider the data data.frame
named dat
below. These data have information about three patients (A-C). Each subject has the same ICD-9-CM diagnosis code (e.g. Hypertrophic obstructive cardiomyopathy, ICD-9-CM 425.11, which should be sent as 4251) and the same ICD-9-CM procedure code (e.g. Heart transplantation, ICD-9-CM 37.51, which should be sent as 3751), but each input is formatted differently. Based on the ICD-9-CM diagnosis code, the ccc
function will only identify subject A
as having a CCC. Based on the ICD-9-CM procedure code, the ccc
function will only identify subject B
as having a CCC and will also flag the Transplantation category.
dat <- data.frame(ids = c("A", "B", "C"),
dxs = c("4251", "425.1", "425.1"),
procs = c("37.51", "3751", "37.51"))
dat
#> ids dxs procs
#> 1 A 4251 37.51
#> 2 B 425.1 3751
#> 3 C 425.1 37.51
ccc(dat,
id = ids,
dx_cols = dxs,
pc_cols = procs,
icdv = 9)
#> ids neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic
#> 1 A 0 1 0 0 0 0 0 0
#> 2 B 0 1 0 0 0 0 0 0
#> 3 C 0 0 0 0 0 0 0 0
#> malignancy neonatal tech_dep transplant ccc_flag
#> 1 0 0 0 0 1
#> 2 0 0 0 1 1
#> 3 0 0 0 0 0
This example used a tool developed by Seth Russell (available at icd_file_generator) to create a sample data file for ICD-9-CM and ICD-10-CM. The generated data files contain randomly generated ICD codes for 1,000 patients and is comprised of 10 columns of diagnosis codes (d_cols), 10 columns of procedure codes (p_cols), and 10 columns of other data (g_cols).
Sample of how ICD-9-CM test file was generated:
pccc_icd9_dataset <- generate_sample(
v = 9,
n_rows = 10000,
d_cols = 10,
p_cols = 10,
g_cols = 10
)
save(pccc_icd9_dataset, file="pccc_icd9_dataset.rda")
Example using sample patient data set:
library(dplyr)
library(pccc)
ccc_result <-
ccc(pccc::pccc_icd9_dataset[, c(1:21)], # get id, dx, and pc columns
id = id,
dx_cols = dplyr::starts_with("dx"),
pc_cols = dplyr::starts_with("pc"),
icdv = 09)
# review results
head(ccc_result)
#> id neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic
#> 1 1 0 0 0 0 1 0 0 0
#> 2 2 0 0 0 0 0 0 0 0
#> 3 3 1 0 0 0 1 0 0 0
#> 4 4 0 0 0 0 0 0 0 0
#> 5 5 0 0 0 0 0 0 0 0
#> 6 6 0 1 0 0 0 0 0 0
#> malignancy neonatal tech_dep transplant ccc_flag
#> 1 0 0 0 0 1
#> 2 0 0 0 0 0
#> 3 0 0 1 0 1
#> 4 0 0 0 0 0
#> 5 0 0 0 0 0
#> 6 0 1 1 0 1
# view number of patients with each CCC
sum_results <- dplyr::summarize_at(ccc_result, vars(-id), sum) %>% print.data.frame
#> neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic
#> 1 102 150 64 118 103 61 80 25
#> malignancy neonatal tech_dep transplant ccc_flag
#> 1 400 20 287 61 741
# view percent of total population with each CCC
dplyr::summarize_at(ccc_result, vars(-id), mean) %>% print.data.frame
#> neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic
#> 1 0.102 0.15 0.064 0.118 0.103 0.061 0.08 0.025
#> malignancy neonatal tech_dep transplant ccc_flag
#> 1 0.4 0.02 0.287 0.061 0.741