Introduction

This vignette describes how the pccc package generates the Complex Chronic Condition Categories (CCC) from ICD-9 and ICD-10 codes.

A CCC is “any medical condition that can be reasonably expected to last at least 12 months (unless death intervenes) and to involve either several different organ systems or 1 organ system severely enough to require specialty pediatric care and probably some period of hospitalization in a tertiary care center.” The categorization is based on the work of Feudtner et al. (2000 & 2014), as referenced below.

A supplemental reference document showing the lists of codes for each category was published as a supplement to Feudtner et al. (2014) and we have made it available as part of the pccc package. After installing the package, you can find the file on your system with the below system.file call. Open the file with your preferred/available program for .docx files (Word, etc.).

system.file("pccc_references/Categories_of_CCCv2_and_Corresponding_ICD.docx", package = "pccc")

To evaluate the code chunks in this example you will need to load the following R packages.

library(pccc)
library(dplyr)

Logic employed

There are 12 total categories of CCCs used in this package. The first group of 10 are mutually exclusive - only one of them can be derived from a single ICD code:

Neurologic and Neuromuscular
Cardiovascular
Respiratory
Renal and Urologic
Gastrointestinal
Hematologic or immunologic
Metabolic
Other Congenital or Genetic Defect
Malignancy
Premature and Neonatal

The last 2 can be be selected in addition to the above codes - for example, one ICD code could generate CCC categorization as both Gastrointestinal and Technology Dependency:

Technology Dependency
Transplant

To see actual specific ICD codes by category, see pccc-icd-codes.

Generating CCC categories from ICD codes

The ccc function is the workhorse here. Simply put, a user will provide ICD codes as strings and ccc will return CCC categories. CCC codes for ICD-9-CM are matched on substrings and ICD 10 codes are matched on full codes, but the ccc function uses the same “starts with substring” matching logic for both, except in a few cases described in the next paragraph.

Substring matching exceptions

Some datasets may contain different degrees of specificity of ICD-9-CM codes, which can lead to issues with substring matching for certain codes. For example, consider a patient with Congenital hereditary muscular dystrophy. The least specific ICD-9-CM code for Muscular dystrophy is 359, which is a CCC code. The exact ICD-9-CM code specifying Congenital hereditary muscular dystrophy is 3590. Even when describing the same patient, one dataset may contain the 359 code while another dataset may contain the 3590 code. If we use substring matching logic above and match on 359, we would capture the patient in both datasets. However, we would also capture non-CCC diagnoses like 3594, Toxic myopathy. If we use substring matching logic and match on 3590, we would only capture the patient in the dataset with more specific ICD-9-CM codes. We address this problem by exact matching for less specific codes (e.g., the code 359 will match only if the dataset contains the 3-digit code 359) and substring matching for more specific codes (e.g., code 3590 will match any code beginning with 3590). This approach improves the sensitivity of detecting CCCs in datasets with less specific codes (e.g. 359) and also reduces misclassification errors in datasets with more specific codes (e.g. 3590).

We have listed these exact match exceptions under their corresponding CCC category in the pccc-icd-codes description.

Preparing ICD-9-CM and ICD-10-CM codes for analysis using the PCCC package

Users of the pccc package will need to pre-process the ICD-9 and ICD-10 codes in their data so that the strings are formatted in the way that the pccc package will recognize them.

Specific rules to format ICD Codes correctly:

Codes should be alphanumeric only (e.g. Diabetes with renal manifestations, type II or unspecified type, uncontrolled should be sent as 25042)
Codes should NOT contain periods, spaces or other separator characters periods (e.g. ICD-9-CM 04.92 will only be matched by the string “0492”)
ICD-9-CM codes should be at a minimum 3 digits long:
- Codes less than 10 should be left padded with 2 zeros. E.g. Cholera due to vibrio cholerae el tor, ICD-9-CM 001.1, should be sent as 0011)
- Codes less than 100 should be left padded with 1 zero. E.g. Whooping cough, unspecified organism, ICD-9-CM 033.9, should be sent as 0339)

Potential issues with improperly formatted ICD codes:

All codes in all categories employ “starts with substring” matching logic. Because of this, if a code to be evaluated starts with a code listed in one of the CCC categories, a match will be found. As an example, if a record with an ICD-9-CM procedure code of “0492,25042” is passed due to failure to properly parse an input file, PCCC would indicate a match for the Neuromuscular CCC since one of the Neuromuscular CCC procedure substrings is 0492.
CCCs are matched in the order of the CCCs shown in the “Logic employed” section. Once a match is found, other categories are not evaluated with the exception of Technology Dependency and Transplant CCCs.
If there are changes in either ICD-9-CM or ICD-10-CM, this library may require updating to continue functioning correctly.

Users of PCCC may find the R Package ICD useful.

PCCC Examples

To illustrate the how the input formatting impacts the identification of a CCC, consider the data data.frame named dat below. These data have information about three patients (A-C). Each subject has the same ICD-9-CM diagnosis code (e.g. Hypertrophic obstructive cardiomyopathy, ICD-9-CM 425.11, which should be sent as 4251) and the same ICD-9-CM procedure code (e.g. Heart transplantation, ICD-9-CM 37.51, which should be sent as 3751), but each input is formatted differently. Based on the ICD-9-CM diagnosis code, the ccc function will only identify subject A as having a CCC. Based on the ICD-9-CM procedure code, the ccc function will only identify subject B as having a CCC and will also flag the Transplantation category.

Basic Example

dat <- data.frame(ids = c("A", "B", "C"), 
                  dxs = c("4251", "425.1", "425.1"), 
                  procs = c("37.51", "3751", "37.51"))
dat
#>   ids   dxs procs
#> 1   A  4251 37.51
#> 2   B 425.1  3751
#> 3   C 425.1 37.51
ccc(dat, 
    id = ids, 
    dx_cols = dxs, 
    pc_cols = procs, 
    icdv = 9)
#>   ids neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic
#> 1   A         0   1           0     0  0           0         0               0
#> 2   B         0   1           0     0  0           0         0               0
#> 3   C         0   0           0     0  0           0         0               0
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1          0        0        0          0        1
#> 2          0        0        0          1        1
#> 3          0        0        0          0        0

Extended Example

This example used a tool developed by Seth Russell (available at icd_file_generator) to create a sample data file for ICD-9-CM and ICD-10-CM. The generated data files contain randomly generated ICD codes for 1,000 patients and is comprised of 10 columns of diagnosis codes (d_cols), 10 columns of procedure codes (p_cols), and 10 columns of other data (g_cols).

Sample of how ICD-9-CM test file was generated:

pccc_icd9_dataset <- generate_sample(
  v = 9,
  n_rows = 10000,
  d_cols = 10,
  p_cols = 10,
  g_cols = 10
)

save(pccc_icd9_dataset, file="pccc_icd9_dataset.rda")

Example using sample patient data set:

library(dplyr)
library(pccc)

ccc_result <-
    ccc(pccc::pccc_icd9_dataset[, c(1:21)], # get id, dx, and pc columns
        id      = id,
        dx_cols = dplyr::starts_with("dx"),
        pc_cols = dplyr::starts_with("pc"),
        icdv    = 09)

# review results
head(ccc_result)
#>   id neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic
#> 1  1         0   0           0     0  1           0         0               0
#> 2  2         0   0           0     0  0           0         0               0
#> 3  3         1   0           0     0  1           0         0               0
#> 4  4         0   0           0     0  0           0         0               0
#> 5  5         0   0           0     0  0           0         0               0
#> 6  6         0   1           0     0  0           0         0               0
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1          0        0        0          0        1
#> 2          0        0        0          0        0
#> 3          0        0        1          0        1
#> 4          0        0        0          0        0
#> 5          0        0        0          0        0
#> 6          0        1        1          0        1

# view number of patients with each CCC
sum_results <- dplyr::summarize_at(ccc_result, vars(-id), sum) %>% print.data.frame
#>   neuromusc cvd respiratory renal  gi hemato_immu metabolic congeni_genetic
#> 1       102 151          64   119 106          61        80              25
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1        400       20      287         61      741

# view percent of total population with each CCC
dplyr::summarize_at(ccc_result, vars(-id), mean) %>% print.data.frame
#>   neuromusc   cvd respiratory renal    gi hemato_immu metabolic congeni_genetic
#> 1     0.102 0.151       0.064 0.119 0.106       0.061      0.08           0.025
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1        0.4     0.02    0.287      0.061    0.741

Pediatric Complex Chronic Conditions

Peter DeWitt

Seth Russell

James Feinstein

Tell Bennett

2026-01-22