medicalcoder
:
An R package for working with ICD codes and Comorbidity Algorithms
medicalcoder
is a lightweight, base-R package for
working with ICD-9 and ICD-10 diagnosis and procedure codes. It provides
fast, dependency-free tools to look up, validate, and manipulate ICD
codes, while also implementing widely used comorbidity algorithms such
as Charlson, Elixhauser, and the Pediatric Complex Chronic Conditions
(PCCC). Designed for portability and reproducibility, the package avoids
external dependencies—requiring only R ≥ 3.5.0—yet offers a rich set of
curated ICD code libraries from the United States’ Centers for Medicare
and Medicaid Services (CMS), Centers for Disease Control (CDC), and the
World Health Organization (WHO).
The package balances performance with elegance: its internal caching,
efficient joins, and compact data structures make it practical for
large-scale health data analyses, while its clean design makes it easy
to extend or audit. Whether you need to flag comorbidities, explore ICD
hierarchies, or standardize clinical coding workflows,
medicalcoder
provides a robust, transparent foundation for
research and applied work in biomedical informatics.
The primary objectives of medicalcoder
are:
medicalcoder
was in 2025.DESCRIPTION
file. These are only needed for building
vignettes, other documentation, and testing. They are not required to
install the package.medicalcoder
does not import any non-base namespaces.
This improves ease of maintenance and usability.data.table
to the comorbidities()
function compared to passing a base
data.frame
or a tibble
from the tidyverse.
(See benchmarking)..tar.gz
source file and R ≥ 3.5.0, that is all you need to
install and use the package.medicalcoder
will flag comorbidities based on
present-on-admission indicators for the current encounter and can look
back in time for a patient to flag a comorbidity if reported in a prior
encounter. See examples.medicalcoder
There are several tools for working with ICD codes and comorbidity
algorithms. medicalcoder
provides novel features:
comorbidities()
.The major factors impacting the expected computation time for applying a comorbidity algorithm to a data set are:
medicalcoder
has been built such
that no imports of other namespaces is required. That said, when a
data.table
is passed to comorbidities()
and
the data.table
namespace is available, then S3 dispatch for
merge
is used, along with some other methods, to reduce
memory use and reduce computation time.flag.method
: “current” will take less time than the
“cumulative” method.Details on the benchmarking method, summary graphics, and tables, can
be found on the medicalcoder
GitHub benchmarking
directory.
install.packages("medicalcoder")
::install_github("dewittpe/medicalcoder") remotes
If you have the .tar.gz file for version X.Y.Z, e.g.,
medicalcoder_X.Y.Z.tar.gz
you can install from within R
via:
install.packages(
pkgs = "medicalcoder_X.Y.Z.tar.gz", # replace file name with the file you have
repos = NULL,
type = "source"
)
From the command line:
R CMD INSTALL medicalcoder_X.Y.Z.tar.gz
Pediatric Complex Chronic Conditions (PCCC)
Charlson Comorbidities
Elixhauser Comorbidities
All of the methods are available from the same function call
comorbidities()
. There is support for age scores in
Charlson, present on admission flags for all methods, and support for
longitudinal data.
See more examples in the vignettes.
vignette(topic = "comorbidities", package = "medicalcoder")
vignette(topic = "pccc", package = "medicalcoder")
vignette(topic = "charlson", package = "medicalcoder")
vignette(topic = "elixhauser", package = "medicalcoder")
Input data for comorbidities()
is expected to be in a
‘long’ format. Each row is one code with additional columns for patient
and/or encounter id.
data(mdcr, mdcr_longitudinal, package = "medicalcoder")
str(mdcr)
#> 'data.frame': 319856 obs. of 4 variables:
#> $ patid: int 71412 71412 71412 71412 71412 17087 64424 64424 84361 84361 ...
#> $ icdv : int 9 9 9 9 9 10 9 9 9 9 ...
#> $ code : chr "99931" "75169" "99591" "V5865" ...
#> $ dx : int 1 1 1 1 1 1 1 0 1 1 ...
head(mdcr)
#> patid icdv code dx
#> 1 71412 9 99931 1
#> 2 71412 9 75169 1
#> 3 71412 9 99591 1
#> 4 71412 9 V5865 1
#> 5 71412 9 V427 1
#> 6 17087 10 V441 1
head(mdcr_longitudinal)
#> patid date icdv code
#> 1 9663901 2016-03-18 10 Z77.22
#> 2 9663901 2016-03-24 10 IMO0002
#> 3 9663901 2016-03-24 10 V87.7XXA
#> 4 9663901 2016-03-25 10 J95.851
#> 5 9663901 2016-03-30 10 IMO0002
#> 6 9663901 2016-03-30 10 Z93.0
The package contains internal data sets with references for ICD-9 and ICD-10 US based diagnostic and procedure codes. These codes are supplemented with additional codes from the World Health Organization.
You can get a table of ICD codes via
get_icd_codes()
.
str(medicalcoder::get_icd_codes())
#> 'data.frame': 227534 obs. of 9 variables:
#> $ icdv : int 9 9 9 9 9 9 9 9 9 9 ...
#> $ dx : int 0 0 0 0 0 0 1 0 1 0 ...
#> $ full_code : chr "00" "00.0" "00.01" "00.02" ...
#> $ code : chr "00" "000" "0001" "0002" ...
#> $ src : chr "cms" "cms" "cms" "cms" ...
#> $ known_start : int 2003 2003 2003 2003 2003 2003 1997 2003 1997 2003 ...
#> $ known_end : int 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
#> $ assignable_start: int NA NA 2003 2003 2003 2003 NA NA 1997 2003 ...
#> $ assignable_end : int NA NA 2015 2015 2015 2015 NA NA 2015 2015 ...
The columns are:
icdv
: integer value 9 or 10; for ICD-9 or
ICD-10
dx
: integer 0 or 1; 0 = procedure code, 1 =
diagnostic code
full_code
: character string for the ICD code with
any appropriate decimal point.
code
: characters string for the compact ICD code,
that is, the ICD code without any decimal point, e.g., the full code
C00.1 has the compact code form C001.
src
: character string denoting the source of the ICD
code information.
cms
: The ICD-9-CM, ICD-9-PCS, ICD-10-CM, or ICD-10-PCS
codes curated by the Centers for Medicare and Medicaid Services
(CMS).cdc
: CDC mortality coding.who
: World Health Organization.known_start
: The earliest (fiscal) year when source
data for the code was available in the source code for
medicalcoder
. Codes from CMS are for the United States
fiscal year. Codes from CDC and WHO are calendar year. The United States
fiscal year starts October 1 and concludes September 30. For example,
fiscal year 2013 started October 1 2012 and concluded September 30
2013.
To reemphasize that the year is for the data within
medicalcoder
. For ICD-9-CM, the codes went into effect for
fiscal year 1980. The source code only has documented source files for
the codes dating back to
known_end
: The latest (fiscal) year when the code
was part of the ICD system and/or known within the
medicalcoder
lookup tables.
Assignable codes. Some codes are header codes, e.g., ICD-10-CM three-digit code Z94 is a header code because the four-digit codes Z94.0, Z94.1, Z94.2, Z94.3, Z94.4, Z94.5, Z94.6, Z94.7, Z94.8, and Z94.9 exist. All but Z94.8 are assignable codes because no five-digit codes with the same initial four-digits exist. Z94.8 is a header code because the five-digit codes Z94.81, Z94.82, Z94.83, Z94.84, and Z94.89 exist.
assignable_start
: Earliest (fiscal) year when the code
was assignable.assignable_end
: Latest (fiscal) year when the code was
assignable.subset(
x = lookup_icd_codes("^Z94", regex = TRUE, full.codes = TRUE, compact.codes = FALSE),
subset = src == "cms",
select = c("full_code", "known_start", "known_end", "assignable_start", "assignable_end")
)#> full_code known_start known_end assignable_start assignable_end
#> 1 Z94 2014 2026 NA NA
#> 5 Z94.0 2014 2026 2014 2026
#> 9 Z94.1 2014 2026 2014 2026
#> 14 Z94.2 2014 2026 2014 2026
#> 17 Z94.3 2014 2026 2014 2026
#> 22 Z94.4 2014 2026 2014 2026
#> 25 Z94.5 2014 2026 2014 2026
#> 29 Z94.6 2014 2026 2014 2026
#> 33 Z94.7 2014 2026 2014 2026
#> 38 Z94.8 2014 2026 NA NA
#> 41 Z94.81 2014 2026 2014 2026
#> 42 Z94.82 2014 2026 2014 2026
#> 43 Z94.83 2014 2026 2014 2026
#> 44 Z94.84 2014 2026 2014 2026
#> 45 Z94.89 2014 2026 2014 2026
#> 46 Z94.9 2014 2026 2014 2026
Additionally, the get_icd_codes()
method can provide
descriptions and the ICD hierarchy by using the
with.descriptions
and/or with.hierarchy
arguments.
Functions lookup_icd_codes()
, is_icd()
, and
icd_compact_to_full()
are also provided for working with
ICD codes.
More details and examples are in the vignette:
vignette(topic = "icd", package = "medicalcoder")