Glossary of REDCap and REDCapTidieR Terms

The REDCapTidieR package uses vocabulary that is standard for REDCap database architects but not necessarily well known to all R users. It also introduces several idiosyncratic terms.

Below we provide a rough mapping of REDCap concepts to their corresponding artifacts in REDCapTidieR. This is followed by a listing of term definitions.

REDCap REDCapTidieR
Project, Database Supertibble
Instrument, Form

One row of the supertibble

Data is in the data tibble

Field Data column (a column of the data tibble)
Field name Variable name of a data column
Field type Data type of a data column
Field label

Variable label of a data column

(only present if supertibble is labelled)

Record

One or several rows of a data tibble

Record ID column is the first column of a data tibble

Event

redcap_event column of a data tibble

(only present if the project is longitudinal)

Arm

redcap_arm column of a data tibble

(only present if the project is longitudinal with multiple arms)

Repeat Instrument

redcap_form_instance column of a data tibble

(only present if the instrument is repeating)

Repeat Event

redcap_event_instance column of a data tibble

(only present if the instrument is associated with a repeating event)

Data Access Group

redcap_data_access_group column of a data tibble

(only present if the project has data access groups enabled)

Arm

An ordered group of events. Arms provide a mechanism that allows one longitudinal project to have multiple different sequences of events defined. ↩︎

Block matrix

A rectangular data structure (matrix) that is constructed from multiple smaller rectangular data structures (blocks). In the context of REDCap, the block matrix is the rectangular data set that contains data from multiple instruments returned by the REDCap API. ↩︎

Column

A vertical series of cells in a data frame or tibble. Synonym: Variable. See also: Row. ↩︎

Composite primary key

A primary key is a column in a table that is distinct in each row and serves to identify each row. A composite primary key is a primary key that consists of multiple columns that in combination are distinct in each row and serve to identify each row. Taken together, the identifier columns of the data tibble form a composite primary key. This makes it easy to join data tibbles together. ↩︎

Choice

An option or category defined in the context of a single-answer or multi-answer categorical field type in REDCap. You can define choices using the REDCap Field Editor. Choices have a raw value (a unique identifier - usually a serial number but this can be changed) and a choice label (a human readable description of the choice, which is displayed during data entry).

In the context of REDCapTidieR, choices come into play in two scenarios during the construction of the data tibble. Choice labels of single-answer type fields (dropdown and radio) are used to define the values of data columns that are derived from those fields. Raw values of the multi-answer checkbox field are used to construct the names of data columns derived from them. ↩︎

Classic project

Also known as a traditional project, this the simplest type of REDCap project. You can define one or multiple instruments (also called forms) for data entry. Both repeating and nonrepeating instruments are allowed. Nonrepeating instruments are completed only once for each record. For nonrepeating instruments, one row of data in the data tibble represents one record. Repeating instruments can be completed an arbitrary number of times for each record. For repeating instruments, one row of data in the data tibble represents one repeat instance of one record. See also: Longitudinal project. ↩︎

Data Access Group

The Data Access Group (DAG) feature of REDCap streamlines multi-group collaboration by partitioning groups of records of a single project. This feature is particularly useful when you want certain users or groups of users to only have access to a specific subset of the data in a project.

In a multi-site study, for instance, you might want each site to only have access to their own data. By setting up a DAG for each site, you can ensure that site users can only view and edit records that belong to their DAG. Super users (i.e., those with full privileges) can view and edit all records in the project, regardless of the DAG to which they belong.

When a project has DAGs enabled, a corresponding redcap_data_access_group column identifies which DAG a given record belongs to.

Database

In the context of REDCap, this is the same as project. We prefer the term “project” because it is has a more specific meaning. ↩︎

Data column

A column of the data tibble that is derived from data that were entered into the fields of a REDCap instrument. ↩︎

Data tibble

A tibble that contains data that were entered into the fields of a specific REDCap instrument. The redcap_data column of the supertibble contains the data tibbles of a project. The columns of the data tibble include identifier columns that jointly identify each row and data columns that contain data that was entered into REDCap. REDCapTidieR provides several functions to extract data tibbles from the supertibble. See also: Metadata tibble. ↩︎

Data viewer

A part of the RStudio IDE functionality that allows you to inspect data frames, tibbles, and some other data structures. It includes features to perform basic exploratory data analysis such as sorting, filtering, and searching. The supertibble is designed to work well with the data viewer. ↩︎

Environment

A fundamental data structure in R that allows binding a set of names to a set of objects. The global environment is the namespace in which you bind objects such as values and tibbles during interactive work. The bind_tibbles() function takes a supertibble and binds its data tibbles to the global environment. ↩︎

Event

A part of a longitudinal project. Each event can be associated with one or multiple instruments and may be either repeating or nonrepeating. ↩︎

Factor variable

A data type in R for categorical data. By default, single-answer categorical REDCap field types (dropdown, radio) are represented as factor variables in the data tibble. ↩︎

Field

An attribute about an entity (e.g., age or height) that can be captured in REDCap. Instruments are made up of fields. You can configure the fields of an instrument using the REDCap Field Editor. Fields have a field type and can have a descriptive field label. The data tibble contains the data entered into the fields of a REDCap project. ↩︎

Field label

A piece of text that acts as the prompt for data entry in REDCap. The make_labelled() function creates variable labels based on the field label. ↩︎

Field type

The data type of the data that can be entered into a specific field. Important field types include:

Form

In the context of REDCap, this is the same as an instrument. We prefer the term “instrument” because it has a more specific meaning than “form.” ↩︎

Format helper

A function provided by REDCapTidieR designed to help turning field labels of data columns into pretty variable labels. See format-helpers. ↩︎

Granularity

The level of detail that a specific row in a data tibble represents. This depends on the structure of the project (classic vs. longitudinal vs. longitudinal with arms), the structure of the instrument (repeating vs. nonrepeating), and, for longitudinal projects, the structure of the event (repeating vs. nonrepeating). For example, a data tibble containing data from a nonrepeating instrument in a longitudinal project with two arms has a granularity of one row per record per event per arm. See also: the section Longitudinal REDCap projects in the Diving Deeper vignette. ↩︎

Identifier column

A column in the data tibble that serves to partially identify the entity described in a row. The record ID column is always present in the data tibble. Depending on the structure of the project (classic vs. longitudinal vs. longitudinal with arms), the structure of the instrument (nonrepeating vs repeating), and the structure of the event (repeating vs. nonrepeating) there may be additional identifier columns, including redcap_event, redcap_arm, redcap_form_instance, and redcap_event_instance. Taken together, the identifier columns form a composite primary key. See also: the section Longitudinal REDCap projects in the Diving Deeper vignette. ↩︎

Import

In the context of REDCapTidieR, this is the process of using the REDCap API to query data from a REDCap project to make it available inside the R environment. We use the term “import” in the sense described in R for Data Science which is to “take data stored in a file, database, or web application programming interface (API), and load it into a data frame in R.” Of note, the term “import” is ambiguous. From the perspective of REDCap, “import” may mean writing external data into the database. To clarify the direction of the import, we have named the main function of REDCapTidieR read_redcap() which is analogous to other import functions in the tidyverse such as read_csv(). You can use the read_redcap() function to import data from a REDCap project. ↩︎

Instrument

Also called form. An electronic data entry form in REDCap. An instrument contains fields into which data can be entered. In the supertibble, each row corresponds to one instrument. The instrument’s name and human-readable label are shown in the redcap_form_name and redcap_form_label columns of the supertibble, respectively. A data tibble contains all the data that was entered into a specific instrument. ↩︎

labelled

The labelled R package provides functions to attach a human-readable description (a label) to a variable (a variable label). Labelled data can streamline data exploration and assist with the generation of a data dictionary. There are multiple packages that support labelled data. The make_labelled() function attaches variable labels to the variables of a supertibble and the variables of the data tibbles and metadata tibbles contained in that supertibble. ↩︎

List column

A list is a fundamental data type in R. A tibble can contain columns that are lists, and these columns are called list columns. REDCapTidieR leverages list columns to store tibbles inside of the supertibble. For example, the redcap_data column of the supertibble is a list column that contains data tibbles, and redcap_metadata is a list column that contains metadata tibbles. ↩︎

Longitudinal project

A type of REDCap project that contains events and optionally arms. One instrument can be associated with multiple events. This makes it possible to collect the same kind of data for the same record multiple times, which is useful for longitudinal research studies with multiple study visits. See also: Classic project. ↩︎

Metadata tibble

A tibble that contains metadata about a specific REDCap instrument. The redcap_metadata column of the supertibble contains the metadata tibbles of a project. The rows of the metadata tibble represent fields of the instrument. The columns represent attributes of those fields. For example, the field_name, field_label, and field_type columns show the field’s name, a human-readable description (the field label), and its field type. ↩︎

Nonrepeating Event

An event whose associated instruments can be filled out exactly once per record per event (and per arm, if applicable). See also: Repeating Event. ↩︎

Nonrepeating Instrument

An instrument that can be filled out exactly once per record in a classic project and once per record per event instance (and per arm, if applicable) in a longitudinal project. See also: Repeating Instrument. ↩︎

Project

Also called a database, a REDCap project is a self-contained collection of all the of data and metadata related to some data collection activity (for example, a specific research study). A project may be classic or longitudinal. A classic project consists of instruments that contain fields. A longitudinal project may additionally include events and arms. You can use read_redcap() to import the data from a project. ↩︎

Record

The set of information about a single entity (e.g., a study participant) for which data is being captured in a specific REDCap project. Each record consists of a discrete data values organized into fields that can be spread across multiple instruments, events, and/or arms. Each record has a unique record ID. In the data tibble, the record ID is always the first column. The record ID column is one of the identifier columns. ↩︎

REDCap API

The application programming interface (API) of a REDCap instance allows external programs to connect, upload, and download data. To access the REDCap API, a user must have appropriate access privileges, an API token, and the uniform resource identifier (URI) of the API endpoint (something like “my.institution.edu/redcap/api”). The REDCapTidieR package uses REDCapR to query the REDCap API. ↩︎

REDCapR

The REDCapR R package provides functions to interact with the REDCap API. REDCapTidieR builds on REDCapR to import data into R. ↩︎

Repeating Event

An event whose associated instruments can be filled out zero, one, or multiple times per record per event (and per arm, if applicable). Note: REDCap does not allow repeating instruments inside repeating events. See also: Nonrepeating Event. ↩︎

Repeating Instrument

An instrument that can be filled out zero, one, or multiple times per record in a classic project and zero, one, or multiple times per record per event (and per arm, if applicable) in a longitudinal project. Note: REDCap does not allow repeating instruments inside repeating events. See also: Nonrepeating Instrument. ↩︎

Row

A horizontal series of cells in a data frame or tibble. One row of a supertibble represents an instrument. One row of a data tibble can represent different things, depending on the granularity of the data. See also: Column. ↩︎

skimr

The skimr R package provides summary statistics to help users quickly skim and understand their data. REDCapTidieR’s add_skimr_metadata() function uses skimr to add various summary statistics of a specific field to the metadata tibbles. See also: the section Adding summary statistics to the metadata with the skimr package in the Getting Started vignette.

Structure

The structure of an instrument can be repeating, nonrepeating, or mixed. The supertibble shows the instrument’s structure in the structure column. The structure of a project can be classic, longitudinal, or longitudinal with arms. The structure of an event can be repeating or nonrepeating. As of REDCapTidieR v1.1.0, mixed structure instruments are supported. The granularity of a data tibble depends on the structure of all three: the instrument, the project, and the events associated with the instrument. Note: REDCap does not allow repeating instruments inside a repeating event. See also: the section Longitudinal REDCap projects in the Diving Deeper vignette. ↩︎

Supertibble

A special tibble that contains data and metadata of a REDCap project returned by the read_redcap() function. Each row of the supertibble corresponds to one instrument. The redcap_form_name and redcap_form_label columns identify the instrument. The redcap_data and redcap_metadata contain the instrument’s data tibble and metadata tibble. Additional columns contain useful information about the data tibble, such as row and column counts, size in memory, and the percentage of missing values in the data. ↩︎

Survey

A special kind of instrument that can be completed by someone who is not a user on a REDCap project. ↩︎

Tibble

A variant of the R data frame that makes data analysis in the tidyverse a little easier. The data structures generated by REDCapTidieR are based on tibbles. See also: chapter on Tibbles in R for Data Science. ↩︎

Tidy

The term “tidy” is part of REDCapTidieR’s name because it underlies two key ideas of the package.

The first is the concept of Tidy Data. A rectangular data structure is tidy if:

  1. Each variable forms a column
  2. Each observation forms a row
  3. Each type of observational unit forms a table (i.e. the granularity of rows in a table is consistent)

Data returned by the REDCap API (the “block matrix”) often satisfies the first two requirements of tidy data. However, if the project contains both repeating and nonrepeating instruments and/or events then the granularity is inconsistent from row to row. A key function of the REDCapTidieR package is to break down the block matrix by instrument. The resulting set of data tibbles tends to be tidier than the block matrix, because the granularity within each individual data tibble is consistent. This makes it easy to work with them.

The second is the idea of Tidy Tools, which is a set of design guidelines for the packages of the Tidyverse. Tidy tools should follow the following principles:

  1. Reuse existing data structures.
  2. Compose simple functions with the pipe.
  3. Embrace functional programming.
  4. Design for humans.

We strive to follow these principles in the design of the REDCapTidieR package. ↩︎

Variable

A column of a data frame or tibble. See also: Column. ↩︎

Variable label

A human-readable description (label) attached to a variable. See also: labelled. ↩︎