The purpose of the sendigR
package is to extract data from a set of nonclinical studies stored in CDISC SEND format in a database to be used for cross study analysis. It can either be done in a script by execution of a set of functions from the package to extract data for further processing or by execution an encapsulated R Shiny application.
The package supports currently these types of databases:
This vignettes introduces how to use sendigR
to:
Familiarity with the CDISC SEND data model and at least basic knowledge about animal studies is necessary to follow this vignette and to use the package.
Before we are ready to use the functions in the package, we must ensure that a minimum of prerequisites are fulfilled.
Beside the required packages listed in the Imports section of DESCRIPTION, one or more additional package(s) must be installed:
If an existing SEND database is to be accessed, it must contain a set of tables and columns representing the domains and variables described in the SEND IG version 3.0 or 3.1 - or a union of the two versions.
It is required that all names and basic data types (i.e. numerical or character) are as described the the SEND IG.
If study data are to be imported into a SQLite database, data for each study must be located in a separate folder as SAS transport files - also konw as XPT files.
Multiple studies may be located together in a folder hierarchy as described below.
Several of the function in the packages uses CDISC SEND terminology code list to verify validity of the SEND data. The package includes the values for the relevant code lists extracted from the newest CT version at build time. If it is relevant to use another version - newer or older - this version must be downloaded in Excel format from NCI or CDISC library archives and saved in any folder which is accessible from the R script(s) using this package.
The package contains 5 sets of functions covering different areas - each of these are described below, including detailed examples.
Before any use of the package’s function, a database must be opened and the database must be closed again as the final action.
Function | Description |
---|---|
initEnvironment | Initialize the environment. It must be executed as the absolute first function before any other package functions. It opens an existing or create a new empty database and returns a handle to the database, which must be used in all following function calls. |
disconnectDB | Close the open database. It must be called as the absolute last function in a session. |
The call of initEnvironment
returns a handle with a reference to the open database and pointers to low level functions accessing the specific type of database. This handle must be given as the first input parameter in the execution of all other function in the package. It is possible to open and access multiple databases - also of different types - in the same R session. Just execute initEnvironment
for each relevant database and save the returned db handles in each distinct variable.
Examples in the following sections demonstrates the use of these functions in a complete work flow.
It is possible to build an SQLite database with SEND data imported from a set of studies.
Function | Description |
---|---|
dbCreateSchema | Create a SEND schema, i.e. all the tables to represent the domains documented in SEND IG, in an open and empty database |
dbCreateIndexes | Create a set of indexes on the tables in a SEND database to optimize performance of extraction of data |
dbImportOneStudy | Import SEND study data in SAS xport format into a SEND database from a single study folder |
dbImportStudies | Import SEND study data in SAS xport format into a SEND database from a hierarchy study folders |
dbDeleteStudies | Delete one or more studies in SEND database |
It should be possible to import data into an existing SQLite SEND database using dbImportOneStudy
and dbImportStudies
, even though the tables are not created by dbCreateSchema
. It is possible to delete studies in any SQLite SEND database dbDeleteStudies
- whether or not the tables are created by dbCreateSchema
.
To make it possible to import data for a set of studies with dbImportStudies
, the input data should be saved in folder structure like this:
/path/to/SEND/datasets
+-- study01
| +-- ts.xpt
| +-- dm.xpt
| +-- ex.xpt
| +-- etc.
+-- study02
| +-- ts.xpt
| +-- dm.xpt
| +-- ex.xpt
| +-- etc.
+-- proj1234
+-- study11
| +-- ts.xpt
| +-- dm.xpt
| +-- ex.xpt
| +-- etc.
+-- study12
| +-- ts.xpt
| +-- dm.xpt
| +-- ex.xpt
| +-- etc.
+-- etc.
It is recommended to download DB Browser for SQLite to look at the data imported into the database.
Create a new database, create all the SEND domain tables and import data for a set of studies XPT files saved in a folder hierarchy - create a set of indexes to optimize data extraction performance for the sendigR functions:
# The database is created by the call of initEnvironment with the parameter dbCreate = TRUE:
<- initEnvironment(dbType='sqlite',
dbToken dbPath='/path/to/db/send.db',
dbCreate=TRUE)
# The tables must be created before any study can be imported
dbCreateSchema(dbToken)
<- dbImportStudies(dbToken, '/path/to/SEND/datasets',
status # Print contiously the status for import each study:
verbose = TRUE,
# sand save the status in a log file:
logFilePath = '/path/to/log file')
# Create a set of indexes to increase query performance for the data extraction functions
# - they may be created before of after import of data
dbCreateteIndexes(dbToken)
disconnectDB(dbToken)
Open an existing database and import data for a single study, allow it to replace potential existing data for the the same study in the database. Delete another study from the database:
<- initEnvironment(dbType='sqlite', dbPath='/path/to/db/send.db')
dbToken
dbImportOneStudy(dbToken, '/path/to/SEND/datasets/proj1234/study11', overWrite=TRUE)
dbDeleteStudies(dbToken, 'study99')
disconnectDB(dbToken)
These functions are divided into different groups:
genericQuery
gives the possibility to execute a SQL query against the database, i.e.it executes any user defined select statement and returns the selected rows. It is used internally by the functions described below, but can also be used directly to fetcg rows which ar e not possible with any of the specific extraction functions described below.TS
subset of input studies. In the latter case, the extraction/filtering is based on the full TS
table.getStudies
TSPARMCD
, where TSPARMCD
is the specific TSPARMCD value handled by the function - e.g. STSTDTC
.getControlSubj
, which extracts and return the set of control animals for a given input set of study id values.getSubj
attributes
where attributes
is the specific attribute handled by the function - e.g. Sex
.getSubjData
, which extracts and return all rows for a given input set of animals from a specified subject level table - it can be any SEND table including USUBJID
column.getFindings
attributes
where attributes
is the specific attribute handled by the function - e.g. Phase
.The functions rely on the data.table package - and all input/output sets of data are data tables.
The complete list of functions:
Function | Description |
---|---|
genericQuery | Execute database query and returns fetched rows |
getStudiesSDESIGN | Extract a list of SEND studies with a specified study design |
getStudiesSTSTDTC | Extract a list of SEND studies with study start date within a specified interval |
getControlSubj | Extract a list of control animals for a list of studies |
getSubjRoute | Extract the set of animals of the specified route of administration - or just add actual route of administration for each animal. |
getSubjSpeciesStrain | Extract the set of animals of the specified species and strain - or just add the species and strain for each animal. |
getSubjSex | Extract the set of animals of the specified sex - or just add the sex of each animal. |
getSubjData | Extract data from a subject level domain. |
getFindingsPhase | Extract a set of findings for a specified study phase - or just add phase for each animal |
getFindingsSubjAge | Add the subject age at finding time - and optionally extract the set of findings within a specified range of ages |
gen_vocab | Create json file for vocabulary mappings |
standardize_file | Standardizes SEND xpt files using CDISC controlled terminologies |
All the functions have an input parameter called inclUncertain
, default value is FALSE
.
If a function which is doing any kind of filtering is executed with:
inclUncertain = FALSE
getSubjSpeciesStrain
cannot decide the species for an animal because the value does not match a value in the CDISC SEND CT code list ‘value SPECIES, the animal is excluded even though it could be a simple misspelling like ’RAAT’ instead of ‘RAT’inclUncertain = TRUE
UNCERTAIN_MSG
is included in the output data set - it contains an explanation of the uncertainty for the uncertain rows.function id
:
explanation
, where function id
indicates from which function the message originates from, e.g. SpeciesStrain
for function getSubjSpeciesStrain
. Multiple messages from same function are separated by &
.UNCERTAIN_MSG
variable, uncertain messages are added to this variable. Messages from multiple functions are concatenated and separated by |
.As mentioned above, some of the functions may be executed with no filter conditions just to add the relevant attributes to a set of data, e.g. a SEX
variable is added to the output data set by execution of getSubjSex
. In these cases, the additional variable is populated with the found value for all rows, also if the value cannot be confidently identified - e.g. when a DM.SEX value like ‘Male’ does not match any value in the CDISC SEND CT code list value SEX.
These functions have an input parameter called noFilterReportUncertain
, which is equivalent to the inclUncertain
parameter described above.
If one of these functions is executed with no filter condition and noFilterReportUncertain = FALSE
(default), no explanation of whether the identified values are confident or not is included.
If a function is executed with no filter condition and noFilterReportUncertain = TRUE
, a NOT_VALID_MSG
variable is included in the output data set with an explanation of the reason for not confident values which may for found.
The format of this variable is equivalent to the UNCERTAIN_MSG
variable described above.
The intention with these functions are to build a work flow or pipeline to extract set(s) of animals, findings or other kinds of subject level data to be used for further analysis or similar - see an example below.
Extract data to make this analysis:
What are the most common microscopic findings in control Sprague-Dawley male rats 10 to 14 weeks old at finding time that were dosed by oral or oral gavage. Include parallel studies after 2016 and findings from the treatment phase. Include uncertain records.
<- initEnvironment(dbType='sqlite',
db dbPath='/path/to/db/send.db')
# Extract the relevant set of studies into data.table 'studies'
<-
studies getStudiesSDESIGN(db,
studyDesignFilter = 'PARALLEL',
inclUncertain = TRUE) %>%
getStudiesSTSTDTC(db, .,
fromDTC = '2016',
inclUncertain = TRUE)
# Extract the complete set of control animals for the set of extracted studies
# into data.table 'controlAnimals':
<-
controlAnimals getControlSubj(db, studies,
inclUncertain = TRUE)
# Extract all
# 1: the males
# 2: Sprague-Dawley rats
# 3: animals dosed by oral or oral gavage
# from the set of control animals into data.table 'animals':
<-
animals getSubjSex(db, controlAnimals,
sexFilter = 'M',
inclUncertain = TRUE) %>%
getSubjSpeciesStrain(db, .,
speciesFilter = 'RAT',
strainFilter = 'Sprague-Dawley',
inclUncertain = TRUE) %>%
getSubjRoute(db, .,
routeFilter = c('ORAL', 'ORAL GAVAGE'),
inclUncertain = TRUE)
# Extract microscopic findings
# 1: all MI rows for the extrated set of animals
# 2: keep the MI rows from the treatment phase
# 3: keep the MI rows where the animals are between 10 and 14 weeks at the time
# of finding
# into data.table 'dataMI':
<-
dataMI getSubjData(db, animals, 'MI') %>%
getFindingsPhase(db, .,
phaseFilter = "treatment",
inclUncertain = TRUE) %>%
getFindingsSubjAge(db, .,
fromAge = '10w',
toAge = '14w',
inclUncertain = TRUE)
disconnectDB(db)
The resulting data.table dataMI
contains all the extracted findings fulfilling the wanted conditions and can be used for further analysis.
It includes both findings
animal.UNCERTAIN_MSG
columndataMI.UNCERTAIN_MSG
column.The sendigR package contains an encapsulated Shiny app - SendDashboard. The app enables the user to search and extract historical control data in a SEND database.
Function | Description |
---|---|
execSendDashboard | Starts the SendDashboard app |
Connect to the database, start the app. Remember to disconnect from database when leaving the app.
library(sendigR)
<- initEnvironment(dbType='sqlite', dbPath='/path/to/db/send.db')
dbToken
execSendDashboard(dbToken)
disconnectDB(dbToken)