Directly quoting from Fornes O, Castro-Mondragon JA, Khan A, et al:
JASPAR (https://jaspar.elixir.no) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release.
source:
Fornes O, Castro-Mondragon JA, Khan A, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2019; doi: 10.1093/nar/gkz1001
JASPAR is a database of transcription factor binding matrices with annotations and metadata. These entities are organized in a hierarchical fashion that we will explore next.
In addition to the latest JASPAR database release (2020), other
active releases are also available. Most of the rbioapi JASPAR functions
have a release
argument that allows you to use other
database releases.
Within a release, Matrix profiles are organized into collections, You
can use rba_jaspar_collections()
to get a list of available
collections, or read “JASPAR Collections” section in documentation page in JASPAR web-site
for a thorough review.
## To get a list of available collection in release 2020:
rba_jaspar_collections(release = 2020)
#> name url
#> 1 CORE https://jaspar.elixir.no/api/v1/collections/CORE/
#> 2 UNVALIDATED https://jaspar.elixir.no/api/v1/collections/UNVALIDATED/
## You can list information of all matrices available in a collection:
mat_in_core_2020 <- rba_jaspar_collections_matrices(collection = "CORE")
Within each collection, the matrix profiles are organized based on main taxonomic groups:
## To get a list of taxonomic groups in release 2020:
rba_jaspar_taxons(release = 2020)
#> name url
#> 1 plants https://jaspar.elixir.no/api/v1/taxon/plants/
#> 2 vertebrates https://jaspar.elixir.no/api/v1/taxon/vertebrates/
#> 3 insects https://jaspar.elixir.no/api/v1/taxon/insects/
#> 4 urochordates https://jaspar.elixir.no/api/v1/taxon/urochordates/
#> 5 nematodes https://jaspar.elixir.no/api/v1/taxon/nematodes/
#> 6 fungi https://jaspar.elixir.no/api/v1/taxon/fungi/
#> 7 diatoms https://jaspar.elixir.no/api/v1/taxon/diatoms/
#> 8 trematodes https://jaspar.elixir.no/api/v1/taxon/trematodes/
#> 9 dictyostelium https://jaspar.elixir.no/api/v1/taxon/dictyostelium/
#> 10 cnidaria https://jaspar.elixir.no/api/v1/taxon/cnidaria/
#> 11 oomycota https://jaspar.elixir.no/api/v1/taxon/oomycota/
## You can list information of all matrices available in a taxonomic group:
mat_in_insects <- rba_jaspar_taxons_matrices(tax_group = "insects")
As we go down in the data organization hierarchy, Each taxonomic group consist of species:
## To get a list of species in release 2020:
species <- rba_jaspar_species(release = 2020)
head(species)
#> tax_id species
#> 1 5037 Ajellomyces capsulatus
#> 2 4151 Antirrhinum majus
#> 3 81972 Arabidopsis lyrata subsp. lyrata
#> 4 3702 Arabidopsis thaliana
#> 5 9913 Bos taurus
#> 6 6238 Caenorhabditis briggsae
#> url
#> 1 https://jaspar.elixir.no/api/v1/species/5037/
#> 2 https://jaspar.elixir.no/api/v1/species/4151/
#> 3 https://jaspar.elixir.no/api/v1/species/81972/
#> 4 https://jaspar.elixir.no/api/v1/species/3702/
#> 5 https://jaspar.elixir.no/api/v1/species/9913/
#> 6 https://jaspar.elixir.no/api/v1/species/6238/
#> matrix_url
#> 1 https://jaspar.elixir.no/api/v1/species/5037/
#> 2 https://jaspar.elixir.no/api/v1/species/4151/
#> 3 https://jaspar.elixir.no/api/v1/species/81972/
#> 4 https://jaspar.elixir.no/api/v1/species/3702/
#> 5 https://jaspar.elixir.no/api/v1/species/9913/
#> 6 https://jaspar.elixir.no/api/v1/species/6238/
## You can list information of all matrices available in a specie:
mat_in_human <- rba_jaspar_species_matrices(tax_id = 9606)
Retrieving a list of every matrix available in a given category is
not the only option. You can also build a search query using
rba_jaspar_matrix_search
. Note that this is a search
function, you are not required to fill every argument. You may use any
combination of arguments you see fit to build your query. You can even
call the function without any argument to get a list of all the matrix
profiles. For instance:
## Get a list of all the available matrix profile:
all_matrices <- rba_jaspar_matrix_search()
## Search FOX:
FOX_matrices <- rba_jaspar_matrix_search(term = "FOX")
## Transcription factors named FOXP3
FOXP3_matrices <- rba_jaspar_matrix_search(term = "FOXP3")
## Transcription factors of Zipper-Type Class
zipper_matrices <- rba_jaspar_matrix_search(tf_class = "Zipper-Type")
## Transcription factors of Zipper-Type Class in PBM collection
zipper_pbm_matrices <- rba_jaspar_matrix_search(tf_class = "Zipper-Type",
collection = "PBM")
Since JASPAR release 2010, the matrix profiles are versioned. A
matrix profile Identifier has a “base_id.version” naming schema; for
example “MA0600.2” corresponds to the second version of a matrix with
base ID MA0600. You can Use rba_jaspar_matrix_versions
to
get a list of matrix profiles with a given base ID. Also note that some
functions, generally those that are used to list available matrices,
have an argument called only_last_version
.
Now that you listed or searched for matrix profiles, you can use
rba_jaspar_matrix
to retrieve matrix profiles. There are
two ways in which you can use this function:
To do that, only fill in the matrix_id
argument in
rba_jaspar_matrix
pfm_matrix <- rba_jaspar_matrix(matrix_id = "MA0600.2")
## you can find the matrix in the pfm element along with
## other elements which correspond to annotations and details
str(pfm_matrix)
#> List of 24
#> $ matrix_id : chr "MA0600.2"
#> $ name : chr "RFX2"
#> $ base_id : chr "MA0600"
#> $ version : int 2
#> $ collection : chr "CORE"
#> $ sequence_logo: chr "https://jaspar.elixir.no/static/logos/svg/MA0600.2.svg"
#> $ versions_url : chr "https://jaspar.elixir.no/api/v1/matrix/MA0600/versions"
#> $ sites_url : NULL
#> $ pfm : num [1:4, 1:16] 1381 5653 4042 2336 270 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:4] "A" "C" "G" "T"
#> .. ..$ : NULL
#> $ class : chr "Fork head/winged helix factors"
#> $ family : chr "RFX-related factors"
#> $ tfe_id : list()
#> $ medline : chr "8754849"
#> $ pazar_tf_id : list()
#> $ remap_tf_name: chr "RFX2"
#> $ source : chr "23332764"
#> $ tax_group : chr "vertebrates"
#> $ type : chr "HT-SELEX"
#> $ tfe_ids : list()
#> $ pubmed_ids : chr "8754849"
#> $ pazar_tf_ids : list()
#> $ uniprot_ids : chr "P48378"
#> $ species :'data.frame': 1 obs. of 2 variables:
#> ..$ tax_id: int 9606
#> ..$ name : chr "Homo sapiens"
#> $ tffm :List of 7
#> ..$ tffm_id : chr "TFFM0576.1"
#> ..$ base_id : chr "TFFM0576"
#> ..$ version : int 1
#> ..$ log_p_1st_order: num 6275
#> ..$ log_p_detailed : num 6660
#> ..$ experiment_name: chr "CistromeDB_58298"
#> ..$ tffm_url : chr "https://jaspar.elixir.no/api/v1/tffm/TFFM0576.1/"
JASPAR provides position frequency matrices (PFM) formatted as
Raw PFM, JASPAR,
TRANSFAC, YAML, and
MEME. You can download a matrix profile as a file with
any of these formats. To do that, You should use the
file_format
and save_to
arguments available in
rba_jaspar_matrix
. There are two notes here:
In this case, the function will save your matrix as a file and returns the un-parsed content of the file as a character string.
The save_to
argument in this function, and in fact
through any rbioapi function can be used in many ways:
2.1. save_to = NA: rbioapi will automatically generate a file path under
your working directory, save the file in that path , and informs you
with a message.
2.2 save_to = file_name without path: rbioapi will save the file with
your supplied name in your working directory.
2.3. save_to = a directory path (without file): rbioapi will save the
file with a proper name in that directory.
2.4. save_to = a file path (i.e. ending with .extension): rbioapi will
save the file exactly to this path. Make sure that the file extension of
the path matches your requested file format. If this was not the case,
rbioapi will save the file with the extension supplied in the path, but
issues a warning to inform you about that.
In any of the aforementioned cases, the file path can be absolute or relative.
## Different wqays in which you can save the matrix file:
meme_matrix1 <- rba_jaspar_matrix(matrix_id = "MA0600.2",
file_format = "meme")
meme_matrix2 <- rba_jaspar_matrix(matrix_id = "MA0600.2",
file_format = "meme",
save_to = "my_matrix.meme")
meme_matrix3 <- rba_jaspar_matrix(matrix_id = "MA0600.2",
file_format = "meme",
save_to = "c:/rbioapi")
meme_matrix4 <- rba_jaspar_matrix(matrix_id = "MA0600.2",
file_format = "meme",
save_to = "c:/rbioapi/my_matrix.meme")
JASPAR also stores and assigns identifiers to TF flexible models (TFFMs). Just like PFM (position frequency matrices), you can search TFFMs or retrieve information and annotations using a TFFM Identifier. TFFM IDs are versioned, meaning that they are in base_id.version format.
## Search TFFMs. This is a search function. Thus, what has been presented
## in `Search Matrix Profiles` section also applies here:
## Get a list of all the available matrix profile:
all_tffms <- rba_jaspar_tffm_search()
## Search FOX:
FOX_tffms <- rba_jaspar_tffm_search(term = "FOX")
## Transcription factors named FOXP3
FOXP3_tffms <- rba_jaspar_tffm_search(term = "FOXP3")
## Transcription factors of insects taxonomic group
insects_tffms <- rba_jaspar_tffm_search(tax_group = "insects")
## Now that you have a TFFM ID, you can retrieve it
TFFM0056 <- rba_jaspar_tffm("TFFM0056.3")
str(TFFM0056)
#> List of 10
#> $ tffm_id : chr "TFFM0056.3"
#> $ base_id : chr "TFFM0056"
#> $ version : int 3
#> $ matrix_base_id : chr "MA0039"
#> $ matrix_id : chr "MA0039.4"
#> $ matrix_url : chr "https://jaspar.elixir.no/api/v1/matrix/MA0039.4/"
#> $ matrix_version : int 4
#> $ experiment_name: chr "CistromeDB_33718"
#> $ first_order :List of 5
#> ..$ log_p : num 7420
#> ..$ dense_logo : chr "https://jaspar.elixir.no/static/TFFM/TFFM0056.3/TFFM_first_order_trained_dense_logo.svg"
#> ..$ summary_logo: chr "https://jaspar.elixir.no/static/TFFM/TFFM0056.3/TFFM_first_order_trained_summary_logo.svg"
#> ..$ hits : chr "https://jaspar.elixir.no/static/TFFM/TFFM0056.3/TFFM_first_order_trained.hits.svg"
#> ..$ xml : chr "https://jaspar.elixir.no/static/TFFM/TFFM0056.3/TFFM_first_order_trained.xml"
#> $ detailed :List of 5
#> ..$ log_p : num 6854
#> ..$ dense_logo : chr "https://jaspar.elixir.no/static/TFFM/TFFM0056.3/TFFM_detailed_trained_dense_logo.svg"
#> ..$ summary_logo: chr "https://jaspar.elixir.no/static/TFFM/TFFM0056.3/TFFM_detailed_trained_summary_logo.svg"
#> ..$ hits : chr "https://jaspar.elixir.no/static/TFFM/TFFM0056.3/TFFM_detailed_trained.hits.svg"
#> ..$ xml : chr "https://jaspar.elixir.no/static/TFFM/TFFM0056.3/TFFM_detailed_trained.xml"
To cite JASPAR (Please see https://jaspar.elixir.no/faq/):
To cite rbioapi:
#> R version 4.3.3 (2024-02-29 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22631)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: Europe/Brussels
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rbioapi_0.8.1
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.35 R6_2.5.1 fastmap_1.1.1 xfun_0.43
#> [5] magrittr_2.0.3 cachem_1.0.8 knitr_1.45 htmltools_0.5.8
#> [9] rmarkdown_2.26 lifecycle_1.0.4 DT_0.32 cli_3.6.2
#> [13] sass_0.4.9 jquerylib_0.1.4 compiler_4.3.3 httr_1.4.7
#> [17] rstudioapi_0.16.0 tools_4.3.3 curl_5.2.1 evaluate_0.23
#> [21] bslib_0.6.2 yaml_2.3.8 htmlwidgets_1.6.4 rlang_1.1.3
#> [25] jsonlite_1.8.8 crosstalk_1.2.1