TFactSR 0.99.0
TFactS is to predict which are the transcription factors (TFs), regulated in a biological condition based on lists of differentially expressed genes (DEGs) obtained from transcriptome experiments. This package is based on the TFactS concept and expands it. It allows users to performe TFactS-like enrichment approach. The package can import and use the original catalogue file from the TFactS website as well as users’ defined catalogues of interest that are not supported by TFactS (e.g., Arabidopsis).
This vignette is largely based on the TFactS manual. For the details about TFactS, please also see the original paper by Essaghir et al. (2010).
Briefly, the current package assumes the Sign-Less catalogue, i.e. it does not contain any regulation type information (up- or down-regulation). TFactSR compares the list of query DEGs (up and/or down) with a catalogue of target gene signatures. The core algorithm is based on Fisher’s exact test using a contingency table as follows:
TF | DEGs: Present | DEGs: Absent | Total |
---|---|---|---|
Catalogue: Present | k | m - k | m |
Catalogue: Absent | n - k | N + k - n - m | N - m |
Total | n | N - n | N |
\[ Pval = \left( \begin{array}{c} m \\ i \end{array} \right) \left( \begin{array}{c} N-m \\ n-i \end{array} \right) / \left( \begin{array}{c} N \\ n \end{array} \right) \]
E-value is the number of tests done (\(T\)) times the p-value.
\(Eval = pval \times T\)
Benjamini and Hochberg false discovery rate (FDR) controlling method: this is based on Benjamini and Hochberg (1995) and is calculated using p.adjust() function. Note that the current TFactSR package does not use Q-value (Storey 2003) under default settings.
RC is the percentage of which a TF is called significant under a certain E-value threshold after a random simulation of user lists in specified number of repetitions:
\[ RD_{(TF)} = \frac{\#\left\{ Eval(TF) \leq \lambda \right\} \times 100} { \#\left\{rep\right\} } \]
The TFactSR package requires (1) a list of DEGs and (2) a catalogue of interest. For Arabidopsis, we prepared the catalogue based on AtRegNet and ATRM. For human data, the package can do the calculation using default settings.
The Supported organisms by the original TFactS are human, rat and mouse genes. As you can see below, you can perform an enrichment analysis which TFs are regulated if you have a list of DEGs and your catalogue.
For human/rat/mouse data, we can do the TFactS analysis as follows.
library(TFactSR)
data(DEGs)
data(catalog)
tftg <- extractTFTG(DEGs, catalog)
TFs <- tftg$TFs
all.targets <- tftg$all.targets
res <- calculateTFactS(DEGs, catalog, TFs, all.targets)
head(res)
## TFs m n N k p.value e.value FDR.BH RC
## 8 FOXO3 78 18 6838 7 5.499085e-10 1.594735e-08 1.594735e-08 2
## 7 FOXO1 161 18 6838 7 9.012711e-08 2.613686e-06 1.306843e-06 1
## 9 FOXO4 9 18 6838 2 2.330683e-04 6.758980e-03 2.252993e-03 0
## 12 IRF9 3 18 6838 1 7.877425e-03 2.284453e-01 5.711133e-02 0
## 23 STAT1 61 18 6838 2 1.092615e-02 3.168585e-01 6.329948e-02 0
## 19 SMAD5 5 18 6838 1 1.309644e-02 3.797969e-01 6.329948e-02 0
Using the option “TF.col” and “TF.col”, we can specify the target column of your catalogue dataset. Carefully you have to choose the TF-target relationships as follows.
data(AtCatalog)
data(GenesUp_SH1H)
d <- extractTFTG(GenesUp_SH1H, AtCatalog,
TF.col = "TF",
TG.col = "target.genes")
res <- calculateTFactS(GenesUp_SH1H, AtCatalog, d$TFs, d$all.targets, TF.col = "TF")
head(res)
## TFs m n N k p.value e.value FDR.BH RC
## 17 AT3G23250 3 74 18910 1 0.01169456 0.3742258 0.3720668 0
## 15 AT2G47460 6 74 18910 1 0.02325417 0.7441336 0.3720668 0
## 26 AT5G11260 280 74 18910 1 0.66914068 21.4125018 1.0000000 1
## 1 AT1G04370 2 74 18910 0 1.00000000 32.0000000 1.0000000 0
## 2 AT1G09530 649 74 18910 0 1.00000000 32.0000000 1.0000000 1
## 3 AT1G24260 4101 74 18910 0 1.00000000 32.0000000 1.0000000 0
We thank the Bio”Pack”thon community for helpful discussions. This work was supported by JSPS KAKENHI Grant Numbers 26850024 and 17K07663.
Here is the output of sessionInfo()
on the system on which this
document was compiled:
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.6.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Asia/Tokyo
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] TFactSR_0.99.0 BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.33 R6_2.5.1 bookdown_0.35
## [4] fastmap_1.1.1 xfun_0.40 cachem_1.0.8
## [7] knitr_1.43 htmltools_0.5.6 rmarkdown_2.24
## [10] cli_3.6.1 sass_0.4.7 jquerylib_0.1.4
## [13] compiler_4.3.1 rstudioapi_0.15.0 tools_4.3.1
## [16] evaluate_0.21 bslib_0.5.1 yaml_2.3.7
## [19] BiocManager_1.30.22 jsonlite_1.8.7 rlang_1.1.1