Omic-age brings huge amoung of gene data, which bring a problem of how to uncover their potential biological effects. One effective way is gene enrichment analysis.
Inside gene enrichment analysis, the central and fundamental part is the access of gene sets, no matter of traditional Over-representation analysis (ORA) method or advanced Functional class scoring (FCS) method (e.g. Gene Set Enrichment Analysis (GSEA) ).
Currently, many available enrichment analysis tools provide built-in
data sets for few model species or ask users to download online. This
causes a problem that user needs to download different gene sets from
various public database for non-model species. For example,
enrichGO() and gseGO()
of clusterProfiler
utilized organism-level annotation package for about 20 species. If
research target is not listed in these organisms, user needs to build
one via AnnotationHub
or download from biomaRt or Blast2GO, which is time-comsuming
and hard task for biologists without programming skills.
Here, we develop an R package name “geneset”, aimming at accessing for updated gene sets with less time.
It includes GO (BP, CC and MF), KEGG (pathway, module, enzyme, network, drug and disease), WikiPathway, MsigDb, EnrichrDb, Reactome, MeSH, DisGeNET, Disease Ontology (DO), Network of Cancer Gene (NCG) (version 6 and v7) and COVID-19. Besides, it supports both model and non-model species.
For more details, please refer to this site.
All gene sets are stored on our website and could be easily accessed with simple functions.
We will follow a monthly-update frequency to make better user experience.
install.packages("geneset")
remotes::install_github("GangLiLab/geneset")
remotes::install_git("https://gitee.com/genekitr/pacakge_geneset")
For more details, please refer to genekitr book.
The package now includes eight functions: getGO()
,
getKEGG()
, getMesh()
,
getMsigdb()
, getWiki()
,
getReactome()
, getEnrichrdb()
,
getHgDisease()
All functions take org
(organism) as input. Several
functions have unique argument such as ont
(ontology) of
genGO()
.
Take Human GO MF gene sets for example:
library(geneset)
= getGO(org = "human",ont = "mf")
x
str(x)
# List of 4
# $ geneset :'data.frame': 280115 obs. of 2 variables:
# ..$ mf : chr [1:280115] "GO:0000009" "GO:0000009" "GO:0000010" "GO:0000010" ...
# ..$ gene: chr [1:280115] "PIGV" "ALG12" "PDSS1" "PDSS2" ...
# $ geneset_name:'data.frame': 4878 obs. of 2 variables:
# ..$ go_id: chr [1:4878] "GO:0000009" "GO:0000010" "GO:0000014" "GO:0000016" ...
# ..$ Term : chr [1:4878] "alpha-1,6-mannosyltransferase activity" "trans-hexaprenyltranstransferase activity" "single-stranded DNA endodeoxyribonuclease activity" "lactase activity" ...
# $ organism : chr "hsapiens"
# $ type : chr "mf"
head(x$geneset)
# mf gene
# GO:0000009 PIGV
# GO:0000009 ALG12
# GO:0000010 PDSS1
# GO:0000010 PDSS2
# GO:0000014 ENDOG
# GO:0000014 ERCC1
head(x$geneset_name)
# go_id Term
# GO:0000009 alpha-1,6-mannosyltransferase activity
# GO:0000010 trans-hexaprenyltranstransferase activity
# GO:0000014 single-stranded DNA endodeoxyribonuclease activity
# GO:0000016 lactase activity
# GO:0000026 alpha-1,2-mannosyltransferase activity
# GO:0000030 mannosyltransferase activity
Take human KEGG Pathway as an example:
<- geneset::getKEGG('hsa','pathway')
gs <- gs$geneset
gs_df table(gs_df$id) %>% length()
# 347
library(GSVA)
# firstly: turn gs to list
<- split(gs_df$gene, gs_df$id)
gs_list
# secondly: pass your expression dataset: "express_data" to gsva() function
<- gsva(expr=express_data,
ssgsea_mat method="ssgsea", # "gsva"(default), "zscore", "plage"
gset.idx.list=gs_list,
verbose=F,
parallel.sz = 4 )
<- geneset::getGO(org = "human",ont = "mf")
hg_gs # ORA
<- genekitr::genORA(input_id, geneset = hg_gs)
go_ent # GSEA (input is a pre-ranked gene list with logFC value)
<- genGSEA(genelist = geneList, geneset = hg_gs) gse