Somatic mutations play an important role in cancer development and disease progression, and mutational profiling is used far more commonly than other ‘omics’ analyses in clinical practice. With the development of PD-1 immunotherapy technology, many experiments show that patients with high TMB have been shown more clinical benefits of immunotherapy (PD-1/PD-L1). However, to date, WES remains confined to research settings, due to the high cost of the large genomic space sequenced. In the clinical setting, instead, targeted enrichment panels (gene panels) of various genomic sizes are emerging as the routine technology for TMB assessment. This package attempts to develop a new pathway-based gene panel for TMB assessment (pathway-based tumor mutational burden, PTMB), using somatic mutations files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format. Besides, we develop a multiple machine learning method using sample PTMB profiles to identify cancer-specific dysfunction pathways, which can be a biomarker of prognostic and predictive for cancer immunotherapy.
MAF files contain many fields ranging from chromosome names to cosmic annotations. However, most of the analysis in PTMB uses the following fields.
Complete specification of MAF files can be found on NCI GDC documentation page.
install.packages("pathwayTMB")
library(pathwayTMB)
The pathwayTMB package is a Bioinformatics tool to identify Cancer-specific pathways. And pathwayTMB functions can be categorized into mainly Visualization and Analysis modules. Each of these functions and a short description is summarized as shown below:
1.Obtain survival benefit mutations frequency matrix.
2.Calculate the length of encoded genes.
3.Infer pathway-based tumor mutational burden profiles (PTMB).
4.Identify Cancer-specific dysfunction pathways.
5.Visualization results:
5.1 Plot Patients’ Kaplan-Meier Survival Curves.
5.2 Plot patient-specific dysfunction pathways and user-interested geneset mutually exclusive and co-occurrence plots.
5.3 Plot patient-specific dysfunction pathways’ waterfall plots.
5.4 Plot the ROC curve.
The function get_gene_length can extract coding genes’ length from gene transfer format (GTF) file. The GTF file can download from GENCODE.
#calculate coding genes' length, filepath--the path of the GTF file
#get_gene_length(filepath)
To identify cancer-specific pathways, PTMB requires sample mutation data with survival information. For each pathway, PTMB is defined as pathway tumor mutational burden corrected by genes’ length and number, the calculation formula is as follows:
\[PTMB_{ij}=\frac{\sum_{k=1}^N\frac{mut_{ijk}}{length_{jk}}}{N}\tag{1}\]
where \(mut_{ij}\) is the \(i\) th sample’s number of mutations of the \(k\) th gene in the \(j\) th pathway; \(length_{jk}\) is the length of \(k\) th gene in the \(j\) th pathway; \(N\) is the total number of genes in pathway \(j\).The \(PTMB_{ij}\) denotes the pathway \(j\) tumor mutational burden for sample \(i\), which was calculated in the context of survival-related genes’ mutation frequency matrix.
#perform the function `get_PTMB`
#PTMB_matrix<-get_PTMB(freq_matrix=mut_matrix,genesmbol=genesmbol,gene_path=gene_path)
#show the first six lines of PTMB matrix
head(PTMB_matrix)[1:6,1:6]
#> Pat03 Pat06 Pat07 Pat08
#> Hematopoietic cell lineage 0.0000000 0.00000 0.0000000 0
#> Complement and coagulation cascades 0.0000000 0.00000 4.1867281 0
#> Platelet activation 0.0000000 0.00000 2.1018828 0
#> Toll-like receptor signaling pathway 0.0000000 0.00000 0.8930421 0
#> NOD-like receptor signaling pathway 0.4252838 2.25137 2.8148339 0
#> RIG-I-like receptor signaling pathway 0.0000000 0.00000 2.6852846 0
#> Pat100 Pat101
#> Hematopoietic cell lineage 0.000000 0.000000
#> Complement and coagulation cascades 3.145643 0.000000
#> Platelet activation 3.938654 0.000000
#> Toll-like receptor signaling pathway 0.000000 0.000000
#> NOD-like receptor signaling pathway 1.962122 0.000000
#> RIG-I-like receptor signaling pathway 0.000000 3.281809
In order to identify cancer-specific dysfunction pathways, we develop a multiple machine learning method using pathway-based tumor mutational burden profiles. The multiple machine learning method consists of three sequential steps, as follows:
The function get_final_signature in the pathwayTMB package can implement the above process.
# filter the survival-related pathways
#set.seed(1)
#final_character<-get_final_signature(PTMB=PTMB_matrix,sur=sur)
#view the final_character
final_character#> a1
#> "Natural killer cell mediated cytotoxicity"
#> a4
#> "Antigen processing and presentation"
#Drawing Kaplan Meier Survival Curves using the final survival-related PTMB.
plotKMcurves(t(PTMB_matrix[final_character,]),sur=sur,returnAll = FALSE,risk.table = TRUE)
#a mutually exclusive co-occurrence chart showing the top 20 genes for mutation rates
<-apply(mut_matrix,1,function(x){length(which(x!=0))/length(x)})
gene_fre<-names(sort(gene_fre,decreasing = TRUE))[1:20]
genesplotMutInteract(freq_matrix=mut_matrix, genes=genes,returnAll = FALSE)
#pathways' mutually exclusive co-occurrence chart
plotMutInteract(freq_matrix=PTMB_matrix,genes=final_character,
nShiftSymbols =0.3,returnAll = FALSE)
#calculate the PTMB-related riskscore
<-plotKMcurves(t(PTMB_matrix[final_character,]),sur=sur,plots = FALSE)$risk_score
riskscore<-median(riskscore)
cut_off#draw an GenePathwayOncoplots
GenePathwayOncoplots(maffile=maf,freq_matrix =mut_matrix,risk_score=riskscore,
cut_off=cut_off,final_character=final_character,gene_path = gene_path,removeNonMutated = FALSE)
#> Natural killer cell mediated cytotoxicity not found in MAF. Excluding it..
#> Antigen processing and presentation not found in MAF. Excluding it..
#> -Reading
#> -Validating
#> -Silent variants: 26
#> -Summarizing
#> --Possible FLAGS among top ten genes:
#> MUC16
#> USH2A
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 0.190s elapsed (0.170s cpu)
#> -Reading
#> -Validating
#> -Silent variants: 26
#> -Summarizing
#> --Possible FLAGS among top ten genes:
#> MUC16
#> USH2A
#> -Processing clinical data
#> -Finished in 0.170s elapsed (0.170s cpu)
#> --Possible FLAGS among top ten genes:
#> MUC16
#> USH2A
#> -Processing clinical data
#get the path of samples' immunotherapy response data
<- system.file("extdata","response.csv",package = "pathwayTMB")
res_path<-read.csv(res_path,header=TRUE,stringsAsFactors =FALSE,row.name=1)
responseplotROC(riskscore=riskscore,response=response,main="Objective Response",print.auc=TRUE,grid = TRUE)
#> Setting levels: control = 0, case = 1
#> Setting direction: controls > cases