This vignette illustrates how to easily use the ICDS package. This package can identifying cancer risk subpathways which can explain the disturbed biological functions in more detail and accurately.
This package provides the
getExpp
,getMethp
,getCnvp
function
to calculate p-values or corrected p-values for each gene.
This package provides the
coverp2zscore
,combinep_two
,combinep_three
function to Convert p-values or corrected p-values to z-scores.
This package provides the FindSubPath
function to
search for interested subpathways in each entire pathway.
This package provides the opt_subpath
function to
Optimize interested subpathways.
This package provides the Permutation
function to to
calculate statistical significance for these interested subpathways
.
We can use function getExampleData to return example data and environment variables, such as the data of exp_data.
library(ICDS)
# obtain the expression profile data
exp_data<-GetExampleData("exp_data")
#view first six rows and six colmns of data
exp_data[1:6, 1:6]
## TCGA.38.4632.11 TCGA.44.5645.11 TCGA.44.6144.11 TCGA.44.6145.11
## GALM 15.55599000 18.3477000 13.4976300 12.4355000
## HK3 9.65274100 12.4103200 12.3639500 17.5058100
## G6PC 0.06917292 0.1689286 0.1555786 0.1847281
## PGM1 24.33696000 69.3011200 68.6039000 48.9442400
## HKDC1 0.65847880 1.2588160 3.0050750 2.8253770
## GPI 73.56174000 58.7149700 62.7976900 58.1940700
## TCGA.44.6146.11 TCGA.44.6147.11
## GALM 16.5563600 8.3897860
## HK3 6.8267430 8.2327620
## G6PC 0.1555417 0.1721295
## PGM1 32.4310600 43.4132300
## HKDC1 1.1371050 0.5703535
## GPI 53.6260600 41.9420700
#obtain the labels of the samples of the expression profile, the label vector is a vector of 0/1s,
# 0 represents the case sample and 1 represents the control sample
label1<-GetExampleData("label1")
#view first ten label
label1[1:10]
## [1] 0 0 0 0 0 0 0 0 0 0
we used the Student’s t-test to calculate the p-value for expression level and methylation level of each gene in the tumor and normal samples.label,0 represents normal samples;1 represents cancer samples.
#calculate p-values or corrected p-values for each gene
exp.p<-getExpp(exp_data,label = label1,p.adjust = FALSE)
label2<-GetExampleData("label2")
meth_data<-GetExampleData("meth_data")
meth.p<-getMethp(meth_data,label = label2,p.adjust = FALSE)
exp.p[1:10,]
## gene p
## GALM GALM 0.00382104698965552
## HK3 HK3 1.22893635962516e-14
## G6PC G6PC 0.309765961639907
## PGM1 PGM1 6.50659302913753e-05
## HKDC1 HKDC1 1.27479647059785e-05
## GPI GPI 1.37454189773835e-11
## FBP1 FBP1 2.09855354282047e-12
## ALDOA ALDOA 2.54043616919423e-14
## G6PC3 G6PC3 0.000265142879987233
## ALDH7A1 ALDH7A1 0.00626937915913354
For copy number data, we can calculate p-value through the following way:
#obtain Copy number variation data
cnv_data<-GetExampleData("cnv_data")
#obtion amplified genes
amp_gene<-GetExampleData("amp_gene")
#obtion deleted genes
del_gene<-GetExampleData(("del_gene"))
#calculate p-values or corrected p-values for each gene
cnv.p<-getCnvp(exp_data,cnv_data,amp_gene,del_gene,p.adjust=FALSE,method="fdr")
cnv.p[1:10,]
## gene p
## ACSS1 ACSS1 0.00641326042511501
## ACSS2 ACSS2 1
## ADH1A ADH1A 0.454656092798328
## ADH1B ADH1B 0.429492188406724
## ADH5 ADH5 0.00253178462835118
## AKR1A1 AKR1A1 1
## ALDH1A3 ALDH1A3 1
## ALDH1B1 ALDH1B1 1
## ALDH2 ALDH2 0.145658828705581
## ALDH3B1 ALDH3B1 1
With the above data, we constructed an integrative gene z-score (Z) based three datasets.
Firstly,the gene score calculated through combined p-values of Fisher’s Inverse chi-square tests. This method computes a combined statistic S from the p-values of the difference coefficient obtained from three individual datasets .Usually,the statistic S follows a chi-square distribution with 2k degrees,we can calculate the null hypothesis p-value of the statistic S.
Secondly,we convert the p-value to z-score according to Gaussian distribution , which is taken as the gene z-score(Z) .
#obtain the p-values of expression profile data,methylation profile data and Copy number variation data
exp.p<-GetExampleData("exp.p")
meth.p<-GetExampleData("meth.p")
cnv.p<-GetExampleData("cnv.p")
#calculate z-scores for p-values of each kind of data
zexp<-coverp2zscore(exp.p)
zmeth<-coverp2zscore(meth.p)
zcnv<-coverp2zscore(cnv.p)
#combine two kinds of p-values,then,calculate z-score for them
zz<-combinep_two(exp.p,meth.p)
#combine three kinds of p-values,then,calculate z-score for them
zzz<-combinep_three(exp.p,meth.p,cnv.p)
zzz[1:6,]
## padjust z_score exp.p
## AACS 1.49856652283489e-12 6.97785849233781 4.27591802855694e-05
## AASDHPPT 3.59090577618182e-09 5.78661689489162 1.51758714809107e-05
## ACAA1 2.26766115435344e-06 4.58520914055579 0.00254706069215554
## ACACA 7.81825182198367e-05 3.78073660442176 0.0414461349050663
## ACACB 3.17474389450551e-24 10.0863369031681 1.0796730435591e-25
## ACAD8 8.80185318920442e-08 5.22301085773967 2.21022111058924e-05
## meth.p cnv.p
## AACS 3.79351155555026e-07 0.000154005933715148
## AASDHPPT 0.138052336368975 4.95190895519194e-06
## ACAA1 4.80814465758352e-06 1
## ACACA 1.62339209249856e-05 1
## ACACB 0.152770581756488 0.0979253199784077
## ACAD8 0.19898879195769 7.67894501521561e-05
For a priori KEGG path, we perform a greedy algorithm to search for interested subpathways.We provide two statistical test methods of the subpathway, which one is whole gene-based perturbation, and the other is the local gene perturbation in a particular pathway.
## padjust z_score exp.p
## AACS 1.4985665228349e-12 6.97785849233781 4.27591802855694e-05
## AASDHPPT 3.59090577618182e-09 5.78661689489162 1.51758714809107e-05
## ACAA1 2.26766115435345e-06 4.58520914055579 0.00254706069215554
## ACACA 7.81825182198368e-05 3.78073660442176 0.0414461349050663
## ACACB 3.17474389450549e-24 10.0863369031681 1.0796730435591e-25
## ACAD8 8.80185318920443e-08 5.22301085773967 2.21022111058924e-05
## ACADL 6.14820533862155e-54 15.4184802326598 2.63073053599382e-55
## ACAT1 1.1545332900424e-06 4.72430235586153 6.58517201812735e-06
## ACER1 2.96364608769913e-06 4.5289664035691 1
## ACER3 1.63555761386473e-07 5.10711682637474 5.9116689567922e-07
## meth.p cnv.p
## AACS 3.79351155555026e-07 0.000154005933715148
## AASDHPPT 0.138052336368975 4.95190895519194e-06
## ACAA1 4.80814465758352e-06 1
## ACACA 1.62339209249856e-05 1
## ACACB 0.152770581756488 0.0979253199784077
## ACAD8 0.19898879195769 7.67894501521561e-05
## ACADL 0.00265801159320894 1
## ACAT1 0.287245454170909 0.00305344462196611
## ACER1 1.65137200042873e-08 1
## ACER3 0.00112767915011995 1
Optimize interested subpathways.If the number of genes shared by the two pathways accounted for more than the Overlap ratio of each pathway genes,than combine two pathways.
subpathdata<-GetExampleData("subpathdata")
keysubpathways<-opt_subpath(subpathdata,zz,overlap=0.6)
head(keysubpathways)
## SubpathwayID pathway
## [1,] "path00010_1" "Glycolysis / Gluconeogenesis"
## [2,] "path00010_2" "Glycolysis / Gluconeogenesis"
## [3,] "path00010_3" "Glycolysis / Gluconeogenesis"
## [4,] "path00010_4" "Glycolysis / Gluconeogenesis"
## [5,] "path00010_5" "Glycolysis / Gluconeogenesis"
## [6,] "path00010_6" "Glycolysis / Gluconeogenesis"
## Subgene
## [1,] "G6PC/HK3/PGM1/HKDC1/GPI/FBP1/ALDOA/G6PC2/GALM/G6PC3"
## [2,] "ADH5/ADH1B/ADH1A/ALDH3B2/ALDH2/AKR1A1/ALDH7A1/ALDH3B1/ALDH1B1/ALDH1A3"
## [3,] "PKLR/ENO1/LDHA/PGAM1/MINPP1/PCK1/LDHAL6A/LDHAL6B"
## [4,] "ACSS1/ALDH3B2/ADH1B/ADH1A/ALDH2/DLAT/ACSS2"
## [5,] "PFKP/ALDOA/GAPDH/FBP1/GPI/ALDOC/GAPDHS"
## [6,] "PGAM4/ENO1/PGAM1/MINPP1/PCK1/PGK2"
## Size SubpathwayZScore
## [1,] "10" "20.8492758044976"
## [2,] "10" "20.9317587858665"
## [3,] "8" "19.0899605586423"
## [4,] "7" "19.7078248333379"
## [5,] "7" "18.3020024323356"
## [6,] "6" "16.7772761794305"
the perturbation test was used to calculate statistical significance for these interested subpathways.
keysubpathways<-Permutation(keysubpathways,zz,nperm1=100,method1=TRUE,nperm2=100,method2=FALSE)
head(keysubpathways)
## SubpathwayID pathway
## [1,] "path00010_1" "Glycolysis / Gluconeogenesis"
## [2,] "path00010_2" "Glycolysis / Gluconeogenesis"
## [3,] "path00010_3" "Glycolysis / Gluconeogenesis"
## [4,] "path00010_4" "Glycolysis / Gluconeogenesis"
## [5,] "path00010_5" "Glycolysis / Gluconeogenesis"
## [6,] "path00010_6" "Glycolysis / Gluconeogenesis"
## Subgene
## [1,] "G6PC/HK3/PGM1/HKDC1/GPI/FBP1/ALDOA/G6PC2/GALM/G6PC3"
## [2,] "ADH5/ADH1B/ADH1A/ALDH3B2/ALDH2/AKR1A1/ALDH7A1/ALDH3B1/ALDH1B1/ALDH1A3"
## [3,] "PKLR/ENO1/LDHA/PGAM1/MINPP1/PCK1/LDHAL6A/LDHAL6B"
## [4,] "ACSS1/ALDH3B2/ADH1B/ADH1A/ALDH2/DLAT/ACSS2"
## [5,] "PFKP/ALDOA/GAPDH/FBP1/GPI/ALDOC/GAPDHS"
## [6,] "PGAM4/ENO1/PGAM1/MINPP1/PCK1/PGK2"
## Size SubpathwayZScore p1 fdr1
## [1,] "10" "20.8492758044976" "0.08" "0.476279069767442"
## [2,] "10" "20.9317587858665" "0.1" "0.501960784313725"
## [3,] "8" "19.0899605586423" "0.08" "0.476279069767442"
## [4,] "7" "19.7078248333379" "0.03" "0.333913043478261"
## [5,] "7" "18.3020024323356" "0.07" "0.471578947368421"
## [6,] "6" "16.7772761794305" "0.08" "0.476279069767442"