RAINBOWR
has been published in PLOS
Computational Biology (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007663).
If you use this RAINBOWR
in your paper, please cite
RAINBOWR
as follows:RAINBOWR
package is now
available at the CRAN (Comprehensive R
Archive Network).In this repository, the R
package RAINBOWR
is available. Here, we describe how to install and how to use
RAINBOWR
.
RAINBOWR
RAINBOWR
(Reliable Association INference By Optimizing
Weights with R) is a package to perform several types of
GWAS
as follows.
RGWAS.normal
functionRGWAS.multisnp
function (which tests multiple SNPs at the
same time)RGWAS.epistasis
(very slow and less reliable)RGWAS.normal
functionRGWAS.multisnp
function (which tests multiple
SNPs at the same time)RAINBOWR
also offers some functions to solve the linear
mixed effects model.
EM3.general
function (using gaston
,
MM4LMM
, or RAINBOWR
packages; fast for
gaston
and MM4LMM
)EMM.cpp
functionEM3.cpp
function (for the general kernel, not so fast)EM3.linker.cpp
function (for the linear kernel, fast)By utilizing these functions, you can estimate the genomic
heritability and perform genomic prediction (GP
).
Finally, RAINBOWR
offers other useful functions.
qq
and manhattan
function to draw Q-Q plot
and Manhattan plotmodify.data
function to match phenotype and marker
genotype dataCalcThreshold
function to calculate thresholds for GWAS
resultsSee
function to see a brief view of data (like
head
function, but more useful)genetrait
function to generate pseudo phenotypic values
from marker genotypeSS_GWAS
function to summarize GWAS results (only for
simulation study)estPhylo
and estNetwork
functions to
estimate phylogenetic tree or haplotype network and haplotype effects
with non-linear kernels for haplotype blocks of interest.convertBlockList
function to convert haplotype block
list estimated by PLINK to the format which can be inputted as a
gene.set
argument in RGWAS.multisnp
,
RGWAS.multisnp.interaction
, and
RGWAS.epistasis
functions.The stable version of RAINBOWR
is now available at the
CRAN
(Comprehensive R Archive Network). The latest version of
RAINBOWR
is also available at the
KosukeHamazaki/RAINBOWR
repository in the GitHub
,
so please run the following code in the R console.
#### Stable version of RAINBOWR ####
install.packages("RAINBOWR")
#### Latest version of RAINBOWR ####
### If you have not installed yet, ...
install.packages("devtools")
### Install RAINBOWR from GitHub
devtools::install_github("KosukeHamazaki/RAINBOWR")
If you get some errors via installation, please check if the
following packages are correctly installed. (We removed a dependency on
rgl
package!)
Rtools
for Windows userBiocManager::install("ggtree")
In RAINBOWR
, since part of the code is written in
Rcpp
(C++
in R
), please check if
you can use C++
in R
. For Windows
users, you should install Rtools
.
If you have some questions about installation, please contact us by e-mail (hamazaki@ut-biomet.org).
First, import RAINBOWR
package and load example
datasets. These example datasets consist of marker genotype (scored with
{-1, 0, 1}, 1,536 SNP chip (Zhao et al., 2010; PLoS One 5(5): e10780)),
map with physical position, and phenotypic data (Zhao et al., 2011;
Nature Communications 2:467). Both datasets can be downloaded from
Rice Diversity
homepage (http://www.ricediversity.org/data/). Also, the dataset
includes a list of haplotype blocks from the version 0.1.30. This list
was estimated by the PLINK 1.9 (Taliun et al., 2014; BMC Bioinformatics,
15).
### Import RAINBOWR
require(RAINBOWR)
#> Loading required package: RAINBOWR
### Load example datasets
data("Rice_Zhao_etal")
Rice_geno_score <- Rice_Zhao_etal$genoScore
Rice_geno_map <- Rice_Zhao_etal$genoMap
Rice_pheno <- Rice_Zhao_etal$pheno
Rice_haplo_block <- Rice_Zhao_etal$haploBlock
### View each dataset
See(Rice_geno_score)
#> L1 L2 L3 L4 L5 L6
#> class <integer> <integer> <integer> <integer> <integer> <integer>
#> id1000223 1 1 -1 -1 -1 -1
#> id1000556 -1 -1 1 1 -1 1
#> id1000673 -1 1 -1 1 -1 1
#> id1000830 -1 1 1 1 1 1
#> id1000955 -1 1 1 1 -1 1
#> id1001073 -1 -1 -1 1 1 -1
#> [1] "class: data.frame"
#> [1] "dimension: 1311 x 395"
See(Rice_geno_map)
#> marker chr pos
#> class <factor> <integer> <integer>
#> id1000223 id1000223 1 420422
#> id1000556 id1000556 1 655693
#> id1000673 id1000673 1 740153
#> id1000830 id1000830 1 913806
#> id1000955 id1000955 1 1041748
#> id1001073 id1001073 1 1172387
#> [1] "class: data.frame"
#> [1] "dimension: 1311 x 3"
See(Rice_pheno)
#> Flowering.time.at.Arkansas Flowering.time.at.Faridpur
#> class <numeric> <integer>
#> L1 75.08333333 64
#> L3 89.5 66
#> L4 94.5 67
#> L5 87.5 70
#> L6 89.08333333 73
#> L7 105 <NA>
#> Flowering.time.at.Aberdeen FT.ratio.of.Arkansas.Aberdeen
#> class <integer> <numeric>
#> L1 81 0.926954733
#> L3 83 1.078313253
#> L4 93 1.016129032
#> L5 108 0.810185185
#> L6 101 0.882013201
#> L7 158 0.664556962
#> FT.ratio.of.Faridpur.Aberdeen Culm.habit
#> class <numeric> <numeric>
#> L1 0.790123457 4
#> L3 0.795180723 7.5
#> L4 0.720430108 6
#> L5 0.648148148 3.5
#> L6 0.722772277 6
#> L7 <NA> 3
#> [1] "class: data.frame"
#> [1] "dimension: 413 x 36"
See(Rice_haplo_block)
#> block marker
#> class <character> <character>
#> 1 haploblock_1 id1005261
#> 2 haploblock_1 id1005263
#> 3 haploblock_2 id1009557
#> 4 haploblock_2 id1009616
#> 5 haploblock_3 id1020154
#> 6 haploblock_3 id1020166
#> [1] "class: data.frame"
#> [1] "dimension: 74 x 2"
You can check the original data format by See
function.
Then, select one trait (here, Flowering.time.at.Arkansas
)
for example.
### Select one trait for example
trait.name <- "Flowering.time.at.Arkansas"
y <- Rice_pheno[, trait.name, drop = FALSE]
For GWAS, first you can remove SNPs whose MAF <= 0.05 by
MAF.cut
function.
### Remove SNPs whose MAF <= 0.05
x.0 <- t(Rice_geno_score)
MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
x <- MAF.cut.res$x
map <- MAF.cut.res$map
Next, we estimate additive genomic relationship matrix (GRM) by using
calcGRM
function.
Next, we modify these data into the GWAS format of
RAINBOWR
by modify.data
function.
### Modify data
modify.data.res <- modify.data(pheno.mat = y, geno.mat = x, map = map,
return.ZETA = TRUE, return.GWAS.format = TRUE)
#> Warning in modify.data(pheno.mat = y, geno.mat = x, map = map, return.ZETA =
#> TRUE, : The following lines have phenotypes but no genotypes: L14, L31, L33,
#> L68, L86, L97, L98, L102, L111, L136, L173, L175, L185, L193, L212, L223, L226,
#> L305, L358, L361, L648, L30, L36, L104, L229, L295, L319, L90, L253, L246
pheno.GWAS <- modify.data.res$pheno.GWAS
geno.GWAS <- modify.data.res$geno.GWAS
ZETA <- modify.data.res$ZETA
### View each data for RAINBOWR
See(pheno.GWAS)
#> Sample_names Flowering.time.at.Arkansas
#> class <character> <numeric>
#> L1 L1 75.08333
#> L3 L3 89.50000
#> L4 L4 94.50000
#> L5 L5 87.50000
#> L6 L6 89.08333
#> L7 L7 105.00000
#> [1] "class: data.frame"
#> [1] "dimension: 383 x 2"
See(geno.GWAS)
#> marker chrom pos L1 L3 L4
#> class <factor> <integer> <integer> <integer> <integer> <integer>
#> 1 id1000223 1 420422 1 -1 -1
#> 2 id1000556 1 655693 -1 1 1
#> 3 id1000673 1 740153 -1 -1 1
#> 4 id1000830 1 913806 -1 1 1
#> 5 id1000955 1 1041748 -1 1 1
#> 6 id1001073 1 1172387 -1 -1 1
#> [1] "class: data.frame"
#> [1] "dimension: 1264 x 386"
str(ZETA)
#> List of 1
#> $ A:List of 2
#> ..$ Z: num [1:383, 1:383] 1 0 0 0 0 0 0 0 0 0 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : chr [1:383] "L1" "L3" "L4" "L5" ...
#> .. .. ..$ : chr [1:383] "L1" "L3" "L4" "L5" ...
#> ..$ K: num [1:383, 1:383] 1.0112 -0.45 -0.417 -0.0454 -0.4051 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : chr [1:383] "L1" "L3" "L4" "L5" ...
#> .. .. ..$ : chr [1:383] "L1" "L3" "L4" "L5" ...
ZETA
is a list of genomic relationship matrix (GRM) and
its design matrix.
Finally, we can perform GWAS
using these data. First, we
perform single-SNP GWAS by RGWAS.normal
function as
follows.
### Perform single-SNP GWAS
normal.res <- RGWAS.normal(pheno = pheno.GWAS, geno = geno.GWAS,
plot.qq = FALSE, plot.Manhattan = FALSE,
ZETA = ZETA, n.PC = 4, P3D = TRUE,
skip.check = TRUE, count = FALSE)
#> [1] "GWAS for trait: Flowering.time.at.Arkansas"
#> [1] "Variance components estimated. Testing markers."
#> Time difference of 2.881387 secs
See(normal.res$D) ### Column 4 contains -log10(p) values for markers
#> marker chrom pos Flowering.time.at.Arkansas
#> class <factor> <integer> <integer> <numeric>
#> 1 id1000223 1 420422 0.4947885
#> 2 id1000556 1 655693 0.3805267
#> 3 id1000673 1 740153 0.3443146
#> 4 id1000830 1 913806 0.1364734
#> 5 id1000955 1 1041748 1.0212223
#> 6 id1001073 1 1172387 0.5772126
#> [1] "class: data.frame"
#> [1] "dimension: 1264 x 4"
Next, we perform SNP-set GWAS by RGWAS.multisnp
function.
### Perform SNP-set GWAS (by regarding 11 SNPs as one SNP-set, first 300 SNPs)
SNP_set.res <- RGWAS.multisnp(pheno = pheno.GWAS, geno = geno.GWAS[1:300, ], ZETA = ZETA,
plot.qq = FALSE, plot.Manhattan = FALSE, count = FALSE,
n.PC = 4, test.method = "LR", kernel.method = "linear",
gene.set = NULL, skip.check = TRUE,
test.effect = "additive", window.size.half = 5, window.slide = 11)
#> [1] "GWAS for trait: Flowering.time.at.Arkansas"
#> Time difference of 3.196992 secs
See(SNP_set.res$D) ### Column 4 contains -log10(p) values for markers
#> marker chrom pos Flowering.time.at.Arkansas
#> class <factor> <integer> <integer> <numeric>
#> 1 id1000223 1 420422 0
#> 12 id1002158 1 2723270 0
#> 23 id1004109 1 5067948 0
#> 34 id1005263 1 6972700 0
#> 45 id1007975 1 11107052 0
#> 56 id1009557 1 14413616 0
#> [1] "class: data.frame"
#> [1] "dimension: 28 x 4"
You can perform SNP-set GWAS with sliding window by setting
window.slide = 1
. You can perform SNP-set GWAS with sliding
window by setting window.slide = 1
. And you can also
perform gene-set (or haplotype-block based) GWAS by assigning the
following data set to gene.set
argument. (You can check the
example also by See(Rice_haplo_block)
in R.)
ex.)
gene (or haplotype block) | marker |
---|---|
haploblock_1 | id1005261 |
haploblock_1 | id1005263 |
haploblock_2 | id1009557 |
haploblock_2 | id1009616 |
haploblock_3 | id1020154 |
… | … |
### Perform haplotype-block based GWAS (by using hapltype blocks estimated by PLINK, first 400 SNPs)
haplo_block.res <- RGWAS.multisnp(pheno = pheno.GWAS, geno = geno.GWAS[1:400, ], ZETA = ZETA,
plot.qq = FALSE, plot.Manhattan = FALSE, count = FALSE,
n.PC = 4, test.method = "LR", kernel.method = "linear",
gene.set = Rice_haplo_block, skip.check = TRUE,
test.effect = "additive")
#> [1] "GWAS for trait: Flowering.time.at.Arkansas"
#> [1] "Now generating map for gene set. Please wait."
#> Time difference of 1.799074 secs
See(haplo_block.res$D) ### Column 4 contains -log10(p) values for markers
#> marker chr pos Flowering.time.at.Arkansas
#> class <character> <integer> <numeric> <numeric>
#> 1 haploblock_1 1 6971965 0.0000000
#> 2 haploblock_2 1 14423002 0.3010615
#> 3 haploblock_3 1 32961726 0.3931964
#> 4 haploblock_4 1 33508558 0.0000000
#> 5 haploblock_5 2 8000411 0.0000000
#> 6 haploblock_6 3 36321431 0.0000000
#> [1] "class: data.frame"
#> [1] "dimension: 7 x 4"
There is no significant block for this dataset because the number of markers and blocks is too small for this dataset. However, when whole-genome sequencing data is available, the impact of using SNP-set/gene-set/haplotype-block methods becomes larger and we strongly recommend you use these methods. Please see Hamazaki and Iwata (2020, PLOS Comp Biol) for more details of the features of these methods.
If you have some help before performing GWAS
with
RAINBOWR
, please see the help for each function by
?function_name
.
Kennedy, B.W., Quinton, M. and van Arendonk, J.A. (1992) Estimation of effects of single genes on quantitative traits. J Anim Sci. 70(7): 2000-2012.
Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.
Yu, J. et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38(2): 203-208.
Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.
Kang, H.M. et al. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 42(4): 348-354.
Zhang, Z. et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 42(4): 355-360.
Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250.
Endelman, J.B. and Jannink, J.L. (2012) Shrinkage Estimation of the Realized Relationship Matrix. G3 Genes, Genomes, Genet. 2(11): 1405-1413.
Su, G. et al. (2012) Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS One. 7(9): 1-7.
Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.
Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.
Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.
Jiang, Y. and Reif, J.C. (2015) Modeling epistasis in genomic selection. Genetics. 201(2): 759-768.
Hamazaki, K. and Iwata, H. (2020) RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method. PLOS Computational Biology, 16(2): e1007663.