Coding-variant Allelic Series Test

2024-07-24

Data

To run an allelic series test, there are 4 key inputs:

The example data used below were generated using the DGP function provided with the package. The data set includes 100 subjects, 300 variants, and a continuous phenotype. The true effect sizes follow an allelic series, with magnitudes proportional to c(1, 2, 3) for BMVs, DMVs, and PTVs respectively.

set.seed(101)
n <- 100
data <- AllelicSeries::DGP(
  n = n,
  snps = 300,
  beta = c(1, 2, 3) / sqrt(n),
)

# Annotations.
anno <- data$anno
head(anno)
## [1] 2 0 0 0 0 1
# Covariates.
covar <- data$covar
head(covar)
##      int         age sex         pc1        pc2        pc3
## [1,]   1  0.06340401   1  1.54356127 -2.3816890  0.3378446
## [2,]   1 -0.26084149   1 -0.34178454 -1.4227627 -0.2141389
## [3,]   1  0.10148711   1  1.25528541  0.4608104 -0.6585620
## [4,]   1  0.48607024   1 -0.87800487  0.4316516  0.3552285
## [5,]   1 -0.52207993   1 -0.03553949 -0.1124594 -0.4243847
## [6,]   1 -1.41018843   0  0.37024952  1.3373260 -0.4423871
# Genotypes.
geno <- data$geno
head(geno[,1:5])
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0
## [3,]    0    0    0    0    0
## [4,]    0    0    0    0    0
## [5,]    0    0    0    0    0
## [6,]    0    0    0    0    0
# Phenotype.
pheno <- data$pheno
head(pheno)
## [1]  3.1059934  1.1354490  0.4306503  0.7834604  1.0421679 -0.4964317

The example data generated by the preceding are available under vignettes/vignette_data.

Running the alleic series test

The COding-variant Allelic Series Test (COAST) is run using the COAST function. By default, the output of COAST includes a data.frame of counts showing the number of alleles, variants, and carriers in each class that contributed to the test, and a data.frame of p-values, with the omni test denoting the final, overall p-value. Inspection of the component p-values may be useful for determining which model(s) detected evidence of an association. In the present case, the baseline count model provided the greatest power.

results <- AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar
)
show(results)
## Counts:
##   anno alleles variants carriers
## 1    0     287      163       96
## 2    1     162      109       82
## 3    2      61       28       45
## 
## 
## P-values:
##           test   type     pval
## 1     baseline burden 3.11e-26
## 2          ind burden 1.32e-09
## 3    max_count burden 3.08e-10
## 4      max_ind burden 5.37e-09
## 5    sum_count burden 1.66e-20
## 6      sum_ind burden 2.55e-11
## 7 allelic_skat   skat 2.66e-07
## 8         omni   omni 3.74e-25

The counts data.frame is accessed via:

results@Counts
##   anno alleles variants carriers
## 1    0     287      163       96
## 2    1     162      109       82
## 3    2      61       28       45

and the p-values data.frame via:

results@Pvals
##           test   type         pval
## 1     baseline burden 3.112702e-26
## 2          ind burden 1.322084e-09
## 3    max_count burden 3.076876e-10
## 4      max_ind burden 5.374363e-09
## 5    sum_count burden 1.661854e-20
## 6      sum_ind burden 2.554417e-11
## 7 allelic_skat   skat 2.658137e-07
## 8         omni   omni 3.735235e-25

Test options

AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  apply_int = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  include_orig_skato_all = TRUE,
  include_orig_skato_ptv = TRUE,
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = 1 * (pheno > 0),
  covar = covar,
  is_pheno_binary = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  min_mac = 2
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  return_omni_only = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  score_test = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  weights = c(1, 2, 3)
)