| Version: | 1.13.3 | 
| Title: | Proteomics Data Analysis Functions | 
| Author: | Wolfgang Raffelsberger [aut, cre] | 
| Maintainer: | Wolfgang Raffelsberger <w.raffelsberger@gmail.com> | 
| Description: | Data analysis of proteomics experiments by mass spectrometry is supported by this collection of functions mostly dedicated to the analysis of (bottom-up) quantitative (XIC) data. Fasta-formatted proteomes (eg from UniProt Consortium <doi:10.1093/nar/gky1049>) can be read with automatic parsing and multiple annotation types (like species origin, abbreviated gene names, etc) extracted. Initial results from multiple software for protein (and peptide) quantitation can be imported (to a common format): MaxQuant (Tyanova et al 2016 <doi:10.1038/nprot.2016.136>), Dia-NN (Demichev et al 2020 <doi:10.1038/s41592-019-0638-x>), Fragpipe (da Veiga et al 2020 <doi:10.1038/s41592-020-0912-y>), ionbot (Degroeve et al 2021 <doi:10.1101/2021.07.02.450686>), MassChroq (Valot et al 2011 <doi:10.1002/pmic.201100120>), OpenMS (Strauss et al 2021 <doi:10.1038/nmeth.3959>), ProteomeDiscoverer (Orsburn 2021 <doi:10.3390/proteomes9010015>), Proline (Bouyssie et al 2020 <doi:10.1093/bioinformatics/btaa118>), AlphaPept (preprint Strauss et al <doi:10.1101/2021.07.23.453379>) and Wombat-P (Bouyssie et al 2023 <doi:10.1021/acs.jproteome.3c00636>. Meta-data provided by initial analysis software and/or in sdrf format can be integrated to the analysis. Quantitative proteomics measurements frequently contain multiple NA values, due to physical absence of given peptides in some samples, limitations in sensitivity or other reasons. Help is provided to inspect the data graphically to investigate the nature of NA-values via their respective replicate measurements and to help/confirm the choice of NA-replacement algorithms. Meta-data in sdrf-format (Perez-Riverol et al 2020 <doi:10.1021/acs.jproteome.0c00376>) or similar tabular formats can be imported and included. Missing values can be inspected and imputed based on the concept of NA-neighbours or other methods. Dedicated filtering and statistical testing using the framework of package 'limma' <doi:10.18129/B9.bioc.limma> can be run, enhanced by multiple rounds of NA-replacements to provide robustness towards rare stochastic events. Multi-species samples, as frequently used in benchmark-tests (eg Navarro et al 2016 <doi:10.1038/nbt.3685>, Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>), can be run with special options considering such sub-groups during normalization and testing. Subsequently, ROC curves (Hand and Till 2001 <doi:10.1023/A:1010920819831>) can be constructed to compare multiple analysis approaches. As detailed example the data-set from Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>) quantified by MaxQuant, ProteomeDiscoverer, and Proline is provided with a detailed analysis of heterologous spike-in proteins. | 
| Depends: | R (≥ 3.5.0) | 
| Imports: | grDevices, graphics, knitr, limma, stats, utils, wrMisc (≥ 1.15.2) | 
| Suggests: | data.table, fdrtool, kableExtra, MASS, RColorBrewer, readxl, ROTS, rmarkdown, R.utils, sm, wrGraph (≥ 1.3.7) | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| VignetteBuilder: | knitr | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-08-21 14:33:46 UTC; wraff | 
| Repository: | CRAN | 
| Date/Publication: | 2025-08-22 08:20:02 UTC | 
Molecular mass for Elements
Description
This fuction returns the molecular mass based of main elements found in biology/proteomics as average and mono-isotopic mass. The result includes H, C, N, O, P, S, Se and the electrone. The values are bsed on http://www.ionsource.com/Card/Mass/mass.htm in ref to http://physics.nist.gov/Comp (as of 2019).
Usage
.atomicMasses()
Value
This function returns a numeric matrix with mass values
See Also
Examples
.atomicMasses()
Checking presence of knitr and rmarkdown
Description
This function allows checking presence of knitr and rmarkdown
Usage
.checkKnitrProt(tryF = FALSE)
Arguments
| tryF | (logical) | 
Value
This function returns a logical value
See Also
Examples
.checkKnitrProt()
Additional/final Check And Adjustments To Sample-order After readSampleMetaData()
Description
This (low-level) function performs an additional/final check & adjustments to sample-names after readSampleMetaData()
Usage
.checkSetupGroups(
  abund,
  setupSd,
  gr = NULL,
  sampleNames = NULL,
  quantMeth = NULL,
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)
Arguments
| abund | (matrix or data.frame) abundance data, only the colnames will be used | 
| setupSd | (list) describing sammple-setup, typically produced by  | 
| gr | (factor) optional custom information about replicate-layout, has priority over setupSd | 
| sampleNames | (character) custom sample-names, has priority over abund and setuoSd | 
| quantMeth | (character) 2-letter abbreviation of name of quantitation-software (eg 'MQ') | 
| silent | (logical) suppress messages | 
| callFrom | (character) allow easier tracking of messages produced | 
| debug | (logical) display additional messages for debugging | 
Value
This function returns an enlaged/updated list 'setupSd' (set setupSd$sampleNames, setupSd$groups)
See Also
used in readProtDiscovererFile,  readMaxQuantFile, readProlineFile, readFragpipeFile
Examples
abun1 <- matrix(1:16, ncol=8, dimnames=list(NULL,paste("samp", LETTERS[8:1], sep="_")))
sdrf1 <- data.frame(source.name=paste(rep(LETTERS[1:4],each=2), 1:2, sep="_"), 
  assay.name=paste0("run", 1:8), comment.data.file.=paste0("MSrun", 8:1))
setU1 <- list(level=gl(4,2), meth="lowest", sampleNames=paste("samp", LETTERS[1:8], sep="_"), 
  sdrfDat=sdrf1, annotBySoft=NULL)
.checkSetupGroups(abun1, setU1)
Get Matrix With UniProt Abbreviations For Selected Species As Well As Simple Names
Description
This (low-level) function allows accessing matrix with UniProt abbreviations for species frequently used in research. This information may be used to harmonize species descriptions or extract species information out of protein-names.
Usage
.commonSpecies()
Value
This function returns a 2-column matrix with species names
See Also
used eg in readProtDiscovererFile,  readMaxQuantFile, readProlineFile, readFragpipeFile
Examples
.commonSpecies()
Extract Additional Information To Construct The Colum 'SpecType', Allows Adding Information From Fasta
Description
This (low-level) function creates the column annot[,'SpecType'] which may help distinguishing different lines/proteins.
This information may, for example, be used to normalize only to all proteins of a common backgroud matrix (species).
In order to compare specPref a species-column will be added to the annotation (annot) - if not already present
If $mainSpecies or $conta: match to annot[,"Species"], annot[,"EntryName"], annot[,"GeneName"], if length==1 grep in  annot[,"Species"]
Usage
.extrSpecPref(
  specPref,
  annot,
  useColumn = c("Species", "EntryName", "GeneName", "Accession"),
  suplInp = NULL,
  soft = NA,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| specPref | (list) may contain $mainSpecies, $conta ... | 
| annot | (matrix) main protein annotation | 
| useColumn | (factor) columns from annot to use/mine | 
| suplInp | (matrix) additional custom annotation | 
| soft | (character, length=1) additional info which software was initially used (so far only special treatmentr for IB) | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging (starting with 'mainSpecies','conta' and others - later may overwrite prev settings) | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
Different to readSampleMetaData this function also considers the main annotation as axtracted with main quantification data.
For example, this function can complement protein annotation data if columns 'Accession','EntryName' or 'SpecType' are missing
Value
This function returns a matrix with additional column 'SpecType'
See Also
used in readProtDiscovererFile,  readMaxQuantFile, readProlineFile, readFragpipeFile
Examples
annot1 <- cbind( Leading.razor.protein=c("sp|P00925|ENO2_YEAST",
  "sp|Q3E792|RS25A_YEAST", "sp|P09938|RIR2_YEAST", "sp|P09938|RIR2_YEAST",
  "sp|Q99186|AP2M_YEAST", "sp|P00915|CAH1_HUMAN"), 
  Species= rep(c("Saccharomyces cerevisiae","Homo sapiens"), c(5,1)))
specPref1 <- list(conta="CON_|LYSC_CHICK", 
  mainSpecies="OS=Saccharomyces cerevisiae", spike="P00915")   # MQ type
.extrSpecPref(specPref1, annot1, useColumn=c("Species","Leading.razor.protein"))  
Basic NA-imputaton (main)
Description
This (lower-level) function allows to perfom the basic NA-imputaton.
Note, at this point the information from argument gr is not used.
Usage
.imputeNA(
  dat,
  gr = NULL,
  impParam,
  exclNeg = TRUE,
  inclLowValMod = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| dat | (matrix or data.frame) main data (may contain  | 
| gr | (character or factor) grouping of columns of  | 
| impParam | (numeric) 1st for mean; 2nd for sd; 3rd for seed | 
| exclNeg | (logical) exclude negative | 
| inclLowValMod | (logical) label on x-axis on plot | 
| silent | (logical) suppress messages | 
| debug | (logical) supplemental messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Value
This function returns a list with $data and $datImp
See Also
for more complex treatment matrixNAneighbourImpute;
Examples
dat1 <- matrix(11:22, ncol=4)
dat1[3:4] <- NA
.imputeNA(dat1, impParam=c(mean(dat1, na.rm=TRUE), 0.1))
Generic Plotting Of Density Distribution For Quantitation Import-functions
Description
This (low-level) function allows (generic) plotting of density distribution for quantitation import-functions
Usage
.plotQuantDistr(
  abund,
  quant,
  custLay = NULL,
  normalizeMeth = NULL,
  softNa = NULL,
  refLi = NULL,
  refLiIni = NULL,
  notLogAbund = NA,
  figMarg = c(3.5, 3.5, 3, 1),
  tit = NULL,
  las = NULL,
  cexAxis = 0.8,
  nameSer = NULL,
  cexNameSer = NULL,
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)
Arguments
| abund | (matrix or data.frame) abundance data, will be plottes as distribution | 
| quant | (matrix or data.frame) optional additional abundance data, to plot 2nd distribution, eg of normalized data | 
| custLay | (matrix) describing sammple-setup, typically produced by | 
| normalizeMeth | (character, length=1) name of normalization method (will be displayed in title of figure) | 
| softNa | (character, length=1) name of quantitation-software (typically 2-letter abbreviation, eg 'MQ') | 
| refLi | (integer) to display number reference lines | 
| refLiIni | (integer) to display initial number reference lines | 
| notLogAbund | (logical) set to  | 
| figMarg | (numeric, length=4) custom figure margins (will be passed to  | 
| tit | (character) custom title | 
| las | (integer) indicate orientation of text in axes | 
| cexAxis | (numeric) size of numeric axis labels as cex-expansion factor (see also  | 
| nameSer | (character) custom label for data-sets or columns (length must match number of data-sets) | 
| cexNameSer | (numeric) size of individual data-series labels as cex-expansion factor (see also  | 
| silent | (logical) suppress messages | 
| callFrom | (character) allow easier tracking of messages produced | 
| debug | (logical) display additional messages for debugging | 
Value
This function returns logical value (if data were valid for plotting) and produces a density dustribution figure (if data were found valid)
See Also
used in readProtDiscovererFile,  readMaxQuantFile, readProlineFile, readFragpipeFile
Examples
set.seed(2018);  datT8 <- matrix(round(rnorm(800) +3,1), nc=8, dimnames=list(paste(
  "li",1:100,sep=""), paste(rep(LETTERS[1:3],c(3,3,2)),letters[18:25],sep="")))
.plotQuantDistr(datT8, quant=NULL, refLi=NULL, tit="Synthetic Data Distribution")                                
Molecular mass for amino-acids
Description
Calculate molecular mass based on atomic composition
Usage
AAmass(massTy = "mono", inPept = TRUE, inclSpecAA = FALSE)
Arguments
| massTy | (character) 'mono' or 'average' | 
| inPept | (logical) remove H20 corresponding to water loss at peptide bond formaton | 
| inclSpecAA | (logical) include ornithine O & selenocysteine U | 
Value
This function returns a vector with masses for all amino-acids (argument 'massTy' to switch from mono-isotopic to average mass)
See Also
Examples
massDeFormula(c("12H12O","HO"," 2H 1 Se, 6C 2N","HSeCN"," ","e"))
AAmass()
AUC from ROC-curves
Description
This function calculates the AUC (area under the curve) from ROC data in matrix of specificity and sensitivity values,
as provided in the output from  summarizeForROC.
Usage
AucROC(
  dat,
  useCol = c("spec", "sens"),
  returnIfInvalid = NA,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| dat | (matrix or data.frame) main inut containig sensitivity and specificity data (from  | 
| useCol | (character or integer) column names to be used: 1st for specificity and 2nd for sensitivity count columns | 
| returnIfInvalid | ( | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Value
This function returns a matrix including imputed values or list of final and matrix with number of imputed by group (plus optional plot)
See Also
preparing ROC data summarizeForROC, (re)plot the ROC figure plotROC;
note that numerous other packages also provide support for working with ROC-curves : Eg rocPkgShort,
ROCR, pROC or ROCit, etc.
Examples
set.seed(2019); test1 <- list(annot=cbind(Species=c(rep("b",35), letters[sample.int(n=3,
  size=150,replace=TRUE)])), BH=matrix(c(runif(35,0,0.01), runif(150)), ncol=1))
roc1 <- summarizeForROC(test1, spec=c("a","b","c"), annotCol="Species")
AucROC(roc1)
Deprecialed Volcano-plot
Description
Please use VolcanoPlotW() from package wrGraph. This function does NOT produce a plot any more.
Usage
VolcanoPlotW2(
  Mvalue,
  pValue = NULL,
  useComp = 1,
  filtFin = NULL,
  ProjNa = NULL,
  FCthrs = NULL,
  FdrList = NULL,
  FdrThrs = NULL,
  FdrType = NULL,
  subTxt = NULL,
  grayIncrem = TRUE,
  col = NULL,
  pch = 16,
  compNa = NULL,
  batchFig = FALSE,
  cexMa = 1.8,
  cexLa = 1.1,
  limM = NULL,
  limp = NULL,
  annotColumn = NULL,
  annColor = NULL,
  cexPt = NULL,
  cexSub = NULL,
  cexTxLab = 0.7,
  namesNBest = NULL,
  NbestCol = 1,
  sortLeg = "descend",
  NaSpecTypeAsContam = TRUE,
  useMar = c(6.2, 4, 4, 2),
  returnData = FALSE,
  callFrom = NULL,
  silent = FALSE,
  debug = FALSE
)
Arguments
| Mvalue | (numeric or matrix) data to plot; M-values are typically calculated as difference of log2-abundance values and 'pValue' the mean of log2-abundance values;
M-values and p-values may be given as 2 columsn of a matrix, in this case the argument  | 
| pValue | (numeric, list or data.frame) if  | 
| useComp | (integer, length=1) choice of which of multiple comparisons to present in  | 
| filtFin | (matrix or logical) The data may get filtered before plotting: If  | 
| ProjNa | (character) custom title | 
| FCthrs | (numeric) Fold-Change threshold (display as line) give as Fold-change and NOT log2(FC), default at 1.5, set to  | 
| FdrList | (numeric) FDR data or name of list-element | 
| FdrThrs | (numeric) FDR threshold (display as line), default at 0.05, set to  | 
| FdrType | (character) FDR-type to extract if  | 
| subTxt | (character) custom sub-title | 
| grayIncrem | (logical) if  | 
| col | (character) custom color(s) for points of plot (see also  | 
| pch | (integer) type of symbol(s) to plot (default=16) (see also  | 
| compNa | (character) names of groups compared | 
| batchFig | (logical) if  | 
| cexMa | (numeric) font-size of title, as expansion factor (see also  | 
| cexLa | (numeric) size of axis-labels, as expansion factor (see also  | 
| limM | (numeric, length=2) range of axis M-values | 
| limp | (numeric, length=2) range of axis FDR / p-values | 
| annotColumn | (character) column names of annotation to be extracted (only if  | 
| annColor | (character or integer) colors for specific groups of annoatation (only if  | 
| cexPt | (numeric) size of points, as expansion factor (see also  | 
| cexSub | (numeric) size of subtitle, as expansion factor (see also  | 
| cexTxLab | (numeric) size of text-labels for points, as expansion factor (see also  | 
| namesNBest | (integer or character) number of best points to add names in figure; if 'passThr' all points passing FDR and FC-filtes will be selected; 
if the initial object  | 
| NbestCol | (character or integer) colors for text-labels of best points | 
| sortLeg | (character) sorting of 'SpecType' annotation either ascending ('ascend') or descending ('descend'), no sorting if  | 
| NaSpecTypeAsContam | (logical) consider lines/proteins with  | 
| useMar | (numeric,length=4) custom margings (see also  | 
| returnData | (logical) optional returning data.frame with (ID, Mvalue, pValue, FDRvalue, passFilt) | 
| callFrom | (character) allow easier tracking of message(s) produced | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
Value
deprecated - returns nothing
See Also
this function was replaced by plotPCAw)
Examples
set.seed(2005); mat <- matrix(round(runif(900),2), ncol=9)
Selective batch cleaning of sample- (ie column-) names in list
Description
This function allows to manipulate sample-names (ie colnames of abundance data) in a batch-wise manner from data stored as multiple matrixes or data.frames of a list.
Import functions such as readMaxQuantFile() organize initial flat files into lists (of matrixes) of the different types of data.
Many times all column names in such lists carry long names including redundant information, like the overall experiment name or date, etc.
The aim of this function is to facilitate 'cleaning' the sample- (ie column-) names to obtain short and concise names.
Character terms to be removed (via argument rem) and/or replaced/subsitituted (via argument subst) should be given as they are, characters with special behaviour in grep (like '.') will be protected internally.
Note, that the character substitution part will be done first, and the removal part (without character replacement) afterwards.
Usage
cleanListCoNames(
  dat,
  rem = NULL,
  subst = c("-", "_"),
  lstE = c("raw", "quant", "counts"),
  mathOper = NULL,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| dat | (list) main input | 
| rem | (character) character string to be removed, may be named 'left' and 'right' for more specific exact pattern matching
(this part will be perfomed before character substitutions by  | 
| subst | (character of length=2, or matrix with 2 columns) pair(s) of character-strings for replacement (1st as search-item and 2nd as replacement); this part is performed after character-removal via  | 
| lstE | (character, length=1) names of list-elements where colnames should be cleaned | 
| mathOper | (character, length=1) optional mathematical operation on numerical part of sample-names (eg  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Value
This function returns a list (equivalent to input dat)
See Also
Examples
dat1 <- matrix(1:12, ncol=4, dimnames=list(1:3, paste0("sample_R.",1:4)))
dat1 <- list(raw=dat1, quant=dat1, notes="other..")
cleanListCoNames(dat1, rem=c(left="sample_"), c(".","-"))
Combine Multiple Filters On NA-imputed Data
Description
In most omics data-analysis one needs to employ a certain number of filtering strategies to avoid getting artifacts to the step of statistical testing.
combineMultFilterNAimput takes on one side the origial data and on the other side NA-imputed data to create several differnet filters and to finally combine them.
A filter aiming to take away the least abundant values (using the imputede data) can be fine-tuned by the argument abundThr. 
This step compares the means for each group and line, at least one grou-mean has to be > the threshold (based on hypothesis 
that if all conditions represent extrememy low measures their diffrenetial may not be determined with certainty).
In contrast, the filter addressing the number of missing values (NA) uses the original data, the arguments colTotNa,minSpeNo and minTotNo 
are used at this step. Basically, this step allows defining a minimum content of 'real' (ie non-NA) values for further considering the measurements as reliable.
This part uses internally presenceFilt for filtering elevated content of NA per line.
Finally, this function combines both filters (as matrix of FALSE and TRUE) on NA-imputed and original data 
and retruns a vector of logical values if corresponding lines passe all filter criteria.
Usage
combineMultFilterNAimput(
  dat,
  imputed,
  grp,
  annDat = NULL,
  abundThr = NULL,
  colRazNa = NULL,
  colTotNa = NULL,
  minSpeNo = 1,
  minTotNo = 2,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| dat | (matrix or data.frame) main data (may contain  | 
| imputed | (character)  same as 'dat' but with all  | 
| grp | (character or factor) define groups of replicates (in columns of 'dat') | 
| annDat | (matrix or data.frame) annotation data (should match lines of 'dat') | 
| abundThr | (numeric) optional threshold filter for minimumn abundance | 
| colRazNa | (character) if razor peptides are used: column name for razor peptide count | 
| colTotNa | (character) column name for total peptide count | 
| minSpeNo | (integer) minimum number of specific peptides for maintaining proteins | 
| minTotNo | (integer) minimum total ie max razor number of peptides | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Value
This function returns a vector of logical values if corresponding line passes filter criteria
See Also
Examples
set.seed(2013)
datT6 <- matrix(round(rnorm(300)+3,1), ncol=6,
  dimnames=list(paste0("li",1:50), letters[19:24]))
datT6 <- datT6 +matrix(rep(1:nrow(datT6),ncol(datT6)), ncol=ncol(datT6))
datT6[6:7,c(1,3,6)] <- NA
datT6[which(datT6 < 11 & datT6 > 10.5)] <- NA
datT6[which(datT6 < 6 & datT6 > 5)] <- NA
datT6[which(datT6 < 4.6 & datT6 > 4)] <- NA
datT6b <- matrixNAneighbourImpute(datT6, gr=gl(2,3))
datT6c <- combineMultFilterNAimput(datT6, datT6b, grp=gl(2,3), abundThr=2)
Molecular mass for amino-acids
Description
This function calculates the molecular mass of one-letter code amion-acid sequences.
Usage
convAASeq2mass(
  x,
  massTy = "mono",
  seqName = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| x | (character) aminoacid sequence (single upper case letters for describing a peptide/protein) | 
| massTy | (character) default 'mono' for mono-isotopic masses (alternative 'average') | 
| seqName | (logical) optional (alternative) names for the content of 'x' (ie aa seq) as name (always if 'x' has no names) | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Value
This functions returns a vector with masses for all amino-acids (argument 'massTy' to switch form mono-isotopic to average mass)
See Also
massDeFormula, AAmass, convToNum
Examples
convAASeq2mass(c("PEPTIDE","fPROTEINES"))
pep1 <- c(aa="AAAA", de="DEFDEF")
convAASeq2mass(pep1, seqN=FALSE)
Order Columns In List Of Matrixes, Data.frames And Vectors
Description
This function orders columns in list of matrixes (or matrix) according to argument sampNames and also offers an option for changing names of columns.
It was (initially) designed to adjust/correct the order of samples after import using readMaxQuantFile(), readProteomeDiscovererFile() etc.
The input may also be MArrayLM-type object from package limma or 
from functions moderTestXgrp or moderTest2grp.
Usage
corColumnOrder(
  dat,
  sampNames,
  replNames = NULL,
  useListElem = c("quant", "raw", "counts"),
  annotElem = "sampleSetup",
  newNames = NULL,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| dat | (matrix, list or MArrayLM-object from limma) main input of which columns should get re-ordered, may be output from  | 
| sampNames | (character) column-names in desired order for output (its content must match colnames of  | 
| replNames | (character) option for replacing column-names by new/different colnames; should be vector of NEW column-names (in order as input from  | 
| useListElem | (character) in case  | 
| annotElem | (character) name of list-element of  | 
| newNames | depreciated, pleqse use  | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Value
This function returns an object of same class as input dat  (ie matrix, list or MArrayLM-object from limma)
See Also
readMaxQuantFile, readProteomeDiscovererFile; moderTestXgrp or moderTest2grp
Examples
grp <- factor(rep(LETTERS[c(3,1,4)], c(2,3,3)))
dat1 <- matrix(1:15, ncol=5, dimnames=list(NULL,c("D","A","C","E","B")))
corColumnOrder(dat1, sampNames=LETTERS[1:5])
dat2 <- list(quant=dat1, raw=dat1)
dat2
corColumnOrder(dat2, sampNames=LETTERS[1:5])
corColumnOrder(dat2, sampNames=LETTERS[1:5], replNames=c("Dd","Aa","Cc","Ee","Bb"))
Compare in-silico digested proteomes for unique and shared peptides, counts per protein or as peptides Compare in-silico digested proteomes for unique and shared peptides, counts per protein or as peptides. The in-silico digestion may be performed separately using the package cleaver. Note: input must be list (or multiple names lists) of proteins with their respective peptides (eg by in-silico digestion).
Description
Compare in-silico digested proteomes for unique and shared peptides, counts per protein or as peptides
Compare in-silico digested proteomes for unique and shared peptides, counts per protein or as peptides. The in-silico digestion may be performed separately using the package cleaver. Note: input must be list (or multiple names lists) of proteins with their respective peptides (eg by in-silico digestion).
Usage
countNoOfCommonPeptides(
  ...,
  prefix = c("Hs", "Sc", "Ec"),
  sep = "_",
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| ... | (list) multiple lists of (ini-silico) digested proteins (typically protein ID as names) with their respectice peptides (AA sequence), one entry for each species | 
| prefix | (character) optional (species-) prefix for entries in '...', will be only considered if '...' has no names | 
| sep | (character) concatenation symbol | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of message(s) produced | 
Value
This function returns a list with $byPep as list of logical matrixes for each peptide (as lines) and unique/shared/etc for each species; $byProt as list of matrixes with count data per proten (as line) for each species; $tab with simple summary-type count data
See Also
readFasta2 and/or cleave-methods in package cleaver
Examples
## The example mimics a proteomics experiment where extracts form E coli and 
## Saccharomyces cerevisiae were mixed, thus not all peptdes may occur unique.  
(mi2 = countNoOfCommonPeptides(Ec=list(E1=letters[1:4],E2=letters[c(3:7)],
  E3=letters[c(4,8,13)],E4=letters[9]),Sc=list(S1=letters[c(2:3,6)], 
  S2=letters[10:13],S3=letters[c(5,6,11)],S4=letters[c(11)],S5="n")))
##  a .. uni E, b .. inteR, c .. inteR(+intra E), d .. intra E  (no4), e .. inteR, 
##  f .. inteR +intra E   (no6), g .. uni E, h .. uni E  no 8), i .. uni E, 
##  j .. uni S (no10), k .. intra S  (no11), l .. uni S (no12), m .. inteR  (no13)
lapply(mi2$byProt,head)
mi2$tab
Export As Wombat-P Set Of Files
Description
This function allows exporting objects created from wrProteo to the format of Wombat-P Wombat-P.
Usage
exportAsWombatP(
  wrProtObj,
  path = ".",
  combineFractions = "mean",
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| wrProtObj | (list produced by any import-function from wrProteo) object which will be exported as Wombat-P format | 
| path | (character) the location where the data should be exorted to | 
| combineFractions | ( | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Value
This function creates a set of files (README.md, test_params.yml), plus a sud-directory containig file(s) (stand_prot_quant_method.csv); finally the function returns  (NULL),
See Also
readMaxQuantFile, readProteomeDiscovererFile; moderTestXgrp or moderTest2grp
Examples
path1 <- system.file("extdata", package="wrProteo")
fiNa <- "proteinGroupsMaxQuant1.txt.gz"
specPr <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="YEAST", spike="HUMAN_UPS")
dataMQ <- readMaxQuantFile(path1, file=fiNa, specPref=specPr, tit="tiny MaxQuant")
exportAsWombatP(dataMQ, path=tempdir())
Export Sample Meta-data from Quantification-Software as Sdrf-draft
Description
Sample/experimental annotation meta-data form MaxQuant that was previously import can now be formatted in sdrf-style and exported using this function to write a draft-sdrf-file. Please note that this information will not _complete_ in respect to all information used in data-bases like Pride. Sdrf-files provide additional meta-information about samles and MS-runs in a standardized format, they may also be part of submissions to Pride.
Usage
exportSdrfDraft(
  lst,
  fileName = "sdrfDraft.tsv",
  correctFileExtension = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| lst | (list) object created by import-function (MaxQuant) | 
| fileName | (character) file-name (and path) to be used when exprting | 
| correctFileExtension | (logical) if  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
Gathering as much as possible information about samples and MS-runs requires that the additional files created from software, like MaxQuant using readMaxQuantFile, 
is present and was imported when calling the import-function (eg using the argument _suplAnnotFile=TRUE_).
Please note that this functionality was designed for the case where no (external) sdrf-file is available. 
Thus, when data was imported including exteranl sdrf (uinsg the _sdrf=_ argument), exporting incomplete annotation-data from MaxQuant-produced files does not make any sense and therefore won't be possible.
After exporting the draft sdrf the user is advised to check and complete the information in the resulting file.
Unfortunately, not all information present in a standard sdrf-file (like on Pride) cannot be gathered automatically,
but key columns are already present and thus may facilitate completing.
Please note, that the file-format has been defined as .tsv, thus columns/fields should be separated by tabs.
At manual editing and completion, some editing- or tabulator-software may change the file-extesion to .tsv.txt,
in this case the final files should be renamed as .tsv to remain compatible with Pride.
At this point only the import of data from MaxQuant via readMaxQuantFile has been developed to extract information for creating a draft-sdrf.
Other data/file-import functions may be further developed to gather as much as possible equivalent information in the future.
Value
This function writes an Sdrf draft to file
See Also
This function may be used after reading/importig data by readMaxQuantFile in absence of sdrf
Examples
path1 <- system.file("extdata", package="wrProteo")
fiNaMQ <- "proteinGroups.txt.gz"
dataMQ <- readMaxQuantFile(path1, file=fiNaMQ, refLi="mainSpe", sdrf=FALSE, suplAnnotFile=TRUE)
## Here we'll write simply in the current temporary directory of this R-session
exportSdrfDraft(dataMQ, file.path(tempdir(),"testSdrf.tsv"))
Extract species annotation
Description
extrSpeciesAnnot identifies species-related annotation (as suffix to identifyers) for data comnining multiple species and returns alternative (short) names.  
This function also suppresses extra heading or tailing space or punctuation characters.
In case multiple tags are found, the last tag is reported and a message of alert may be displayed.
Usage
extrSpeciesAnnot(
  annot,
  spec = c("_CONT", "_HUMAN", "_YEAST", "_ECOLI"),
  shortNa = c("cont", "H", "S", "E"),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| annot | (character) vector with initial annotation | 
| spec | (character) the tags to be identified | 
| shortNa | (character) the final abbreviation used, order and lengt must fit to argument  | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Value
This function returns a character vector with single (last of multiple) term if found in argument annot
See Also
Examples
spec <- c("keratin_CONT","AB_HUMAN","CD_YEAST","EF_G_HUMAN","HI_HUMAN_ECOLI","_YEAST_012")
extrSpeciesAnnot(spec) 
Extract Results From Moderated t-tests
Description
This function allows convenient access to results produced using the functions moderTest2grp or moderTestXgrp.
The user can define the threshold which type of multiple testing correction should be used
(as long as the  multiple testing correction method was actually performed as part of testing).
Usage
extractTestingResults(
  stat,
  compNo = 1,
  statTy = "BH",
  thrsh = 0.05,
  FCthrs = 1.5,
  annotCol = c("Accession", "EntryName", "GeneName"),
  nSign = 6,
  addTy = c("allMeans"),
  filename = NULL,
  fileTy = "csvUS",
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| stat | ('MArrayLM'-object or list) Designed for the output from  | 
| compNo | (integer) the comparison number/index to be used | 
| statTy | (character) the multiple-testing correction type to be considered when looking for significant changes  with threshold  | 
| thrsh | (numeric) the threshold to be applied on  | 
| FCthrs | (numeric) Fold-Change threshold given as Fold-change and NOT log2(FC), default at 1.5 (for filtering at M-value =0.585) | 
| annotCol | (character) column-names from the annotation to be included | 
| nSign | (integer) number of significant digits whe returning results | 
| addTy | (character) additional groups to add (so far only "allMeans" available) in addition to the means used in the pairwise comparison | 
| filename | (character) optional (path and) file-name for exporting results to csv-file | 
| fileTy | (character) file-type to be used with argument  | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Value
This function returns a limma-type MA-object (which can be handeled just like a list)
See Also
testRobustToNAimputation, moderTestXgrp or moderTest2grp
Examples
grp <- factor(rep(LETTERS[c(3,1,4)],c(2,3,3)))
set.seed(2017); t8 <- matrix(round(rnorm(208*8,10,0.4),2), ncol=8,
  dimnames=list(paste(letters[],rep(1:8,each=26),sep=""), paste(grp,c(1:2,1:3,1:3),sep="")))
t8[3:6,1:2] <- t8[3:6,1:2] +3                    # augment lines 3:6 (c-f) 
t8[5:8,c(1:2,6:8)] <- t8[5:8,c(1:2,6:8)] -1.5    # lower lines 
t8[6:7,3:5] <- t8[6:7,3:5] +2.2                  # augment lines 
## expect to find C/A in c,d,g, (h)
## expect to find C/D in c,d,e,f
## expect to find A/D in f,g,(h) 
library(wrMisc)     # for testing we'll use this package
test8 <- moderTestXgrp(t8, grp) 
extractTestingResults(test8)
Add arrow for expected Fold-Change to VolcanoPlot or MA-plot
Description
NOTE : This function is deprecated, please use foldChangeArrow instead !!
This function was made for adding an arrow indicating a fold-change to MA- or Volcano-plots. 
When comparing mutiple concentratios of standards in benchmark-tests it may be useful to indicate the expected ratio in a pair-wise comparison.
In case of main input as list or MArrayLM-object (as generated from limma), the colum-names of multiple pairwise comparisons can be used 
for extracting a numeric content (supposed as concentrations in sample-names) which will be used to determine the expected ratio used for plotting. 
Optionally the ratio used for plotting can be returned as numeric value.
Usage
foldChangeArrow2(
  FC,
  useComp = 1,
  isLin = TRUE,
  asX = TRUE,
  col = 1,
  arr = c(0.005, 0.15),
  lwd = NULL,
  addText = c(line = -0.9, cex = 0.7, txt = "expected", loc = "toright"),
  returnRatio = FALSE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| FC | (numeric, list or MArrayLM-object) main information for drawing arrow : either numeric value for fold-change/log2-ratio of object to search for colnames of statistical testing for extracting numeric part | 
| useComp | (integer) only used in case FC is list or MArrayLM-object an has multiple pairwise-comparisons | 
| isLin | (logical) inidicate if  | 
| asX | (logical) indicate if arrow should be on x-axis | 
| col | (integer or character) custom color | 
| arr | (numeric, length=2) start- and end-points of arrow (as relative to entire plot) | 
| lwd | (numeric) line-width of arrow | 
| addText | (logical or named vector) indicate if text explaining arrow should be displayed, use  | 
| returnRatio | (logical) return ratio | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of message(s) produced | 
Details
The argument addText also allows specifying a fixed position when using addText=c(loc="bottomleft"), also bottomright, topleft, topright, toleft and toright may be used.
In this case the elemts side and adjust will be redefined to accomodate the text in the corner specified. 
Ultimately this function will be integated to the package wrGraph.
Value
plots arrow only (and explicative text), if returnRatio=TRUE also returns numeric value for extracted ratio
See Also
new version : foldChangeArrow; used with MAplotW, VolcanoPlotW
Examples
plot(rnorm(20,1.5,0.1),1:20)
#deprecated# foldChangeArrow2(FC=1.5) 
Combine Multiple Proteomics Data-Sets
Description
This function allows combining up to 3 separate data-sets previously imported using wrProteo.
Usage
fuseProteomicsProjects(
  x,
  y,
  z = NULL,
  columnNa = "Accession",
  NA.rm = TRUE,
  listNa = c(quant = "quant", annot = "annot"),
  all = FALSE,
  textModif = NULL,
  shortNa = NULL,
  retProtLst = FALSE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| x | (list) First Proteomics data-set | 
| y | (list) Second Proteomics data-set | 
| z | (list) optional third Proteomics data-set | 
| columnNa | (character) column names from annotation | 
| NA.rm | (logical) remove  | 
| listNa | (character) names of key list-elemnts from  | 
| all | (logical) union of intersect or merge should be performed between x, y and z | 
| textModif | (character) Additional modifications to the identifiers from argument  | 
| shortNa | (character) for appending to output-colnames | 
| retProtLst | (logical) return list-object similar to input, otherwise a matrix of fused/aligned quantitation data | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
Some quantification software way give some identifyers multiple times, ie as multiple lines (eg for different modifictions or charge states, etc).
In this case this function tries first to summarize all lines with identical identifyers (using the function combineRedundLinesInList
which used by default the median value). 
Thus, it is very important to know your data and to understand when lines that appear with the same identifyers should/may be fused/summarized without 
doing damage to the later biological interpretation ! The user may specify for each dataset the colum out of the protein/peptide-annotation to use
via the argument columnNa. 
Then, this content will be matched as identical match, so when combining data from different software special care shoud be taken !
Please note, that (at this point) the data from different series/objects will be joined as they are, ie without any additional normalization. It is up to the user to inspect the resulting data and to decide if and which type of normalization may be suitable !
Please do NOT try combining protein and peptide quntification data.
Value
This function returns a list with the same number of list-elements as  $x, ie typically this contains :
$raw (initial/raw abundance values), $quant with final normalized quantitations, 
$annot, optionally $counts an array with number of peptides, $quantNotes or $notes
See Also
Examples
path1 <- system.file("extdata", package="wrProteo")
dataMQ <- readMaxQuantFile(path1, specPref=NULL, normalizeMeth="median")
MCproFi1 <- "tinyMC.RData"
dataMC <- readMassChroQFile(path1, file=MCproFi1, plotGraph=FALSE)
dataFused <- fuseProteomicsProjects(dataMQ, dataMC)
dim(dataMQ$quant)
dim(dataMC$quant)
dim(dataFused$quant)
UniProt Accession-Numbers And Names Of UPS1 Proteins
Description
UPS1 (see https://www.sigmaaldrich.com/FR/en/product/sigma/ups1) and UPS2 are commerical products consisting of a mix of 48 human (purified) proteins.
They are frequently used as standard in spike-in experiments, available from Sigma-Aldrich (https://www.sigmaaldrich.com/GB/en).
This function allows accessing their protein accession numbers and associated names on UniProt
Usage
getUPS1acc(updated = TRUE, silent = FALSE, debug = FALSE, callFrom = NULL)
Arguments
| updated | (logical) return updated accession number (of UBB) | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Details
Please note that the UniProt accession 'P62988' for 'UBIQ_HUMAN' (as originally cited by Sigma-Aldrich)
has been withdrawn and replaced in 2010 by UniProt by the accessions 'P0CG47', 'P0CG48', 'P62979', and 'P62987'.
This initial accession is available via getUPS1acc()$acOld, now getUPS1acc()$ac contains 'P0CG47'.
Value
This function returns data.frame with accession-numbers as stated by the supplier ($acFull),
trimmed accession-numbers, ie without version numbers ($ac), 
and associated (UniProt) entry-names  ($EntryName) from UniProt 
as well as the species designation for the collection of 48 human UPS1 or UPS2 proteins.
This function returns a matrix including imputed values or list of final and matrix with number of imputed by group (plus optional plot)
Examples
head(getUPS1acc())
Inspect Species Indictaion Or Group of Proteins
Description
This function inspects its main argument to convert a species indication to the scientific name or to return all protein-accession numbers for a name of a standard collection like UPS1.
Usage
inspectSpeciesIndic(x, silent = FALSE, debug = FALSE, callFrom = NULL)
Arguments
| x | (character) species indication or name of collection of proteins (so far only UPS1 & UPS2) | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Value
This function returns a character vector
See Also
Examples
inspectSpeciesIndic("Human")
inspectSpeciesIndic("UPS1")
Isolate NA-neighbours
Description
This functions extracts all replicate-values where at least one of the replicates is NA and sorts by number of NAs per group.
A list with all NA-neighbours organized by the number of NAs gets returned.
Usage
isolNAneighb(mat, gr, silent = FALSE, debug = FALSE, callFrom = NULL)
Arguments
| mat | (matrix or data.frame) main data (may contain  | 
| gr | (character or factor) grouping of columns of 'mat', replicate association | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Value
This function returns a list with NA-neighbours sorted by number of NAs in replicate group
See Also
This function gets used by matrixNAneighbourImpute and testRobustToNAimputation; estimation of mode stableMode; detection of NAs na.fail
Examples
mat1 <- c(22.2, 22.5, 22.2, 22.2, 21.5, 22.0, 22.1, 21.7, 21.5, 22, 22.2, 22.7,
  NA, NA, NA, NA, NA, NA, NA, 21.2,   NA, NA, NA, NA,
  NA, 22.6, 23.2, 23.2,  22.4, 22.8, 22.8, NA,  23.3, 23.2, NA, 23.7,
  NA, 23.0, 23.1, 23.0,  23.2, 23.2, NA, 23.3,  NA, NA, 23.3, 23.8)
mat1 <- matrix(mat1, ncol=12, byrow=TRUE)
gr4 <- gl(3, 4)
isolNAneighb(mat1, gr4)
Molecular mass from chemical formula
Description
Calculate molecular mass based on atomic composition
Usage
massDeFormula(
  comp,
  massTy = "mono",
  rmEmpty = FALSE,
  silent = FALSE,
  callFrom = NULL
)
Arguments
| comp | (character) atomic compostion | 
| massTy | (character) 'mono' or 'average' | 
| rmEmpty | (logical) suppress empty entries | 
| silent | (logical) suppress messages | 
| callFrom | (character) allow easier tracking of messages produced | 
Value
This function returns a numeric vector with mass
See Also
Examples
massDeFormula(c("12H12O","HO"," 2H 1 Se, 6C 2N","HSeCN"," ","e"))
Histogram of content of NAs in matrix
Description
matrixNAinspect makes histograms of the full data and shows sub-population of NA-neighbour values.  
The aim of this function is to investigate the nature of NA values in matrix (of experimental measures) where replicate measurements are available.
If a given element was measured twice, and one of these measurements revealed a NA while the other one gave a (finite) numeric value, the non-NA-value is considered a NA-neighbour.  
The subpopulation of these NA-neighbour values will then be highlighted in the resulting histogram.
In a number of experimental settiongs some actual measurements may not meet an arbitrary defined baseline (as 'zero') or may be too low to be distinguishable from noise that 
associated measures were initially recorded as NA. In several types of measurments in proteomics and transcriptomics this may happen.
So this fucntion allows to collect all NA-neighbour values and compare them to the global distribution of the data to investigate if NA-neighbours are typically very low values.
In case of data with multiple replicates NA-neighbour values may be distinguished for the case of 2 NA per group/replicate-set.
The resulting plots are typically used to decide if and how NA values may get replaced by imputed random values or wether measues containing NA-values should rather me omitted.
Of course, such decisions do have a strong impact on further steps of data-analysis and should be performed with care.
Usage
matrixNAinspect(
  dat,
  gr = NULL,
  retnNA = TRUE,
  xLab = NULL,
  tit = NULL,
  xLim = NULL,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| dat | (matrix or data.frame) main numeric data | 
| gr | (charcter or factor) grouping of columns of dat indicating who is a replicate of whom (ie the length of 'gr' must be equivalent to the number of columns in 'dat') | 
| retnNA | (logical) report number of NAs in graphic | 
| xLab | (character) custom x-label | 
| tit | (character) custom title | 
| xLim | (numerical,length=2) custom x-axis limits | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Value
This function produces a graphic (to the current graphical device)
See Also
Examples
set.seed(2013)
datT6 <- matrix(round(rnorm(300)+3,1), ncol=6, 
  dimnames=list(paste("li",1:50,sep=""), letters[19:24]))
datT6 <- datT6 +matrix(rep(1:nrow(datT6),ncol(datT6)), ncol=ncol(datT6))
datT6[6:7,c(1,3,6)] <- NA
datT6[which(datT6 < 11 & datT6 > 10.5)] <- NA
datT6[which(datT6 < 6 & datT6 > 5)] <- NA
datT6[which(datT6 < 4.6 & datT6 > 4)] <- NA
matrixNAinspect(datT6, gr=gl(2,3)) 
Imputation of NA-values based on non-NA replicates
Description
It is assumed that NA-values appear in data when quantitation values are very low (as this appears eg in quantitative shotgun proteomics).
Here, the concept of (technical) replicates is used to investigate what kind of values appear in the other replicates next to NA-values for the same line/protein.
Groups of replicate samples  are defined via argument gr which descibes the columns of dat).
Then, they are inspected for each line to gather NA-neighbour values (ie those values where NAs and regular measures are observed the same time).
Eg, let's consider a line contains a set of 4 replicates for a given group. Now, if 2 of them are NA-values, the remaining 2 non-NA-values will be considered as NA-neighbours.
Ultimately, the aim is to replaces all NA-values based on values from a normal distribution ressembling theire respective NA-neighbours.
Usage
matrixNAneighbourImpute(
  dat,
  gr,
  imputMethod = "mode2",
  retnNA = TRUE,
  avSd = c(0.15, 0.5),
  avSdH = NULL,
  NAneigLst = NULL,
  plotHist = c("hist", "mode"),
  xLab = NULL,
  xLim = NULL,
  yLab = NULL,
  yLim = NULL,
  tit = NULL,
  figImputDetail = TRUE,
  seedNo = NULL,
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)
Arguments
| dat | (matrix or data.frame) main data (may contain  | 
| gr | (character or factor) grouping of columns of 'dat', replicate association | 
| imputMethod | (character) choose the imputation method (may be 'mode2'(default), 'mode1', 'datQuant', 'modeAdopt' or 'informed') | 
| retnNA | (logical) decide (if = | 
| avSd | (numerical,length=2) population characteristics 'high' (mean and sd) for >1  | 
| avSdH | depreciated, please use  | 
| NAneigLst | (list) option for repeated rounds of imputations: list of  | 
| plotHist | (character or logical) decide if supplemental figure with histogram shoud be drawn, the details 'Hist','quant' (display quantile of originak data), 'mode' (display mode of original data) can be chosen explicitely | 
| xLab | (character) label on x-axis on plot | 
| xLim | (numeric, length=2) custom x-axis limits | 
| yLab | (character) label on y-axis on plot | 
| yLim | (numeric, length=2) custom y-axis limits | 
| tit | (character) title on plot | 
| figImputDetail | (logical) display details about data (number of NAs) and imputation in graph (min number of NA-neighbours per protein and group, quantile to model, mean and sd of imputed) | 
| seedNo | (integer) seed-value for normal random values | 
| silent | (logical) suppress messages | 
| callFrom | (character) allow easier tracking of messages produced | 
| debug | (logical) supplemental messages for debugging | 
Details
By default a histogram gets plotted showing the initial, imputed and final distribution to check the global hypothesis that NA-values arose
from very low measurements and to appreciate the impact of the imputed values to the overall final distribution.
There are a number of experimental settings where low measurements may be reported as NA.
Sometimes an arbitrary defined baseline (as 'zero') may provoke those values found below being unfortunately reported as NA or as 0 (in case of MaxQuant).
In quantitative proteomics (DDA-mode) the presence of numerous high-abundance peptides will lead to the fact that a number of less
intense MS-peaks don't get identified properly and will then be reported as NA in the respective samples,
while the same peptides may by correctly identified and quantified in other (replicate) samples.
So, if a given protein/peptide gets properly quantified in some replicate samples but reported as NA in other replicate samples
one may thus speculate that similar values like in the successful quantifications may have occored.
Thus, imputation of NA-values may be done on the basis of NA-neighbours.
When extracting NA-neighbours, a slightly more focussed approach gets checked, too, the 2-NA-neighbours : In case a set of replicates for a given protein
contains at least 2 non-NA-values (instead of just one) it will be considered as a (min) 2-NA-neighbour as well as regular NA-neighbour.
If >300 of these (min) 2-NA-neighbours get found, they will be used instead of the regular NA-neighbours.
For creating a collection of normal random values one may use directly the mode of the NA-neighbours (or 2-NA-neighbours, if >300 such values available).
To do so, the first value of argument avSd must be set to NA. Otherwise, the first value avSd will be used as quantile of all data to define the mean
for the imputed data (ie as quantile(dat, avSd[1], na.rm=TRUE)). The sd for generating normal random values will be taken from the sd of all  NA-neighbours (or 2-NA-neighbours)
multiplied by the second value in argument avSd (or avSd, if >300 2-NA-neighbours), since the sd of the NA-neighbours is usually quite high.
In extremely rare cases it may happen that no NA-neighbours are found (ie if NAs occur, all replicates are NA).
Then, this function replaces NA-values based on the normal random values obtained as dscribed above.
Value
This function returns a list with $data .. matrix of data where NA are replaced by imputed values, $nNA .. number of NA by group, $randParam .. parameters used for making random data
See Also
this function gets used by testRobustToNAimputation; estimation of mode stableMode; detection of NAs na.fail
Examples
set.seed(2013)
datT6 <- matrix(round(rnorm(300)+3,1), ncol=6, dimnames=list(paste("li",1:50,sep=""),
  letters[19:24]))
datT6 <- datT6 +matrix(rep(1:nrow(datT6), ncol(datT6)), ncol=ncol(datT6))
datT6[6:7, c(1,3,6)] <- NA
datT6[which(datT6 < 11 & datT6 > 10.5)] <- NA
datT6[which(datT6 < 6 & datT6 > 5)] <- NA
datT6[which(datT6 < 4.6 & datT6 > 4)] <- NA
datT6b <- matrixNAneighbourImpute(datT6, gr=gl(2,3))
head(datT6b$data)
Plot ROC curves
Description
plotROC plots ROC curves based on results from summarizeForROC.
This function plots only, it does not return any data. It allows printing simultaneously multiple ROC curves from different studies,
it is also compatible with data from 3 species mix as in proteomics benchmark.
Input can be prepared using moderTest2grp followed by summarizeForROC.
Usage
plotROC(
  dat,
  ...,
  useColumn = 2:3,
  methNames = NULL,
  col = NULL,
  pch = 1,
  bg = NULL,
  tit = NULL,
  xlim = NULL,
  ylim = NULL,
  point05 = 0.05,
  pointSi = 0.85,
  nByMeth = NULL,
  speciesOrder = NULL,
  txtLoc = NULL,
  legCex = 0.72,
  las = 1,
  addSuplT = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| dat | (matrix) from testing (eg   | 
| ... | optional additional data-sets to include as seprate ROC-curves to same plot (must be of same type of format as 'dat') | 
| useColumn | (integer or character, length=2) columns from  | 
| methNames | (character) names of methods (data-sets) to be displayed | 
| col | (character) custom colors for lines and text (choose one color for each different data-set) | 
| pch | (integer) type of symbol to be used (see also  | 
| bg | (character) background color in plot (see also  | 
| tit | (character) custom title | 
| xlim | (numeric, length=2) custom x-axis limits | 
| ylim | (numeric, length=2) custom y-axis limits | 
| point05 | (numeric) specific point to highlight in plot (typically at alpha=0.05) | 
| pointSi | (numeric) size of points (as expansion factor  | 
| nByMeth | (integer) value of n to display | 
| speciesOrder | (integer) custom order of species in legend | 
| txtLoc | (numeric, length=3) location for text (x, y location and proportional factor for line-offset, default is c(0.4,0.3,0.04)) | 
| legCex | (numeric) cex expansion factor for legend (see also  | 
| las | (numeric) factor for text-orientation (see also  | 
| addSuplT | (logical) add text with information about precision,accuracy and FDR | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of message(s) produced | 
Value
This function returns only a plot with ROC curves
See Also
summarizeForROC, moderTest2grp
Examples
roc0 <- cbind(alph=c(2e-6,4e-5,4e-4,2.7e-3,1.6e-2,4.2e-2,8.3e-2,1.7e-1,2.7e-1,4.1e-1,5.3e-1,
	 6.8e-1,8.3e-1,9.7e-1), spec=c(1,1,1,1,0.957,0.915,0.915,0.809,0.702,0.489,0.362,0.234,
  0.128,0.0426), sens=c(0,0,0.145,0.942,2.54,2.68,3.33,3.99,4.71,5.87,6.67,8.04,8.77,
  9.93)/10, n.pos.a=c(0,0,0,0,2,4,4,9,14,24,36,41) )
plotROC(roc0)
Filter based on either number of total peptides and specific peptides or number of razor petides
Description
razorNoFilter filters based on either a) number of total peptides and specific peptides or b) numer of razor petides.
This function was designed for filtering using a mimimum number of (PSM-) count values following the common practice to consider results with 2 or more peptide counts as reliable. 
The function be (re-)run independently on each of various questions (comparisons).
Note: Non-integer data will be truncated to integer (equivalent to  floor).
Usage
razorNoFilter(
  annot,
  speNa = NULL,
  totNa = NULL,
  minRazNa = NULL,
  minSpeNo = 1,
  minTotNo = 2,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| annot | (matrix or data.frame) main data (may contain NAs) with (PSM-) count values for each protein | 
| speNa | (integer or character) indicate which column of 'annot' has number of specific peptides | 
| totNa | (integer or character) indicate which column of 'annot' has number of total peptides | 
| minRazNa | (integer or character) name of column with number of razor peptides, alternative to 'minSpeNo'& 'minTotNo' | 
| minSpeNo | (integer) minimum number of pecific peptides | 
| minTotNo | (integer) minimum total ie max razor number of peptides | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Value
This function returns a vector of logical values if corresponding line passes filter criteria
See Also
Examples
set.seed(2019); datT <- matrix(sample.int(20,60,replace=TRUE), ncol=6,
  dimnames=list(letters[1:10], LETTERS[1:6])) -3
datT[,2] <- datT[,2] +2
datT[which(datT <0)] <- 0
razorNoFilter(datT, speNa="A", totNa="B")
Read (Normalized) Quantitation Data Files Produced By AlphaPept
Description
Protein quantification results from AlphaPept can be read using this function. Input files compressed as .gz can be read as well. The protein abundance values (XIC) get extracted. Since protein annotation is not very extensive with this format of data, the function allows reading the initial fasta files (from the directory above the quantitation-results) allowing to extract more protein-annotation (like species). Sample-annotation (if available) can be extracted from sdrf files, too. The protein abundance values may be normalized using multiple methods (median normalization as default), the determination of normalization factors can be restricted to specific proteins (normalization to bait protein(s), or to invariable matrix of spike-in experiments). The protein annotation data gets parsed to extract specific fields (ID, name, description, species ...). Besides, a graphical display of the distribution of protein abundance values may be generated before and after normalization.
Usage
readAlphaPeptFile(
  fileName = "results_proteins.csv",
  path = NULL,
  fasta = NULL,
  isLog2 = FALSE,
  normalizeMeth = "none",
  quantCol = "_LFQ$",
  contamCol = NULL,
  read0asNA = TRUE,
  refLi = NULL,
  sampleNames = NULL,
  specPref = NULL,
  extrColNames = NULL,
  remRev = TRUE,
  remConta = FALSE,
  separateAnnot = TRUE,
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  titGraph = NULL,
  wex = 1.6,
  plotGraph = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read (default 'results_proteins.csv'). Gz-compressed files can be read, too. | 
| path | (character) path of file to be read | 
| fasta | (logical or character) if  | 
| isLog2 | (logical) typically data read from AlphaPept are expected NOT to be  | 
| normalizeMeth | (character) normalization method, defaults to  | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| contamCol | (character or integer, length=1) which columns should be used for contaminants | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results) | 
| refLi | (character or integer) custom specify which line of data should be used for normalization, ie which line is main species; if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| sampleNames | (character) custom column-names for quantification data; this argument has priority over  | 
| specPref | (character) prefix to identifiers allowing to separate i) recognize contamination database, ii) species of main identifications and iii) spike-in species | 
| extrColNames | (character or  | 
| remRev | (logical) option to remove all protein-identifications based on reverse-peptides | 
| remConta | (logical) option to remove all proteins identified as contaminants | 
| separateAnnot | (logical) if  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second & third elements may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by Compomics; if  | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| titGraph | (character) custom title to plot of distribution of quantitation values | 
| wex | (numeric) relative expansion factor of the violin in plot | 
| plotGraph | (logical) optional plot vioplot of initial and normalized data (using  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
Meta-data describing the samples and experimental setup may be available from a sdrf-file (from the directory above the analysis/quantiication results)
If available, the meta-data will be examined for determining groups of replicates and
the results thereof can be found in $sampleSetup$levels.
Alternatively, a dataframe formatted like sdrf-files (ie for each sample a separate line, see also function readSdrf) may be given, too.
This import-function has been developed using AlphaPept version x.x.
The final output is a list containing these elements: $raw, $quant, $annot, $counts, $sampleSetup, $quantNotes, $notes, or (if separateAnnot=FALSE) data.frame
with annotation- and main quantification-content. If sdrf information has been found, an add-tional list-element setup
will be added containg the entire meta-data as setup$meta and the suggested organization as setup$lev.
Value
This function returns a list with  $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot (columns ), $counts an array with 'PSM' and 'NoOfRazorPeptides',
$quantNotes, $notes and optional setup for meta-data from sdrf; or a data.frame with quantitation and annotation if separateAnnot=FALSE
See Also
read.table, normalizeThis) , readProteomeDiscovererFile; readProlineFile (and other import-functions), matrixNAinspect
Examples
path1 <- system.file("extdata", package="wrProteo")
# Here we'll load a short/trimmed example file
fiNaAP <- "tinyAlpaPeptide.csv.gz"
dataAP <- readAlphaPeptFile(file=fiNaAP, path=path1, tit="tiny AlphaPaptide ")
summary(dataAP$quant)
Read Tabulated Files Exported by DIA-NN At Protein Level
Description
This function allows importing protein identification and quantification results from DIA-NN.
Data should be exported as tabulated text (tsv) as protein-groups (pg) to allow import by thus function. 
Quantification data and other relevant information will be parsed and extracted (similar to the other import-functions from this package).
The final output is a list containing as (main) elements: $annot, $raw and $quant, or a data.frame with the quantication data and a part of the annotation if argument separateAnnot=FALSE.
Usage
readDiaNNFile(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  read0asNA = TRUE,
  quantCol = "\\.raw$",
  annotCol = NULL,
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  plotGraph = TRUE,
  titGraph = "DiaNN",
  wex = 1.6,
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = FALSE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read | 
| path | (character) path of file to be read | 
| normalizeMeth | (character) normalization method, defaults to  | 
| sampleNames | (character) custom column-names for quantification data; this argument has priority over  | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results) | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| annotCol | (character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") ) | 
| refLi | (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| separateAnnot | (logical) if  | 
| FDRCol | - not used (the argument was kept to remain with the same synthax as the other import functions fo this package) | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| plotGraph | (logical or integer) optional plot of type vioplot of initial and normalized data (using  | 
| titGraph | (character) custom title to plot of distribution of quantitation values | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second element may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files; however, if  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
This function has been developed using DIA-NN version 1.8.x. Note, reading gene-group (gg) files is in priciple possible, but resulting files typically lack protein-identifiers which may be less convenient in later steps of analysis. Thus, it is suggested to rather read protein-group (pg) files.
Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment related information.
Value
This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes
and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only
See Also
read.table, normalizeThis) , readMaxQuantFile, readProtDiscovFile, readProlineFile
Examples
diaNNFi1 <- "tinyDiaNN1.tsv.gz"   
## This file contains much less identifications than one may usually obtain
path1 <- system.file("extdata", package="wrProteo")
## let's define the main species and allow tagging some contaminants
specPref1 <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="HUMAN")
dataNN <- readDiaNNFile(path1, file=diaNNFi1, specPref=specPref1, tit="Tiny DIA-NN Data")
summary(dataNN$quant)
Read Tabulated Files Exported by DiaNN At Peptide Level
Description
This function allows importing peptide identification and quantification results from DiaNN.
Data should be exported as tabulated text (tsv) to allow import by thus function.
Quantification data and other relevant information will be extracted similar like the other import-functions from this package.
The final output is a list containing as (main) elements: $annot, $raw and $quant, or a data.frame with the quantication data and a part of the annotation if argument separateAnnot=FALSE.
Usage
readDiaNNPeptides(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  read0asNA = TRUE,
  quantCol = "\\.raw$",
  annotCol = NULL,
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  plotGraph = TRUE,
  titGraph = "DiaNN",
  wex = 1.6,
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = FALSE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read | 
| path | (character) path of file to be read | 
| normalizeMeth | (character) normalization method, defaults to  | 
| sampleNames | (character) custom column-names for quantification data; this argument has priority over  | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results) | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| annotCol | (character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") ) | 
| refLi | (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| separateAnnot | (logical) if  | 
| FDRCol | (list) - not used | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| plotGraph | (logical or integer) optional plot of type vioplot of initial and normalized data (using  | 
| titGraph | (character) custom title to plot of distribution of quantitation values | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second element may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files; however, if  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
This function has been developed using DiaNN version 1.8.x.
Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment related information.
Value
This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes
and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only
See Also
read.table, normalizeThis) , readMaxQuantFile, readProtDiscovFile, readProlineFile
Examples
diaNNFi1 <- "tinyDiaNN1.tsv.gz"
## This file contains much less identifications than one may usually obtain
path1 <- system.file("extdata", package="wrProteo")
## let's define the main species and allow tagging some contaminants
specPref1 <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="HUMAN")
dataNN <- readDiaNNFile(path1, file=diaNNFi1, specPref=specPref1, tit="Tiny DIA-NN Data")
summary(dataNN$quant)
Read File Of Protein Sequences In Fasta Format
Description
Read fasta formatted file (from UniProt) to extract (protein) sequences and name.
If tableOut=TRUE output may be organized as matrix for separating meta-annotation (eg uniqueIdentifier, entryName, proteinName, GN) in separate columns.
Usage
readFasta2(
  filename,
  delim = "|",
  databaseSign = c("sp", "tr", "generic", "gi"),
  removeEntries = NULL,
  tableOut = FALSE,
  UniprSep = c("OS=", "OX=", "GN=", "PE=", "SV="),
  strictSpecPattern = TRUE,
  cleanCols = TRUE,
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)
Arguments
| filename | (character) names fasta-file to be read | 
| delim | (character) delimeter at header-line | 
| databaseSign | (character) characters at beginning right after the '>' (typically specifying the data-base-origin), they will be excluded from the sequance-header | 
| removeEntries | (character) if  | 
| tableOut | (logical) toggle to return named character-vector or matrix with enhaced parsing of fasta-header. 
The resulting matrix will contain the comumns 'database','uniqueIdentifier','entryName','proteinName','sequence' and further columns depending on argument  | 
| UniprSep | (character) separators for further separating entry-fields if  | 
| strictSpecPattern | (logical or character) pattern for recognizing EntryName which is typically preceeding ProteinName (separated by ' '); if  | 
| cleanCols | (logical) remove columns with all entries NA, if  | 
| silent | (logical) suppress messages | 
| callFrom | (character) allows easier tracking of messages produced | 
| debug | (logical) supplemental messages for debugging | 
Value
This function returns (depending on argument tableOut) a simple character vector (of sequences) with (entire) Uniprot annotation as name or 
b) a matrix with columns: 'database','uniqueIdentifier','entryName','proteinName','sequence' and further columns depending on argument UniprSep
See Also
writeFasta2 for writing as fasta; for reading scan or  read.fasta from the package seqinr
Examples
## Tiny example with common contaminants
path1 <- system.file('extdata', package='wrProteo')
fiNa <-  "conta1.fasta.gz"
fasta1 <- readFasta2(file.path(path1, fiNa))
## now let's read and further separate annotation-fields
fasta2 <- readFasta2(file.path(path1, fiNa), tableOut=TRUE)
str(fasta1)
Read Tabulated Files Exported by FragPipe At Protein Level
Description
This function allows importing protein identification and quantification results from Fragpipe
which were previously exported as tabulated text (tsv). Quantification data and other relevant information will be extracted similar like the other import-functions from this package.
The final output is a list containing the elements: $annot, $raw and $quant, or a data.frame with the quantication data and a part of the annotation if argument separateAnnot=FALSE.
Usage
readFragpipeFile(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  read0asNA = TRUE,
  quantCol = "Intensity$",
  annotCol = NULL,
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = list("Protein.Probability", lim = 0.99),
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  plotGraph = TRUE,
  titGraph = "FragPipe",
  wex = 1.6,
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = FALSE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read | 
| path | (character) path of file to be read | 
| normalizeMeth | (character) normalization method, defaults to  | 
| sampleNames | (character) custom column-names for quantification data; this argument has priority over  | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results) | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| annotCol | (character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") ) | 
| refLi | (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| separateAnnot | (logical) if  | 
| FDRCol | (list) optional indication to search for protein FDR information | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| plotGraph | (logical or integer) optional plot of type vioplot of initial and normalized data (using  | 
| titGraph | (character) custom title to plot of distribution of quantitation values | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second & third elements may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files; however, if  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
This function has been developed using Fragpipe versions 18.0 and 19.0.
Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment related information.
Value
This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes
and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only
See Also
read.table, normalizeThis) , readMaxQuantFile, readProtDiscovFile, readProlineFile
Examples
FPproFi1 <- "tinyFragpipe1.tsv.gz"
path1 <- system.file("extdata", package="wrProteo")
## let's define the main species and allow tagging some contaminants
specPref1 <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="MOUSE")
dataFP <- readFragpipeFile(path1, file=FPproFi1, specPref=specPref1, tit="Tiny Fragpipe Data")
summary(dataFP$quant)
Read Tabulated Files Exported by Ionbot At Peptide Level
Description
This function allows importing initial petide identification and quantification results from  Ionbot 
which were exported as tabulated tsv can be imported and relevant information extracted.
The final output is a list containing 3 main elements: $annot, $raw and optional $quant, or returns data.frame with entire content of file if separateAnnot=FALSE.
Usage
readIonbotPeptides(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  gr = NULL,
  sdrf = NULL,
  read0asNA = TRUE,
  quantCol = "^Abundances*",
  annotCol = NULL,
  contamCol = "Contaminant",
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = list(c("^Protein.FDR.Confidence", "High"), c("^Found.in.Sample.", "High")),
  plotGraph = TRUE,
  suplAnnotFile = TRUE,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  titGraph = "Ionbot",
  wex = 1.6,
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read | 
| path | (character) path of file to be read | 
| normalizeMeth | (character) normalization method, defaults to  | 
| sampleNames | (character) new column-names for quantification data (ProteomeDiscoverer does not automatically use file-names from spectra); this argument has priority over  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second & third elements may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| annotCol | (character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") ) | 
| contamCol | (character or integer, length=1) which columns should be used for contaminants marked by ProteomeDiscoverer.
If a column named  | 
| refLi | (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| separateAnnot | (logical) if  | 
| FDRCol | (list) optional indication to search for protein FDR information | 
| plotGraph | (logical or integer) optional plot of type vioplot of initial and normalized data (using  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by ProteomeDiscoverer; however, if  | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| titGraph | (character) depreciated custom title to plot, please use 'tit' | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Details
Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names 
and other experiment realted information.
Value
This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes
and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only
See Also
/link[utils]{read.table}, /link{readMaxQuantFile}, /link{readProteomeDiscovererFile}, /link[wrMisc]{normalizeThis})
Examples
path1 <- system.file("extdata", package="wrProteo")
fiIonbot <- "tinyIonbotFile1.tsv.gz"
datIobPep <- readIonbotPeptides(fiIonbot, path=path1) 
Read tabulated files imported from MassChroQ
Description
Quantification results using MassChroQ should be initially treated using the R-package MassChroqR (both distributed by the PAPPSO at http://pappso.inrae.fr/) for initial normalization on peptide-level and combination of peptide values into protein abundances.
Usage
readMassChroQFile(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  refLi = NULL,
  separateAnnot = TRUE,
  titGraph = "MassChroQ",
  wex = NULL,
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = FALSE,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  plotGraph = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read (may be tsv, csv, rda or rdata); both US and European csv formats are supported | 
| path | (character) path of file to be read | 
| normalizeMeth | (character) normalization method (will be sent to   | 
| sampleNames | (character) custom column-names for quantification data; this argument has priority over  | 
| refLi | (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| separateAnnot | (logical) if  | 
| titGraph | (character) custom title to plot of distribution of quantitation values | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second & third elements may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by ProteomeDiscoverer; however, if  | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| plotGraph | (logical) optional plot of type vioplot of initial and normalized data (using  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
The final output of this fucntion is a list containing 3 elements: $annot, $raw, $quant and  $notes, or returns data.frame with entire content of file if separateAnnot=FALSE. Other list-elements remain empty to keep format compatible to other import functions.
This function has been developed using MassChroQ version 2.2 and R-package MassChroqR version 0.4.0. Both are distributed by the PAPPSO (http://pappso.inrae.fr/). When saving quantifications generated in R as RData (with extension .rdata or .rda) using the R-packages associated with MassChroq, the ABUNDANCE_TABLE produced by mcq.get.compar(XICAB) should be used.
After import data get (re-)normalized according to normalizeMeth and refLi, and boxplots or vioplots drawn.
Value
This function returns list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only
See Also
read.table, normalizeThis) , readProlineFile
Examples
path1 <- system.file("extdata", package="wrProteo")
fiNa <- "tinyMC.RData"
dataMC <- readMassChroQFile(file=fiNa, path=path1)
Read Quantitation Data-Files (proteinGroups.txt) Produced From MaxQuant At Protein Level
Description
Protein quantification results from MaxQuant can be read using this function and relevant information extracted. Input files compressed as .gz can be read as well. The protein abundance values (XIC), peptide counting information like number of unique razor-peptides or PSM values and sample-annotation (if available) can be extracted, too. The protein abundance values may be normalized using multiple methods (median normalization as default), the determination of normalization factors can be restricted to specific proteins (normalization to bait protein(s), or to invariable matrix of spike-in experiments). The protein annotation data gets parsed to extract specific fields (ID, name, description, species ...). Besides, a graphical display of the distribution of protein abundance values may be generated before and after normalization.
Usage
readMaxQuantFile(
  path,
  fileName = "proteinGroups.txt",
  normalizeMeth = "median",
  quantCol = "LFQ.intensity",
  contamCol = "Potential.contaminant",
  pepCountCol = c("Razor + unique peptides", "Unique peptides", "MS.MS.count"),
  read0asNA = TRUE,
  refLi = NULL,
  sampleNames = NULL,
  extrColNames = c("Majority.protein.IDs", "Fasta.headers", "Number.of.proteins"),
  specPref = c(conta = "conta|CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  remRev = TRUE,
  remConta = FALSE,
  separateAnnot = TRUE,
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  titGraph = NULL,
  wex = 1.6,
  plotGraph = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| path | (character) path of file to be read | 
| fileName | (character) name of file to be read (default 'proteinGroups.txt' as typically generated by MaxQuant in txt folder). Gz-compressed files can be read, too. | 
| normalizeMeth | (character) normalization method, defaults to  | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| contamCol | (character or integer, length=1) which columns should be used for contaminants | 
| pepCountCol | (character) pattern to search among column-names for count data (1st entry for 'Razor + unique peptides', 2nd fro 'Unique peptides', 3rd for 'MS.MS.count' (PSM)) | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results) | 
| refLi | (character or integer) custom specify which line of data should be used for normalization, ie which line is main species; if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| sampleNames | (character) custom column-names for quantification data; this argument has priority over  | 
| extrColNames | (character) column names to be read (1st position: prefix for LFQ quantitation, default 'LFQ.intensity'; 2nd: column name for protein-IDs, default 'Majority.protein.IDs'; 3rd: column names of fasta-headers, default 'Fasta.headers', 4th: column name for number of protein IDs matching, default 'Number.of.proteins') | 
| specPref | (character) prefix to identifiers allowing to separate i) recognize contamination database, ii) species of main identifications and iii) spike-in species | 
| remRev | (logical) option to remove all protein-identifications based on reverse-peptides | 
| remConta | (logical) option to remove all proteins identified as contaminants | 
| separateAnnot | (logical) if  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second & third elements may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by MaxQuant; if  | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  May contain  | 
| titGraph | (character) custom title to plot of distribution of quantitation values | 
| wex | (numeric) relative expansion factor of the violin in plot | 
| plotGraph | (logical) optional plot vioplot of initial and normalized data (using  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
MaxQuant is proteomics quantification software provided by the MaxPlanck institute.
By default MaxQuant writes the results of each run to the path combined/txt, from there (only) the files
'proteinGroups.txt' (main quantitation at protein level), 'summary.txt' and 'parameters.txt' will be used.
Meta-data describing the samples and experimental setup may be available from two sources :
a) The file summary.txt which gets produced by MaxQuant in the same folder as the main quantification data.
b) Furthermore, meta-data deposited as sdrf at Pride can be retreived (via the respective github page) when giving the accession number in argument sdrf.
Then, the meta-data will be examined for determining groups of replicates and
the results thereof can be found in $sampleSetup$levels.
Alternatively, a dataframe formatted like sdrf-files (ie for each sample a separate line, see also function readSdrf) may be given.
In tricky cases it is also possible to precise the column-name to use for defining the groups of replicates or the method for automatically choosing
the most suited column via the 2nd value of the argument sdrf.
Please note, that sdrf is still experimental and only a small fraction of proteomics-data on Pride have been annotated accordingly.
If a valid sdrf is furnished, it's information has priority over the information extracted from the MaxQuant produced file summary.txt.
This import-function has been developed using MaxQuant versions 1.6.10.x to 2.0.x, the format of the resulting file 'proteinGroups.txt' is typically well conserved between versions.
The final output is a list containing these elements: $raw, $quant, $annot, $counts, $sampleSetup, $quantNotes, $notes, or (if separateAnnot=FALSE) data.frame
with annotation- and main quantification-content. If sdrf information has been found, an add-tional list-element setup
will be added containg the entire meta-data as setup$meta and the suggested organization as setup$lev.
Value
This function returns a list with  $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot (columns ), $counts an array with 'PSM' and 'NoOfRazorPeptides',
$quantNotes, $notes and optional setup for meta-data from sdrf; or a data.frame with quantitation and annotation if separateAnnot=FALSE
See Also
read.table, normalizeThis) , readProteomeDiscovererFile; readProlineFile (and other imprtfunctions), matrixNAinspect
Examples
path1 <- system.file("extdata", package="wrProteo")
# Here we'll load a short/trimmed example file (thus not the MaxQuant default name)
fiNa <- "proteinGroupsMaxQuant1.txt.gz"
specPr <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="YEAST", spike="HUMAN_UPS")
dataMQ <- readMaxQuantFile(path1, file=fiNa, specPref=specPr, tit="tiny MaxQuant")
summary(dataMQ$quant)
matrixNAinspect(dataMQ$quant, gr=gl(3,3))
Read Peptide Identification and Quantitation Data-Files (peptides.txt) Produced By MaxQuant
Description
Peptide level identification and quantification data produced by MaxQuant can be read using this function and relevant information extracted. Input files compressed as .gz can be read as well. The peptide abundance values (XIC), peptide counting information and sample-annotation (if available) can be extracted, too.
Usage
readMaxQuantPeptides(
  path,
  fileName = "peptides.txt",
  normalizeMeth = "median",
  quantCol = "Intensity",
  contamCol = "Potential.contaminant",
  pepCountCol = "Experiment",
  refLi = NULL,
  sampleNames = NULL,
  extrColNames = c("Sequence", "Proteins", "Leading.razor.protein", "Start.position",
    "End.position", "Mass", "Missed.cleavages", "Unique..Groups.", "Unique..Proteins.",
    "Charges"),
  specPref = c(conta = "conta|CON_|LYSC_CHICK", mainSpecies = "HUMAN"),
  remRev = TRUE,
  remConta = FALSE,
  separateAnnot = TRUE,
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  titGraph = NULL,
  wex = 1.6,
  plotGraph = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| path | (character) path of file to be read | 
| fileName | (character) name of file to be read (default 'peptides.txt' as typically generated by MaxQuant in txt folder). Gz-compressed files can be read, too. | 
| normalizeMeth | (character) normalization method (for details see  | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| contamCol | (character or integer, length=1) which columns should be used for contaminants | 
| pepCountCol | (character) pattern to search among column-names for count data (defaults to 'Experiment') | 
| refLi | (character or integer) custom specify which line of data should be used for normalization, ie which line is main species; if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| sampleNames | (character) custom column-names for quantification data; this argument has priority over  | 
| extrColNames | (character) column names to be read (1st position: prefix for quantitation, default 'intensity'; 2nd: column name for peptide-IDs, default ) | 
| specPref | (character) prefix to identifiers allowing to separate i) recognize contamination database, ii) species of main identifications and iii) spike-in species | 
| remRev | (logical) option to remove all peptide-identifications based on reverse-peptides | 
| remConta | (logical) option to remove all peptides identified as contaminants | 
| separateAnnot | (logical) if  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of
replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second & third elements may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by MaxQuant; if  | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| titGraph | (character) custom title to plot | 
| wex | (numeric) relative expansion factor of the violin in plot | 
| plotGraph | (logical) optional plot vioplot of initial and normalized data (using  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Details
The peptide annotation data gets parsed to extract specific fields (ID, name, description, species ...). Besides, a graphical display of the distribution of peptide abundance values may be generated before and after normalization.
MaxQuant is proteomics quantification software provided by the MaxPlanck institute.
By default MaxQuant write the results of each run to the path combined/txt, from there (only) the files
'peptides.txt' (main quantitation at peptide level), 'summary.txt' and 'parameters.txt' will be used for this function.
Meta-data describing the samples and experimental setup may be available from two sources :
a) The file summary.txt which gets produced by MaxQuant in the same folder as the main quantification data.
b) Furthermore, meta-data deposited as sdrf at Pride can be retreived (via the respective github page) when giving
the accession number in argument sdrf.
Then, the meta-data will be examined for determining groups of replicates and
the results thereof can be found in $sampleSetup$levels.
Alternatively, a dataframe formatted like sdrf-files (ie for each sample a separate line, see also function readSdrf) may be given.
In tricky cases it is also possible to precise the column-name to use for defining the groups of replicates or the method for automatically choosing
the most suited column via the 2nd value of the argument sdrf, see also the function defineSamples (which gets used internally).
Please note, that sdrf is still experimental and only a small fraction of proteomics-data on Pride have been annotated accordingly.
If a valid sdrf is furnished, it's information has priority over the information extracted from the MaxQuant produced file summary.txt.
This function has been developed using MaxQuant versions 1.6.10.x to 2.0.x, the format of the resulting file 'peptides.txt'
is typically well conserved between versions.
The final output is a list containing these elements: $raw, $quant, $annot, $counts, $sampleSetup,
$quantNotes, $notes, or (if separateAnnot=FALSE) data.frame
with annotation- and main quantification-content. If sdrf information has been found, an add-tional list-element setup
will be added containg the entire meta-data as setup$meta and the suggested organization as setup$lev.
Value
This function returns a list with  $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot (columns ), $counts an array with 'PSM' and 'NoOfRazorPeptides',
$quantNotes, $notes and optional setup for meta-data from sdrf; or a data.frame with quantitation and annotation if separateAnnot=FALSE
See Also
read.table, normalizeThis), for reading protein level readMaxQuantFile, readProlineFile
Examples
# Here we'll load a short/trimmed example file (thus not the MaxQuant default name)
MQpepFi1 <- "peptides_tinyMQ.txt.gz"
path1 <- system.file("extdata", package="wrProteo")
specPref1 <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="YEAST", spec2="HUMAN")
dataMQpep <- readMaxQuantPeptides(path1, file=MQpepFi1, specPref=specPref1,
  tit="Tiny MaxQuant Peptides")
summary(dataMQpep$quant)
Read csv files exported by OpenMS
Description
Protein quantification results form OpenMS 
which were exported as .csv can be imported and relevant information extracted. 
Peptide data get summarized by protein by top3 or sum methods.
The final output is a list containing the elements: $annot, $raw, $quant ie normaized final quantifications, or returns data.frame with entire content of file if separateAnnot=FALSE.
Usage
readOpenMSFile(
  fileName = NULL,
  path = NULL,
  normalizeMeth = "median",
  refLi = NULL,
  sampleNames = NULL,
  quantCol = "Intensity",
  sumMeth = "top3",
  minPepNo = 1,
  protNaCol = "ProteinName",
  separateAnnot = TRUE,
  plotGraph = TRUE,
  tit = "OpenMS",
  wex = 1.6,
  specPref = c(conta = "LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read | 
| path | (character) path of file to be read | 
| normalizeMeth | (character) normalization method (will be sent to   | 
| refLi | (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| sampleNames | (character) new column-names for quantification data (by default the names from files with spectra will be used) | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| sumMeth | (character) method for summarizing peptide data (so far 'top3' and 'sum' available) | 
| minPepNo | (integer) minumun number of peptides to be used for retruning quantification | 
| protNaCol | (character) column name to be read/extracted for the annotation section (default "ProteinName") | 
| separateAnnot | (logical) if  | 
| plotGraph | (logical) optional plot of type vioplot of initial and normalized data (using  | 
| tit | (character) custom title to plot | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe', 
and optional following ones for supplemental tags/species - maked as 'species2','species3',...); 
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of message(s) produced | 
Details
This function has been developed based on the OpenMS peptide-identification and label-free-quantification module. Csv input files may also be compresses as .gz.
Note: With this version the information about protein-modifications (PTMs) may not yet get exploited fully.
Value
This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes,$expSetup and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only
See Also
read.table, normalizeThis) , readMaxQuantFile, readProlineFile, readProtDiscovFile
Examples
path1 <- system.file("extdata", package="wrProteo")
fiNa <- "OpenMS_tiny.csv.gz"
dataOM <- readOpenMSFile(file=fiNa, path=path1, tit="tiny OpenMS example")
summary(dataOM$quant)
Read xlsx, csv or tsv files exported from Proline and MS-Angel
Description
Quantification results from Proline Proline and MS-Angel exported as xlsx format can be read directly using this function.
Besides, files in tsv, csv (European and US format) or tabulated txt can be read, too.
Then relevant information gets extracted, the data can optionally normalized and displayed as boxplot or vioplot.
The final output is a list containing 6 elements: $raw, $quant,  $annot, $counts, $quantNotes and $notes.
Alternatively, a data.frame with annotation and quantitation data may be returned if separateAnnot=FALSE.
Note: There is no normalization by default since quite frequently data produced by Proline are already sufficiently normalized.
The figure produced using the argument plotGraph=TRUE may help judging if the data appear sufficiently normalized (distribtions should align).
Usage
readProlineFile(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  logConvert = TRUE,
  sampleNames = NULL,
  quantCol = "^abundance_",
  annotCol = c("accession", "description", "is_validated", "protein_set_score",
    "X.peptides", "X.specific_peptides"),
  remStrainNo = TRUE,
  pepCountCol = c("^psm_count_", "^peptides_count_"),
  trimColnames = FALSE,
  refLi = NULL,
  separateAnnot = TRUE,
  plotGraph = TRUE,
  titGraph = NULL,
  wex = 2,
  specPref = c(conta = "_conta\\|", mainSpecies = "OS=Homo sapiens"),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = TRUE,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)
Arguments
| fileName | (character) name of file to read; .xlsx-, .csv-, .txt- and .tsv can be read (csv, txt and tsv may be gz-compressed). Reading xlsx requires package 'readxl'. | 
| path | (character) optional path (note: Windows backslash sould be protected or written as '/') | 
| normalizeMeth | (character) normalization method (for details and options see  | 
| logConvert | (logical) convert numeric data as log2, will be placed in $quant | 
| sampleNames | (character) custom column-names for quantification data; this argument has priority over  | 
| quantCol | (character or integer) colums with main quantitation-data : precise colnames to extract, or if length=1 content of  | 
| annotCol | (character) precise colnames or if length=1 pattern to search among column-names for $annot | 
| remStrainNo | (logical) if  | 
| pepCountCol | (character) pattern to search among column-names for count data of PSM and NoOfPeptides | 
| trimColnames | (logical) optional trimming of column-names of any redundant characters from beginning and end | 
| refLi | (integer) custom decide which line of data is main species, if single character entry it will be used to choose a group of species (eg 'mainSpe') | 
| separateAnnot | (logical) separate annotation form numeric data (quantCol and annotCol must be defined) | 
| plotGraph | (logical or matrix of integer) optional plot vioplot of initial data; if integer, it will be passed to  | 
| titGraph | (character) custom title to plot of distribution of quantitation values | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second & third elements may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by quantification software; however, if  | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| silent | (logical) suppress messages | 
| callFrom | (character) allow easier tracking of messages produced | 
| debug | (logical) display additional messages for debugging | 
Details
This function has been developed using Proline version 1.6.1 coupled with MS-Angel 1.6.1. The classical way of using ths function consists in exporting results produced by Proline and MS-Angel as xlsx file. Besides, other formats may be read, too. This includes csv (eg the main sheet/table of ths xlsx exported file saved as csv). WOMBAT represents an effort to automatize quantitative proteomics experiments, using this route data get exported as txt files which can be read, too.
Value
This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot (columns ), $counts an array with 'PSM' and 'NoOfPeptides', $quantNotes and $notes; or a data.frame with quantitation and annotation if separateAnnot=FALSE
See Also
Examples
path1 <- system.file("extdata", package="wrProteo")
fiNa <- "exampleProlineABC.csv.gz"
dataABC <- readProlineFile(path=path1, file=fiNa)
summary(dataABC$quant)
Read Tabulated Files Exported By ProteomeDiscoverer At Protein Level, Deprecated
Description
Depreciated old version of Protein identification and quantification results from
Thermo ProteomeDiscoverer
which were exported as tabulated text can be imported and relevant information extracted.
The final output is a list containing 3 elements: $annot, $raw and optional $quant,
or returns data.frame with entire content of file if separateAnnot=FALSE.
Please use readProteomeDiscovererFile() from the same package instead !
Usage
readProtDiscovFile(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  read0asNA = TRUE,
  quantCol = "^Abundances*",
  annotCol = NULL,
  contamCol = "Contaminant",
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = list(c("^Protein.FDR.Confidence", "High"), c("^Found.in.Sample.", "High")),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = TRUE,
  groupPref = list(lowNumberOfGroups = TRUE),
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  plotGraph = TRUE,
  wex = 1.6,
  titGraph = "Proteome Discoverer",
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read | 
| path | (character) path of file to be read | 
| normalizeMeth | (character) normalization method, defaults to  | 
| sampleNames | (character) custom column-names for quantification data (ProteomeDiscoverer does not automatically use file-names from spectra); this argument has priority over  | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| annotCol | (character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") ) | 
| contamCol | (character or integer, length=1) which columns should be used for contaminants marked by ProteomeDiscoverer.
If a column named  | 
| refLi | (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| separateAnnot | (logical) if  | 
| FDRCol | (list) optional indication to search for protein FDR information | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second element may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by ProteomeDiscoverer; however, if  | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| plotGraph | (logical or integer) optional plot of type vioplot of initial and normalized data (using  | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| titGraph | (character) custom title to plot of distribution of quantitation values | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
This function has been replaced by readProteomeDiscovererFile (from the same package) !
The syntax  and strcuture of output has remained the same, you can simply replace the name of the function called.
This function has been developed using Thermo ProteomeDiscoverer versions 2.2 to 2.5.
The format of resulting files at export also depends which columns are chosen as visible inside ProteomeDiscoverer and subsequently get chosen for export.
Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment realted information.
If a column named contamCol is found, the data will be lateron filtered to remove all contaminants, set to NULL for keeping all contaminants
This function replaces the depreciated function readPDExport.
Value
This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes
and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only
See Also
read.table, normalizeThis) , readMaxQuantFile, readProlineFile, readFragpipeFile
Examples
path1 <- system.file("extdata", package="wrProteo")
fiNa <- "tinyPD_allProteins.txt.gz"
## Please use the function readProteinDiscovererFile(), as shown below (same syntax)
dataPD <- readProteomeDiscovererFile(file=fiNa, path=path1, suplAnnotFile=FALSE)
summary(dataPD$quant)
Read Tabulated Files Exported by ProteomeDiscoverer At Peptide Level, Deprecated
Description
Depreciated old version of Peptide identification and quantification results from Thermo ProteomeDiscoverer
which were exported as tabulated text can be imported and relevant information extracted.
The final output is a list containing 3 elements: $annot, $raw and optional $quant, or returns data.frame with entire content of file if separateAnnot=FALSE.
Usage
readProtDiscovPeptides(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  suplAnnotFile = TRUE,
  gr = NULL,
  sdrf = NULL,
  read0asNA = TRUE,
  quantCol = "^Abundances*",
  annotCol = NULL,
  contamCol = "Contaminant",
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = list(c("^Protein.FDR.Confidence", "High"), c("^Found.in.Sample.", "High")),
  plotGraph = TRUE,
  titGraph = "Proteome Discoverer",
  wex = 1.6,
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read | 
| path | (character) path of file to be read | 
| normalizeMeth | (character) normalization method, defaults to  | 
| sampleNames | (character) new column-names for quantification data (ProteomeDiscoverer does not automatically use file-names from spectra); this argument has priority over  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by ProteomeDiscoverer; however, if  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second element may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| annotCol | (character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") ) | 
| contamCol | (character or integer, length=1) which columns should be used for contaminants marked by ProteomeDiscoverer.
If a column named  | 
| refLi | (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| separateAnnot | (logical) if  | 
| FDRCol | (list) optional indication to search for protein FDR information | 
| plotGraph | (logical or integer) optional plot of type vioplot of initial and normalized data (using  | 
| titGraph | (character) depreciated custom title to plot, please use 'tit' | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
This function has been developed using Thermo ProteomeDiscoverer versions 2.2 to 2.5.
The format of resulting files at export also depends which columns are chosen as visible inside ProteomeDiscoverer and subsequently get chosen for export.
Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment realted information.
Precedent and following aminoacids (relative to identified protease recognition sites) will be removed form peptide sequences and be displayed in $annot as columns 'prec' and 'foll'.
If a column named contamCol is found, the data will be lateron filtered to remove all contaminants, set to NULL for keeping all contaminants
This function replaces the depreciated function readPDExport.
Besides, ProteomeDiscoverer version number and full raw-file path will be extracted for $notes in final output.
Value
This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes
and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only
See Also
read.table, normalizeThis) , readMaxQuantFile, readProteomeDiscovererFile
Examples
path1 <- system.file("extdata", package="wrProteo")
readProtDiscovererPeptides, depreciated
Description
This function has been depreciated and replaced by readProteomeDiscovererPeptides (from this package).
Usage
readProtDiscovererPeptides(...)
Arguments
| ... | Actually, this function doesn't ready any input any more | 
Value
This function returns NULL
See Also
readProteomeDiscovererFile, readProteomeDiscovererPeptides
Read Tabulated Files Exported By ProteomeDiscoverer At Protein Level
Description
Protein identification and quantification results from Thermo ProteomeDiscoverer which were exported as tabulated text can be imported and relevant information extracted.
Usage
readProteomeDiscovererFile(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  read0asNA = TRUE,
  quantCol = "^Abundance",
  annotCol = NULL,
  contamCol = "Contaminant",
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = list(c("^Protein.FDR.Confidence", "High"), c("^Found.in.Sample.", "High")),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = TRUE,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  plotGraph = TRUE,
  wex = 1.6,
  titGraph = "Proteome Discoverer",
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read | 
| path | (character) path of file to be read | 
| normalizeMeth | (character) normalization method, defaults to  | 
| sampleNames | (character) custom column-names for quantification data (ProteomeDiscoverer does not automatically use file-names from spectra); this argument has priority over  | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA | 
| quantCol | (character or integer) define ywhich columns should be extracted as quantitation data : The argument may be the exact column-names to be used, or if length=1 
content of  | 
| annotCol | (character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") ) | 
| contamCol | (character or integer, length=1) which columns should be used for contaminants marked by ProteomeDiscoverer.
If a column named  | 
| refLi | (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| separateAnnot | (logical) if  | 
| FDRCol | (list) optional indication to search for protein FDR information | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second & third elements may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by ProteomeDiscoverer; however, if  | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| plotGraph | (logical or integer) optional plot of type vioplot of initial and normalized data (using  | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| titGraph | (character) custom title to plot of distribution of quantitation values | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
This function has been developed using Thermo ProteomeDiscoverer versions 2.2 to 2.5.
The format of resulting files at export also depends which columns are chosen as visible inside ProteomeDiscoverer and subsequently get chosen for export.
Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment realted information.
If a column named contamCol is found, the data will be lateron filtered to remove all contaminants, set to NULL for keeping all contaminants.
The final output is a list containing as (main) elements: $annot, $raw and optional $quant,
or returns data.frame with entire content of file if separateAnnot=FALSE.
This function replaces the depreciated function readProtDiscovFile which will soon be retracted from this package.
Value
This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes
and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only
See Also
read.table, normalizeThis) , readMaxQuantFile, readProlineFile, readFragpipeFile
Examples
path1 <- system.file("extdata", package="wrProteo")
fiNa <- "tinyPD_allProteins.txt.gz"
dataPD <- readProteomeDiscovererFile(file=fiNa, path=path1, suplAnnotFile=FALSE)
summary(dataPD$quant)
Read Tabulated Files Exported By ProteomeDiscoverer At Peptide Level
Description
Initials petide identificationa and quantification results form Thermo ProteomeDiscoverer
which were exported as tabulated text can be imported and relevant information extracted.
The final output is a list containing 3 elements: $annot, $raw and optional $quant, or returns data.frame with entire content of file if separateAnnot=FALSE.
Usage
readProteomeDiscovererPeptides(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  suplAnnotFile = TRUE,
  gr = NULL,
  sdrf = NULL,
  read0asNA = TRUE,
  quantCol = "^Abundances*",
  annotCol = NULL,
  contamCol = "Contaminant",
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = list(c("^Protein.FDR.Confidence", "High"), c("^Found.in.Sample.", "High")),
  plotGraph = TRUE,
  titGraph = "Proteome Discoverer",
  wex = 1.6,
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read | 
| path | (character) path of file to be read | 
| normalizeMeth | (character) normalization method, defaults to  | 
| sampleNames | (character) new column-names for quantification data (ProteomeDiscoverer does not automatically use file-names from spectra); this argument has priority over  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by ProteomeDiscoverer; however, if  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second element may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from  | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| annotCol | (character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") ) | 
| contamCol | (character or integer, length=1) which columns should be used for contaminants marked by ProteomeDiscoverer.
If a column named  | 
| refLi | (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| separateAnnot | (logical) if  | 
| FDRCol | (list) optional indication to search for protein FDR information | 
| plotGraph | (logical or integer) optional plot of type vioplot of initial and normalized data (using  | 
| titGraph | (character) depreciated custom title to plot, please use 'tit' | 
| wex | (integer) relative expansion factor of the violin-plot (will be passed to  | 
| specPref | (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument  | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
This function has been developed using Thermo ProteomeDiscoverer versions 2.2 to 2.5.
The format of resulting files at export also depends which columns are chosen as visible inside ProteomeDiscoverer and subsequently get chosen for export.
Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment realted information.
Precedent and following aminoacids (relative to identified protease recognition sites) will be removed form peptide sequences and be displayed in $annot as columns 'prec' and 'foll'.
If a column named contamCol is found, the data will be lateron filtered to remove all contaminants, set to NULL for keeping all contaminants
This function replaces the depreciated function readPDExport.
Besides, ProteomeDiscoverer version number and full raw-file path will be extracted for $notes in final output.
Value
This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes
and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only
See Also
read.table, normalizeThis) , readMaxQuantFile, readProteomeDiscovererFile
Examples
path1 <- system.file("extdata", package="wrProteo")
Read Sample Meta-data from Quantification-Software And/Or Sdrf And Align To Experimental Data
Description
Sample/experimental annotation meta-data form MaxQuant, ProteomeDiscoverer, FragPipe, Proline or similar, can be read using this function and relevant information extracted. Furthermore, annotation in sdrf-format can be added (the order of sdrf will be adjated automatically, if possible). This functions returns a list with grouping of samples into replicates and additional information gathered. Input files compressed as .gz can be read as well.
Usage
readSampleMetaData(
  quantMeth,
  sdrf = NULL,
  suplAnnotFile = NULL,
  path = ".",
  abund = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, sampleNames = NULL, gr = NULL),
  chUnit = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| quantMeth | (character, length=1) quantification method used; 2-letter abbreviations like 'MQ','PD','PL','FP' etc may be used | 
| sdrf | (character, list or data.frame) optional extraction and adding of experimenal meta-data:
This may be a matrix or data.frame with information respective to the experimental setup (to understand which lines=samples should be evaluated as replicates).
If _ | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by MaxQuant; if  | 
| path | (character) optional path of file(s) to be read | 
| abund | (matrix or data.frame) experimental quantitation data; only column-names will be used for aligning order of annotated samples | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates);
May contain  | 
| chUnit | (logical or character) optional adjustig of group-labels from sample meta-data in case multipl different unit-prefixes are used to single common prefix 
(eg adjust '100pMol' and '1nMol' to '100pMol' and '1000pMol') for better downstream analysis. This option will call  | 
| silent | (logical) suppress messages if  | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Details
When initally reading/importing quantitation data, typically very little is known about the setup of different samples in the underlying experiment. The overall aim is to read and mine the corresponding sample-annotation documeneted by the quantitation-software and/or from n sdrf repository and to attach it to the experimental data. This way, in subsequent steps of analysis (eg PCA, statictical tests) the user does not have to bother stuying the experimental setup to figure out which samples should be considered as relicate of whom.
Sample annotation meta-data can be obtained from two sources : a) form additional files produced (and exported) by the initial quantitation software (so far MaxQuant and ProteomeDiscoverer have een implemeneted) or b) from the universal sdrf-format (from Pride or user-supplied). Both types can be imported and checked in the same run, if valid sdrf-information is found this will be given priority. For more information about the sdrf format please see sdrf on github.
Value
This function returns a list with $groups and $level (grouping of samples given as integer), and $meth (method by which grouping as determined).
If valid sdrf was given, the resultant list contains in addition $sdrfDat (data.frame of annotation).
Alternatively it may contain a $sdrfExport if sufficient information has been gathered (so far only for MaxQuant) for a draft sdrf for export (that should be revised and completed by the user).
If software annotation has been found it will be shown in $annotBySoft.
If all entries are invalid or entries do not pass the tests, this functions returns an empty list.
See Also
This function is used internally by readMaxQuantFile,/link{readProteomeDiscovererFile} etc; uses readSdrf for reading sdrf-files, replicateStructure for mining annotation columns
Examples
sdrf001819Setup <- readSampleMetaData(quantMeth=NA, sdrf="PXD001819")
str(sdrf001819Setup)
Read proteomics meta-data as sdrf file
Description
This function allows reading proteomics meta-data from sdrf file, as they are provided on https://github.com/bigbio/proteomics-sample-metadata. A data.frame containing all annotation data will be returned. To stay conform with the (non-obligatory) recommendations, columnnames are shown as lower caps.
Usage
readSdrf(
  fi,
  chCol = "auto",
  urlPrefix = "github",
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)
Arguments
| fi | (character) main input; may be full path or url to the file with meta-annotation. If a short project-name is given,
it will be searched based at the location of  | 
| chCol | (character, length=1) optional checking of column-names | 
| urlPrefix | (character, length=1) prefix to add to search when no complete path or url is given on  | 
| silent | (logical) suppress messages | 
| callFrom | (character) allows easier tracking of messages produced | 
| debug | (logical) display additional messages for debugging | 
Details
The packages utils and wrMisc must be installed.
Please note that reading sdrf files (if not provided as local copy) will take a few seconds, depending on the responsiveness of github.
This function only handles the main reading of sdrf data and some diagnostic checks.
For mining sdrf data please look at replicateStructure and readSampleMetaData.
Value
This function returns the content of sdrf-file as data.frame (or NULL if the corresponding file was not found)
See Also
readSampleMetaData,  replicateStructure,
Examples
## This may take a few sconds...
sdrf001819 <- readSdrf("PXD001819")
str(sdrf001819)
Read annotation files from UCSC
Description
This function allows reading and importing genomic UCSC-annotation data.
Files can be read as default UCSC exprot or as GTF-format. 
In the context of proteomics we noticed that sometimes UniProt tables from UCSC are hard to match to identifiers from UniProt Fasta-files, ie many protein-identifiers won't match.
For this reason additional support is given to reading 'Genes and Gene Predictions': Since this table does not include protein-identifiers, a non-redundant list of ENSxxx transcript identifiers 
can be exprted as file for an additional stop of conversion, eg using a batch conversion tool at the site of UniProt. 
The initial genomic annotation can then be complemented using readUniProtExport. 
Using this more elaborate route, we found higher coverage when trying to add genomic annotation to protein-identifiers to proteomics results with annnotation based on an initial Fasta-file.
Usage
readUCSCtable(
  fiName,
  exportFileNa = NULL,
  gtf = NA,
  simplifyCols = c("gene_id", "chr", "start", "end", "strand", "frame"),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fiName | (character) name (and path) of file to read | 
| exportFileNa | (character) optional file-name to be exported, if  | 
| gtf | (logical) specify if file  | 
| simplifyCols | (character) optional list of column-names to be used for simplification (if 6 column-headers are given) : the 1st value will be used to identify the column used as refence to summarize all lines with this ID; for the 2nd (typically chromosome names) will be taken a representative value, for the 3rd (typically gene start site) will be taken the minimum, for the 4th (typically gene end site) will be taken the maximum, for the 5th and 6th a representative values will be reported; | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of message(s) produced | 
Value
This function returns a matrix, optionally the file 'exportFileNa' may be written
See Also
Examples
path1 <- system.file("extdata", package="wrProteo")
gtfFi <- file.path(path1, "UCSC_hg38_chr11extr.gtf.gz")
# here we'll write the file for UniProt conversion to tempdir() to keep things tidy
expFi <- file.path(tempdir(), "deUcscForUniProt2.txt")
UcscAnnot1 <- readUCSCtable(gtfFi, exportFileNa=expFi)
## results can be further combined with readUniProtExport() 
deUniProtFi <- file.path(path1, "deUniProt_hg38chr11extr.tab")
deUniPr1 <- readUniProtExport(deUniProtFi, deUcsc=UcscAnnot1,
  targRegion="chr11:1-135,086,622")  
deUniPr1[1:5,-5] 
Read protein annotation as exported from UniProt batch-conversion
Description
This function allows reading and importing protein-ID conversion results from UniProt.
To do so, first copy/paste your query IDs into UniProt 'Retrieve/ID mapping' field called '1. Provide your identifiers' (or upload as file), verify '2. Select options'.
In a typical case of 'enst000xxx' IDs  you may leave default settings, ie 'Ensemble Transcript' as input and 'UniProt KB' as output. Then, 'Submit' your search and retreive results via 
'Download', you need to specify a 'Tab-separated' format ! If you download as 'Compressed' you need to decompress the .gz file before running the function readUCSCtable 
In addition, a file with UCSC annotation (Ensrnot accessions and chromosomic locations, obtained using readUCSCtable) can be integrated.
Usage
readUniProtExport(
  UniProtFileNa,
  deUcsc = NULL,
  targRegion = NULL,
  useUniPrCol = NULL,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| UniProtFileNa | (character) name (and path) of file exported from Uniprot (tabulated text file inlcuding headers) | 
| deUcsc | (data.frame) object produced by  | 
| targRegion | (character or list) optional marking of chromosomal locations to be part of a given chromosomal target region, 
may be given as character like  | 
| useUniPrCol | (character) optional declaration which colums from UniProt exported file should be used/imported (default 'EnsID','Entry','Entry.name','Status','Protein.names','Gene.names','Length'). | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of message(s) produced | 
Details
In a typicall use case, first chromosomic location annotation is extracted from UCSC for the species of interest and imported to R using  readUCSCtable . 
However, the tables provided by UCSC don't contain Uniprot IDs. Thus, an additional (batch-)conversion step needs to get added. 
For this reason readUCSCtable allows writing a file with Ensemble transcript IDs which can be converted tu UniProt IDs at the site of  UniProt. 
Then, UniProt annotation (downloaded as tab-separated) can be imported and combined with the genomic annotation using this function.
Value
This function returns a data.frame (with columns $EnsID, $Entry, $Entry.name, $Status, $Protein.names, $Gene.names, $Length; if deUcsc is integrated plus: $chr, $type, $start, $end, $score, $strand, $Ensrnot, $avPos)
See Also
Examples
path1 <- system.file("extdata",package="wrProteo")
deUniProtFi <- file.path(path1,"deUniProt_hg38chr11extr.tab")
deUniPr1a <- readUniProtExport(deUniProtFi) 
str(deUniPr1a)
## Workflow starting with UCSC annotation (gtf) files :
gtfFi <- file.path(path1,"UCSC_hg38_chr11extr.gtf.gz")
UcscAnnot1 <- readUCSCtable(gtfFi)
## Results of conversion at UniProt are already available (file "deUniProt_hg38chr11extr.tab")
myTargRegion <- list("chr1", pos=c(198110001,198570000))
myTargRegion2 <-"chr11:1-135,086,622"      # works equally well
deUniPr1 <- readUniProtExport(deUniProtFi,deUcsc=UcscAnnot1,
  targRegion=myTargRegion)
## Now UniProt IDs and genomic locations are both available :
str(deUniPr1)
Read (Normalized) Quantitation Data Files Produced By Wombat At Protein Level
Description
Protein quantification results from Wombat-P using the Bioconductor package Normalizer can be read using this function and relevant information extracted. Input files compressed as .gz can be read as well. The protein abundance values (XIC), peptide counting get extracted. Since protein annotation is not very extensive with this format of data, the function allows reading the initial fasta files (from the directory above the quantitation-results) allowing to extract more protein-annotation (like species). Sample-annotation (if available) can be extracted from sdrf files, which are typically part of the Wombat output, too. The protein abundance values may be normalized using multiple methods (median normalization as default), the determination of normalization factors can be restricted to specific proteins (normalization to bait protein(s), or to invariable matrix of spike-in experiments). The protein annotation data gets parsed to extract specific fields (ID, name, description, species ...). Besides, a graphical display of the distribution of protein abundance values may be generated before and after normalization.
Usage
readWombatNormFile(
  fileName,
  path = NULL,
  quantSoft = "(quant software not specified)",
  fasta = NULL,
  isLog2 = TRUE,
  normalizeMeth = "none",
  quantCol = "abundance_",
  contamCol = NULL,
  pepCountCol = c("number_of_peptides"),
  read0asNA = TRUE,
  refLi = NULL,
  sampleNames = NULL,
  extrColNames = c("protein_group"),
  specPref = NULL,
  remRev = TRUE,
  remConta = FALSE,
  separateAnnot = TRUE,
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  titGraph = NULL,
  wex = 1.6,
  plotGraph = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| fileName | (character) name of file to be read (default 'proteinGroups.txt' as typically generated by Compomics in txt folder). Gz-compressed files can be read, too. | 
| path | (character) path of file to be read | 
| quantSoft | (character) qunatification-software used inside Wombat-P | 
| fasta | (logical or character) if  | 
| isLog2 | (logical) typically data read from Wombat are expected to be  | 
| normalizeMeth | (character) normalization method, defaults to  | 
| quantCol | (character or integer) exact col-names, or if length=1 content of  | 
| contamCol | (character or integer, length=1) which columns should be used for contaminants | 
| pepCountCol | (character) pattern to search among column-names for count data (1st entry for 'Razor + unique peptides', 2nd fro 'Unique peptides', 3rd for 'MS.MS.count' (PSM)) | 
| read0asNA | (logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results) | 
| refLi | (character or integer) custom specify which line of data should be used for normalization, ie which line is main species; if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given | 
| sampleNames | (character) custom column-names for quantification data; this argument has priority over  | 
| extrColNames | (character) column names to be read (1st position: prefix for LFQ quantitation, default 'LFQ.intensity'; 2nd: column name for protein-IDs, default 'Majority.protein.IDs'; 3rd: column names of fasta-headers, default 'Fasta.headers', 4th: column name for number of protein IDs matching, default 'Number.of.proteins') | 
| specPref | (character) prefix to identifiers allowing to separate i) recognize contamination database, ii) species of main identifications and iii) spike-in species | 
| remRev | (logical) option to remove all protein-identifications based on reverse-peptides | 
| remConta | (logical) option to remove all proteins identified as contaminants | 
| separateAnnot | (logical) if  | 
| gr | (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from  | 
| sdrf | (logical, character, list or data.frame) optional extraction and adding of experimenal meta-data:
if  | 
| suplAnnotFile | (logical or character) optional reading of supplemental files produced by Compomics; if  | 
| groupPref | (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to  | 
| titGraph | (character) custom title to plot of distribution of quantitation values | 
| wex | (numeric) relative expansion factor of the violin in plot | 
| plotGraph | (logical) optional plot vioplot of initial and normalized data (using  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
By standard workflow of Wombat-P writes the results of each analysis-method/quantification-algorithm as .csv files Meta-data describing the proteins may be available from two sources : a) The 1st column of the Wombat/normalizer output. b) Form the .fasta file in the directory above the analysis/quantiication results of the Wombar-workflow
Meta-data describing the samples and experimental setup may be available from a sdrf-file (from the directory above the analysis/quantiication results)
If available, the meta-data will be examined for determining groups of replicates and
the results thereof can be found in $sampleSetup$levels.
Alternatively, a dataframe formatted like sdrf-files (ie for each sample a separate line, see also function readSdrf) may be given, too.
This import-function has been developed using Wombat-P version 1.x.
The final output is a list containing these elements: $raw, $quant, $annot, $counts, $sampleSetup, $quantNotes, $notes, or (if separateAnnot=FALSE) data.frame
with annotation- and main quantification-content. If sdrf information has been found, an add-tional list-element setup
will be added containg the entire meta-data as setup$meta and the suggested organization as setup$lev.
Value
This function returns a list with  $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot (columns ), $counts an array with 'PSM' and 'NoOfRazorPeptides',
$quantNotes, $notes and optional setup for meta-data from sdrf; or a data.frame with quantitation and annotation if separateAnnot=FALSE
See Also
read.table, normalizeThis) , readProteomeDiscovererFile; readProlineFile (and other import-functions), matrixNAinspect
Examples
path1 <- system.file("extdata", package="wrProteo")
# Here we'll load a short/trimmed example file (originating from Compomics)
fiNa <- "tinyWombCompo1.csv.gz"
dataWB <- readWombatNormFile(file=fiNa, path=path1, tit="tiny Wombat/Compomics, Normalized ")
summary(dataWB$quant)
Remove Samples/Columns From list of matrixes
Description
Remove samples (ie columns) from every instance of list of matrixes. Note: This function assumes same order of columns in list-elements 'listElem' !
Usage
removeSampleInList(
  dat,
  remSamp,
  listElem = c("raw", "quant", "counts", "sampleSetup"),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| dat | (list) main input to be filtered | 
| remSamp | (integer) column number to exclude | 
| listElem | (character) names of list-elements where columns indicated with 'remSamp' should be removed | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of message(s) produced | 
Value
This function returns a matrix including imputed values or list of final and matrix with number of imputed by group (plus optional plot)
See Also
Examples
set.seed(2019)
datT6 <- matrix(round(rnorm(300)+3,1), ncol=6, dimnames=list(paste("li",1:50,sep=""),
  letters[19:24]))
datL <- list(raw=datT6, quant=datT6, annot=matrix(nrow=nrow(datT6), ncol=2))
datDelta2 <- removeSampleInList(datL, remSam=2)
Complement Missing EntryNames In Annotation
Description
This function helps replacing missing EntryNames (in $annot) after reading quantification results. 
To do so the comumn-names of annCol will be used : 
The content of 2nd element (and optional 3rd element) will be used to replace missing content in column defined by 1st element.
Usage
replMissingProtNames(
  x,
  annCol = c("EntryName", "Accession", "SpecType"),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| x | (list) output of  | 
| annCol | (character) the column-names form  | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of message(s) produced | 
Value
This function returns a list (like as input), but with missing elments of $annot completed (if available in other columns)
See Also
readMaxQuantFile, readProtDiscovFile, readProlineFile
Examples
dat <- list(quant=matrix(sample(11:99,9,replace=TRUE), ncol=3), annot=cbind(EntryName=c(
  "YP010_YEAST","",""),Accession=c("A5Z2X5","P01966","P35900"), SpecType=c("Yeast",NA,NA)))
replMissingProtNames(dat)
Get Short Names of Proteomics Quantitation Software
Description
Get/convert short names of various proteomics quantitation software names for software results handeled by this package. A 2-letter abbreviation will be returned
Usage
shortSoftwName(
  x,
  tryAsLower = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| x | (character) software (full) name | 
| tryAsLower | (logical) include lower-caps writing to search | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allow easier tracking of messages produced | 
Details
So far thuis function recognizes the following software names: "DIA-NN", "ProteomeDiscoverer", "Compomics", "MaxQuant", "Proline", "TPP", "FragPipe", "MassChroQ", "OpenMS", "Ionbot" and "Sage"
Value
This function returns a vector with 2-letter abbreviation for the software
See Also
Examples
shortSoftwName(c("maxquant","DIANN"))
Summarize statistical test result for plotting ROC-curves
Description
This function takes statistical testing results (obtained using testRobustToNAimputation or moderTest2grp,
based on limma) and calculates specifcity and sensitivity values for plotting ROC-curves along a panel of thresholds.
Based on annotation (from test$annot) with the user-defined column for species (argument 'spec') the counts of TP (true positives), FP (false positves), FN (false negatives) and TN are determined.
In addition, an optional plot may be produced.
Usage
summarizeForROC(
  test,
  useComp = 1,
  tyThr = "BH",
  thr = NULL,
  columnTest = NULL,
  FCthrs = NULL,
  spec = c("H", "E", "S"),
  annotCol = "Species",
  filterMat = "filter",
  batchMode = FALSE,
  tit = NULL,
  color = 1,
  plotROC = TRUE,
  pch = 1,
  bg = NULL,
  overlPlot = FALSE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| test | (list or class  | 
| useComp | (character or integer) in case multiple comparisons (ie multiple columns 'test$tyThr'); which pairwise comparison to used | 
| tyThr | (character,length=1) type of statistical test-result to be used for sensitivity and specificity calculations (eg 'BH','lfdr' or 'p.value'), must be list-element of 'test' | 
| thr | (numeric) stat test (FDR/p-value) threshold, if  | 
| columnTest | depreciated, please use 'useComp' instead | 
| FCthrs | (numeric) Fold-Change threshold (display as line) give as Fold-change and NOT as log2(FC), default at 1.5, set to  | 
| spec | (character) labels for those species which should be matched to column  | 
| annotCol | (character, length=1) column name of  | 
| filterMat | (character) name (or index) of element of  | 
| batchMode | (logical) if  | 
| tit | (character) optinal custom title in graph | 
| color | (character or integer) color in graph | 
| plotROC | (logical) toogle plot on or off | 
| pch | (integer) type of symbol to be used (see  | 
| bg | (character) backgroud in plot (see  | 
| overlPlot | (logical) overlay to existing plot if  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Details
Determining TP and FP counts requires 'ground trouth' experiments, where it is known in advance which proteins are expected to change abundance between two groups of samples. Typically this is done by mixing proteins of different species origin, the first species noted by argument 'spec' designes the species to be considered constant (expected as FN in statistical tests). Then, one or mutiple additional spike-in species can be defined. As the spike-in cocentration should have been altered between different gruops of samples, they are expected as TP.
The main aim of this function consists in providing specifcity and sensitivity values, plus counts of TP (true positives), FP (false positves), FN (false negatives) and TN (true negatives), along various thrsholds (specified in column 'alph') for statistical tests preformed prior to calling this function.
Note, that the choice of species-annotation plays a crucial role who the counting results are obtained. In case of multiple spike-in species the user should pay attention if they all are expected to change abundance at the same ratio. If not, it is advised to run this function multiple times sperately only with the subset of those species expected to change at same ratio.
The dot on the plotted curve shows the results at the level of the single threshold alpha=0.05.
For plotting multiple ROC curves as overlay and additional graphical parameters/options you may use plotROC.
See also ROC on Wkipedia for explanations of TP,FP,FN and TN as well as examples. Note that numerous other packages also provide support for building and plotting ROC-curves : Eg rocPkgShort, ROCR, pROC or ROCit
Value
This function returns a numeric matrix containing the columns 'alph', 'spec', 'sens', 'prec', 'accur', 'FD' plus two columns with absolute numbers of lines (genes/proteins) passing the current threshold level alpha (1st species, all other species)
See Also
replot the figure using plotROC, calculate AUC using AucROC, robust test for preparing tables testRobustToNAimputation, moderTest2grp, test2grp, eBayes in package limma, t.test
Examples
set.seed(2019); test1 <- list(annot=cbind(Species=c(rep("b",35), letters[sample.int(n=3,
  size=150, replace=TRUE)])), BH=matrix(c(runif(35,0,0.01), runif(150)), ncol=1))
tail(roc1 <- summarizeForROC(test1, spec=c("a","b","c"), annotCol="Species"))
t-test each line of 2 groups of data
Description
test2grp performs t-test on two groups of data using limma,
this is a custom implementation of moderTest2grp for proteomics.
The final obkect also includes the results without moderation by limma (eg BH-FDR in $nonMod.BH). 
Furthermore, there is an option to make use of package ROTS (note, this will increase the time of computatins considerably).
Usage
test2grp(
  dat,
  questNo,
  useCol = NULL,
  grp = NULL,
  annot = NULL,
  ROTSn = 0,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| dat | (matrix or data.frame) main data (may contain NAs) | 
| questNo | (integer) specify here which question, ie comparison should be adressed | 
| useCol | (integer or character) | 
| grp | (character or factor) | 
| annot | (matrix or data.frame) | 
| ROTSn | (integer) number of iterations ROTS runs (stabilization of reseults may be seen with >300) | 
| silent | (logical) suppress messages | 
| debug | (logical) display additional messages for debugging | 
| callFrom | (character) allow easier tracking of message(s) produced | 
Value
This function returns a limma-type S3 object of class 'MArrayLM' (which can be accessed like a list); multiple testing correction types or modified testing by ROTS may get included ('p.value','FDR','BY','lfdr' or 'ROTS.BH')
See Also
moderTest2grp, pVal2lfdr, t.test, ROTS from the Bioconductor package ROTS
Examples
set.seed(2018);  datT8 <- matrix(round(rnorm(800)+3,1), nc=8, dimnames=list(paste(
  "li",1:100,sep=""), paste(rep(LETTERS[1:3],c(3,3,2)),letters[18:25],sep="")))
datT8[3:6,1:2] <- datT8[3:6,1:2] +3   # augment lines 3:6 (c-f) 
datT8[5:8,5:6] <- datT8[5:8,5:6] +3   # augment lines 5:8 (e-h) 
grp8 <- gl(3,3,labels=LETTERS[1:3],length=8)
datL <- list(data=datT8, filt= wrMisc::presenceFilt(datT8,grp=grp8,maxGrpM=1,ratMa=0.8))
testAvB0 <- wrMisc::moderTest2grp(datT8[,1:6], gl(2,3))
testAvB <- test2grp(datL, questNo=1)
Pair-wise testing robust to NA-imputation
Description
This function replaces NA values based on group neighbours (based on grouping of columns in argument gr), following overall assumption of close to Gaussian distribution.
Furthermore, it is assumed that NA-values originate from experimental settings where measurements at or below detection limit are recoreded as NA.
In  such cases (eg in proteomics) it is current practice to replace NA-values by very low (random) values in order to be able to perform t-tests.
However, random normal values used for replacing may in rare cases deviate from the average (the 'assumed' value) and in particular, if multiple NA replacements are above the average, 
may look like induced biological data and be misinterpreted as so.      
The statistical testing uses eBayes from Bioconductor package limma for robust testing in the context of small numbers of replicates. 
By repeating multiple times the process of replacing NA-values and subsequent testing the results can be sumarized afterwards by median over all repeated runs to remmove the stochastic effect of individual NA-imputation.
Thus, one may gain stability towards random-character of NA imputations by repeating imputation & test 'nLoop' times and summarize p-values by median (results stabilized at 50-100 rounds).
It is necessary to define all groups of replicates in gr to obtain all possible pair-wise testing (multiple columns in $BH, $lfdr etc). 
The modified testing-procedure of Bioconductor package ROTS may optionaly be included, if desired.
This function returns a limma-like S3 list-object further enriched by additional fields/elements.
Usage
testRobustToNAimputation(
  dat,
  gr = NULL,
  annot = NULL,
  retnNA = TRUE,
  avSd = c(0.15, 0.5),
  avSdH = NULL,
  plotHist = FALSE,
  xLab = NULL,
  tit = NULL,
  imputMethod = "mode2",
  seedNo = NULL,
  multCorMeth = NULL,
  nLoop = 100,
  lfdrInclude = NULL,
  ROTSn = NULL,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| dat | (matrix or data.frame) main data (may contain  | 
| gr | (character or factor) replicate association; if  | 
| annot | (matrix or data.frame) annotation (lines must match lines of data !), if  | 
| retnNA | (logical) retain and report number of  | 
| avSd | (numerical,length=2) population characteristics (mean and sd) for >1  | 
| avSdH | depreciated, please use  | 
| plotHist | (logical) additional histogram of original, imputed and resultant distribution (made using  | 
| xLab | (character) custom x-axis label | 
| tit | (character) custom title | 
| imputMethod | (character) choose the imputation method (may be 'mode2'(default), 'mode1', 'datQuant', 'modeAdopt', 'informed' or 'none', for details see  | 
| seedNo | (integer) seed-value for normal random values | 
| multCorMeth | (character) define which method(s) for correction of multipl testing should be run (for choice : 'BH','lfdr','BY','tValTab', choosing several is possible) | 
| nLoop | (integer) number of runs of independent  | 
| lfdrInclude | (logical) depreciated, please used  | 
| ROTSn | (integer) depreciated, please used  | 
| silent | (logical) suppress messages | 
| debug | (logical) additional messages for debugging | 
| callFrom | (character) This function allows easier tracking of messages produced | 
Details
The argument multCorMeth allows to choose which multiple correction algorimths will be used and included to the final results.
Possible options are 'lfdr','BH','BY','tValTab', ROTSn='100' (name to element necessary) or 'noLimma' (to add initial p.values and BH to limma-results). By default 'lfdr' (local false discovery rate from package 'fdrtools') and 'BH' (Benjamini-Hochberg FDR) are chosen.
The option 'BY' referrs to Benjamini-Yakuteli FDR, 'tValTab' allows exporting all individual t-values from the repeated NA-substitution and subsequent testing.
This function is compatible with automatic extraction of experimental setup based on sdrf or other quantitation-specific sample annotation.
In this case, the results of automated importing and mining of sample annotation should be stored as $sampleSetup$groups or $sampleSetup$lev  
For details 'on choice of NA-impuation procedures with arguments 'imputMethod' and 'avSd' please see  matrixNAneighbourImpute.
Value
This function returns a limma-type S3 object of class 'MArrayLM' (which can be accessed lika a list); multiple results of testing or multiple testing correction types may get included ('p.value','FDR','BY','lfdr' or 'ROTS.BH')
See Also
NA-imputation via matrixNAneighbourImpute, modereated t-test without NA-imputation moderTest2grp, calculating lfdr pVal2lfdr, eBayes in Bioconductor package limma, t.test,ROTS of Bioconductor package ROTS
Examples
set.seed(2015); rand1 <- round(runif(600) +rnorm(600,1,2),3)
dat1 <- matrix(rand1,ncol=6) + matrix(rep((1:100)/20,6),ncol=6)
dat1[13:16,1:3] <- dat1[13:16,1:3] +2      # augment lines 13:16 
dat1[19:20,1:3] <- dat1[19:20,1:3] +3      # augment lines 19:20
dat1[15:18,4:6] <- dat1[15:18,4:6] +1.4    # augment lines 15:18 
dat1[dat1 <1] <- NA                        # mimick some NAs for low abundance
## normalize data
boxplot(dat1, main="data before normalization")
dat1 <- wrMisc::normalizeThis(as.matrix(dat1), meth="median")
## designate replicate relationships in samples ...  
grp1 <- gl(2, 3, labels=LETTERS[1:2])                   
## moderated t-test with repeated inputations (may take >10 sec,  >60 sec if ROTSn >0 !) 
PLtestR1 <- testRobustToNAimputation(dat=dat1, gr=grp1, retnNA=TRUE, nLoop=70)
names(PLtestR1)
Write sequences in fasta format to file
This function writes sequences from character vector as fasta formatted file (from UniProt) 
Line-headers are based on names of elements of input vector prot.
This function also allows comparing the main vector of sequences with a reference vector ref to check if any of the sequences therein are truncated.
Description
Write sequences in fasta format to file
This function writes sequences from character vector as fasta formatted file (from UniProt) 
Line-headers are based on names of elements of input vector prot.
This function also allows comparing the main vector of sequences with a reference vector ref to check if any of the sequences therein are truncated.
Usage
writeFasta2(
  prot,
  fileNa = NULL,
  ref = NULL,
  lineLength = 60,
  eol = "\n",
  truSuf = "_tru",
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)
Arguments
| prot | (character) vector of sequenes, names will be used for fasta-header | 
| fileNa | (character) name (and path) for file to be written | 
| ref | (character) optional/additional set of (reference-) sequences (only for comparison to  | 
| lineLength | (integer, length=1) number of sequence characters per line (default 60, should be >1 and <10000) | 
| eol | (character) the character(s) to print at the end of each line (row); for example, eol = "\r\n" will produce Windows' line endings on a Unix-alike OS | 
| truSuf | (character) suffix to be added for sequences found truncated when comparing with  | 
| silent | (logical) suppress messages | 
| debug | (logical) supplemental messages for debugging | 
| callFrom | (character) allows easier tracking of messages produced | 
Details
Sequences without any names will be given generic headers like protein01 ... etc.
Value
This function writes the sequences from prot as fasta formatted-file
See Also
readFasta2 for reading fasta, write.fasta from the package seqinr
Examples
prots <- c(SEQU1="ABCDEFGHIJKL", SEQU2="CDEFGHIJKLMNOP")
writeFasta2(prots, fileNa=file.path(tempdir(),"testWrite.fasta"), lineLength=6)