General:
alakazam 1.3.0
,
alakazam::makeChangeoClone
requires the parameter
locus
with default value locus
. This function
is used in some examples and tests in shazam
. We added a
locus
column to the package’s example data.Distance Profiling:
distToNearest
the parameter
locusValues=c("IGH")
to specify loci values to focus the
analysis on.distToNearest
where grouping by
fields
was applied after grouping by genes, therefore not
treating independently the different subsets of data to identify groups
of genes. In practice, this means that if fields was set to treat
samples independently (fields='sample_id'
), single linkage
was applied to all data, and two genes could be placed in the same group
of genes if they where connected by an ambiguous gene call in any of the
samples. Now, data is separated by fields
(sample_id in this
example) before creating the groups of genes, and ambiguities in other
samples are not considered.Mutation Profiling:
Bug fix in parallelization set up for functions
slideWindowTune
and slideWindowDb
.
plotSlideWindowTune
(slideWindowTunePlot
). Updated the possible values of the
parameter plotFiltered
, for easier usage. The new values
(and their equivalent values in slideWindowTunePlot
) are
filtered
(TRUE
), remaining
(FALSE
), and per_mutation
(NULL
).
Deprecated:
slideWindowTunePlot
in favor of
plotSlideWindowTune
, for naming consistency.General:
New feature:
convertNumbering
to convert between
numbering systems (IMGT, Kabat).Mutation Profiling:
shmulateTree
has new argument nproc
to
specify the number of cores. Default values mutThresh
and
windowSize
have been set to mutThresh=6
and
windowSize=10
.
Added the option plotFiltered=NULL
to
slideWindowTunePlot
.
Fixed a bug in listObservedMutations
not returning a
list when db
had one sequence with one mutation.
Fixed bars shifted in plotMutability
.
General:
Selection Analysis:
observedMutations
, expectedMutations
, and
calcBaseline
can analyze mutations in all regions (CDR1,
CDR2, CDR3, FWR1, FWR2, FWR3 and FWR4) by specifying
regionDefinition=IMGT_VDJ
or
regionDefinition=IMGT_VDJ_BY_REGIONS
.setRegionBoundaries
to build
sequence-specific RegionDefinition
objects extending to
CDR3 and FWR4.makeGraphDf
to facilitate mutational
analysis on lineage trees.Distance Profiling:
distToNearest
where TRB and TRD
sequences where ignored in distance calculation.distToNearest
causing a fatal error when
cross
was set.nearestDist
causing a fatal error when
using model="aa"
and crossGroups
.Targeting Models:
plotMutability
.Mutation Profiling:
observedMutations
and
calcObservedMutations
causing mutation counting to fail
when there are gap (-
) characters in the germline
sequence.Targeting Models:
createTargetingModel
causing empty
counts in the numMutS
and numMutR
slots.Distance Profiling:
distToNearest
.groupUsingOnlyIGH
argument of
distToNearest
to onlyHeavy
.Backwards Incompatible Changes:
V_CALL
(Change-O) as the default to
identify the field that stored the V gene calls, they now use
v_call
(AIRR). That means, scripts that relied on default
values (previously, v_call="V_CALL"
), will now fail if
calls to the functions are not updated to reflect the correct value for
the data. If data are in the Change-O format, the current default value
v_call="v_call"
will fail to identify the column with the V
gene calls as the column v_call
doesn’t exist. In this
case, v_call="V_CALL"
needs to be specified in the function
call.ExampleDb
converted to the AIRR Rearrangement standard
and examples updated accordingly.labels
slot of
IMGT_V
has changed from CDR_R
,
CDR_S
, FWR_R
and FWR_S
to
cdr_r
, cdr_s
, fwr_r
and
fwr_s
, respectively.CODON_TABLE
and the different
MUTATION_SCHEMES
change from R
, S
and Stop
to r
, s
and
stop
, respectively.MU_COUNT_SEQ
to mu_count_seq
.calcBaseline
and related function output columns and S4
object slots. For example, from PVALUE
, REGION
and BASELINE_CI_PVALUE
to pvalue
,
region
and baseline_ci_pvalue
,
respectively.createSubstitutionMatrix
,
createMutabilityMatrix
and
createTargetingModel
, changed from
model=c("S","RS")
to model=c("s","rs")
.General:
Targeting Models:
createMutabilityMatrix
,
extendMutabilityMatrix
, createTargetingMatrix
,
and createTargetingModel
now also returns the numbers of
silent and replacement mutations used for estimating the 5-mer
mutabilities. These numbers are recorded in the numMutS
and
numMutR
slots in the newly defined
MutabilityModel
, MutabilityModelWithSource
,
and TargetingMatrix
classes.Mutation Profiling:
shmulateSeq
now also supports specifying the frequency
of mutations to be introduced. (Previously, only the number of mutations
was supported.)General:
General:
Distance Calculation:
distToNearest
that could potentially
cause sequences from different partitions to be used for distance
calculation.General:
Distance Calculation:
plotDensityThreshold
for negative
densities.distToNearest
for performing subsampling
while calculating cross-group nearest neighbor distances.distToNearest
now supports,
via a new argument VJthenLen
, either a 2-stage partitioning
(first by V gene and J gene, then by junction length), or a 1-stage
partitioning (simultaneously by V gene, J gene, and junction length).
For 1-stage partitioning, distToNearest
supports export of
the partitioning information as a new column via
keepVJLgroup
.distToNearest
now supports single-cell input data with
the addition of new arguments cellIdColumn
,
locusColumn
, and groupUsingOnlyIGH
.Mutation Profiling:
shmulateTree
has new arguments, start
and
end
, to specify the region in the sequence where mutations
can be introduced.Selection Analysis:
consensusSequence
which can be used
to build a consensus sequence using a variety of methods.General:
TargetingModel
and RegionDefinition
S4
classes.General:
subsample
argument to distToNearest
function.alakazam
. Specifically, progressBar
,
getBaseTheme
and checkColumns
.clearConsole
, getnproc
, and
getPlatform
functions.Distance Calculation:
findThreshold
method to
density
.density
method by
retuning the bandwidth detection process. The density
method should now also yield more consistent thresholds, on
average.subsample
argument to findThreshold
now applies to both the density
and gmm
methods. Subsampling of distance is not performed by default.plotDensityThreshold
and
plotGmmThreshold
wherein the breaks
argument
was ignored when specifying xmax
and/or
xmin
.Selection Analysis:
plotBaselineDensity
arising
when the groupColumn
and idColumn
arguments
were set to the same column.sizeElement
argument to
plotBaselineDensity
to control line sizefield_name
argument to field
in editBaseline
.Selection Analysis:
plotBaselineDensity
which caused an
empty plot to be generated if there was only a single value in the
idColumn
.calcBaseline
which caused a crash in
summarizeBaseline
and groupBaseline
when input
baseline
is based on only 1 sequence (i.e. when
nrow(baseline@db)
is 1).plot
call on a Baseline
object
to plotBaselineDensity
.getBaselineStats
function.summary
method for Baseline
objects that calls summarizeBaseline
and returns a
data.frame.Mutation Profiling:
shmulateSeq
which caused a crash when
the input sequence contains gaps (.
).mutations
in
shmulateSeq
to numMutations
.shmulateSeq
and
shmulateTree
.calcExpectedMutations
will now treat non-ACTG
characters as Ns rather than produce an error.RegionDefinition
objects for the full V
segment as single region (IMGT_V_BY_SEGMENTS
) and the V
segment with each codon as a separate region
(IMGT_V_BY_CODONS
).Targeting Models:
calculateMutability
function which computes
the aggregate mutability for sequences.createSubstitutionMatrix
to
fail for data containing only a single V family.model="S"
) in createSubstitutionMatrix
,
createSubstitutionMatrix
and
createTargetingModel
plot
call on a TargetingModel
object to plotMutability
.General:
Distance Calculation:
"gmm"
method of
findThreshold()
that allows users to choose a mixture of
two univariate density distribution functions among four available
combinations: "norm-norm"
, "norm-gamma"
,"gamma-norm"
, or "gamma-gamma"
."gmm"
method of findThreshold()
from the best
average sensitivity and specificity, the curve intersection or user
defined sensitivity or specificity.cutEdge
argument of
findThreshold()
to edge
.Mutation Profiling:
collapseClones()
, adding various
deterministic and stochastic methods to obtain effective clonal
sequences, support for including ambiguous IUPAC characters in output,
as well as extensive documentation. Removed
calcClonalConsensus()
from exported functions.observedMutations()
and
calcObservedMutations()
.calcObservedMutations()
for sequences with
non-triplet overhang at the tail.OBSERVED
) and expected mutations (previously
EXPECTED
) returned by observedMutations()
and
expectedMutations()
to MU_COUNT
and
MU_EXPECTED
respectively.Selection Analysis:
calcBaseline()
no longer calls
collapseClones()
automatically if a CLONE
column is present. As indicated by the documentation for
calcBaseline()
users are advised to obtain effective clonal
sequences (for example, calling collapseClones()
) before
running calcBaseline()
.calcBaseline()
.Mutation Profiling:
collapseClones()
that prevented it from
running when nproc
is greater than 1.General:
Mutation Profiling:
collapseClones()
that resulted in
erroneous CLONAL_SEQUENCE
and CLONAL_GERMLINE
being returned.observedMutations
was
running.General:
Selection Analysis:
summarizeBaseline()
. The returned p-value can now be either
positive or negative. Its magnitude (without the sign) should be
interpreted as per normal. Its sign indicates the direction of the
seLicense chalection detected. A positive p-value indicates positive
selection, whereas a negative p-value indicates negative selection.editBaseline()
to exported functions, and a
corresponding section in the vignette.calcBaseline()
.Targeting Models:
numMutationsOnly
argument to
createSubstitutionMatrix()
, enabling parameter tuning for
minNumMutations
.minNumMutationsTune()
and
minNumSeqMutationsTune()
to tune for parameters
minNumMutations
and minNumSeqMutations
in
functions createSubstitutionMatrix()
and
createMutabilityMatrix()
respectively. Also added function
plotTune()
which helps visualize parameter tuning using the
abovementioned two new functions.HKL_S5F
).HS5FModel
as HH_S5F
,
MRS5NFModel
as MK_RS5NF
, and
U5NModel
as U5N
.HH_S1F
), human kappa and lambda light chain, silent,
1-mer, functional substitution model (HKL_S1F
), and mouse
kappa light chain, replacement and silent, 1-mer, non-functional
substitution model (MK_RS1NF
).makeDegenerate5merSub
and
makeDegenerate5merMut
which make degenerate 5-mer
substitution and mutability models respectively based on the 1-mer
models. Also added makeAverage1merSub
and
makeAverage1merMut
which make 1-mer substitution and
mutability models respectively by averaging over the 5-mer models.Mutation Profiling:
returnRaw
argument to
calcObservedMutations()
, which if true returns the
positions of point mutations and their corresponding mutation types, as
opposed to counts of mutations (hence “raw”).slideWindowSeq()
and
slideWindowDb()
which implement a sliding window approach
towards filtering a single sequence or sequences in a data.frame which
contain(s) equal to or more than a given number of mutations in a given
number of consecutive nucleotides.slideWindowTune()
which allows for
parameter tuning for using slideWindowSeq()
and
slideWindowDb()
.slideWindowTunePlot()
which
visualizes parameter tuning by slideWindowTune()
.Distance Calculation:
distToNearest
wherein
normalize="length"
for 5-mer models was resulting in
distances normalized by junction length squared instead of raw junction
length.distToNearest
wherein
symmetry="min"
was calculating the minimum of the total
distance between two sequences instead of the minimum distance at each
mutated position.findThreshold
function to infer clonal distance
threshold from nearest neighbor distances returned by
distToNearest
.length
option for the
normalize
argument of distToNearest
to
len
so it matches Change-O.HS1FDistance
and
M1NDistance
distance models, which have been renamed to
hs1f_compat
and m1n_compat
in the
model
argument of distToNearest
. These
deprecated models should be used for compatibility with DefineClones in
Change-O v0.3.3. These models have been replaced by replaced by
hh_s1f
and mk_rs1nf
, which are supported by
Change-O v0.3.4.hs5f
model in distToNearest
to
hh_s5f
.MK_RS5NF
models to
distToNearest
.calcTargetingDistance()
to enable calculation
of a symmetric distance matrix given a 1-mer substitution matrix
normalized by row, such as HH_S1F
.findThreshold
. The previous smoothed
density method is available via the method="density"
argument and the new GMM method is available via
method="gmm"
.plotGmmThreshold
and
plotDensityThreshold
to plot the threshold detection
results from findThreshold
for the "gmm"
and
"density"
methods, respectively.Region Definition:
IMGT_V_NO_CDR3
and
IMGT_V_BY_REGIONS_NO_CDR3
. Updated IMGT_V
and
IMGT_V_BY_REGIONS
so that neither includes CDR3 now.Selection Analysis:
Targeting Models:
numSeqMutationsOnly
argument to
createMutabilityMatrix()
, enabling parameter tuning for
minNumSeqMutations
.General:
InfluenzaDb
data object, in favor of the
updated ExampleDb
provided in alakazam 0.2.4.Distance Calculation:
cross
argument to
distToNearest()
which allows restriction of distances to
only distances across samples (ie, excludes within-sample
distances).mst
flag to distToNearest()
, which
will return all distances to neighboring nodes in a minimum spanning
tree.aa
model of distToNearest()
.aa
model of
distToNearest()
.Mutation Profiling:
MutationDefinition
VOLUME_MUTATIONS
.shmulateSeq()
and
shmulateTree()
to simulate mutations on sequences and
lineage trees, respectively, using a 5-mer targeting model.collapseByClone
,
calcDbExpectedMutations
and
calcDbObservedMutations
to collapseClones
,
expectedMutations
, and observedMutations
,
respectively.Selection Analysis:
Baseline
object through
groupBaseline()
multiple times resulted in incorrect
normalization.title
options to
plotBaselineSummary()
and
plotBaselineDensity()
.plotBaselineSummary()
and
plotBaselineDensity()
.testBaseline()
function to test the
significance of differences between two selection distributions.General:
InfluenzaDb
.dplyr::tbl_df
object instead of a
data.frame
.Distance Calculation:
distToNearest()
did not return the
nearest neighbor with a non-zero distance.Targeting Models:
createSubstitutionMatrix()
,createMutabilityMatrix()
, and
plotMutability()
.plotMutability()
.Mutation Profiling:
MutationDefinition
objects
MUTATIONS_CHARGE
, MUTATIONS_HYDROPATHY
,
MUTATIONS_POLARITY
providing alternate approaches to
defining replacement and silent annotations to mutations when calling
calcDBObservedMutations()
and
calcDBExpectedMutations()
.regionDefinition=NULL
consistent
for all mutation profiling functions. Now the entire sequence is used as
the region and calculations are made accordingly.calcDBObservedMutations()
returns R and S mutations
also when regionDefinition=NULL
. Older versions reported
the sum of R and S mutations. The function will add the columns
OBSERVED_SEQ_R
and OBSERVED_SEQ_S
when
frequency=FALSE
, and MU_FREQ_SEQ_R
and
MU_FREQ_SEQ_R
when frequency=TRUE
.General:
Distance Calculation:
symmetry
parameter to distToNearest to change
behavior of how asymmetric distances (A->B != B->A) are combined
to get distance between A and B.Mutation Profiling:
Selection Analysis:
Targeting Models:
minNumMutations
parameter to
createSubstitutionMatrix. This is the minimum number of observed 5-mers
required for the substituion model. The substitution rate of 5-mers with
fewer number of observed mutations will be inferred from other
5-mers.minNumSeqMutations
parameter to
createMutabilityMatrix. This is the minimum number of mutations required
in sequences containing the 5-mers of interest. The mutability of 5-mers
with fewer number of observed mutations in the sequences will be
inferred.returnModel
parameter to
createSubstitutionMatrix. This gives user the option to return 1-mer or
5-mer model.returnSource
parameter to createMutabilityMatrix.
If TRUE, the code will return a data frame indicating whether each 5-mer
mutability is observed or inferred.Initial public release.
General:
Influenza.tab
file did not
load on Mac OS X.citation("shazam")
command.Distance Calculation:
HS1FDistance
,
based on the Yaari et al, 2013 data.hs1f
as the default distance model for
distToNearest()
.distToNearest()
.Mutation Profiling:
calcDBClonalConsensus()
so that the
function now works correctly when called with the argument
collapseByClone=FALSE
.frequency
argument to
calcObservedMutations()
and
calcDBObservedMutations()
, which enables return of mutation
frequencies rather the default of mutation counts.Targeting Models:
M3NModel
and all options for using said
model.createSubstitutionMatrix()
and
createMutabilityMatrix()
where IMGT gaps were not being
handled.General:
Targeting Models:
Targeting Models:
U5NModel
, which is a uniform 5-mer
model.plotMutability()
output.Prerelease for review.