ggnewscale
. Thanks @eliocamp for the
PR.Removes defunct tests after ggplot2
update.
plot_confusion_matrix()
now shows total count when
add_normalized=FALSE
. Thanks @JianWang2016 for
reporting the issue.
Makes it more clear in the documentation that the
Balanced Accuracy
metric in multiclass classification is
the macro-averaged metric, not the average recall metric that
is sometimes used.
plot_confusion_matrix()
:Breaking: Adds slight 3D tile effect to help separate tiles with the same count. Not tested in all many-class scenarios.
Fixes image sizing (arrows and zero-shading) when there are different numbers of unique classes in targets and predictions.
Fixes bug with class_order
argument when there are
different numbers of unique classes in targets and predictions.
plot_confusion_matrix()
:NEW: We created a Plot
Confusion Matrix web application! It allows using
plot_confusion_matrix()
without code. Select from multiple
design templates or make your own.
For (palette=, sums_settings(palette=))
arguments,
tile color palettes can now be a custom gradient. Simply supply a named
list with hex colors for “low” and “high”
(e.g. list("low"="#B1F9E8", "high"="#239895")
).
Adds intensity_by
, intensity_lims
, and
intensity_beyond_lims
arguments to
sum_tile_settings()
to allow setting them separately for
sum tiles.
Adds intensity_lims
argument which allows setting a
custom range for the tile color intensities. Makes it easier to compare
plots for different prediction sets.
Adds intensity_beyond_lims
for specifying how to
handle counts / percentages outside the specified
intensity_lims
. Default is to truncate the
intensities.
Fixes bug where arrow size was not taking add_sums
into account.
plot_confusion_matrix()
:Adds option to set intensity_by
to a log/arcsinh
transformed version of the counts. This adds the options
"log counts"
, "log2 counts"
,
"log10 counts"
, "arcsinh counts"
to the
intensity_by
argument.
Fixes bug when add_sums = TRUE
and
counts_on_top = TRUE
.
Raises error for negative counts.
Fixes zero-division when all counts are 0.
Sets palette colors to lowest value when all counts are 0.
In plot_confusion_matrix()
, adds
sub_col
argument for passing in text to replace the bottom
text (counts
by default).
In plot_confusion_matrix()
, fixes direction of
arrows when class_order
is specified.
In update_hyperparameters()
, allows
hyperparameters
argument to be NULL
. Thanks @ggrothendieck
for reporting the issue.
In relevant contexts: Informs user once about the
positive
argument in evaluate()
and
cross_validate*()
not affecting the interpretation of
probabilities. I, myself, had forgotten about this in a project, so
seems useful to remind us all about :-)
Fixes usage of the "all"
name in
set_metrics()
after purrr v1.0.0
update.
Makes testing conditional on the availability of
xpectr
.
Fixes tidyselect
-related warnings.
parameters 0.19.0
. Thanks to @strengejacke.Fixes tests for CRAN.
Adds merDeriv
as suggested package.
parameters 0.15.0
. Thanks to @strengejacke.checkmate 2.1.0
.ggplot2
functions. Now
compatible with ggplot2 3.3.4
.In order to reduce dependencies, model coefficients are now
tidied with the parameters
package instead of
broom
and broom.mixed
. Thanks to @IndrajeetPatil for
the contributions.
In cross_validate()
and
cross_validate_fn()
, fold columns can now have a varying
number of folds in repeated cross-validation. Struggling to choose a
number of folds? Average over multiple settings.
In the Class Level Results
in multinomial
evaluations, the nested Confusion Matrix
and
Results
tibbles are now named with their class to ease
extraction and further work with these tibbles. The Results
tibble further gets a Class
column. This information might
be redundant, but could make life easier.
Adds vignette:
Multiple-k: Picking the number of folds for cross-validation
.
plot_confusion_matrix()
, where tiles with
a count > 0 but a rounded percentage of 0 did not have the percentage
text. Only tiles with a count of 0 should now be without text.Breaking change: In plot_confusion_matrix()
, the
targets_col
and predictions_col
arguments have
been renamed to target_col
and prediction_col
to be consistent with evaluate()
.
Breaking change: In evaluate_residuals()
, the
targets_col
and predictions_col
arguments have
been renamed to target_col
and prediction_col
to be consistent with evaluate()
.
Breaking change: In
process_info_gaussian/binomial/multinomial()
, the
targets_col
argument have been renamed to
target_col
to be consistent with
evaluate()
.
In binomial
most_challenging()
, the
probabilities are now properly of the second class
alphabetically.
In plot_confusion_matrix()
, adds argument
class_order
for manually setting the order of the classes
in the facets.
In plot_confusion_matrix()
, tiles with a count of
0
no longer has text in the tile by default. This adds the
rm_zero_percentages
(for column/row percentage) and
rm_zero_text
(for counts and normalized)
arguments.
In plot_confusion_matrix()
, adds optional sum tiles.
Enabling this (add_sums = TRUE
) adds an extra column and an
extra row with the sums. The corner tile contains the total count. This
adds the add_sums
and sums_settings
arguments.
A sum_tile_settings()
function has been added to control
the appearance of these tiles. Thanks to @MaraAlexeev for
the idea.
In plot_confusion_matrix()
, adds option
(intensity_by
) to set the color intensity of the tiles to
the overall percentages (normalized
).
In plot_confusion_matrix()
, adds option to only have
row and column percentages in the diagonal tiles. Thanks to @xgirouxb for the
idea.
Adds Process
information to output with the settings
used. Adds transparency. It has a custom print method, making it easy to
read. Underneath it is a list, why all information is available using
$
or similar. In most cases, the Family
information has been moved into the Process
object. Thanks
to @daviddalpiaz for
notifying me of the need for more transparency.
In outputs, the Family
information is (in most
cases) moved into the new Process
object.
In binomial
evaluate()
and
baseline()
, Accuracy
is now enabled by
default. It is still disabled in cross_validate*()
functions to guide users away from using it as the main criterion for
model selection (as it is well known to many but can be quite bad in
cases with imbalanced datasets.)
Fixes: In binomial evaluation, the probabilities are now properly of the second class alphabetically. When the target column was a factor where the levels were not in alphabetical order, the second level in that order was used. The levels are now sorted before extraction. Thanks to @daviddalpiaz for finding the bug.
Fixes: In grouped multinomial evaluation, when predictions are classes and there are different sets of classes per group, only the classes in the subset are used.
Fixes: Bug in ROC
direction parameter being set
wrong when positive
is numeric. In regression tests, the
AUC
scores were not impacted.
Fixes: 2-class multinomial
evaluation returns all
expected metrics.
In multinomial evaluation, the Class Level Results
are sorted by the Class
.
Imports broom.mixed
to allow tidying of coefficients
from lme4::lmer
models.
Exports process_info_binomial()
,
process_info_multinomial()
,
process_info_gaussian()
constructors to ensure the various
methods are available. They are not necessarily intended for external
use.
dplyr
version 1.0.0
.
NOTE: this version of dplyr
slows down some functions in
cvms
significantly, why it might be beneficial not to
update before version 1.1.0
, which is supposed to tackle
this problem.rsvg
and ggimage
are now only
suggested and plot_confusion_matrix()
throws
warning if either are not installed.
Additional input checks for evaluate()
.
In cross_validate()
and validate()
, the
models
argument is renamed to formulas
. This
is a more meaningful name that was recently introduced in
cross_validate_fn()
. For now, the models
argument is deprecated, will be used instead of formulas
if
specified, and will throw a warning.
In cross_validate()
and validate()
, the
model_verbose
argument is renamed to verbose
.
This is a more meaningful name that was recently introduced in
cross_validate_fn()
. For now, the
model_verbose
argument is deprecated, will be used instead
of verbose
if specified, and will throw a warning.
In cross_validate()
and validate()
, the
link
argument is removed. Consider using
cross_validate_fn()
or validate_fn()
instead,
where you have full control over the prediction type fed to the
evaluation.
In cross_validate_fn()
, the
predict_type
argument is removed. You now have to pass a
predict function as that is safer and more transparent.
In functions with family
/type
argument,
this argument no longer has a default, forcing the user to specify the
family/type of the task. This also means that arguments have been
reordered. In general, it is safer to name arguments when passing values
to them.
In evaluate()
, apply_softmax
now
defaults to FALSE
. Throws error if probabilities do not add
up to 1 row-wise (tolerance of 5 decimals) when type
is
multinomial
.
multinomial
MCC
is now the proper
multiclass generalization. Previous versions used
macro MCC
. Removes MCC
from the class level
results. Removes the option to enable
Weighted MCC
.
multinomial
AUC
is calculated with
pROC::multiclass.roc()
instead of in the one-vs-all
evaluations. This removes AUC
, Lower CI
, and
Upper CI
from the Class Level Results
and
removes Lower CI
and Upper CI
from the main
output tibble. Also removes option to enable “Weighted AUC”, “Weighted
Lower CI”, and “Weighted Upper CI”.
multinomial
AUC
is disabled by default,
as it can take a long time to calculate for a large set of
classes.
ROC
columns now return the ROC
objects
instead of the extracted sensitivities
and
specificities
, both of which can be extracted from the
objects.
In evaluate()
, it’s no longer possible to pass model
objects. It now only evaluates the predictions. This removes the the
AIC
, AICc
, BIC
, r2m
,
and r2c
metrics.
In cross_validate
and validate()
, the
r2m
, and r2c
metrics are now disabled by
default in gaussian
. The r-squared metrics are
non-predictive and should not be used for model selection. They can be
enabled with
metrics = list("r2m" = TRUE, "r2c" = TRUE)
.
In cross_validate_fn()
, the AIC
,
AICc
, BIC
, r2m
, and
r2c
metrics are now disabled by default in
gaussian
. Only some model types will allow the computation
of those metrics, and it is preferable that the user actively makes a
choice to include them.
In baseline()
, the AIC
,
AICc
, BIC
, r2m
, and
r2c
metrics are now disabled by default in
gaussian
. It can be unclear whether the IC metrics
(computed on the lm()
/lmer()
model objects)
can be compared to those calculated for a given other model function. To
avoid such confusion, it is preferable that the user actively makes a
choice to include the metrics. The r-squared metrics will only be
non-zero when random effects are passed. Given that we shouldn’t use the
r-squared metrics for model selection, it makes sense to not have them
enabled by default.
validate()
now returns a tibble with the model
objects nested in the Model
column. Previously, it returned
a list with the results and models. This allows for easier use in
magrittr
pipelines (%>%
).
In multinomial baseline()
, the aggregation approach
is changed. The summarized results now properly describe the random
evaluations tibble, except for the four new measures
CL_Max
, CL_Min
, CL_NAs
, and
CL_INFs
, which describe the class level results.
Previously, NAs
were removed before aggregating the
one-vs-all evaluations, meaning that some metric summaries could become
inflated if small classes had NA
s. It was also
non-transparent that the NA
s and INF
s were
counted in the class level results instead of being a count of random
evaluations with NA
s or INF
s.
cv_plot()
is removed. It wasn’t very useful and has
never been developed properly. We aim to provide specialized plotting
functions instead.
validate_fn()
is added. Validate your custom model
function on a test set.
confusion_matrix()
is added. Create a confusion
matrix and calculate associated metrics from your targets and
predictions.
evaluate_residuals()
is added. Calculate common
metrics from regression residuals.
summarize_metrics()
is added. Use it summarize the
numeric columns in your dataset with a set of common descriptors. Counts
the NA
s and Inf
s. Used by
baseline()
.
select_definitions()
is added. Select the columns
that define the models, such as Dependent
,
Fixed
, Random
, and the (unnested)
hyperparameters.
model_functions()
is added. Contains simple
model_fn
examples that can be used in
cross_validate_fn()
and validate_fn()
or as
starting points.
predict_functions()
is added. Contains simple
predict_fn
examples that can be used in
cross_validate_fn()
and validate_fn()
or as
starting points.
preprocess_functions()
is added. Contains simple
preprocess_fn
examples that can be used in
cross_validate_fn()
and validate_fn()
or as
starting points.
update_hyperparameters()
is added. For managing
hyperparameters when writing custom model functions.
most_challenging()
is added. Finds the data points
that were the most difficult to predict.
plot_confusion_matrix()
is added. Creates a
ggplot
representing a given confusion matrix. Thanks to
Malte Lau Petersen (@maltelau), Maris Sala (@marissala) and Kenneth
Enevoldsen (@KennethEnevoldsen) for
feedback.
plot_metric_density()
is added. Creates a ggplot
density plot for a metric column.
font()
is added. Utility for setting font settings
(size, color, etc.) in plotting functions.
simplify_formula()
is added. Converts a formula with
inline functions to a simple formula where all variables are added
together (e.g. y ~ x*z + log(a) + (1|b)
->
y ~ x + z + a + b
). This is useful when passing a formula
to recipes::recipe()
, which doesn’t allow the inline
functions.
gaussian_metrics()
, binomial_metrics()
,
and multinomial_metrics()
are added. Can be used to select
metrics for the metrics
argument in many cvms
functions.
baseline_gaussian()
,
baseline_binomial()
, baseline_multinomial()
are added. Simple wrappers for baseline()
that are easier
to use and have simpler help files. baseline()
has a lot of
arguments that are specific to a family, which can be a bit
confusing.
wines
dataset is added. Contains a list of wine
varieties in an approximately Zipfian distribution.
musicians
dataset is added. This has been
generated for multiclass classification
examples.
predicted.musicians
dataset is added. This contains
cross-validated predictions of the musicians
dataset by
three algorithms. Can be used to demonstrate working with predictions
from repeated 5-fold stratified cross-validation.
Adds NRMSE(RNG)
, NRMSE(IQR)
,
NRMSE(STD)
, NRMSE(AVG)
metrics to
gaussian
evaluations. The RMSE
is normalized
by either target range (RNG), target interquartile range (IQR), target
standard deviation (STD), or target mean (AVG). Only
NRMSE(IQR)
is enabled by default.
Adds RMSLE
, RAE
, RSE
,
RRSE
, MALE
, MAPE
,
MSE
, TAE
and TSE
metrics to
gaussian
evaluations. RMSLE
, RAE
,
and RRSE
are enabled by default.
Adds Information Criterion metrics (AIC
,
AICc
, BIC
) to the binomial
and
multinomial
output of some functions (disabled by default).
These are based on the fitted model objects and will only work for some
types of models.
Adds Positive Class
column to binomial
evaluations.
Adds optional hyperparameter
argument to
cross_validate_fn()
. Pass a list of hyperparameters and
every combination of these will be cross-validated.
Adds optional preprocess_fn
argument to
cross_validate_fn()
. This can, for instance, be used to
standardize the training and test sets within the function. E.g., by
extracting the scaling and centering parameters from the training set
and apply them to both the training set and the test fold.
Adds Preprocess
column to output when
preprocess_fn
is passed. Contains returned parameters
(e.g. mean, sd) used in preprocessing.
Adds preprocess_once
argument to
cross_validate_fn()
. When preprocessing does not depend on
the current formula or hyperparameters, we might as well perform it on
each train/test split once, instead of for every model.
Adds metrics
argument to baseline()
.
Enable the non-default metrics you want a baseline evaluation
for.
Adds preprocessing
argument to
cross_validate()
and validate()
. Currently
allows “standardize”, “scale”, “center”, and “range”. Results will
likely not be affected noticeably by the preprocessing.
Adds add_targets
and
add_predicted_classes
arguments to
multiclass_probability_tibble()
.
Adds Observation
column in the nested predictions
tibble in cross_validate()
,
cross_validate_fn()
, validate()
, and
validate_fn()
. These indices can be used to identify which
observations are difficult to predict.
Adds SD
column in the nested predictions tibble in
evaluate()
when performing ID aggregated evaluation with
id_method = 'mean'
. This is the standard deviation of the
predictions for the ID.
Adds vignette:
Cross-validating custom model functions with cvms
Adds vignette:
Creating a confusion matrix with cvms
Adds vignette:
The available metrics in cvms
Adds vignette: Evaluate by ID/group
The metrics
argument now allows setting a boolean
for "all"
inside the list to enable or disable all the
metrics. For instance, the following would disable all the metrics
except RMSE
:
metrics = list("all" = FALSE, "RMSE" = TRUE)
.
multinomial
evaluation results now contain the
Results
tibble with the results for each fold column. The
main metrics are now averages of these fold column results. Previously,
they were not aggregated by fold column first. In the unit tests, this
has not altered the results, but it is a more correct approach.
The prediction column(s) in evaluate()
must be
either numeric or character, depending on the format chosen.
In binomial
evaluate()
, it’s now
possible to pass predicted classes instead of probabilities.
Probabilities still carry more information though. Both the prediction
and target columns must have type character in this format.
Changes the required arguments in the predict_fn
function passed to cross_validate_fn()
.
Changes the required arguments in the model_fn
function passed to cross_validate_fn()
.
Warnings and messages from preprocess_fn
are caught
and added to Warnings and Messages
. Warnings are counted in
Other Warnings
.
Nesting is now done with dplyr::group_nest
instead
of tidyr::nest_legacy
for speed improvements.
caret
, mltools
, and
ModelMetrics
are no longer dependencies. The confusion
matrix metrics have instead been implemented in cvms
(see
confusion_matrix()
).
select_metrics()
now works with a wider range of
inputs as it no longer depends on a Family
column.
The Fixed
column in some of the output tibbles have
been moved to make it clearer which model was evaluated.
Better handling of inline functions in formulas.
evaluate()
, when used on a grouped data
frame. The row order in the output was not guaranteed to fit the
grouping keys.Fixes documentation in cross_validate_fn()
. The
examples section contained an unreasonable number of mistakes
:-)
In cross_validate_fn()
, warnings and messages from
the predict function are now included in
Warnings and Messages
. The warnings are counted in
Other Warnings
.
Breaking change: In evaluate()
, when
type
is multinomial
, the output is now a
single tibble. The Class Level Results
are included as a
nested tibble.
Breaking change: In baseline()
, lmer
models are now fitted with REML = FALSE
by
default.
Adds REML
argument to
baseline()
.
cross_validate_fn()
is added. Cross-validate custom
model functions.
Bug fix: the control
argument in
cross_validate()
was not being used. Now it is.
In cross_validate()
, the model is no longer fitted
twice when a warning is thrown during fitting.
Adds metrics
argument to
cross_validate()
and validate()
. Allows
enabling the regular Accuracy
metric in
binomial
or to disable metrics (will currently still be
computed but not included in the output).
AICc
is now computed with the MuMIn
package instead of the AICcmodavg
package, which is no
longer a dependency.
Adds lifecycle
badges to the function
documentation.
evaluate()
is added. Evaluate your model’s
predictions with the same metrics as used in
cross_validate()
.
Adds 'multinomial'
family/type to
baseline()
and evaluate()
.
Adds multiclass_probability_tibble()
for generating
a random probability tibble.
Adds random_effects
argument to
baseline()
for adding random effects to the Gaussian
baseline model.
Adds Zenodo DOI for easier citation.
In nested confusion matrices, the Reference column is renamed to Target, to use the same naming scheme as in the nested predictions.
Bug fix: p-values are correctly added to the nested coefficients tibble. Adds tests of this table as well.
Adds extra unit tests to increase code coverage.
When argument "model_verbose"
is TRUE
,
the used model function is now messaged instead of printed.
Adds badges to README, including travis-ci status, AppVeyor status, Codecov, min. required R version, CRAN version and monthly CRAN downloads. Note: Zenodo badge will be added post release.
R v. 3.5
Adds optional parallelization.
Results now contain a count of singular fit messages. See
?lme4::isSingular
for more information.
Argument "positive"
changes default value to 2. Now
takes either 1 or 2 (previously 0 and 1). If your dependent variable has
values 0 and 1, 1 is now the positive class by default.
AUC calculation has changed. Now explicitly sets the direction in
pROC::roc
.
Unit tests have been updated for the new random sampling
generator in R 3.6.0
. They will NOT run previous versions
of R.
Adds baseline()
for creating baseline
evaluations.
Adds reconstruct_formulas()
for reconstructing
formulas based on model definition columns in the results
tibble.
Adds combine_predictors()
for generating model
formulas from a set of fixed effects.
Adds select_metrics()
for quickly selecting the
metrics and model definition columns.
Breaking change: Metrics have been rearranged and a few metrics have been added.
Breaking change: Renamed argument folds_col
to
fold_cols
to better fit the new repeated cross-validation
option.
New: repeated cross-validation.
Created package :)