All messages, warnings and errors has been translated to use {cli} package (#517, #522).
Bug was fixed in roc_curve_survival() where wrong
weights were used. (#495, @asb2111).
Output of roc_curve_survival() now returns columns
in same order as roc_curve(). (#498)
The Brier score for survival data was added with
brier_survival().
The Integrated Brier score for survival data was added with
brier_survival_integrated().
The Concordance index for survival data was added with
concordance_survival().
Time-Dependent ROC curves estimation for right-censored data can
now be calculated with roc_curve_survival().
Time-Dependent ROC AUC estimation for right-censored data can now
be calculated with roc_auc_survival().
demographic_parity(), equalized_odds(),
and equal_opportunity() are new metrics for measuring model
fairness. Each is implemented with the
new_groupwise_metric() constructor, a general interface for
defining group-aware metrics that allows for quickly and flexibly
defining fairness metrics with the problem context in mind.
metric_set() can now be used with a combination of
dynamic and static survival metrics.
Added a print method for metrics and metric sets (#435).
All warnings and errors have been updated to use the cli package for increased clarity and consistency. (#456, #457, #458)
brier_survival_integrated() now throws an error if
input data only includes 1 evalution time point. (#460)
Clarifying documentation about how event_level
always default to "first. (#432)
estimate
argument is wrongly used. (#443)NA when missing values are found and
na_rm = FALSE. (#344)brier_class() (#139).The global option, yardstick.event_first, has been
hard deprecated in favor of using explicit argument,
event_level. Setting this option will now produce an
warning, but won’t have any effect. (#173)
Removed start-up message about event_level
argument.
yardstick metric functions now use a pure tidyselect interface
for truth, estimate, and the ...
of class probability metrics (#322).
Changed the default aspect ratio for PR curves to be 1.0.
Error messages now show what user-facing function was called (#348).
classification and probability metrics now fully support
class_pred objects from {probably} package (#341).
Using metric_set() on a metric created with
metric_tweak() will no longer produces an error, and will
favor arguments set with metric_tweak() (#351).
Metric summarizers no longer error if column names in
data conflict with argument names (#382).
conf_mat() no longer throw errors listed as internal
(#327).
metric_vec_template() is being soft deprecated in
favor of a more manual and flexible metric creation approach.
yardstick_remove_missing() and
yardstick_any_missing() have been added for treatment of
missing values. check_class_metric(),
check_numeric_metric(), and
check_prob_metric() have been added to perform standardized
input checking for classification, regression and class probability
metrics respectively. These changes mean that it is the developer’s
responsibility to perform validation of truth and
estimate input. (#337).
metric_summarizer() is being soft deprecated in
favor of the more specific newly added
class_metric_summarizer(),
numeric_metric_summarizer(),
prob_metric_summarizer(), and
curve_metric_summarizer() (#322).
dots_to_estimate() is being soft deprecated along
with metric_summarizer(). dots_to_estimate()
is not needed with prob_metric_summarizer(), and
curve_metric_summarizer() (#329).
Emil Hvitfeldt is now the maintainer (#315).
Improved on the chained error thrown by metric_set()
when one of the metric computations fails (#313).
All yardstick metrics now support case weights through the new
case_weights argument. This also includes metric-adjacent
functions like roc_curve(), pr_curve(),
conf_mat(), and metric_set().
The options argument of roc_curve(),
roc_auc(), roc_aunp(),
roc_aunu(), and metrics() that was passed
along to the pROC package is now deprecated and no longer has any
affect. This is a result of changing to an ROC curve implementation that
supports case weights, but does not support any of the previous options.
If you need these options, we suggest wrapping pROC yourself in a custom
metric (#296).
conf_mat() now ignores any inputs passed through
... and warns if you try to do such a thing. Previously,
those were passed on to base::table(), but with the
addition of case weight support, table() is no longer used
(#295).
Fixed a small mistake in ccc() where the unbiased
covariance wasn’t being used when bias = FALSE.
j_index() now throws a more correct warning if
0 is in the denominator when computing sens()
internally. Additionally, in the multiclass case it now removes the
levels where this occurs from the multiclass weighted average
computation, which is consistent with how other metrics were updated to
handle this in #118 (#265).
Improved on some possible ambiguity in the documentation of the
data argument for all metrics (#255).
purrr has been removed from Suggests.
The pROC package has been removed as a dependency (#300).
Moved the Custom Metrics vignette to tidymodels.org (#236).
New metric poisson_log_loss() was added
(#146).
sensitivity() and specificity() now
work correctly with the tune and workflowsets packages (#232).
roc_curve() now throws a more informative error if
truth doesn’t have any control or event
observations.
dplyr 1.0.0 is now required. This allowed us to remove multiple
usages of dplyr::do() in favor of
dplyr::summarise(), which can now return packed data frame
columns and multiple rows per group.
Removed internal hardcoding of "dplyr_error" to
avoid issues with an upcoming dplyr 1.0.8 release (#244).
Updated test suite to testthat 3e (#243).
Internal upkeep has been done to move from
rlang::warn(.subclass = ) to
rlang::warn(class = ), since the .subclass
argument has been deprecated (#225).
New metric_tweak() for adjusting the default values
of optional arguments in an existing yardstick metric. This is useful to
quickly adjust the defaults of a metric that will be included in a
metric_set(), especially if that metric set is going to be
used for tuning with the tune package (#206, #182).
New classification_cost() metric for computing the
cost of a poor class probability prediction using user-defined costs
(#3).
New msd() for computing the mean signed deviation
(also called mean signed difference, or mean signed error) (#183, @datenzauberai).
class_pred objects from the probably package are now
supported, and are automatically converted to factors before computing
any metric. Note that this means that any equivocal values are
materialized as NA (#198).
The kap() metric has a new weighting
argument to apply linear or quadratic weightings before computing the
kappa value (#2, #125, @jonthegeek).
When sens() is undefined when computing
ppv(), npv(), j_index(), or
bal_accuracy(), a sensitivity warning is now correctly
thrown, rather than a recall warning (#101).
The autoplot() method for gain curves now plots the
curve line on top of the shaded polygon, resulting in a sharper look for
the line itself (#192, @eddjberry).
The autoplot() methods for conf_mat now
respect user-defined dimension names added through
conf_mat(dnn = ) or from converting a table with dimension
names to a conf_mat (#191).
Added an as_tibble() method for
metric_set objects. Printing a metric_set now
uses this to print out a tibble rather than a data frame
(#186).
Re-licensed package from GPL-2 to MIT. See consent from copyright holders here (#204).
The global option, yardstick.event_first, has been
deprecated in favor of the new explicit argument,
event_level. All metric functions that previously supported
changing the “event” level have gained this new argument. The global
option was a historical design decision that can be classified as a case
of a hidden argument. Existing code that relied on this global option
will continue to work in this version of yardstick, however you will now
get a once-per-session warning that requests that you update to instead
use the explicit event_level argument. The global option
will be completely removed in a future version. To update, follow the
guide below (#163).
`options(yardstick.event_first = TRUE)`  -> `event_level = "first"` (the default)
`options(yardstick.event_first = FALSE)` -> `event_level = "second"`The roc_auc() Hand-Till multiclass estimator will
now ignore levels in truth that occur zero times in the
actual data. With other methods of multiclass averaging, this usually
returns an NA, however, ignoring levels in this manner is
more consistent with implementations in the HandTill2001 and pROC
packages (#123).
roc_auc() and roc_curve() now set
direction = "<" when computing the ROC curve using
pROC::roc(). Results were being computed incorrectly with
direction = "auto" when most probability values were
predicting the wrong class (#123).
mn_log_loss() now respects the (deprecated) global
option yardstick.event_first. However, you should instead
change the relevant event level through the event_level
argument.
metric_set() now strips the package name when
auto-labeling functions (@rorynolan, #151).
There are three new helper functions for more easily creating
custom metric functions: new_class_metric(),
new_prob_metric(), and
new_numeric_metric().
Rcpp has been removed as a direct dependency.
roc_auc() now warns when there are no events or
controls in the provided truth column, and returns
NA (@dpastling, #132).
Adds sensitivity() and specificity() as
aliases for sens() and spec()
respectively, avoids conflict with other packages
e.g. readr::spec().
roc_aunu() and roc_aunp() are two new
ROC AUC metrics for multiclass classifiers. These measure the AUC of
each class against the rest, roc_aunu() using the uniform
class distribution (#69) and roc_aunp() using the a priori
class distribution (#70).
The autoplot() heat map for confusion matrices now
places the predicted values on the x axis and the truth
values on the y axis to be more consistent with the
confusion matrix print() method.
The autoplot() mosaic plot for confusion matrices
had the x and y axis labels backwards. This
has been corrected.
iic() is a new numeric metric for computing the
index of ideality of correlation. It can be seen as a potential
alternative to the traditional correlation coefficient, and has been
used in QSAR models (@jyuu, #115).
average_precision() is a new probability metric that
can be used as an alternative to pr_auc(). It has the
benefit of avoiding any issues of ambiguity in the case where
recall == 0 and the current number of false positives is
0.
metric_set() output now includes a
metrics attribute which contains a list of the original
metric functions used to generate the metric set.
Each metric function now has a direction attribute
attached to it, specifying whether to minimize or maximize the
metric.
Classification metrics that can potentially have a 0
value denominator now throw an informative warning when this case
occurs. These include recall(), precision(),
sens(), and spec() (#98).
The autoplot() method for pr_curve()
has been improved to always set the axis limits to
c(0, 1).
All valid arguments to pROC::roc() are now utilized,
including those passed on to pROC::auc().
Documentation for class probability metrics has been improved with more informative examples (@rudeboybert, #100).
mn_log_loss() now uses the min/max rule before
computing the log of the estimated probabilities to avoid problematic
undefined log values (#103).
pr_curve() now places a 1 as the first
precision value, rather than NA. While NA is
technically correct as precision is undefined here, 1 is
practically more correct because it generates a correct PR Curve graph
and, more importantly, allows pr_auc() to compute the
correct AUC.
pr_curve() could generate the wrong results in the
somewhat rare case when two class probability estimates were the same,
but had different truth values.
pr_curve() (and subsequently pr_auc())
now generates the correct curve when there are duplicate class
probability values (reported by @dariyasydykova, #93).
Binary mcc() now avoids integer overflow when the
confusion matrix elements are large (#108).
mase() is a numeric metric for the mean absolute
scaled error. It is generally useful when forecasting with time series
(@alexhallam,
#68).
huber_loss() is a numeric metric that is less
sensitive to outliers than rmse(), but is more sensitive
than mae() for small errors (@blairj09, #71).
huber_loss_pseudo() is a smoothed form of
huber_loss() (@blairj09, #71).
smape() is a numeric metric that is based on
percentage errors (@riazhedayati, #67).
conf_mat objects now have two
ggplot2::autoplot() methods for easy visualization of the
confusion matrix as either a heat map or a mosaic plot (@EmilHvitfeldt,
#10).
metric_set() now returns a classed function. If numeric
metrics are used, a "numeric_metric_set" function is
returned. If class or probability metrics are used, a
"class_prob_metric_set" is returned.Tests related to the fixed R 3.6 sample() function
have been fixed.
f_meas() propagates NA values from
precision() and recall() correctly
(#77).
All "micro" estimators now propagate NA
values through correctly.
roc_auc(estimator = "hand_till") now correctly
computes the metric when the column names of the probability matrix are
not the exact same as the levels of truth. Note that the
computation still assumes that the order of the supplied probability
matrix columns still matches the order of levels(truth),
like other multiclass metrics (#86).
A desire to standardize the yardstick API is what drove these breaking changes. The output of each metric is now in line with tidy principles, returning a tibble rather than a single numeric. Additionally, all metrics now have a standard argument list so you should be able to switch between metrics and combine them together effortlessly.
All metrics now return a tibble rather than a single numeric
value. This format allows metrics to work with grouped data frames (for
resamples). It also allows you to bundle multiple metrics together with
a new function, metric_set().
For all class probability metrics, now only 1 column can be
passed to ... when a binary implementation is used. Those
metrics will no longer select only the first column when multiple
columns are supplied, and will instead throw an error.
The summary() method for conf_mat
objects now returns a tibble to be consistent with the change to the
metric functions.
For naming consistency, mnLogLoss() was renamed to
mn_log_loss()
mn_log_loss() now returns the
negative log loss for the multinomial
distribution.
The argument na.rm has been changed to
na_rm in all metrics to align with the
tidymodels model implementation principles.
Each metric now has a vector interface to go alongside the data
frame interface. All vector functions end in _vec(). The
vector interface accepts vector/matrix inputs and returns a single
numeric value.
Multiclass support has been added for each classification metric.
The support varies from one metric to the next, but generally macro and
micro averaging is available for all metrics, with some metrics having
specialized multiclass implementations (for example,
roc_auc() supports the multiclass generalization presented
in a paper by Hand and Till). For more information, see
vignette("multiclass", "yardstick").
All metrics now work with grouped data frames. This produces a tibble with as many rows as there are groups, and is useful when used alongside resampling techniques.
mape() calculates the mean absolute percent
error.
kap() is a metric similar to accuracy()
that calculates Cohen’s kappa.
detection_prevalence() calculates the number of
predicted positive events relative to the total number of
predictions.
bal_accuracy() calculates balanced accuracy as the
average of sensitivity and specificity.
roc_curve() calculates receiver operator curves and
returns the results as a tibble.
pr_curve() calculates precision recall
curves.
gain_curve() and lift_curve() calculate
the information used in gain and lift curves.
gain_capture() is a measure of performance similar
in spirit to AUC but applied to a gain curve.
pr_curve(), roc_curve(),
gain_curve() and lift_curve() all have
ggplot2::autoplot() methods for easy
visualization.
metric_set() constructs functions that calculate
multiple metrics at once.
The infrastructure for creating metrics has been exposed to allow
users to extend yardstick to work with their own metrics. You might want
to do this if you want your metrics to work with grouped data frames out
of the box, or if you want the standardization and error checking that
yardstick already provides. See
vignette("custom-metrics", "yardstick") for a few
examples.
A vignette describing the three classes of metrics used in
yardstick has been added. It also includes a list of every metric
available, grouped by class. See
vignette("metric-types", "yardstick").
The error messages in yardstick should now be much more
informative, with better feedback about the types of input that each
metric can use and about what kinds of metrics can be used together
(i.e. in metric_set()).
There is now a grouped_df method for
conf_mat() that returns a tibble with a list column of
conf_mat objects.
Each metric now has its own help page. This allows us to better document the nuances of each metric without cluttering the help pages of other metrics.
broom has been removed from Depends, and is replaced
by generics in Suggests.
tidyr and ggplot2 have been moved to
Suggests.
MLmetrics has been removed as a dependency.