This NEWS file now follows the Keep a Changelog format.
Removed lifecycle badge from README file.
The training data has to be explicitly passed in more cases when
using vi_permute(), vi_shap(), and
vi_firm().
Raised R version dependency to >= 4.1.0 (the introduction of
the native piper operator |>).
The vi_permute function now relies on the yardstick
package for compouting performance measures (e.g., RMSE and log loss);
consequently, user-supplied metric functions now nned to conform to yardstick metric
argument names.
The var_fun argument in vi_firm() has
been deprecated; use the new var_continuous and
var_categorical instead.
The explicit ice argument in vi_firm()
has been removed; it was not really needed since it can be passed via
the ... argument.
Removed magrittr from imports; it’s easy enough to just laod the package if you need it or use R’s newer internal pipe operator.
Tweaked examples.
Tests based on fastshap now check to make sure it’s available.
Suppress loading of mixOmics in tests.
Switched lifecycle badge from “maturing”, which has been superseded, to “experimental.”
Fixed H2O
URL in vi_model.R.
Removed the unnecessary LazyData: true line from the
DESCRIPTION file.
Switched to using markdown syntax in roxygen2
comments.
vi_model() now supports lightgbm models.
Thanks to @nipnipj
for the suggestion (#146).
The permutation importance method (i.e., function
vi_permute()) now integrates with and uses yardstick
performance metrics.
list_metrics() gained an additional
smaller_is_better column indicating whether or not the
corresponding metric should be minimized
(smaller_is_better = TRUE) or maximized
(smaller_is_better = FALSE); thanks to @topedo. Additionally, all
the column names are now in lower case.
Added support for partial least squares via the mixOmics package (PR #129); thanks to @topedo.
Added support for the workflows and parsnip packages from the tidymodels ecosystem (PR #128); thanks to @topedo.
New pkgdown site and vignette based on our original R Journal article.
add_sparklines() seems out of scope and has
been removed.vint() also seems out of scope and is too slow
to implement for most practical problems; for now, the function will
likely live on in the moreparty
package.tools/ to .Rbuildignore.Change http://spark.rstudio.com/mlib/ to https://spark.rstudio.com/mlib/ in NEWS.md.
Remove unnecessary codecov.yml file.
vip(); in particular,
bar, width, alpha,
color, fill, size, and
shape. Users should instead rely on the
mapping and aesthetics arguments; see
?vip::vip for details.vi_model()
with the glmnet
package. In particular, we added a new lamnda parameter for
specifying the value of the penalty term to use when extracting the
estimated coefficients. This is equivalent to the s
argument in glmnet::coef(); the name lambda
was chosen to not conflict with other arguments in vi().
Additionally, vi_model() did not return the absolute value
of the estimated coefficients for glmnet models like
advertised, but is now fixed in this version (#103).Switched from Travis-CI to GitHub Actions for continuous integration.
Added a CITATION file and PDF-based vignette based off of the published article in The R Journal (#109).
Switch from tibble::as.tibble()—which was deprecated
in tibble 2.0.0—to
tibble::as_tibble() in a few function calls (#101).
Importance column from vi_model() no
longer contains “inner” names; in accordance with breaking changes in tibble 3.0.0.Added support for SHAP-based feature importance which makes use
of the recent fastshap package
on CRAN. To use, simply call vi() or vip() and
specify method = "shap", or you can just call
vi_shap() directly (#87).
Added support for the parsnip, mlr, and mlr3 packages (#94).
Added support for "mvr" objects from the pls package (currently
just calls caret::varImp()) (#35).
The "lm" method for vi_model() gained a
new type argument that allows users to use either (1) the
raw coefficients if the features were properly standardized
(type = "raw"), or (2) the absolute value of the
corresponding t- or z-statistic
(type = "stat", the default) (#77).
New function gen_friedman() for simulating data from
the Friedman 1 benchmark problem; see ?vip::gen_friedman
for details.
The vi_pdp() and vi_ice() functions
have been deprecated and merged into a single new function called
vi_firm(). Consequently, setting
method = "pdp" and method = "ice" has also
been deprecated; use method = "firm" instead.
The metric and pred_wrapper arguments
to vi_permute() are no longer optional.
The vip() function gained a new argument,
geom, for specifying which type of plot to construct.
Current options are geom = "col" (the default),
geom = "point", geom = "boxplot", or
geom = "violin" (the latter two only work for
permutation-based importance with nsim > 1) (#79).
Consequently, the bar argument has been removed.
The vip() function gained two new arguments for
specifying aesthetics: mapping and aesthetics
(for fixed aesthetics like color = "red"). Consequently,
the arguments color, fill, etc. have been
removed (#80).
An example illustrating the above two changes is given below:
# Load required packages
library(ggplot2)  # for `aes_string()` function
# Load the sample data
data(mtcars)
# Fit a linear regression model
model <- lm(mpg ~ ., data = mtcars)
# Construct variable importance plots
p1 <- vip(model)
p2 <- vip(model, mapping = aes_string(color = "Sign"))
p3 <- vip(model, type = "dotplot")
p4 <- vip(model, type = "dotplot", mapping = aes_string(color = "Variable"),
          aesthetics = list(size = 3))
grid.arrange(p1, p2, p3, p4, nrow = 2)vip() function gained a new argument,
include_type, which defaults to FALSE. If
TRUE, the type of variable importance that was computed is
included in the appropriate axis label. Set
include_type = TRUE to revert to the old behavior.Removed dependency on ModelMetrics
and the built-in family of performance metrics (metric_*())
are now documented and exported. See, for example,
?vip::metric_rmse (#93).
Minor documentation improvements.
The internal (i.e., not exported)
get_feature_names() function does a better job with
"nnet" objects containing factors. It also does a better
job at extracting feature names from model objects containing a
"formula" component.
vi_model() now works correctly for
"glm" objects with non-Gaussian families (e.g., logistic
regression) (#74).
Added appropriate sparklyr version dependency (#59).
Removed warnings from experimental functions.
vi_permute() gained a type argument (i.e.,
type = "difference" or type = "ratio"); this
argument can be passed via vi() or vip() as
well.
add_sparklines() creates an HTML widget to display
variable importance scores with a sparkline representation of each
features effect (i.e., its partial dependence function) (#64).
Added support for the Olden and Garson algorithms with neural networks fit using the neuralnet, nnet, and RSNNS packages (#28).
Added support for GLMNET models fit using the glmnet package (with and without cross-validation).
The pred_fun argument in vi_permute()
has been changed to pred_wrapper.
The FUN argument to vi(),
vi_pdp(), and vi_ice() has been changed to
var_fun.
Only the predicted class probabilities for the reference class
are required (as a numeric vector) for binary classification when
metric = "auc" or metric = "logloss".
vi_permute() gained a new logical keep
argument. If TRUE (the default), the raw permutation scores
from all nsim repetitions (provided
nsim > 1) will be stored in an attribute called
"raw_scores".
vip() gained new logical arguments
all_permutations and jitter which help to
visualize the raw permutation scores for all nsim
repetitions (provided nsim > 1).
You can now pass a type argument to
vi_permute() specifying how to compare the baseline and
permuted performance metrics. Current choices are
"difference" (the default) and
"ratio".
Improved documentation (especially for vi_permute()
and vi_model()).
Results from vi_model(), vi_pdp(),
vi_ice(), and vi_permute() now have class
"vi", making them easier to plot with
vip().
Added nsim argument to vi_permute() for
reducing the sampling variability induced by permuting each predictor (#36).
Added sample_size and sample_frac
arguments to vi_permute() for reducing the size of the
training sample for every Monte Carlo repetition (#41).
Greatly improved the documentation for vi_model()
and the various objects it supports.
New argument rank, which defaults to
FALSE, available in vi() (#55).
Added support for Spark (G)LMs.
vi() is now a generic which makes adding new methods
easier (e.g., to support DataRobot models).
Bug fixes.
Fixed bug in get_feature_names.ranger() s.t. it
never returns NULL; it either returns the feature names or
throws an error if they cannot be recovered from the model object (#43).
Added pkgdown site:
https://github.com/koalaverse/vip.
Changed truncate_feature_names argument of
vi() to abbreviate_feature_names which
abbreviates all feature names, rather than just truncating
them.
Added CRAN-related badges (#32).
New generic vi_permute() for constructing
permutation-based variable importance scores (#19).
Fixed bug and unnecessary error check in vint() (#38).
New vignette on using vip with unsupported models
(using the Keras API to TensorFlow as an example).
Added basic sparklyr support.
Added support for XGBoost models (i.e., objects of class
"xgb.booster").
Added support for ranger models (i.e., objects of class
"ranger").
Added support for random forest models from the
party package (i.e., objects of class
"RandomForest").
vip() gained a new argument,
num_features, for specifying how many variable importance
scores to plot. The default is set to 10.
. was changed to _ in all argument
names.
vi() gained three new arguments:
truncate_feature_names (for truncating feature names in the
returned tibble), sort (a logical argument specifying
whether or not the resulting variable importance scores should be
sorted), and decreasing (a logical argument specifying
whether or not the variable importance scores should be sorted in
decreasing order).
vi_model.lm(), and hence vi(), contains
an additional column called Sign that contains the sign of
the original coefficients (#27).
vi() gained a new argument, scale, for
scaling the variable importance scores so that the largest is 100.
Default is FALSE (#24).
vip() gained two new arguments, size
and shape, for controlling the size and shape of the points
whenever bar = FALSE (#9).
Added support for "H2OBinomialModel",
"H2OMultinomialModel", and,
"H2ORegressionModel" objects (#8).