The most significant updates are the addition of p-values for the ALE statistics, the launching of a pkgdown website which will henceforth host the development version of the package, and parallelization of core functions with a resulting performance boost.
One of the key goals for the {ale}
package is that it would be truly model-agnostic: it should support any R object that can be considered a model, where a model is defined as an object that makes a prediction for each input row of data that it is provided. Towards this goal, we had to adjust the custom predict function to make it more flexible for various kinds of model objects. We are happy that our changes now enable support for tidymodels
objects and various survival models (but for now, only those that return single-vector predictions). So, in addition to taking required object
and newdata
arguments, the custom predict function pred_fun
in the ale()
function now also requires an argument for type
to specify the prediction type, whether it is used or not. This change breaks previous code that used custom predict functions, but it allows ale
to analyze many new model types than before. Code that did not require custom predict functions should not be affected by this change. See the updated documentation of the ale()
function for details.
Another change that breaks former code is that the arguments for model_bootstrap()
have been modified. Instead of a cumbersome model_call_string
, model_bootstrap()
now uses the {insight}
package to automatically detect many R models and directly manipulate the model object as needed. So, the second argument is now the model
object. However, for non-standard models that {insight}
cannot automatically parse, a modified model_call_string
is still available to assure model-agnostic functionality. Although this change breaks former code that ran model_bootstrap()
, we believe that the new function interface is much more user-friendly.
A slight change that might break some existing code is that the conf_regions
output associated with ALE statistics has been restructured. The new structure provides more useful information. See help(ale)
for details.
pkgdown
website located at https://tripartio.github.io/ale/. This is where the most recent development features will be documented.create_p_funs()
function for details and an example.vignette('ale-statistics')
for details. The vignette has been expanded with more details on how to properly interpret normalized ALE statistics.vignette('ale-statistics')
for details.{furrr}
library. In our tests, practically, we typically found speed-ups of n – 2
where n
is the number of physical cores (machine learning is generally unable to use logical cores). For example, a computer with 4 physical cores should see at least ×2 speed-up and a computer with 6 physical cores should see at least ×4 speed-up. However, parallelization is tricky with our model-agnostic design. When users work with models that follow standard R conventions, the {ale}
package should be able to automatically configure the system for parallelization. But for some non-standard models users may have to explicitly list the model’s packages in the new model_packages
argument so that each parallel thread can find all necessary functions. This is only a concern if you get weird errors. See help(ale)
for details.ale()
function. See help(ale)
for details.median_band_pct
argument to ale()
now takes a vector of two numbers, one for the inner band and one for the outer.{gridExtra}
with {patchwork}
for examples and vignettes for printing plots.ale()
function documentation from ale-package
documentation.alt
tags to describe plots for accessibility.{insight}
package to automatically detect y_col and model call objects when possible; this increases the range of automatic model detection of the ale
package in general.{progressr}
package for progress bars. With the cli
progression handler, this enables accurate estimated times of arrival (ETA) for long procedures, even with parallel computing. A message is displayed once per session informing users of how to customize their progress bars. For details, see help(ale)
, particularly the documentation on progress bars and the silent
argument.{ggplot2}
from a dependency to an import. So, it is no longer automatically loaded with the package.var_summary()
function. In particular, encodes whether the user is using p-values (ALER band) or not (median band).validation.R
file.compact_plots
to plotting functions to strip plot environments to reduce the size of returned objects. See help(ale)
for details.package_scope
environment.ale_ixn()
).ale_ixn()
).ale()
does not yet support multi-output model prediction types (e.g., multi-class classification and multi-time survival probabilities).This version introduces various ALE-based statistics that let ALE be used for statistical inference, not just interpretable machine learning. A dedicated vignette introduces this functionality (see “ALE-based statistics for statistical inference and effect sizes” from the vignettes link on the main CRAN page at https://CRAN.R-project.org/package=ale). We introduce these statistics in detail in a working paper: Okoli, Chitu. 2023. “Statistical Inference Using Machine Learning and Classical Techniques Based on Accumulated Local Effects (ALE).” arXiv. https://doi.org/10.48550/arXiv.2310.09877. Please note that they might be further refined after peer review.
ale()
and model_bootstrap()
now output these statistics. (ale_ixn()
will come later.)ale
package with the reference {ALEPlot} package: “Comparison between {ALEPlot}
and {ale}
packages” (available from the vignettes link on the main CRAN page at https://CRAN.R-project.org/package=ale).var_cars
is a modified version of mtcars that features many different types of variables.census
is a polished version of the adult income dataset used for a vignette in the {ALEPlot}
package.silent = TRUE
to ale()
, ale_ixn()
, or model_bootstrap()
.seed
argument to ale()
, ale_ixn()
, or model_bootstrap()
.By far the most extensive changes have been to assure the accuracy and stability of the package from a software engineering perspective. Even though these are not visible to users, they make the package more robust with hopefully fewer bugs. Indeed, the extensive data validation may help users debug their own errors.
{assertthat}
package; if not, the function fails quickly with an appropriate error message.{testthat}
package is now used for testing the outputs of each user-facing function. This should help the code base to be more robust going forward with future developments.{ALEPlot}
package. These tests should ensure that any future code that breaks the accuracy of ALE calculations will be caught quickly.ale_ixn()
).ale_ixn()
).This is the first CRAN release of the ale
package. Here is its official description with the initial release:
Accumulated Local Effects (ALE) were initially developed as a model-agnostic approach for global explanations of the results of black-box machine learning algorithms. (Apley, Daniel W., and Jingyu Zhu. “Visualizing the effects of predictor variables in black box supervised learning models.” Journal of the Royal Statistical Society Series B: Statistical Methodology 82.4 (2020): 1059-1086 doi:10.1111/rssb.12377.) ALE has two primary advantages over other approaches like partial dependency plots (PDP) and SHapley Additive exPlanations (SHAP): its values are not affected by the presence of interactions among variables in a model and its computation is relatively rapid. This package rewrites the original code from the ‘ALEPlot’ package for calculating ALE data and it completely reimplements the plotting of ALE values.
(This package uses the same GPL-2 license as the {ALEPlot}
package.)
This initial release replicates the full functionality of the {ALEPlot}
package and a lot more. It currently presents three functions:
ale()
: create data for and plot one-way ALE (single variables). ALE values may be bootstrapped.ale_ixn()
: create data for and plot two-way ALE interactions. Bootstrapping of the interaction ALE values has not yet been implemented.model_bootstrap()
: bootstrap an entire model, not just the ALE values. This function returns the bootstrapped model statistics and coefficients as well as the bootstrapped ALE values. This is the appropriate approach for small samples.This release provides more details in the following vignettes (they are all available from the vignettes link on the main CRAN page at https://CRAN.R-project.org/package=ale):
ale
packageale()
function handling of various datatypes for x