% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cv_super_learner.R
\name{cv_super_learner}
\alias{cv_super_learner}
\title{Cross-Validating a \code{super_learner}}
\usage{
cv_super_learner(
  data,
  learners,
  formulas,
  y_variable = NULL,
  n_folds = 5,
  determine_super_learner_weights = determine_super_learner_weights_nnls,
  ensemble_or_discrete = "ensemble",
  cv_schema = cv_random_schema,
  outcome_type = "continuous",
  extra_learner_args = NULL,
  cluster_ids = NULL,
  strata_ids = NULL,
  weights = NULL,
  loss_metric,
  use_complete_cases = FALSE
)
}
\arguments{
\item{data}{Data to use in training a \code{super_learner}.}

\item{learners}{A list of predictor/closure-returning-functions. See Details.}

\item{formulas}{Either a single regression formula or a vector of regression formulas.}

\item{y_variable}{Typically \code{y_variable} can be inferred automatically from the \code{formulas}, but if needed, the y_variable can be specified explicitly.}

\item{n_folds}{The number of cross-validation folds to use in constructing the \code{super_learner}.}

\item{determine_super_learner_weights}{A function/method to determine the weights for each of the candidate \code{learners}. The default is to use \code{determine_super_learner_weights_nnls}.}

\item{ensemble_or_discrete}{Defaults to \code{'ensemble'}, but can be set to \code{'discrete'}. Discrete \code{super_learner()} chooses only one of the candidate learners to have weight 1 in the resulting prediction algorithm,
while \code{ensemble} \code{super_learner()} combines predictions from 1 or more candidate learners, with respective weights adding up to 1.}

\item{cv_schema}{A function that takes \code{data}, \code{n_folds} and returns a list containing \code{training_data} and \code{validation_data}, each of which are lists of \code{n_folds} data frames.}

\item{outcome_type}{One of 'continuous', 'binary', 'multiclass', or 'density'. \code{outcome_type} is used to infer the correct \code{determine_super_learner_weights} function if it is not explicitly passed.}

\item{extra_learner_args}{A list of equal length to the \code{learners} with additional arguments to pass to each of the specified learners.}

\item{cluster_ids}{(default: null) If specified, clusters will either be entirely assigned to training or validation (not both) in each cross-validation split.}

\item{strata_ids}{(default: null) If specified, strata are balanced across training and validation splits so that strata appear in both the training and validation splits.}

\item{weights}{If specified, (per observation) weights are used to
indicate that risk minimization across models (i.e., the meta-learning
step) should be targeted to higher weight observations.}

\item{loss_metric}{A loss metric function, like the mean-squared-error or negative-log-loss to be
used in evaluating the learners on held-out data and minimized through convex optimization.
A loss metric should take two (vector) arguments:
predictions, and true outcomes, and produce a single statistic summarizing the
performance of each learner. Defaults to the mean-squared-error \code{nadir:::mse()}.}

\item{use_complete_cases}{(default: FALSE) If the \code{data} passed have any NA or NaN missing data, restrict the \code{data} to
\code{data[complete.cases(data),]}.}
}
\value{
A list containing \code{$trained_learners} and \code{$cv_loss} which
respectively include 1) the trained super learner models on each fold of the data, their holdout predictions and,
2) the cross-validated estimate of the risk (expected loss) on held-out data.
}
\description{
Produce cv-rmse for a \code{super_learner} specified by a closure that
accepts data and returns a \code{super_learner} prediction function.
}
\details{
The idea is that \code{cv_super_learner} splits the data into training/validation
splits, trains \code{super_learner} on each training split, and then
evaluates their predictions on the held-out validation data, calculating
a root-mean-squared-error on those held-out data.

This function does print a message if the \code{loss_function} argument is
not set explicitly, letting the user know that the mean-squared-error will be
used by default. Pass in \code{loss_function = nadir:::mse} to
\code{super_learner()} if you'd like to suppress this message, or use a
similar approach for the appropriate loss function depending on context.
}
\examples{

  cv_super_learner(
    data = mtcars,
    formula = mpg ~ cyl + hp,
    learners = list(lnr_mean, lnr_lm))

}
