Builtin CPOs can be listed with listCPO()
.
listCPO()[, c("name", "category", "subcategory")]
NULLCPO
is the neutral element of
%>>%
. It is returned by some functions when no other
CPO or Retrafo is present.
NULLCPOis.nullcpo(NULLCPO)
%>>% cpoScale()
NULLCPO %>>% NULLCPO
NULLCPO print(as.list(NULLCPO))
pipeCPO(list())
A simple CPO with one parameter which gets applied to the data as CPO. This is different from a multiplexer in that its parameter is free and can take any value that behaves like a CPO. On the downside, this does not expose the argument’s parameters to the outside.
= cpoWrap()
cpa print(cpa, verbose = TRUE)
head(iris %>>% setHyperPars(cpa, wrap.cpo = cpoScale()))
head(iris %>>% setHyperPars(cpa, wrap.cpo = cpoPca()))
# attaching the cpo applicator to a learner gives this learner a "cpo" hyperparameter
# that can be set to any CPO.
getParamSet(cpoWrap() %>>% makeLearner("classif.logreg"))
Combine many CPOs into one, with an extra selected.cpo
parameter that chooses between them.
= cpoMultiplex(list(cpoScale, cpoPca))
cpm print(cpm, verbose = TRUE)
head(iris %>>% setHyperPars(cpm, selected.cpo = "scale"))
# every CPO's Hyperparameters are exported
head(iris %>>% setHyperPars(cpm, selected.cpo = "scale", scale.center = FALSE))
head(iris %>>% setHyperPars(cpm, selected.cpo = "pca"))
A CPO that builds data-dependent CPO networks. This is a generalized
CPO-Multiplexer that takes a function which decides (from the data, and
from user-specified hyperparameters) what CPO operation to perform.
Besides optional arguments, the used CPO’s Hyperparameters are exported
as well. This is a generalization of cpoMultiplex
; however,
requires
of the involved parameters are not adjusted, since
this is impossible in principle.
= cpoCase(pSS(logical.param: logical),
s.and.p export.cpos = list(cpoScale(),
cpoPca()),
cpo.build = function(data, target, logical.param, scale, pca) {
if (logical.param || mean(data[[1]]) > 10) {
%>>% pca
scale else {
} %>>% scale
pca
}
})print(s.and.p, verbose = TRUE)
The resulting CPO s.and.p
performs scaling and PCA, with
the order depending on the parameter logical.param
and on
whether the mean of the data’s first column exceeds 10. If either of
those is true, the data will be first scaled, then PCA’d, otherwise the
order is reversed. The all CPOs listed in .export
are
passed to the cpo.build
.
cbind
other CPOs as operation. The cbinder
makes it possible to build DAGs of CPOs that perform different
operations on data and paste the results next to each other.
= cpoScale(id = "scale")
scale = scale %>>% cpoPca()
scale.pca = cpoCbind(scaled = scale, pcad = scale.pca, original = NULLCPO) cbinder
# cpoCbind recognises that "scale.scale" happens before "pca.pca" but is also fed to the
# result directly. The summary draws a (crude) ascii-art graph.
print(cbinder, verbose = TRUE)
head(iris %>>% cbinder)
# the unnecessary copies of "Species" are unfortunate. Remove them with cpoSelect:
= cpoSelect(type = "numeric")
selector = cpoCbind(scaled = selector %>>% scale, pcad = selector %>>% scale.pca, original = NULLCPO)
cbinder.select
cbinder.selecthead(iris %>>% cbinder)
# alternatively, we apply the cbinder only to numerical data
head(iris %>>% cpoWrap(cbinder, affect.type = "numeric"))
cpoTransformParams
wraps another CPO
and
sets some of its hyperparameters to the value of expressions depending
on other hyperparameter values. This can be used to make a
transformation of parameters similar to the trafo
parameter
of a Param
in ParamHelpers
, but it can also be
used to set multiple parameters at the same time, depending on a single
new parameter.
= cpoTransformParams(cpoPca(), alist(pca.scale = pca.center))
cpo = pid.task %>|% setHyperPars(cpo, pca.center = FALSE)
retr getCPOTrainedState(retr)$control # both 'center' and 'scale' are FALSE
= cpoMultiplex(list(cpoIca(export = "n.comp"), cpoPca(export = "rank")))
mplx !mplx
= cpoTransformParams(mplx, alist(ica.n.comp = comp, pca.rank = comp),
mtx pSS(comp: integer[1, ]), list(comp = 1))
head(iris %>>% setHyperPars(mtx, selected.cpo = "ica", comp = 2))
head(iris %>>% setHyperPars(mtx, selected.cpo = "pca", comp = 3))
Implements the base::scale
function.
= data.frame(a = 1:3, b = -(1:3) * 10)
df %>>% cpoScale()
df %>>% cpoScale(scale = FALSE) # center = TRUE df
Implements stats::prcomp
. No scaling or centering is
performed.
%>>% cpoPca() df
Dummy encoding of factorial variables. Optionally uses the first factor as reference variable.
head(iris %>>% cpoDummyEncode())
head(iris %>>% cpoDummyEncode(reference.cat = TRUE))
Select to use only certain columns of a dataset. Select by column index, name, or regex pattern.
head(iris %>>% cpoSelect(pattern = "Width"))
# selection is additive
head(iris %>>% cpoSelect(pattern = "Width", type = "factor"))
Drops constant features or numerics, with variable tolerance
head(iris) %>>% cpoDropConstants() # drops 'species'
head(iris) %>>% cpoDropConstants(abs.tol = 0.2) # also drops 'Petal.Width'
Drops unused factors and makes sure prediction data has the same factor levels as training data.
levels(iris$Species)
= head(iris) %>>% cpoFixFactors() # Species only has level 'setosa' in train
irisfix levels(irisfix$Species)
= retrafo(irisfix)
rf c(1, 100, 140), ]
iris[c(1, 100, 140), ] %>>% rf iris[
Creates columns indicating missing data. Most useful in combination with cpoCbind.
= df
impdata 1]][1] = NA
impdata[[ impdata
%>>% cpoMissingIndicators()
impdata %>>% cpoCbind(NULLCPO, dummy = cpoMissingIndicators()) impdata
Apply an univariate function to data columns
head(iris %>>% cpoApplyFun(function(x) sqrt(x) - 10, affect.type = "numeric"))
Convert (non-numeric) features to numeric
head(iris[sample(nrow(iris), 10), ] %>>% cpoAsNumeric())
Combine low prevalence factors. Set
max.collapsed.class.prevalence
how big the combined factor
level may be.
= iris
iris2 $Species = factor(c("a", "b", "c", "b", "b", "c", "b", "c",
iris2as.character(iris2$Species[-(1:8)])))
head(iris2, 10)
head(iris2 %>>% cpoCollapseFact(max.collapsed.class.prevalence = 0.2), 10)
Specify which columns get used, and how they are transformed, using a
formula
.
head(iris %>>% cpoModelMatrix(~0 + Species:Petal.Width))
# use . + ... to retain originals
head(iris %>>% cpoModelMatrix(~0 + . + Species:Petal.Width))
scale values to a given range
head(iris %>>% cpoScaleRange(-1, 1))
Multiply features to set the maximum absolute value.
head(iris %>>% cpoScaleMaxAbs(0.1))
Normalize values row-wise
head(iris %>>% cpoSpatialSign())
There are two general and many specialised
imputation CPOs. The general imputation CPOs have parameters that let
them use different imputation methods on different columns. They are a
thin wrapper around mlr
’s impute()
and
reimpute()
functions. The specialised imputation CPOs each
implement exactly one imputation method and are closer to the behaviour
of typical CPOs.
cpoImpute
and cpoImputeAll
both have
parameters very much like impute()
. The latter assumes that
all columns of its input is somehow being imputed and can be
preprended to a learner to give it the ability to work with missing
data. It will, however, throw an error if data is missing after
imputation.
%>>% cpoImpute(cols = list(a = imputeMedian())) impdata
%>>% cpoImpute(cols = list(b = imputeMedian())) # NAs remain
impdata %>>% cpoImputeAll(cols = list(b = imputeMedian())) # error, since NAs remain impdata
= makeRegrTask("missing.task", impdata, target = "b")
missing.task # the following gives an error, since 'cpoImpute' does not make sure all missings are removed
# and hence does not add the 'missings' property.
train(cpoImpute(cols = list(a = imputeMedian())) %>>% makeLearner("regr.lm"), missing.task)
# instead, the following works:
train(cpoImputeAll(cols = list(a = imputeMedian())) %>>% makeLearner("regr.lm"), missing.task)
There is one for each imputation method.
%>>% cpoImputeConstant(10)
impdata getTaskData(missing.task %>>% cpoImputeMedian())
# The specialised impute CPOs are:
listCPO()[listCPO()$category == "imputation" & listCPO()$subcategory == "specialised",
c("name", "description")]
There is one general and many specialised feature
filtering CPOs. The general filtering CPO,
cpoFilterFeatures
, is a thin wrapper around
filterFeatures
and takes the filtering method as its
argument. The specialised CPOs each call a specific filtering
method.
Most arguments of filterFeatures
are reflected in the
CPOs. The exceptions being: 1. for filterFeatures
, the
filter method arguments are given in a list filter.args
,
instead of in ...
2. The argument fval
was
dropped for the specialised filter CPOs. 3. The argument
mandatory.feat
was dropped. Use affect.*
parameters to prevent features from being filtered.
head(getTaskData(iris.task %>>% cpoFilterFeatures(method = "variance", perc = 0.5)))
head(getTaskData(iris.task %>>% cpoFilterVariance(perc = 0.5)))
# The specialised filter CPOs are:
listCPO()[listCPO()$category == "featurefilter" & listCPO()$subcategory == "specialised",
c("name", "description")]