Ensemble learner: A machine learning algorithm that uses the input from multiple base
learners to inform its predictions.
H2O: An open source machine learning library with a distributed, Java-based back-end.
Level-one data: The independent predictions generated from validation (typically V -fold
cross-validation) of the base learners. This data is the input to the metalearner. This is
also called the set of cross-validated pr edicted values.
Level-zero data: The original training dataset that is used to train the base learners.
Loss function, objective function: A loss function is a function that maps an event or
values of one or more variables onto a real number intuitively representing some cost
associated with the event. An optimization problem seeks to minimize a loss function.
Metalearner: A supervised machine learning algorithm that is used to learn the optimal
combination of the base learners. This can also be an optimization method such as
nonnegative least-squares (NNLS), COBYLA, or L-BFGS-B for finding the optimal
linear combination of the base learners.
Online (or sequential) learning: Online learning, as opposed to batch learning, involves
using a stream of data for training examples. In online methods, the model fit is updated,
or learned, incrementally.
Online Super Learner (OSL): An online implementation of the Super Learner algo-
rithm that uses stochastic gradient descent for incremental learning.
Oracle selector: The estimator, among all possible weighted combinations of the base
prediction functions, which minimizes risk under the true data-generating distribution.
Rank loss: The rank loss is a name for the quantity, 1AUC, where AUC is the area under
the ROC curve.
Squared-error loss: The squared-error loss of an estimator measures the average of the
squares of the error or the difference between the estimator and what is estimated.
Stacking, stacked generalization, stacked regression: Stacking is a broad class of
algorithms that involves training a second-level metalearner to ensemble a group of
base learners. For prediction, the Super Learner algorithm is equivalent to generalized
Subsemble: Subsemble is a general subset ensemble prediction method which partitions
the full dataset into subsets of observations, fits a specified underlying algorithm on
each subset, and uses a unique form of V -fold cross-validation to output a prediction
function that combines the subset-specific fits. An oracle result provides a theoretical
performance guarantee for Subsemble.
Super Learner (SL): Super Learner is an ensemble algorithm takes as input a library of
supervised learning algorithms and a metalearning algorithm. SL uses cross-validation
to data-adaptively select the best way to combine the algorithms. It is general since
it can be applied to any loss function L(ψ)orL
(ψ) (and thus corresponding risk
L(ψ)), or any risk function, R
(ψ). It is optimal in the sense of asymptotic
equivalence with oracle selector as implied by oracle inequality.
V -fold cross-validation: Another name for k-fold cross-validation. In k-fold cross-
validation, the data is partitioned into k folds, and then a model is trained using the
observations from k 1 folds. Next, the model is evaluated on the held out set. This is
repeated k times and estimates are averaged over the k-folds.
Vowpal Wabbit (VW): An open source, out-of-core, online machine learning library
written in C++.
