354 Handb ook of Big Data
Ensemble learner: A machine learning algorithm that uses the input from multiple base
learners to inform its predictions.
H2O: An open source machine learning library with a distributed, Java-based back-end.
Level-one data: The independent predictions generated from validation (typically V -fold
cross-validation) of the base learners. This data is the input to the metalearner. This is
also called the set of cross-validated pr edicted values.
Level-zero data: The original training dataset that is used to train the base learners.
Loss function, objective function: A loss function is a function that maps an event or
values of one or more variables onto a real number intuitively representing some cost
associated with the event. An optimization problem seeks to minimize a loss function.
Metalearner: A supervised machine learning algorithm that is used to learn the optimal
combination of the base learners. This can also be an optimization method such as
nonnegative least-squares (NNLS), COBYLA, or L-BFGS-B for finding the optimal
linear combination of the base learners.
Online (or sequential) learning: Online learning, as opposed to batch learning, involves
using a stream of data for training examples. In online methods, the model fit is updated,
or learned, incrementally.
Online Super Learner (OSL): An online implementation of the Super Learner algo-
rithm that uses stochastic gradient descent for incremental learning.
Oracle selector: The estimator, among all possible weighted combinations of the base
prediction functions, which minimizes risk under the true data-generating distribution.
Rank loss: The rank loss is a name for the quantity, 1AUC, where AUC is the area under
the ROC curve.
Squared-error loss: The squared-error loss of an estimator measures the average of the
squares of the error or the difference between the estimator and what is estimated.
Stacking, stacked generalization, stacked regression: Stacking is a broad class of
algorithms that involves training a second-level metalearner to ensemble a group of
base learners. For prediction, the Super Learner algorithm is equivalent to generalized
stacking.
Subsemble: Subsemble is a general subset ensemble prediction method which partitions
the full dataset into subsets of observations, fits a specified underlying algorithm on
each subset, and uses a unique form of V -fold cross-validation to output a prediction
function that combines the subset-specific fits. An oracle result provides a theoretical
performance guarantee for Subsemble.
Super Learner (SL): Super Learner is an ensemble algorithm takes as input a library of
supervised learning algorithms and a metalearning algorithm. SL uses cross-validation
to data-adaptively select the best way to combine the algorithms. It is general since
it can be applied to any loss function L(ψ)orL
η
(ψ) (and thus corresponding risk
R
0
(ψ)=E
0
L(ψ)), or any risk function, R
P
0
(ψ). It is optimal in the sense of asymptotic
equivalence with oracle selector as implied by oracle inequality.
Scalable Super Le a rning 355
V -fold cross-validation: Another name for k-fold cross-validation. In k-fold cross-
validation, the data is partitioned into k folds, and then a model is trained using the
observations from k 1 folds. Next, the model is evaluated on the held out set. This is
repeated k times and estimates are averaged over the k-folds.
Vowpal Wabbit (VW): An open source, out-of-core, online machine learning library
written in C++.
References
1. Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for exotic parti-
cles in high-energy physics with deep learning. Nature Communications, 5, 2014,
doi:10.1038/ncomms5308.
2. L´eon Bottou. Large-scale machine learning with stochastic gradient descent. In Pro-
ceedings of COMPSTA T2010, pp. 177–186. Springer, Berlin, Germany, 2010.
3. Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
4. Leo Breiman. Stacked regressions. Machine Learning, 24(1):49–64, 1996.
5. Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
6. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A Libr ary for Support Vector
Machines, 2001.
7. Bertrand Clarke and Bin Yu. Comparing bayes model averaging and stacking when
model approximation error cannot be ignored. Journal of Machine Learning Research,
4:683-712, 2003.
8. Beman Dawes, David Abrahams, and Nicolai Josuttis. Boost C++ Libraries. http://
www.boost.org/.
9. Sandrine Dudoit and Mark J. van der Laan. Asymptotics of cross-validated risk
estimation in estimator selection and performance assessment. Statistical Methodology,
2(2):131–154, 2005.
10. Sandrine Dudoit, Mark J. van der Laan, and Aad W. van der Vaart. The cross-validated
adaptive epsilon-net estimator. Statistics and Decisions, 24(2):373–395, 2006.
11. Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line
learning and an application to boosting. Journal of Computer and System Sciences,
55(1):119–139, 1997.
12. Jerome H. Friedman. Greedy function approximation: A gradient boosting machine.
Annals of Statistics, 29:1189–1232, 1999.
13. H2O. H2O Performance Datasheet, 2014. http://docs.h2o.ai/h2oclassic/resources/
h2odatasheet.html.
14. H2O.ai. H2O Machine Learning Platform, 2014. version 2.9.0.1593. https://github.com/
h2oai/h2o-2.
356 Handb ook of Big Data
15. Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan
Salakhutdinov. Improving neural networks by preventing co-adaptation of feature
detectors. CoRR, abs/1207.0580, 2012.
16. Arthur E. Hoerl. Application of ridge analysis to regression problems. Chemical
Engineering Progress, 58:54–59, 1958.
17. Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Bias estimation for
nonorthogonal problems. Technometrics, 12:55–67, 1970.
18. Alan Hubbard, Ivn Daz, Anna Decker, John B. Holcomb, Eileen M. Bulger, Martin
A. Schreiber, Karen J. Brasel et al. Time-dependent prediction and evaluation of
variable importance using superlearning in high-dimensional clinical data. Journal of
Trauma and Acute Care Surgery, 75:S53–S60, 2013.
19. Intel Corporation. Intel Math Kernel Library (MKL), 2003. https://software.intel.com/
en-us/intel-mkl.
20. Steven G. Johnson. The NLopt Nonlinear-Optimization Package, 2014. http://ab-initio.
mit.edu/nlopt.
21. Noemi Kreif, Ivn D. Muoz, and David Harrison. Health econometric evaluation of the
effects of a continuous treatment: A machine learning approach. The Selected Works of
Ivn Daz, 2013.
22. John Langford, Alex Strehl, and Lihong Li. Vowpal Wabbit, 2007. https://github.com/
JohnLangford/vowpal
wabbit.
23. Charles L. Lawson and Richard J. Hanson. Solving Least Squares Problems.SIAM,
Prentice-Hall, Englewood Cliffs, NJ, 1974.
24. Michael LeBlanc and Robert Tibshirani. Combining estimates in regression and clas-
sification. Technical report, Department of Statistics, University of Toronto, Canada,
1993; Journal of the American Statistical Association, 91:1641-1650, 1996.
25. Erin LeDell. h2oEnsemble: H2O Ensemble Learning. R package version 0.0.3, 2014.
https://github.com/h2oai/h2o-3/tree/master/h2o-r/ensemble.
26. Erin LeDell. h2oEnsemble Benchmarks, 2015. https://github.com/ledell/
h2oEnsemble-benchmarks/releases/tag/big-data-handbook.
27. Erin LeDell, Stephanie Sapp, and Mark van der Laan. Subsemble: An Ensemble Method
for Combining Subset-Spe cific Algorithm Fits. R package version 0.0.9, 2014. https:
//github.com/ledell/subsemble.
28. Samuel Lendle and Erin LeDell. Vowpal Wabbit Ensemble: Online Super Learner, 2013.
29. Alexander R. Luedtke and Mark J. van der Laan. Super-learning of an optimal dynamic
treatment rule. Technical Report 326, U.C. Division of Biostatistics Working Paper
Series, University of California, Berkeley, CA, 2014.
30. Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. Malicious URL
Dataset (UCSD), 2009. https://archive.ics.uci.edu/ml/datasets/URL+Reputation.
31. ASA Sections on Statistical Computing. Airline Dataset (1987–2008). http://
stat-computing.org/dataexpo/2009/the-data.html.
Scalable Super Le a rning 357
32. Maya L. Petersen, Erin LeDell, Joshua Schwab, Varada Sarovar, Robert Gross, Nancy
Reynolds, Jessica E. Haberer et al. Super learner analysis of electronic adherence data
improves viral prediction and may provide strategies for selective HIV RNA monitoring.
Journal of Acquired Immune Deficiency Syndromes (JAIDS ), 69(1):109–118, 2015.
33. Romain Pirracchio, Maya L. Petersen, Marco Carone, Mattieu R. Rigon, Sylvie Chevret,
and Mark J. van der Laan. Mortality prediction in intensive care units with the super
ICU learner algorithm (sicula): A population-based study. Statistical Applications in
Genetics and Molecular Biolo gy, 3(1):42–52, 2015.
34. Eric Polley and Mark van der Laan. SuperLearner: Super Learner Prediction. R package
version 2.0-9, 2010. https://github.com/ecpolley/SuperLearner.
35. Stephanie Sapp and Mark J. van der Laan. A scalable supervised subsemble prediction
algorithm. Technical Report 321, U.C. Berkeley Division of Biostatistics Working Paper
Series, University of California, Berkeley, CA, April 2014.
36. Stephanie Sapp, Mark J. van der Laan, and John Canny. Subsemble: An ensemble
method for combining subset-specific algorithm fits. Journal of Applied Statistics,
41(6):1247–1259, 2014.
37. Ed Schmahl. NNLS C Implementation, 2000. http://hesperia.gsfc.nasa.gov/
schmahl/
nnls/nnls.c.
38. Sandra E. Sinisi, Eric C. Polley, Maya L. Petersen, Soo-Yon Rhee, and Mark J.
van der Laan. Super learning: An application to the prediction of HIV-1 drug
resistance. Statistical Applications in Genetics and Molecular Biology, 6(1), 2007,
doi:10.2202/1544-6115.1240.
39. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal
Statistical Society. Series B, 58(1):267–288, 1996.
40. Luke Tierney. Simple Network of Workstations for R (SNOW ). http://homepage.stat.
uiowa.edu/
luke/R/cluster/cluster.html.
41. Mark J. van der Laan, Sandrine Dudoit, and Aad W. van der Vaart. The cross-validated
adaptive epsilon-net estimator. Statistics and Decisions, 24(3):373–395, 2006.
42. Mark J. van der Laan, Eric C. Polley, and Alan E. Hubbard. Super learner. Statis-
tical Applications in Genetics and Molecular Biology, 6(1), 2007, doi:10.2202/1544-
6115.1309.
43. Mark J. van der Laan and Sherri Rose. Targeted Le arning: Causal Inference for
Observational and Experimental Data, 1st edition. Springer Series in Statistics. Springer,
New York, 2011.
44. R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. Automatically Tuned Linear
Algebra Software (ATLAS ). http://math-atlas.sourceforge.net/.
45. David H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992.
46. Zhang Xianyi, Wang Qian, and Werner Saar. OpenBLAS, 2015. http://www.openblas.
net/.
This page intentionally left blankThis page intentionally left blank
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.231