19. Scalable Super Learning (4/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

354 Handb ook of Big Data

Ensemble learner: A machine learning algorithm that uses the input from multiple base

learners to inform its predictions.

H2O: An open source machine learning library with a distributed, Java-based back-end.

Level-one data: The independent predictions generated from validation (typically V -fold

cross-validation) of the base learners. This data is the input to the metalearner. This is

also called the set of cross-validated pr edicted values.

Level-zero data: The original training dataset that is used to train the base learners.

Loss function, objective function: A loss function is a function that maps an event or

values of one or more variables onto a real number intuitively representing some cost

associated with the event. An optimization problem seeks to minimize a loss function.

Metalearner: A supervised machine learning algorithm that is used to learn the optimal

combination of the base learners. This can also be an optimization method such as

nonnegative least-squares (NNLS), COBYLA, or L-BFGS-B for ﬁnding the optimal

linear combination of the base learners.

Online (or sequential) learning: Online learning, as opposed to batch learning, involves

using a stream of data for training examples. In online methods, the model ﬁt is updated,

or learned, incrementally.

Online Super Learner (OSL): An online implementation of the Super Learner algo-

rithm that uses stochastic gradient descent for incremental learning.

Oracle selector: The estimator, among all possible weighted combinations of the base

prediction functions, which minimizes risk under the true data-generating distribution.

Rank loss: The rank loss is a name for the quantity, 1−AUC, where AUC is the area under

the ROC curve.

Squared-error loss: The squared-error loss of an estimator measures the average of the

squares of the error or the diﬀerence between the estimator and what is estimated.

Stacking, stacked generalization, stacked regression: Stacking is a broad class of

algorithms that involves training a second-level metalearner to ensemble a group of

base learners. For prediction, the Super Learner algorithm is equivalent to generalized

stacking.

Subsemble: Subsemble is a general subset ensemble prediction method which partitions

the full dataset into subsets of observations, ﬁts a speciﬁed underlying algorithm on

each subset, and uses a unique form of V -fold cross-validation to output a prediction

function that combines the subset-speciﬁc ﬁts. An oracle result provides a theoretical

performance guarantee for Subsemble.

Super Learner (SL): Super Learner is an ensemble algorithm takes as input a library of

supervised learning algorithms and a metalearning algorithm. SL uses cross-validation

to data-adaptively select the best way to combine the algorithms. It is general since

it can be applied to any loss function L(ψ)orL

(ψ) (and thus corresponding risk

(ψ)=E

L(ψ)), or any risk function, R

(ψ). It is optimal in the sense of asymptotic

equivalence with oracle selector as implied by oracle inequality.

Scalable Super Le a rning 355

V -fold cross-validation: Another name for k-fold cross-validation. In k-fold cross-

validation, the data is partitioned into k folds, and then a model is trained using the

observations from k −1 folds. Next, the model is evaluated on the held out set. This is

repeated k times and estimates are averaged over the k-folds.

Vowpal Wabbit (VW): An open source, out-of-core, online machine learning library

written in C++.

References

1. Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for exotic parti-

cles in high-energy physics with deep learning. Nature Communications, 5, 2014,

doi:10.1038/ncomms5308.

2. L´eon Bottou. Large-scale machine learning with stochastic gradient descent. In Pro-

ceedings of COMPSTA T’2010, pp. 177–186. Springer, Berlin, Germany, 2010.

3. Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.

4. Leo Breiman. Stacked regressions. Machine Learning, 24(1):49–64, 1996.

5. Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.

6. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A Libr ary for Support Vector

Machines, 2001.

7. Bertrand Clarke and Bin Yu. Comparing bayes model averaging and stacking when

model approximation error cannot be ignored. Journal of Machine Learning Research,

4:683-712, 2003.

8. Beman Dawes, David Abrahams, and Nicolai Josuttis. Boost C++ Libraries. http://

www.boost.org/.

9. Sandrine Dudoit and Mark J. van der Laan. Asymptotics of cross-validated risk

estimation in estimator selection and performance assessment. Statistical Methodology,

2(2):131–154, 2005.

10. Sandrine Dudoit, Mark J. van der Laan, and Aad W. van der Vaart. The cross-validated

adaptive epsilon-net estimator. Statistics and Decisions, 24(2):373–395, 2006.

11. Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line

learning and an application to boosting. Journal of Computer and System Sciences,

55(1):119–139, 1997.

12. Jerome H. Friedman. Greedy function approximation: A gradient boosting machine.

Annals of Statistics, 29:1189–1232, 1999.

13. H2O. H2O Performance Datasheet, 2014. http://docs.h2o.ai/h2oclassic/resources/

h2odatasheet.html.

14. H2O.ai. H2O Machine Learning Platform, 2014. version 2.9.0.1593. https://github.com/

h2oai/h2o-2.

356 Handb ook of Big Data

15. Geoﬀrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan

Salakhutdinov. Improving neural networks by preventing co-adaptation of feature

detectors. CoRR, abs/1207.0580, 2012.

16. Arthur E. Hoerl. Application of ridge analysis to regression problems. Chemical

Engineering Progress, 58:54–59, 1958.

17. Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Bias estimation for

nonorthogonal problems. Technometrics, 12:55–67, 1970.

18. Alan Hubbard, Ivn Daz, Anna Decker, John B. Holcomb, Eileen M. Bulger, Martin

A. Schreiber, Karen J. Brasel et al. Time-dependent prediction and evaluation of

variable importance using superlearning in high-dimensional clinical data. Journal of

Trauma and Acute Care Surgery, 75:S53–S60, 2013.

19. Intel Corporation. Intel Math Kernel Library (MKL), 2003. https://software.intel.com/

en-us/intel-mkl.

20. Steven G. Johnson. The NLopt Nonlinear-Optimization Package, 2014. http://ab-initio.

mit.edu/nlopt.

21. Noemi Kreif, Ivn D. Muoz, and David Harrison. Health econometric evaluation of the

eﬀects of a continuous treatment: A machine learning approach. The Selected Works of

Ivn Daz, 2013.

22. John Langford, Alex Strehl, and Lihong Li. Vowpal Wabbit, 2007. https://github.com/

JohnLangford/vowpal

wabbit.

23. Charles L. Lawson and Richard J. Hanson. Solving Least Squares Problems.SIAM,

Prentice-Hall, Englewood Cliﬀs, NJ, 1974.

24. Michael LeBlanc and Robert Tibshirani. Combining estimates in regression and clas-

siﬁcation. Technical report, Department of Statistics, University of Toronto, Canada,

1993; Journal of the American Statistical Association, 91:1641-1650, 1996.

25. Erin LeDell. h2oEnsemble: H2O Ensemble Learning. R package version 0.0.3, 2014.

https://github.com/h2oai/h2o-3/tree/master/h2o-r/ensemble.

26. Erin LeDell. h2oEnsemble Benchmarks, 2015. https://github.com/ledell/

h2oEnsemble-benchmarks/releases/tag/big-data-handbook.

27. Erin LeDell, Stephanie Sapp, and Mark van der Laan. Subsemble: An Ensemble Method

for Combining Subset-Spe ciﬁc Algorithm Fits. R package version 0.0.9, 2014. https:

//github.com/ledell/subsemble.

28. Samuel Lendle and Erin LeDell. Vowpal Wabbit Ensemble: Online Super Learner, 2013.

29. Alexander R. Luedtke and Mark J. van der Laan. Super-learning of an optimal dynamic

treatment rule. Technical Report 326, U.C. Division of Biostatistics Working Paper

Series, University of California, Berkeley, CA, 2014.

30. Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoﬀrey M. Voelker. Malicious URL

Dataset (UCSD), 2009. https://archive.ics.uci.edu/ml/datasets/URL+Reputation.

31. ASA Sections on Statistical Computing. Airline Dataset (1987–2008). http://

stat-computing.org/dataexpo/2009/the-data.html.

Scalable Super Le a rning 357

32. Maya L. Petersen, Erin LeDell, Joshua Schwab, Varada Sarovar, Robert Gross, Nancy

Reynolds, Jessica E. Haberer et al. Super learner analysis of electronic adherence data

improves viral prediction and may provide strategies for selective HIV RNA monitoring.

Journal of Acquired Immune Deﬁciency Syndromes (JAIDS ), 69(1):109–118, 2015.

33. Romain Pirracchio, Maya L. Petersen, Marco Carone, Mattieu R. Rigon, Sylvie Chevret,

and Mark J. van der Laan. Mortality prediction in intensive care units with the super

ICU learner algorithm (sicula): A population-based study. Statistical Applications in

Genetics and Molecular Biolo gy, 3(1):42–52, 2015.

34. Eric Polley and Mark van der Laan. SuperLearner: Super Learner Prediction. R package

version 2.0-9, 2010. https://github.com/ecpolley/SuperLearner.

35. Stephanie Sapp and Mark J. van der Laan. A scalable supervised subsemble prediction

algorithm. Technical Report 321, U.C. Berkeley Division of Biostatistics Working Paper

Series, University of California, Berkeley, CA, April 2014.

36. Stephanie Sapp, Mark J. van der Laan, and John Canny. Subsemble: An ensemble

method for combining subset-speciﬁc algorithm ﬁts. Journal of Applied Statistics,

41(6):1247–1259, 2014.

37. Ed Schmahl. NNLS C Implementation, 2000. http://hesperia.gsfc.nasa.gov/

∼

schmahl/

nnls/nnls.c.

38. Sandra E. Sinisi, Eric C. Polley, Maya L. Petersen, Soo-Yon Rhee, and Mark J.

van der Laan. Super learning: An application to the prediction of HIV-1 drug

resistance. Statistical Applications in Genetics and Molecular Biology, 6(1), 2007,

doi:10.2202/1544-6115.1240.

39. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal

Statistical Society. Series B, 58(1):267–288, 1996.

40. Luke Tierney. Simple Network of Workstations for R (SNOW ). http://homepage.stat.

uiowa.edu/

∼

luke/R/cluster/cluster.html.

41. Mark J. van der Laan, Sandrine Dudoit, and Aad W. van der Vaart. The cross-validated

adaptive epsilon-net estimator. Statistics and Decisions, 24(3):373–395, 2006.

42. Mark J. van der Laan, Eric C. Polley, and Alan E. Hubbard. Super learner. Statis-

tical Applications in Genetics and Molecular Biology, 6(1), 2007, doi:10.2202/1544-

6115.1309.

43. Mark J. van der Laan and Sherri Rose. Targeted Le arning: Causal Inference for

Observational and Experimental Data, 1st edition. Springer Series in Statistics. Springer,

New York, 2011.

44. R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. Automatically Tuned Linear

Algebra Software (ATLAS ). http://math-atlas.sourceforge.net/.

45. David H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992.

46. Zhang Xianyi, Wang Qian, and Werner Saar. OpenBLAS, 2015. http://www.openblas.

net/.

This page intentionally left blankThis page intentionally left blank

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 19. Scalable Super Learning (4/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
19. Scalable Super Learning (4/4)