References

1. Abadi M, Agarwal A, Barham P, et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint 2016; arXiv:1603.04467.

2. Abe N, Zadrozny B, Langford J. Outlier detection by active learning. Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining New York, NY: ACM Press; 2006;767–772.

3. Adriaans P, Zantige D. Data mining Harlow: Addison-Wesley; 1996.

4. Agrawal R, Imielinski T, Swami A. Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering. 1993a;5(6):914–925.

5. Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S, eds. Proceedings of the ACM SIGMOD international conference on management of data, Washington, DC. New York, NY: ACM Press; 1993b;207–216.

6. Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In: Bocca J, Jarke M, Zaniolo C, eds. Proceedings of the international conference on very large data bases, Santiago, Chile. San Francisco, CA: Morgan Kaufmann; 1994;478–499.

7. Aha D. Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies. 1992;36(2):267–287.

8. Almuallin H, Dietterich TG. Learning with many irrelevant features. Proceedings of the ninth national conference on artificial intelligence, Anaheim, CA Menlo Park, CA: AAAI Press; 1991;547–552.

9. Almuallin H, Dietterich TG. Efficient algorithms for identifying relevant features. Proceedings of the ninth Canadian conference on artificial intelligence, Vancouver, BC San Francisco, CA: Morgan Kaufmann; 1992;38–45.

10. Andrews S, Tsochantaridis I, Hofmann T. Support vector machines for multiple-instance learning. Proceedings of the conference on neural information processing systems, Vancouver, Canada Cambridge, MA: MIT Press; 2003;561–568.

11. Ankerst M, Breunig MM, Kriegel H-P, Sander J. OPTICS: Ordering points to identify the clustering structure. Proceedings of the ACM SIGMOD international conference on management of data New York, NY: ACM Press; 1999;49–60.

12. Arthur D, Vassilvitskii S. K-means++: The advantages of careful seeding. Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms New Orleans, Louisiana Philadelphia, PA: Society for Industrial and Applied Mathematics; 2007;1027–1035.

13. Asmis E. Epicurus’ scientific method Ithaca, NY: Cornell University Press; 1984.

14. Asuncion A, Newman DJ. UCI machine learning repository Irvine, CA: University of California, School of Information and Computer Science; 2007; <http://www.ics.uci.edu/~mlearn/MLRepository.html>.

15. Atkeson CG, Schaal SA, Moore AW. Locally weighted learning. AI Review. 1997;11:11–71.

16. Auer P, Ortner R. A boosting approach to multiple instance learning. Proceedings of the European conference on machine learning, Pisa, Italy Berlin: Springer-Verlag; 2004;63–74.

17. Baldi P, Hornik K. Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks. 1989;2(1):53–58.

18. Barnett V, Lewis T. Outliers in statistical data West Sussex: John Wiley and Sons; 1994.

19. Bay SD. Nearest neighbor classification from multiple feature subsets. Intelligent Data Analysis. 1999;3(3):191–209.

20. Bay SD, Schwabacher M. Near linear time detection of distance-based outliers and applications to security. Proceedings of the workshop on data mining for counter terrorism and security, San Francisco Philadelphia, PA: Society for Industrial and Applied Mathematics; 2003.

21. Bayes T. An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London. 1763;53:370–418.

22. Beck JR, Schultz EK. The use of ROC curves in test performance evaluation. Archives of Pathology and Laboratory Medicine. 1986;110:13–20.

23. Belhumeur PN, Hespanha JP, Kriegman DJ. Eigenfaces vs fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1997;19(7):711–720.

24. Bengio Y. Learning deep architectures for AI. Foundations and Trends® in Machine Learning. 2009;2(1):1–127.

25. Bengio Y. Practical recommendations for gradient-based training of deep architectures. Neural networks: Tricks of the trade Heidelberg: Springer Berlin Heidelberg; 2012;437–478.

26. Bengio Y, Ducharme R, Vincent P, Janvin C. A neural probabilistic language model. Journal of Machine Learning Research. 2003;3:1137–1155.

27. Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks. 1994;5(2):157–166.

28. Bergadano F, Gunetti D. Inductive logic programming: From machine learning to software engineering Cambridge, MA: MIT Press; 1996.

29. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. The Journal of Machine Learning Research. 2012;13(1):281–305.

30. Bergstra J, Breuleux O, Bastien F, et al. Theano: A CPU and GPU math expression compiler. Proceedings of the python for scientific computing conference (SciPy). Vol. 4 Austin, TX: BibTeX; 2010;3 June 30–July 3.

31. Berry MJA, Linoff G. Data mining techniques for marketing, sales, and customer support New York, NY: John Wiley; 1997.

32. Besag JE. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B. 1986;48(3):259–302.

33. Beygelzimer A, Kakade S, Langford J. Cover trees for nearest neighbor. Proceedings of the 23rd international conference on machine learning New York, NY: ACM Press; 2006;97–104.

34. Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA: Massive online analysis. Journal of Machine Learning Research. 2010;9:1601–1604.

35. Bigus JP. Data mining with neural networks New York, NY: McGraw Hill; 1996.

36. Bishop CM. Neural networks for pattern recognition New York, NY: Oxford University Press; 1995.

37. Bishop CM. Pattern recognition and machine learning New York, NY: Springer Verlag; 2006.

38. Bishop, C.M., Spiegelhalter, D. & Winn, J. (2002). VIBES: A variational inference engine for Bayesian networks. In Advances in neural information processing systems (pp. 777–784). Cambridge, MA: MIT Press.

39. Blei DM, Lafferty JD. Dynamic topic models. Proceedings of the 23rd international conference on machine learning New York: ACM Press; 2006;113–120.

40. Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. The Journal of Machine Learning Research. 2003;3:993–1022.

41. BLI (Bureau of Labour Information). Collective bargaining review (November) Ottawa, ON: Labour Canada, Bureau of Labour Information; 1988.

42. Blockeel H, Page D, Srinivasan A. Multi-instance tree learning. Proceedings of the 22nd international conference on machine learning, Bonn, Germany New York, NY: ACM Press; 2005;57–64.

43. Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. Proceedings of the eleventh annual conference on computational learning theory, Madison, WI San Francisco, CA: Morgan Kaufmann; 1998;92–100.

44. Bottou L. Stochastic gradient descent tricks. In: 2nd ed. Heidelberg: Springer; 2012;421–436. Montavon G, Orr GB, Muller K-R, eds. Neural networks: Tricks of the trade. vol. 7700 LNCS.

45. Bouckaert RR. Bayesian belief networks: From construction to inference PhD Dissertation The Netherlands: Computer Science Department, University of Utrecht; 1995.

46. Bouckaert RR. Bayesian network classifiers in Weka New Zealand: Department of Computer Science, University of Waikato; 2004; Working Paper 14/2004.

47. Bouckaert RR. DensiTree: Making sense of sets of phylogenetic trees. Bioinformatics. 2010;26(10):1372–1373.

48. Bourlard H, Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics. 1988;59:291–294.

49. Brachman RJ, Levesque HJ, eds. Readings in knowledge representation. San Francisco, CA: Morgan Kaufmann; 1985.

50. Brants T, Franz A. Web 1T 5-gram Version 1 LDC2006T13 DVD Philadelphia, PA: Linguistic Data Consortium; 2006.

51. Brefeld U, Scheffer T. Co-EM support vector learning. In: Greiner R, Schuurmans D, eds. Proceedings of the twenty-first international conference on machine learning, Banff, Alberta, Canada. New York: ACM Press; 2004;121–128.

52. Breiman L. Stacked regression. Machine Learning. 1996a;24(1):49–64.

53. Breiman L. Bagging predictors. Machine Learning. 1996b;24(2):123–140.

54. Breiman L. [Bias, variance, and] Arcing classifiers Technical Report 460 Berkeley, CA: Department of Statistics, University of California; 1996c.

55. Breiman L. Pasting small votes for classification in large databases and online. Machine Learning. 1999;36(1–2):85–103.

56. Breiman L. Random forests. Machine Learning. 2001;45(1):5–32.

57. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees Monterey, CA: Wadsworth; 1984.

58. Bridle JS. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. Neurocomputing Berlin: Springer Berlin Heidelberg; 1990;227–236.

59. Brin S, Motwani R, Ullman JD, Tsur S. Dynamic itemset counting and implication rules for market basket data. ACM SIGMOD Record. 1997;26(2):255–264.

60. Brin S, Page L. The anatomy of a large-scale hypertext search engine. Computer Networks and ISDN Systems. 1998;33:107–117.

61. Brodley CE, Fried MA. Identifying and eliminating mislabeled training instances. Proceedings of the thirteenth national conference on artificial intelligence, Portland, OR Menlo Park, CA: AAAI Press; 1996;799–805.

62. Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R. Signature verification using a “Siamese” time delay neural network. Advances in neural information processing systems Burlington, MA: Morgan Kaufmann; 1994;737–744.

63. Brownstown L, Farrell R, Kant E, Martin N. Programming expert systems in OPS5 Reading, MA: Addison-Wesley; 1985.

64. Buntine W. Learning classification trees. Statistics and Computing. 1992;2(2):63–73.

65. Buntine W. Variational extensions to EM and multinomial PCA. Machine Learning: ECML 2002 Berlin: Springer Berlin Heidelberg; 2002;23–34.

66. Buntine WL. Operations for learning with graphical models. Journal of Artificial Intelligence Research. 1994;2:159–225.

67. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology. 1997;268(1):78–94.

68. Burges CJC. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery. 1998;2(2):121–167.

69. Cabena P, Hadjinian P, Stadler R, Verhees J, Zanasi A. Discovering data mining: From concept to implementation Upper Saddle River, NJ: Prentice Hall; 1998.

70. Califf ME, Mooney RJ. Relational learning of pattern-match rules for information extraction. Proceedings of the sixteenth national conference on artificial intelligence, Orlando, FL Menlo Park, CA: AAAI Press; 1999;328–334.

71. Cardie C. Using decision trees to improve case-based learning. In: Utgoff P, ed. Proceedings of the tenth international conference on machine learning, Amherst, MA. San Francisco, CA: Morgan Kaufmann; 1993;25–32.

72. Cavnar WB, Trenkle JM. N-Gram-based text categorization. Proceedings of the third symposium on document analysis and information retrieval Las Vegas, NV: UNLV Publications/Reprographics; 1994;161–175.

73. Ceglar A, Roddick JF. Association mining. ACM Computing Surveys. 2006;38 ACM, New York, NY.

74. Cendrowska J. PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies. 1987;27(4):349–370.

75. Chakrabarti S. Mining the web: Discovering knowledge from hypertext data San Francisco, CA: Morgan Kaufmann; 2003.

76. Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: A library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

77. Cheeseman P, Stutz J. Bayesian classification (AutoClass): Theory and results. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R, eds. Advances in knowledge discovery and data mining. Menlo Park, CA: AAAI Press; 1995;153–180.

78. Chen J, Chaudhari NS. Capturing long-term dependencies for protein secondary structure prediction. International Symposium on Neural Networks Berlin: Springer Berlin Heidelberg; 2004;494–500.

79. Chen MS, Jan J, Yu PS. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering. 1996;8(6):866–883.

80. Chen Y, Bi J, Wang JZ. MILES: Multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006;28(12):1931–1947.

81. Cherkauer KJ, Shavlik JW. Growing simpler decision trees to facilitate knowledge discovery. In: Simoudis E, Han JW, Fayyad U, eds. Proceedings of the second international conference on knowledge discovery and data mining, Portland, OR. Menlo Park, CA: AAAI Press; 1996;315–318.

82. Chevaleyre Y, Zucker J-D. Solving multiple-instance and multiple-part learning problems with decision trees and rule sets: Application to the mutagenesis problem. Proceedings of the biennial conference of the Canadian society for computational studies of intelligence, Ottawa, Canada Berlin: Springer-Verlag; 2001;204–214.

83. Cho K, Chen X. Classifying and visualizing motion capture sequences using deep neural networks.. IEEE international conference on computer vision theory and applications (VISAPP). Vol. 2 Setúbal: SciTePress; 2014;122–130.

84. Cho K, van Merrienboer B, Gulcehre C, et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. Empirical Methods on Natural Language Processing 2014; arXiv preprint arXiv:1406.1078.

85. Chollet, F. (2015). Keras: Theano-based deep learning library. Code: https://github.com/fchollet/keras. Documentation: http://keras.io.

86. Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint 2014; arXiv:1412.3555.

87. Ciresan DC, Meier U, Gambardella LM, Schmidhuber J. Deep, big, simple neural nets for handwritten digit recognition. Neural Computation. 2010;22(12):3207–3220.

88. Ciresan, D.C., Meier, U., Masci, J., Maria Gambardella, L., & Schmidhuber, J. (2011). Flexible, high performance convolutional neural networks for image classification. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). vol. 22, no. 1, pp. 1237.

89. Ciresan, D., Meier, U., & Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In Proceedings of Computer Vision and Pattern Recognition (CVPR). pp. 3642–3649.

90. Cleary JG, Trigg LE. K*: An instance-based learner using an entropic distance measure. In: Prieditis A, Russell S, eds. Proceedings of the twelfth international conference on machine learning, Tahoe City, CA. San Francisco, CA: Morgan Kaufmann; 1995;108–114.

91. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 1960;20:37–46.

92. Cohen WW. Fast effective rule induction. In: Prieditis A, Russell S, eds. Proceedings of the twelfth international conference on machine learning, Tahoe City, CA. San Francisco, CA: Morgan Kaufmann; 1995;115–123.

93. Collobert, R., Kavukcuoglu, K., & Farabet, C. (2011). Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop (No. EPFL-CONF-192376).

94. Collobert R, Weston J. A unified architecture for natural language processing: Deep neural networks with multitask learning. Proceedings of the 25th international conference on machine learning New York, NY: ACM Press; 2008, July;160–167.

95. Cooper GF, Herskovits E. A Bayesian method for the induction of probabilistic networks from data. Machine Learning. 1992;9(4):309–347.

96. Cortes C, Vapnik V. Support vector networks. Machine Learning. 1995;20(3):273–297.

97. Cover TM, Hart PE. Nearest neighbor pattern classification. IEEE Transactions on Information Theory IT. 1967;13:21–27.

98. Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods Cambridge: Cambridge University Press; 2000.

99. Cybenko G. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems. 1989;2(4):303–314.

100. Cypher A, ed. Watch what I do: Programming by demonstration. Cambridge, MA: MIT Press; 1993.

101. Dasgupta S. Performance guarantees for hierarchical clustering. In: Kivinen J, Sloan RH, eds. Proceedings of the fifteenth annual conference on computational learning theory, Sydney, Australia. Berlin: Springer-Verlag; 2002;351–363.

102. Dasu, T., Koutsofios, E., & Wright, J. (2006). Zen and the art of data mining. In Proceedings of the KDD Workshop on Data Mining for Business Applications (pp. 37–43). Philadelphia, PA.

103. Datta S, Kargupta H, Sivakumar K. Homeland defense, privacy-sensitive data mining, and random value distortion. Proceedings of the workshop on data mining for counter terrorism and security, San Francisco Philadelphia, PA: Society for International and Applied Mathematics; 2003;27–33.

104. Day WHE, Edelsbrünner H. Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification. 1984;1(1):7–24.

105. de Raedt L. Logical and relational learning New York, NY: Springer-Verlag; 2008.

106. Decoste D, Schölkopf B. Training invariant support vector machines. Machine Learning. 2002;46(1–3):161–190.

107. Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA. Indexing by latent semantic analysis. JAsIs. 1990;41(6):391–407.

108. Demiroz G, Guvenir A. Classification by voting feature intervals. In: van Someren M, Widmer G, eds. Proceedings of the ninth European conference on machine learning, Prague, Czech Republic. Berlin: Springer-Verlag; 1997;85–92.

109. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977;39(1):1–38.

110. Devroye L, Györfi L, Lugosi G. A probabilistic theory of pattern recognition New York, NY: Springer-Verlag; 1996.

111. Dhar V, Stein R. Seven methods for transforming corporate data into business intelligence Upper Saddle River, NJ: Prentice Hall; 1997.

112. Diederich J, Kindermann J, Leopold E, Paass G. Authorship attribution with support vector machines. Applied Intelligence. 2003;19(1):109–123.

113. Dietterich TG. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning. 2000;40(2):139–158.

114. Dietterich TG, Bakiri G. Solving multiclass learning problems via error-correcting output codes. Journal Artificial Intelligence Research. 1995;2:263–286.

115. Dietterich TG, Kong EB. Error-correcting output coding corrects bias and variance. Proceedings of the twelfth international conference on machine learning, Tahoe City, CA San Francisco, CA: Morgan Kaufmann; 1995;313–321.

116. Dietterich TG, Lathrop RH, Lozano-Perez T. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence Journal. 1997;89(1–2):31–71.

117. Domingos P. Knowledge acquisition from examples via multiple models. In: Fisher DH, ed. Proceedings of the fourteenth international conference on machine learning, Nashville, TN. San Francisco, CA: Morgan Kaufmann; 1997;98–106.

118. Domingos P. MetaCost: A general method for making classifiers cost-sensitive. In: Fayyad UM, Chaudhuri S, Madigan D, eds. Proceedings of the fifth international conference on knowledge discovery and data mining San Diego, CA. New York, NY: ACM Press; 1999;155–164.

119. Domingos P, Hulten G. Mining high-speed data streams. International conference on knowledge discovery and data mining New York, NY: ACM Press; 2000;71–80.

120. Domingos P, Lowd D. Markov logic: An interface layer for AI San Rafael, CA: Morgan and Claypool; 2009.

121. Domingos P, Pazzani M. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. Machine Learning. 1997;29:103–130.

122. Dong L, Frank E, Kramer S. Ensembles of balanced nested dichotomies for multi-class problems. Proc of the ninth European conference on principles and practice of knowledge discovery in databases, Porto, Portugal Berlin: Springer-Verlag; 2005;84–95.

123. Dony RD, Haykin D. Image segmentation using a mixture of principal components representation. IEE Proceedings—Vision, Image and Signal Processing. 1997;144(2):73–80.

124. Dougherty J, Kohavi R, Sahami M. Supervised and unsupervised discretization of continuous features. In: Prieditis A, Russell S, eds. Proceedings of the twelfth international conference on machine learning, Tahoe City, CA. San Francisco, CA: Morgan Kaufmann; 1995;194–202.

125. Drucker H. Improving regressors using boosting techniques. In: Fisher DH, ed. Proceedings of the fourteenth international conference on machine learning, Nashville, TN. San Francisco, CA: Morgan Kaufmann; 1997;107–115.

126. Drummond C, Holte RC. Explicitly representing expected cost: An alternative to ROC representation. In: Ramakrishnan R, Stolfo S, Bayardo R, Parsa I, eds. Proceedings of the sixth international conference on knowledge discovery and data mining Boston, MA. New York, NY: ACM Press; 2000;198–207.

127. Duda RO, Hart PE. Pattern classification and scene analysis New York, NY: John Wiley; 1973.

128. Duda RO, Hart PE, Stork DG. Pattern classification 2nd ed. New York, NY: John Wiley; 2001.

129. Dumais ST, Platt J, Heckerman D, Sahami M. Inductive learning algorithms and representations for text categorization. Proceedings of the ACM seventh international conference on information and knowledge management, Bethesda, MD New York, NY: ACM Press; 1998;148–155.

130. Dzeroski S, Zenko B. Is combining classifiers with stacking better than selecting the best one? Machine Learning. 2004;54:255–273.

131. Edwards D. Introduction to graphical modeling New York, NY: Springer Science and Business Media; 2012.

132. Efron B, Tibshirani R. An introduction to the bootstrap London: Chapman and Hall; 1993.

133. Egan JP. Signal detection theory and ROC analysis New York, NY: Series in Cognition and Perception Academic Press; 1975.

134. Epanechnikov VA. Non-parametric estimation of a multivariate probability density. Theory of Probability and its Applications. 1969;14:153–158.

135. Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the second international conference on knowledge discovery and data mining (KDD-96) Portland, OR: AAAI Press; 1996;226–231.

136. Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J. LIBLINEAR: A library for large linear classification. J Machine Learning Research. 2008;9:1871–1874.

137. Fayyad UM, Irani KB. Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the thirteenth international joint conference on artificial intelligence, Chambery, France San Francisco, CA: Morgan Kaufmann; 1993;1022–1027.

138. Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R, eds. Advances in knowledge discovery and data mining. Menlo Park, CA: AAAI Press/MIT Press; 1996.

139. Fayyad UM, Smyth P. From massive datasets to science catalogs: Applications and challenges. Proceedings of the workshop on massive datasets Washington, DC: NRC, Committee on Applied and Theoretical Statistics; 1995;129–141.

140. Finkel JR, Grenager T, Manning C. Incorporating non-local information into information extraction systems by Gibbs sampling. Proceedings of the 43rd annual meeting on association for computational linguistics Stroudsburg: Association for Computational Linguistics; 2005;363–370.

141. Fisher D. Knowledge acquisition via incremental conceptual clustering. Machine Learning. 1987;2(2):139–172.

142. Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annual Eugenics 7 (part II): 179–188. Reprinted in Contributions to Mathematical Statistics, 1950. New York, NY: John Wiley.

143. Fix, E., & Hodges Jr., J.L. (1951). Discriminatory analysis; non-parametric discrimination: Consistency properties. Technical Report 21-49-004(4), USAF School of Aviation Medicine, Randolph Field, Texas.

144. Flach PA, Lachiche N. Confirmation-guided discovery of first-order rules with Tertius. Machine Learning. 1999;42:61–95.

145. Fletcher R. Practical methods of optimization 2nd ed. New York, NY: John Wiley; 1987.

146. Foulds J, Frank E. Revisiting multiple-instance learning via embedded instance selection. Proceedings of the Australasian joint conference on artificial intelligence, Auckland, New Zealand Berlin: Springer-Verlag; 2008;300–310.

147. Foulds J, Frank E. A review of multi-instance learning assumptions. Knowledge Engineering Review. 2010a;25(1):1–25.

148. Foulds J, Frank E. Speeding up and boosting diverse density learning. Proc 13th international conference on discovery science New York, NY: Springer; 2010b;102–116.

149. Fradkin D, Madigan D. Experiments with random projections for machine learning. In: Getoor L, Senator TE, Domingos P, Faloutsos C, eds. Proceedings of the ninth international conference on knowledge discovery and data mining, Washington, D.C. New York, NY: ACM Press; 2003;517–522.

150. Frank E. Pruning decision trees and lists PhD Dissertation New Zealand: Department of Computer Science, University of Waikato; 2000.

151. Frank E, Hall M. A simple approach to ordinal classification. In: de Raedt L, Flach PA, eds. Proceedings of the twelfth European conference on machine learning Freiburg, Germany. Berlin: Springer-Verlag; 2001;145–156.

152. Frank E, Hall M, Pfahringer B. Locally weighted Naïve Bayes. In: Kjærulff U, Meek C, eds. Proceedings of the nineteenth conference on uncertainty in artificial intelligence, Acapulco, Mexico. San Francisco, CA: Morgan Kaufmann; 2003;249–256.

153. Frank E, Holmes G, Kirkby R, Hall M. Racing committees for large datasets. In: Lange S, Satoh K, Smith CH, eds. Proceedings of the fifth international conference on discovery science, Lübeck, Germany. Berlin: Springer-Verlag; 2002;153–164.

154. Frank E, Kramer S. Ensembles of nested dichotomies for multi-class problems. Proceedings of the twenty-first international conference on machine learning, Banff, Alberta, Canada New York, NY: ACM Press; 2004;305–312.

155. Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG. Domain-specific key phrase extraction. Proceedings of the sixteenth international joint conference on artificial intelligence, Stockholm, Sweden San Francisco, CA: Morgan Kaufmann; 1999;668–673.

156. Frank E, Wang Y, Inglis S, Holmes G, Witten IH. Using model trees for classification. Machine Learning. 1998;32(1):63–76.

157. Frank E, Witten IH. Generating accurate rule sets without global optimization. In: Shavlik J, ed. Proceedings of the fifteenth international conference on machine learning, Madison, WI. San Francisco, CA: Morgan Kaufmann; 1998;144–151.

158. Frank E, Witten IH. Making better use of global discretization. In: Bratko I, Dzeroski S, eds. Proceedings of the sixteenth international conference on machine learning, Bled, Slovenia. San Francisco, CA: Morgan Kaufmann; 1999;115–123.

159. Frank E, Xu X. Applying propositional learning algorithms to multi-instance data Technical Report 06/03 New Zealand: Department of Computer Science, University of Waikato; 2003.

160. Franz, A., & Brants, T. (2006). “All Our N-gram are Belong to You”. Google Research Blog. Retrieved 2015-09-14.

161. Freitag D. Machine learning for information extraction in informal domains. Machine Learning. 2002;39(2/3):169–202.

162. Freund Y, Mason L. The alternating decision tree learning algorithm. In: Bratko I, Dzeroski S, eds. Proceedings of the sixteenth international conference on machine learning, Bled, Slovenia. San Francisco, CA: Morgan Kaufmann; 1999;124–133.

163. Freund Y, Schapire RE. Experiments with a new boosting algorithm. In: Saitta L, ed. Proceedings of the thirteenth international conference on machine learning, Bari, Italy. San Francisco, CA: Morgan Kaufmann; 1996;148–156.

164. Freund Y, Schapire RE. Large margin classification using the perceptron algorithm. Machine Learning. 1999;37(3):277–296.

165. Frey BJ. Graphical models for machine learning and digital communication MIT Press 1998.

166. Friedman JH. Another approach to polychotomous classification Technical report Stanford, CA: Department of Statistics, Stanford University; 1996.

167. Friedman JH. Greedy function approximation: A gradient boosting machine. Annals of Statistics. 2001;29(5):1189–1232.

168. Friedman JH, Bentley JL, Finkel RA. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software. 1977;3(3):209–266.

169. Friedman JH, Hastie T, Tibshirani R. Additive logistic regression: A statistical view of boosting. Annals of Statistics. 2000;28(2):337–374.

170. Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Machine Learning. 1997;29(2):131–163.

171. Fukushima K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics. 1980;36(4):193–202.

172. Fulton T, Kasif S, Salzberg S. Efficient algorithms for finding multiway splits for decision trees. In: Prieditis A, Russell S, eds. Proceedings of the twelfth international conference on machine learning, Tahoe City, CA. San Francisco, CA: Morgan Kaufmann; 1995;244–251.

173. Fürnkranz J. Round robin classification. Journal of Machine Learning Research. 2002;2:721–747.

174. Fürnkranz J. Round robin ensembles. Intelligent Data Analysis. 2003;7(5):385–403.

175. Fürnkranz J, Flach PA. ROC ‘n’ rule learning: Towards a better understanding of covering algorithms. Machine Learning. 2005;58(1):39–77.

176. Fürnkranz J, Widmer G. Incremental reduced-error pruning. In: Hirsh H, Cohen W, eds. Proceedings of the eleventh international conference on machine learning, New Brunswick, NJ. San Francisco, CA: Morgan Kaufmann; 1994;70–77.

177. Gaines BR, Compton P. Induction of ripple-down rules applied to modeling large data bases. Journal of Intelligent Information Systems. 1995;5(3):211–228.

178. Gama J. Functional trees. Machine Learning. 2004;55(3):219–250.

179. Gärtner T, Flach PA, Kowalczyk A, Smola AJ. Multi-instance kernels. Proceedings of the international conference on machine learning, Sydney, Australia San Francisco, CA: Morgan Kaufmann; 2002;179–186.

180. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. Vol. 2 London: Chapman and Hall/CRC; 2014.

181. Geman S, Geman D. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1984;6(6):721–741.

182. Genkin A, Lewis DD, Madigan D. Large-scale Bayesian logistic regression for text categorization. Technometrics. 2007;49(3):291–304.

183. Gennari JH, Langley P, Fisher D. Models of incremental concept formation. Artificial Intelligence. 1990;40:11–61.

184. Gers FA, Schmidhuber J, Cummins F. Learning to forget: Continual prediction with LSTM. Neural Computation. 2000;12(10):2451–2471.

185. Ghahramani Z, Beal MJ. Variational inference for bayesian mixtures of factor analysers. NIPS. 1999;12:449–455.

186. Ghahramani Z, Beal MJ. Propagation algorithms for variational Bayesian learning. Proceedings of Advances in Neural Information Processing Systems. 2001;13:507–513.

187. Ghahramani Z, Hinton GE. The EM algorithm for mixtures of factor analyzers (Vol 60) Technical Report CRG-TR-96-1 University of Toronto 1996.

188. Ghani R. Combining labeled and unlabeled data for multiclass text categorization. In: Sammut C, Hoffmann A, eds. Proceedings of the nineteenth international conference on machine learning, Sydney, Australia. San Francisco, CA: Morgan Kaufmann; 2002;187–194.

189. Gilad-Bachrach R, Navot A, Tishby N. Margin based feature selection: Theory and algorithms. In: Greiner R, Schuurmans D, eds. Proceedings of the twenty-first international conference on machine learning, Banff, Alberta, Canada. New York, NY: ACM Press; 2004;337–344.

190. Gilks WR. Markov chain monte carlo New York, NY: John Wiley and Sons, Ltd.; 2005.

191. Giraud-Carrier C. FLARE: Induction with prior knowledge. In: Nealon J, Hunt J, eds. Research and development in expert systems XIII. Cambridge: SGES Publications; 1996;11–24.

192. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS. vol. 9, pp. 249–256.

193. Glorot X, Bordes A, Bengio Y. Deep sparse rectifier networks. AISTATS. 2011;15:315–323.

194. Gluck M, Corter J. Information, uncertainty and the utility of categories. Proceedings of the annual conference of the cognitive science society, Irvine, CA Hillsdale, NJ: Lawrence Erlbaum; 1985;283–287.

195. Goldberg DE. Genetic algorithms in search, optimization and machine learning Reading, MA: Addison-Wesley; 1989.

196. Good IJ. The population frequencies of species and the estimation of population parameters. Biometrika. 1953;40(3–4):237–264.

197. Good P. Permutation tests: A practical guide to resampling methods for testing hypotheses New York, NY: Springer-Verlag; 1994.

198. Goodfellow I, Bengio Y, Courville A. Deep learning Cambridge, MA: MIT Press; 2016.

199. Graves A. Supervised sequence labelling Berlin: Springer Berlin Heidelberg; 2012.

200. Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks. 2005;18(5):602–610.

201. Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J. A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2009;31(5):855–868.

202. Graves, A., Mohamed, A.R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In IEEE international Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6645–6649).

203. Green, P., & Yandell, B. (1985). Semi-parametric generalized linear models. In Proceedings 2nd international GLIM conference, Lancaster, Lecture notes in Statistics No. 32 44–55. New York, NY: Springer-Verlag.

204. Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J. LSTM: A search space odyssey. arXiv preprint 2015; arXiv:1503.04069.

205. Griffiths TL, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences. 2004;101(Suppl. 1):5228–5235.

206. Grossman D, Domingos P. Learning Bayesian network classifiers by maximizing conditional likelihood. In: Greiner R, Schuurmans D, eds. Proceedings of the twenty-first international conference on machine learning, Banff, Alberta, Canada. New York, NY: ACM Press; 2004;361–368.

207. Groth R. Data mining: A hands-on approach for business professionals Upper Saddle River, NJ: Prentice Hall; 1998.

208. Guo Y, Greiner R. Discriminative model selection for belief net structures Edmonton, AB: Department of Computing Science, TR04-22, University of Alberta; 2004.

209. Gütlein M, Frank E, Hall M, Karwath A. Large-scale attribute selection using wrappers. Proceedings of the IEEE symposium on computational intelligence and data mining Washington, DC: IEEE Computer Society; 2009;332–339.

210. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning. 2002;46(1–3):389–422.

211. Hall M. Correlation-based feature selection for discrete and numeric class machine learning. In: Langley P, ed. Proceedings of the seventeenth international conference on machine learning, Stanford, CA. San Francisco, CA: Morgan Kaufmann; 2000;359–366.

212. Hall M, Frank E. Combining Naïve Bayes and decision tables. Proceedings of the 21st Florida artificial intelligence research society conference Miami, FL: AAAI Press; 2008;318–319.

213. Hall M, Holmes G, Frank E. Generating rule sets from model trees. In: Foo NY, ed. Proceedings of the twelfth Australian joint conference on artificial intelligence, Sydney, Australia. Berlin: Springer-Verlag; 1999;1–12.

214. Han J, Kamber M, Pei J. Data mining: Concepts and techniques 3rd ed. San Francisco, CA: Morgan Kaufmann; 2011.

215. Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (pp. 1–12). Dallas, TX.

216. Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery. 2004;8(1):53–87.

217. Hand DJ. Classifier technology and the illusion of progress. Statistical Science. 2006;21(1):1–14.

218. Hand DJ, Manilla H, Smyth P. Principles of data mining Cambridge, MA: MIT Press; 2001.

219. Hartigan JA. Clustering algorithms New York, NY: John Wiley; 1975.

220. Hastie T, Tibshirani R. Classification by pairwise coupling. Annals of Statistics. 1998;26(2):451–471.

221. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning 2nd ed. New York, NY: Springer-Verlag; 2009.

222. Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57(1):97–109.

223. Havaei M, Davy A, Warde-Farley D, et al…. Brain tumor segmentation with deep neural networks Medical Image Analysis 2016.

224. Haykin S. Neural networks: A comprehensive foundation Upper Saddle River, NJ: Prentice Hall; 1994.

225. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.

226. Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning. 1995;20(3):197–243.

227. Hempstalk K, Frank E. Discriminating against new classes: One-class versus multi-class classification. Proceedings of the twenty-first Australasian joint conference on artificial intelligence, Auckland, New Zealand New York, NY: Springer; 2008;225–236.

228. Hempstalk K, Frank E, Witten IH. One-class classification by combining density and class probability estimation. Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Antwerp, Belgium Berlin: Springer-Verlag; 2008;505–519.

229. Hinton GE. Training products of experts by minimizing contrastive divergence. Neural Computation. 2002;14(8):1771–1800.

230. Hinton GE, Salakhutdinov R. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504–507.

231. Hinton, G.E., & Sejnowski, T.J. (1983, June). Optimal perceptual inference. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 448–453). Washington, DC.

232. Ho TK. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998;20(8):832–844.

233. Hochbaum DS, Shmoys DB. A best possible heuristic for the k-center problem. Mathematics of Operations Research. 1985;10(2):180–184.

234. Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber.

235. Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J. Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. In: Kremer SC, Kolen JF, eds. A field guide to dynamical recurrent neural networks. Piscataway, NJ: IEEE Press; 2001;179–206.

236. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation. 1997;9(8):1735–1780.

237. Hofmann T. Probabilistic latent semantic indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval New York, NY: ACM Press; 1999, August;50–57.

238. Holmes G, Nevill-Manning CG. Feature selection via the discovery of simple classification rules. In: Lasker GE, Liu X, eds. Proceedings of the international symposium on intelligent data analysis. Baden-Baden: International Institute for Advanced Studies in Systems Research and Cybernetics; 1995;75–79.

239. Holmes G, Pfahringer B, Kirkby R, Frank E, Hall M. Multiclass alternating decision trees. In: Elomaa T, Mannila H, Toivonen H, eds. Proceedings of the thirteenth European conference on machine learning, Helsinki, Finland. Berlin: Springer-Verlag; 2002;161–172.

240. Holte RC. Very simple classification rules perform well on most commonly used datasets. Machine Learning. 1993;11:63–91.

241. Hornik K. Approximation capabilities of multilayer feedforward networks. Neural Networks. 1991;4(2):251–257.

242. Hosmer Jr DW, Lemeshow S. Applied logistic regression New York, NY: John Wiley and Sons; 2004.

243. Hsu CW, Chang CC, Lin CJ. A practical guide to support vector classification Department of Computer Science, National Taiwan University 2003.

244. Huang C, Darwiche A. Inference in belief networks: A procedural guide. International Journal of Approximate Reasoning. 1996;15(3):225–263.

245. Huffman SB. Learning information extraction patterns from examples. In: Wertmer S, Riloff E, Scheler G, eds. Connectionist, statistical, and symbolic approaches to learning for natural language processing. Berlin: Springer Verlag; 1996;246–260.

246. Hyvärinen A, Oja E. Independent component analysis: Algorithms and applications. Neural Networks. 2000;13(4):411–430.

247. Ihaka R, Gentleman R. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996;5(3):299–314.

248. Ilin A, Raiko T. Practical approaches to principal component analysis in the presence of missing values. The Journal of Machine Learning Research. 2010;11:1957–2000.

249. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921.

250. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint 2015; arXiv:1502.03167.

251. Ivakhnenko AG, Lapa VG. Cybernetic predicting devices New York, NY: CCM Information Corporation; 1965.

252. Jabbour K, Riveros JFV, Landsbergen D, Meyer W. ALFA: Automated load forecasting assistant. IEEE Transactions on Power Systems. 1988;3(3):908–914.

253. Jia Y, Shelhamer E, Donahue J, et al…. Caffe: Convolutional architecture for fast feature embedding. Proceedings of the ACM international conference on multimedia New York, NY: ACM Press; 2014;675–678.

254. Jiang L, Zhang H. Weightily averaged one-dependence estimators. Proceedings of the 9th Biennial Pacific Rim international conference on artificial intelligence Berlin: Springer-Verlag; 2006;970–974.

255. John GH. Robust decision trees: Removing outliers from databases. In: Fayyad UM, Uthurusamy R, eds. Proceedings of the first international conference on knowledge discovery and data mining, Montreal, Canada. Menlo Park, CA: AAAI Press; 1995;174–179.

256. John GH. Enhancements to the data mining process PhD Dissertation Stanford, CA: Computer Science Department, Stanford University; 1997.

257. John GH, Kohavi R, Pfleger P. Irrelevant features and the subset selection problem. In: Hirsh H, Cohen W, eds. Proceedings of the eleventh international conference on machine learning, New Brunswick, NJ. San Francisco, CA: Morgan Kaufmann; 1994;121–129.

258. John GH, Langley P. Estimating continuous distributions in Bayesian classifiers. In: Besnard P, Hanks S, eds. Proceedings of the eleventh conference on uncertainty in artificial intelligence, Montreal, Canada. San Francisco, CA: Morgan Kaufmann; 1995;338–345.

259. Johns MV. An empirical Bayes approach to nonparametric two-way classification. In: Solomon H, ed. Studies in item analysis and prediction. Palo Alto, CA: Stanford University Press; 1961;221–232.

260. Jones MC, Marron JS, Sheather SJ. A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association. 1996;91(433):401–407.

261. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK. An introduction to variational methods for graphical models The Netherlands: Springer; 1998;105–161.

262. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK. An introduction to variational methods for graphical models. Machine Learning. 1999;37(2):183–233.

263. Kass R, Wasserman L. A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association. 1995;90:928–934.

264. Keerthi SS, Shevade SK, Bhattacharyya C, Murthy KRK. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation. 2001;13(3):637–649.

265. Kerber R. Chimerge: Discretization of numeric attributes. In: Swartout W, ed. Proceedings of the tenth national conference on artificial intelligence, San Jose, CA. Menlo Park, CA: AAAI Press; 1992;123–128.

266. Kibler D, Aha DW. Learning representative exemplars of concepts: An initial case study. In: Langley P, ed. Proceedings of the fourth machine learning workshop, Irvine, CA. San Francisco, CA: Morgan Kaufmann; 1987;24–30.

267. Kimball R, Ross M. The data warehouse toolkit 2nd ed. New York, NY: John Wiley; 2002.

268. Kira K, Rendell L. A practical approach to feature selection. In: Sleeman D, Edwards P, eds. Proceedings of the ninth international workshop on machine learning, Aberdeen, Scotland. San Francisco, CA: Morgan Kaufmann; 1992;249–258.

269. Kirkby R. Improving hoeffding trees PhD Dissertation New Zealand: Department of Computer Science, University of Waikato; 2007.

270. Kittler J. Feature set search algorithms. In: Chen CH, ed. Pattern recognition and signal processing. The Netherlands: Sijthoff an Noordhoff; 1978.

271. Kivinen J, Smola AJ, Williamson RC. Online learning with kernels. IEEE Transactions on Signal Processing. 2002;52:2165–2176.

272. Kleinberg, J. (1998) “Authoritative sources in a hyperlinked environment.” Proc ACM-SIAM Symposium on Discrete Algorithms. Extended version published in Journal of the ACM, Vol. 46 (1999), pp. 604–632.

273. Koestler A. The act of creation London: Hutchinson; 1964.

274. Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the fourteenth international joint conference on artificial intelligence, Montreal, Canada San Francisco, CA: Morgan Kaufmann; 1995a;1137–1143.

275. Kohavi R. The power of decision tables. In: Lavrac N, Wrobel S, eds. Proceedings of the eighth European conference on machine learning, Iráklion, Crete, Greece. Berlin: Springer-Verlag; 1995b;174–189.

276. Kohavi R. Scaling up the accuracy of Naïve Bayes classifiers: A decision-tree hybrid. In: Simoudis E, Han JW, Fayyad U, eds. Proceedings of the second international conference on knowledge discovery and data mining, Portland, OR. Menlo Park, CA: AAAI Press; 1996;202–207.

277. Kohavi R, John GH. Wrappers for feature subset selection. Artificial Intelligence. 1997;97(1–2):273–324.

278. Kohavi R, Kunz C. Option decision trees with majority votes. In: Fisher D, ed. Proceedings of the fourteenth international conference on machine learning, Nashville, TN. San Francisco, CA: Morgan Kaufmann; 1997;161–191.

279. Kohavi R, Provost F, eds. Machine learning: Special issue on applications of machine learning and the knowledge discovery process. Machine Learning. 1998;30(2/3):127–274.

280. Kohavi R, Sahami M. Error-based and entropy-based discretization of continuous features. In: Simoudis E, Han JW, Fayyad U, eds. Proceedings of the second international conference on knowledge discovery and data mining, Portland, OR. Menlo Park, CA: AAAI Press; 1996;114–119.

281. Koller D, Friedman N. Probabilistic graphical models: Principles and techniques Cambridge, MA: MIT Press; 2009.

282. Komarek P, Moore A. A dynamic adaptation of AD-trees for efficient machine learning on large data sets. In: Langley P, ed. Proceedings of the seventeenth international conference on machine learning, Stanford, CA. San Francisco, CA: Morgan Kaufmann; 2000;495–502.

283. Kononenko I. On biases in estimating multi-valued attributes. Proceedings of the fourteenth international joint conference on artificial intelligence, Montreal, Canada San Francisco, CA: Morgan Kaufmann; 1995;1034–1040.

284. Koppel M, Schler J. Authorship verification as a one-class classification problem. In: Greiner R, Schuurmans D, eds. Proceedings of the twenty-first international conference on machine learning, Banff, Alberta, Canada. New York, NY: ACM Press; 2004;489–495.

285. Kristjansson T, Culotta A, Viola P, McCallum A. Interactive information extraction with constrained conditional random fields. AAAI. 2004, July;4:412–418.

286. Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS 2012).

287. Krogel M-A, Wrobel S. Feature selection for propositionalization. Proceedings of the international conference on discovery science, Lübeck, Germany Berlin: Springer-Verlag; 2002;430–434.

288. Kschischang FR, Frey BJ, Loeliger HA. Factor graphs and the sum-product algorithm. Information Theory, IEEE Transactions on. 2001;47(2):498–519.

289. Kubat M, Holte RC, Matwin S. Machine learning for the detection of oil spills in satellite radar images. Machine Learning. 1998;30:195–215.

290. Kulp, D., Haussler, D., Rees, M.G., & Eeckman, F.H. (1996). A generalized hidden Markov model for the recognition of human genes in DNA. In Proc. Int. Conf. on Intelligent Systems for Molecular Biology (pp. 134–142). St. Louis.

291. Kuncheva LI, Rodriguez JJ. An experimental study on rotation forest ensembles. Proceedings of the seventh international workshop on multiple classifier systems, Prague, Czech Republic Berlin/Heidelberg: Springer; 2007;459–468.

292. Kushmerick N, Weld DS, Doorenbos R. Wrapper induction for information extraction. Proceedings of the fifteenth international joint conference on artificial intelligence, Nagoya, Japan San Francisco, CA: Morgan Kaufmann; 1997;729–735.

293. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In The proceedings of the international conference on machine learning (ICML) (pp. 282–289).

294. Laguna M, Marti R. Scatter search: Methodology and implementations in C Boston, MA: Kluwer Academic Press; 2003.

295. Landwehr N, Hall M, Frank E. Logistic model trees. Machine Learning. 2005;59(1–2):161–205.

296. Langley P. Elements of machine learning San Francisco, CA: Morgan Kaufmann; 1996.

297. Langley P, Iba W, Thompson K. An analysis of Bayesian classifiers. In: Swartout W, ed. Proceedings of the tenth national conference on artificial intelligence, San Jose, CA. Menlo Park, CA: AAAI Press; 1992;223–228.

298. Langley P, Sage S. Induction of selective Bayesian classifiers. In: de Mantaras RL, Poole D, eds. Proceedings of the tenth conference on uncertainty in artificial intelligence, Seattle, WA. San Francisco, CA: Morgan Kaufmann; 1994;399–406.

299. Langley P, Sage S. Scaling to domains with irrelevant features. In: Cambridge, MA: MIT Press; 1997;Greiner R, ed. Computational learning theory and natural learning systems. Vol. 4.

300. Langley P, Simon HA. Applications of machine learning and rule induction. Communications of the ACM. 1995;38(11):55–64.

301. Larochelle, H., & Bengio, Y. (2008). Classification using discriminative restricted Boltzmann machines. In Proceedings of the 25th International Conference on Machine learning (ICML), pp. 536–543.

302. Lauritzen SL, Spiegelhalter DJ. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society Series B (Methodological). 1988;50:157–224.

303. Lavrac N, Motoda H, Fawcett T, Holte R, Langley P, Adriaans P, eds. Special issue on lessons learned from data mining applications and collaborative problem solving. Machine Learning. 2004;57(1/2):83–113.

304. Lawrence, N., Seeger, M., & Herbrich, R. (2003). Fast sparse Gaussian process methods: The informative vector machine. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems (No. EPFL-CONF-161319, pp. 609–616).

305. Lawson CL, Hanson RJ. Solving least squares problems Philadelphia, PA: SIAM Publications; 1995.

306. le Cessie S, van Houwelingen JC. Ridge estimators in logistic regression. Applied Statistics. 1992;41(1):191–201.

307. Le QV, Jaitly N, Hinton GE. A simple way to initialize recurrent networks of rectified linear units. arXiv preprint 2015; arXiv:1504.00941.

308. LeCun Y, Bengio Y, Hinton GE. Deep learning. Nature. 2015;521(7553):436–444.

309. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE. 1998;86(11):2278–2324.

310. LeCun Y, Bottou L, Orr GB, Müller KR. Efficient BackProp. Neural Networks: Tricks of the Trade Berlin: Springer Berlin Heidelberg; 1998;9–50.

311. Li M, Vitanyi PMB. Inductive reasoning and Kolmogorov complexity. Journal Computer and System Sciences. 1992;44:343–384.

312. Lichman M. UCI Machine Learning Repository Irvine, CA: University of California, School of Information and Computer Science; 2013; <http://archive.ics.uci.edu/ml>.

313. Lieberman H, ed. Your wish is my command: Programming by example. San Francisco, CA: Morgan Kaufmann; 2001.

314. Littlestone N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning. 1988;2(4):285–318.

315. Littlestone N. Mistake bounds and logarithmic linear-threshold learning algorithms PhD Dissertation Santa Cruz, CA: University of California; 1989.

316. Liu B. Web data mining: Exploring hyperlinks, contents, and usage data New York, NY: Springer Verlag; 2009.

317. Liu B, Hsu W, Ma YM. Integrating classification and association rule mining. Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98) New York, NY: AAAI Press; 1998;80–86.

318. Liu H, Setiono R. A probabilistic approach to feature selection: A filter solution. In: Saitta L, ed. Proceedings of the thirteenth international conference on machine learning, Bari, Italy. San Francisco, CA: Morgan Kaufmann; 1996;319–327.

319. Liu H, Setiono R. Feature selection via discretization. IEEE Transactions on Knowledge and Data Engineering. 1997;9(4):642–645.

320. Lowe DG. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision. 2004;60(2):91–110.

321. Luan J. Data mining and its applications in higher education. New Directions for Institutional Research. 2002;2002(113):17–36.

322. Lunn D, Spiegelhalter D, Thomas A, Best N. The BUGS project: Evolution, critique and future directions (with discussion). Statistics in Medicine. 2009;28:3049–3082.

323. Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS—a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing. 2000;10:325–337.

324. Mann T. Library research models: A guide to classification, cataloging, and computers New York, NY: Oxford University Press; 1993.

325. Marill T, Green DM. On the effectiveness of receptors in recognition systems. IEEE Transactions on Information Theory. 1963;9(11):11–17.

326. Maron O. Learning from ambiguity Ph.D thesis Massachusetts Institute of Technology 1998.

327. Maron O, Lozano-Peréz T. A framework for multiple-instance learning. Proceedings of the conference on neural information processing systems, Denver, CO Cambridge, MA: MIT Press; 1997;570–576.

328. Martin B. Instance-based learning: Nearest neighbour with generalisation MSc Thesis Department of Computer Science, University of Waikato, New Zealand 1995.

329. McCallum, A.K. (2002). Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu.

330. McCallum A, Nigam K. A comparison of event models for Naïve Bayes text classification. Proceedings of the AAAI-98 workshop on learning for text categorization, Madison, WI Menlo Park, CA: AAAI Press; 1998;41–48.

331. McCallum, A., Pal, C., Druck, G., and Wang, X. (2006). Multi-conditional learning: Generative/discriminative training for clustering and classification. In the proceedings of AAAI (Vol. 21, No. 1, p. 433). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.

332. McCullagh P. Regression models for ordinal data. Journal of the Royal Statistical Society Series B (Methodological). 1980;42:109–142.

333. McCullagh P, Nelder JA. Generalized linear models. Vol. 37 Boca Raton, FL: CRC Press; 1989.

334. Medelyan O, Witten IH. Domain independent automatic keyphrase indexing with small training sets. Journal American Society for Information Science and Technology. 2008;59:1026–1040.

335. Mehta M, Agrawal R, Rissanen J. SLIQ: A fast scalable classifier for data mining. In: Apers P, Bouzeghoub M, Gardarin G, eds. Proceedings of the fifth international conference on extending database technology, Avignon, France. New York, NY: Springer-Verlag; 1996.

336. Melville P, Mooney RJ. Creating diversity in ensembles using artificial data. Information Fusion. 2005;6(1):99–111.

337. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of state calculations by fast computing machines. Journal of Chemical Physics. 1953;21(6):1087–1092.

338. Michalski RS, Chilausky RL. Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal of Policy Analysis and Information Systems. 1980;4(2):125–161.

339. Michie D. Problems of computer-aided concept formation. In: Wokingham: Addison-Wesley; 1989;310–333. Quinlan JR, ed. Applications of expert systems. Vol. 2.

340. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint 2013a; arXiv:1301.3781.

341. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems. 2013b;26:3111–3119.

342. Minka T. Old and new matrix algebra useful for statistics MIT Media Lab note 2000.

343. Minka TP. Expectation propagation for approximate Bayesian inference. Proceedings of the seventeenth conference on uncertainty in artificial intelligence San Francisco, CA: Morgan Kaufmann Publishers Inc; 2001;362–369.

344. Minsky M, Papert S. Perceptrons Cambridge, MA: MIT Press; 1969.

345. Mitchell TM. Machine Learning New York, NY: McGraw Hill; 1997.

346. Mitchell TM, Caruana R, Freitag D, McDermott J, Zabowski D. Experience with a learning personal assistant. Communications of the ACM. 1994;37(7):81–91.

347. Moore AW. Efficient memory-based learning for robot control PhD Dissertation Computer Laboratory, University of Cambridge, UK 1991.

348. Moore AW. The anchors hierarchy: Using the triangle inequality to survive high-dimensional data. In: Boutilier C, Goldszmidt M, eds. Proceedings of the sixteenth conference on uncertainty in artificial intelligence, Stanford, CA. San Francisco, CA: Morgan Kaufmann; 2000;397–405.

349. Moore AW, Lee MS. Efficient algorithms for minimizing cross validation error. In: Cohen WW, Hirsh H, eds. Proceedings of the eleventh international conference on machine learning, New Brunswick, NJ. San Francisco, CA: Morgan Kaufmann; 1994;190–198.

350. Moore AW, Pelleg D. Cached sufficient statistics for efficient machine learning with large datasets. Journal Artificial Intelligence Research. 1998;8:67–91.

351. Moore AW, Pelleg D. X-means: Extending k-means with efficient estimation of the number of clusters. In: Langley P, ed. Proceedings of the seventeenth international conference on machine learning, Stanford, CA. San Francisco, CA: Morgan Kaufmann; 2000;727–734.

352. Morin, F., & Bengio, Y. (2005). Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics (pp. 246–252).

353. Murphy KP. Dynamic Bayesian networks: Representation, inference and learning Doctoral dissertation Berkeley, CA: University of California; 2002.

354. Murphy KP. Machine learning: A probabilistic perspective Cambridge, MA: MIT Press; 2012.

355. Mutter S, Hall M, Frank E. Using classification to evaluate the output of confidence-based association rule mining. Proceedings of the seventeenth Australian joint conference on artificial intelligence, Cairns, Australia Berlin: Springer; 2004;538–549.

356. Nadeau C, Bengio Y. Inference for the generalization error. Machine Learning. 2003;52(3):239–281.

357. Nahm, U.Y., & Mooney, R.J. (2000). Using information extraction to aid the discovery of prediction rules from texts. Proceedings of the Workshop on Text Mining at the Sixth International Conference on Knowledge Discovery and Data Mining (pp. 51–58). Boston, MA. Workshop proceedings at: http://www.cs.cmu.edu/~dunja/WshKDD2000.html.

358. Neal RM. Connectionist learning of belief networks. Artificial Intelligence. 1992;56(1):71–113.

359. Neal RM, Hinton GE. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models Netherlands: Springer; 1998;355–368.

360. Nelder J, Wedderburn R. Generalized linear models. Journal of the Royal Statistical Society Series A. 1972;135(3):370–384.

361. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A.Y. (2011). Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning (Vol. 2011, p. 4). Granada, Spain.

362. Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. Proceedings of the 22nd international conference on machine learning, Bonn, Germany New York, NY: ACM Press; 2005;625–632.

363. Nie NH, Hull CH, Jenkins JG, Steinbrenner K, Bent DH. Statistical package for the social sciences New York, NY: McGraw Hill; 1970.

364. Nigam K, Ghani R. Analyzing the effectiveness and applicability of co-training. Proceedings of the ninth international conference on information and knowledge management, McLean, VA New York, NY: ACM Press; 2000;86–93.

365. Nigam K, McCallum AK, Thrun S, Mitchell TM. Text classification from labeled and unlabeled documents using EM. Machine Learning. 2000;39(2/3):103–134.

366. Nilsson NJ. Learning machines New York, NY: McGraw Hill; 1965.

367. Nisbet R, Elder J, Miner G. Handbook of statistical analysis and data mining applications New York, NY: Academic Press; 2009.

368. Oates T, Jensen D. The effects of training set size on decision tree complexity. Proceedings of the fourteenth international conference on machine learning, Nashville, TN San Francisco, CA: Morgan Kaufmann; 1997;254–262.

369. Ohm, P. (2009). Broken promises of privacy: Responding to the surprising failure of anonymization. University of Colorado Law Legal Studies Research Paper No. 09-12, August.

370. Omohundro SM. Efficient algorithms with neural network behavior. Journal of Complex Systems. 1987;1(2):273–347.

371. Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML), pp. 1310–1318.

372. Paynter GW. Automating iterative tasks with programming by demonstration PhD Diessertation Department of Computer Science, University of Waikato, New Zealand 2000.

373. Pearson R. Mining Imperfect Data USA: Society for Industrial and Applied Mechanics; 2005.

374. Pedregosa F, Varoquaux G, Gramfort A, et al…. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830.

375. Pei J, Han J, Mortazavi-Asi B, et al…. Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Trans Knowledge and Data Engineering. 2004;16(11):1424–1440.

376. Petersen KB, Pedersen MS. The matrix cookbook Technical University of Denmark 2012; Version Nov. 2012.

377. Piatetsky-Shapiro G, Frawley WJ, eds. Knowledge discovery in databases. Menlo Park, CA: AAAI Press/MIT Press; 1991.

378. Platt J. Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C, Smola A, eds. Advances in kernel methods: Support vector learning. Cambridge, MA: MIT Press; 1998.

379. Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers. 1999;10(3):61–74.

380. Power DJ. What is the true story about data mining, beer and diapers? DSS News. 2002;3 <http://www.dssresources.com/newsletters/66.php>.

381. Provost F, Fawcett T. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Heckerman D, Mannila H, Pregibon D, Uthurusamy R, eds. Proceedings of the third international conference on knowledge discovery and data mining, Huntington Beach, CA. Menlo Park, CA: AAAI Press; 1997;43–48.

382. Pyle D. Data preparation for data mining San Francisco, CA: Morgan Kaufmann; 1999.

383. Quinlan JR. Induction of decision trees. Machine Learning. 1986;1(1):81–106.

384. Quinlan JR. Learning with continuous classes. In: Adams N, Sterling L, eds. Proceedings of the fifth Australian joint conference on artificial intelligence, Hobart, Tasmania. Singapore: World Scientific; 1992;343–348.

385. Quinlan JR. C4.5: Programs for machine learning San Francisco, CA: Morgan Kaufmann; 1993.

386. Quinlan JR. Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research. 1996;4:77–90.

387. Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. 1989;77(2):257–286.

388. Rabiner LR, Juang BH. An introduction to hidden Markov models. ASSP Magazine, IEEE. 1986;3(1):4–16.

389. Ramon, J., & de Raedt, L. (2000). Multi instance neural networks. Proceedings of the ICML workshop on attribute-value and relational learning (pp. 53–60). Stanford, CA.

390. Ray S, Craven M. Supervised learning versus multiple instance learning: An empirical comparison. Proceedings of the International Conference on Machine Learning, Bonn, Germany New York, NY: ACM Press; 2005;697–704.

391. Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multi-label classification. Proc 13th European conference on principles and practice of knowledge discovery in databases and 20th European conference on machine learning, Bled, Slovenia Berlin: Springer Verlag; 2009;254–269.

392. Rennie JDM, Shih L, Teevan J, Karger DR. Tackling the poor assumptions of Naïve Bayes text classifiers. In: Fawcett T, Mishra N, eds. Proceedings of the twentieth international conference on machine learning, Washington, DC. Menlo Park, CA: AAAI Press; 2003;616–623.

393. Ricci F, Aha DW. Error-correcting output codes for local learners. In: Nedellec C, Rouveird C, eds. Proceedings of the European conference on machine learning, Chemnitz, Germany. Berlin: Springer-Verlag; 1998;280–291.

394. Richards D, Compton P. Taking up the situated cognition challenge with ripple-down rules. International Journal of Human-Computer Studies. 1998;49(6):895–926.

395. Richardson M, Domingos P. Markov logic networks. Machine Learning. 2006;62(1–2):107–136.

396. Rifkin R, Klautau A. In defense of one-vs-all classification. Journal of Machine Learning Research. 2004;5:101–141.

397. Ripley BD. Pattern recognition and neural networks Cambridge: Cambridge University Press; 1996.

398. Rissanen J. The minimum description length principle. In: New York, NY: John Wiley; 1985;523–527. Kotz S, Johnson NL, eds. Encylopedia of statistical sciences. Vol. 5.

399. Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics. 1951;22:400–407.

400. Rodriguez JJ, Kuncheva LI, Alonso CJ. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006;28(10):1619–1630.

401. Rojas R. Neural networks: A systematic introduction Berlin: Springer; 1996.

402. Rousseeuw PJ, Leroy AM. Robust regression and outlier detection New York, NY: John Wiley; 1987.

403. Roweis S. EM algorithms for PCA and SPCA. Advances in Neural Information Processing Systems. 1998;10:626–632.

404. Rumelhart DE, Hinton GE, Williams RJ. Learning intemal representation by error propagation. Parallel Distributed Processing. 1986;1:318–362.

405. Russakovsky O, Deng J, Su H, et al…. Imagenet large scale visual recognition challenge. International Journal of Computer Vision. 2015;115(3):211–252.

406. Russell S, Norvig P. Artificial intelligence: A modern approach 3rd ed. Upper Saddle River, NJ: Prentice Hall; 2009.

407. Sahami M, Dumais S, Heckerman D, Horvitz E. A Bayesian approach to filtering junk e-mail. Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI Menlo Park, CA: AAAI Press; 1998;55–62.

408. Saitta L, Neri F. Learning in the “real world.”. Machine Learning. 1998;30(2/3):133–163.

409. Salakhutdinov R, Hinton GE. Deep Boltzmann machines. International Conference on Artificial Intelligence and Statistics. 2009;9:448–455.

410. Salakhutdinov R, Hinton GE. An efficient learning procedure for deep Boltzmann machines. Neural Computation. 2012;24(8):1967–2006.

411. Salakhutdinov R, Roweis S, Ghahramani Z. Optimization with EM and expectation-conjugate-gradient. ICML. 2003;20:672–679.

412. Salzberg S. A nearest hyperrectangle learning method. Machine Learning. 1991;6(3):251–276.

413. Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher DH, ed. Proceedings of the fourteenth international conference on machine learning, Nashville, TN. San Francisco, CA: Morgan Kaufmann; 1997;322–330.

414. Scheffer T. Finding association rules that trade support optimally against confidence. In: de Raedt L, Siebes A, eds. Proceedings of the fifth European conference on principles of data mining and knowledge discovery, Freiburg, Germany. Berlin: Springer-Verlag; 2001;424–435.

415. Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks. 2015;61:85–117.

416. Schölkopf B, Bartlett P, Smola AJ, Williamson R. Shrinking the tube: A new support vector regression algorithm. Advances in Neural Information Processing Systems. Vol. 11 Cambridge, MA: MIT Press; 1999;330–336.

417. Schölkopf B, Smola AJ. Learning with kernels: Support vector machines, regularization, optimization, and beyond Cambridge, MA: MIT Press; 2002.

418. Schölkopf B, Williamson R, Smola A, Shawe-Taylor J, Platt J. Support vector method for novelty detection. Advances in Neural Information Processing Systems. 12 MIT Press 2000;582–588.

419. Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing. 1997;45(11):2673–2681.

420. Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys. 2002;34(1):1–47.

421. Seewald AK. How to make stacking better and faster while also taking care of an unknown weakness. Proceedings of the Nineteenth International Conference on Machine Learning, Sydney, Australia San Francisco, CA: Morgan Kaufmann; 2002;54–561.

422. Seewald AK, Fürnkranz J. An evaluation of grading classifiers. In: Hoffmann F, Hand DJ, Adams NM, Fisher DH, Guimarães G, eds. Proceedings of the fourth international conference on advances in intelligent data analysis, Cascais, Portugal. Berlin: Springer-Verlag; 2001;115–124.

423. Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology Volume 1 (pp. 134–141). Association for Computational Linguistics.

424. Shafer R, Agrawal R, Metha M. SPRINT: A scalable parallel classifier for data mining. In: Vijayaraman TM, Buchmann AP, Mohan C, Sarda NL, eds. Proceedings of the second international conference on very large databases, Mumbai (Bombay), India. San Francisco, CA: Morgan Kaufmann; 1996;544–555.

425. Shalev-Shwartz S, Singer Y, Srebro N. Pegasos: Primal estimated sub-gradient solver for SVM. Proceedings of the 24th international conference on Machine Learning New York, NY: ACM Press; 2007;807–814.

426. Shawe-Taylor J, Cristianini N. Kernel methods for pattern analysis Cambridge: Cambridge University Press; 2004.

427. Shearer C. The CRISP-DM model: The new blueprint for data mining. J Data Warehousing. 2000;5:13–22.

428. Simard, P.Y., Steinkraus, D., & Platt, J.C. (2003). Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), vol. 3, pp. 958–962.

429. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In the proceedings of ICLR 2015. arXiv preprint arXiv:1409.1556.

430. Slonim N, Friedman N, Tishby N. Unsupervised document classification using sequential information maximization. Proceedings of the 25th international ACM SIGIR conference on research and development in information retrieval New York, NY: ACM Press; 2002;129–136.

431. Smola AJ, Scholköpf B. A tutorial on support vector regression. Statistics and Computing. 2004;14(3):199–222.

432. Smolensky P. Information processing in dynamical systems: foundations of harmony theory. In: Rumelhart DE, McClelland, and the PDP Research Group JL, eds. Cambridge, MA: MIT Press; 1986;194–281. Parallel distributed processing: explorations in the microstructure of cognition. Vol. 1.

433. Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Advances in neural Information Processing Systems. 2012;464:2951–2959.

434. Soderland S, Fisher D, Aseltine J, Lehnert W. Crystal: Inducing a conceptual dictionary. Proceedings of the fourteenth international joint conference on artificial intelligence, Montreal, Canada Menlo Park, CA: AAAI Press; 1995;1314–1319.

435. Spiegelhalter, D., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS user manual.

436. Srikant, R., & Agrawal, R. (1996). Mining sequential patters: Generalizations and performance improvements. Proceedings of the Fifth International Conference on Extending Database Technology. Avignon, France. P. M. Apers, M. Bouzeghoub, and G. Gardarin, Eds. Lecture Notes In Computer Science, Vol. 1057. Springer-Verlag, London, 3–17.

437. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research. 2014;15(1):1929–1958.

438. Stevens SS. On the theory of scales of measurement. Science. 1946;103:677–680.

439. Stone P, Veloso M. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots. 2000;8(3):345–383.

440. Stout QF. Unimodal regression via prefix isotonic regression. Computational Statistics and Data Analysis. 2008;53:289–297.

441. Su J, Zhang H, Ling CX, Matwin S. Discriminative parameter learning for Bayesian networks. Proceedings of the 25th International Conference on Machine Learning Helsinki: ACM Press; 2008;1016–1023.

442. Sugiyama M. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. The Journal of Machine Learning Research. 2007;8:1027–1061.

443. Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In Advances in Neural Information Processing Systems (pp. 1988–1996).

444. Sutskever, I., Vinyals, O., & Le, Q.V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104–3112).

445. Sutton, C., & McCallum, A. (2004). Collective segmentation and labeling of distant entities in information extraction. University of Massachusetts Amherst, Dept. of Computer Science Technical Report TR-04-49.

446. Sutton, C., & McCallum, A. (2006). An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, 93–128.

447. Swets J. Measuring the accuracy of diagnostic systems. Science. 1988;240:1285–1293.

448. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A., (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9.

449. Taigman, Y., Yang, M., Ranzato, M.A., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1701–1708.

450. Teh, Y.W., Newman, D., & Welling, M. (2006). A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Advances in neural information processing systems, pp. 1353–1360.

451. Theano Development Team, Al-Rfou, R., Alain, G., Almahairi, A., Angermueller, C., Bahdanau, D., Belopolsky, A. (2016). Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688.

452. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological) 1996;267–288.

453. Ting KM. An instance-weighting method to induce cost-sensitive trees. IEEE Transactions on Knowledge and Data Engineering. 2002;14(3):659–665.

454. Ting KM, Witten IH. Stacked generalization: When does it work?. Proceedings of the fifteenth international joint conference on artificial intelligence, Nagoya, Japan San Francisco, CA: Morgan Kaufmann; 1997a;866–871.

455. Ting KM, Witten IH. Stacking bagged and dagged models. In: Fisher DH, ed. Proceedings of the fourteenth international conference on machine learning, Nashville, TN. . San Francisco, CA: Morgan Kaufmann; 1997b;367–375.

456. Tipping ME. Sparse Bayesian learning and the relevance vector machine. The Journal of Machine Learning Research. 2001;1:211–244.

457. Tipping ME, Bishop CM. Mixtures of probabilistic principal component analyzers. Neural Computation. 1999a;11(2):443–482.

458. Tipping ME, Bishop CM. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 1999b;61(3):611–622.

459. Turk M, Pentland A. Eigenfaces for recognition. Journal of Cognitive Neuroscience. 1991;3(1):71–86.

460. Turney PD. Learning to extract key phrases from text Technical Report ERB-1057 Ottawa, Canada: Institute for Information Technology, National Research Council of Canada; 1999.

461. U.S. House of Representatives Subcommittee on Aviation. (2002). Hearing on aviation security with a focus on passenger profiling, February 27, 2002. <http://www.house.gov/transportation/aviation/02-27-02/02-27-02memo.html>.

462. Utgoff PE. Incremental induction of decision trees. Machine Learning. 1989;4(2):161–186.

463. Utgoff PE, Berkman NC, Clouse JA. Decision tree induction based on efficient tree restructuring. Machine Learning. 1997;29(1):5–44.

464. Vafaie H, DeJong K. Genetic algorithms as a tool for feature selection in machine learning. Proceedings of the international conference on tools with artificial intelligence Arlington, VA: IEEE Computer Society Press; 1992;200–203.

465. van Rijsbergen CA. Information retrieval. London: Butterworths; 1979.

466. Vapnik V. The nature of statistical learning theory 2nd ed. New York, NY: Springer-Verlag; 1999.

467. Venables WN, Ripley BD. S Programming Springer 2000.

468. Venables WN, Ripley BD. Modern Applied Statistics with S 4th ed. New York, NY: Springer; 2002.

469. Venter JC, et al. The sequence of the human genome. Science. 2001;291(5507):1304–1351.

470. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research. 2010;11:3371–3408.

471. Vitter JS. Random sampling with a reservoir. ACM Transactions on Mathematical Software. 1985;1(11):37–57.

472. Wang, J., Han, J., & Pei, J. (2003). CLOSET+: Searching for the best strategies for mining frequent closed itemsets. Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’03), Washington, DC.

473. Wang J, Zucker J-D. Solving the multiple-instance problem: A lazy learning approach. Proceedings of the international conference on machine learning, Stanford, CA San Francisco, CA: Morgan Kaufmann; 2000;1119–1125.

474. Wang Y, Witten IH. Induction of model trees for predicting continuous classes. In: van Someren M, Widmer G, eds. Proceedings of the of the poster papers of the european conference on machine learning. Prague: University of Economics, Faculty of Informatics and Statistics; 1997;128–137.

475. Wang Y, Witten IH. Modeling for optimal probability prediction. In: Sammut C, Hoffmann A, eds. Proceedings of the nineteenth international conference on machine learning, Sydney, Australia. San Francisco, CA: Morgan Kaufmann; 2002;650–657.

476. Webb GI. Decision tree grafting from the all-tests-but-one partition. Proceedings of the sixteenth international joint conference on artificial intelligence San Francisco, CA: Morgan Kaufmann; 1999;702–707.

477. Webb GI. MultiBoosting: A technique for combining boosting and wagging. Machine Learning. 2000;40(2):159–196.

478. Webb GI, Boughton J, Wang Z. Not so naïve Bayes: Aggregating one-dependence estimators. Machine Learning. 2005;58(1):5–24.

479. Webb GI, Boughton JR, Zheng F, Ting KM, Salem H. Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification. Machine Learning. 2012;86(2):233–272.

480. Wegener I. The complexity of Boolean functions New York, NY: John Wiley and Sons; 1987.

481. Weidmann N, Frank E, Pfahringer B. A two-level learning method for generalized multi-instance problems. Proceedings of the European conference on machine learning, Cavtat, Croatia Berlin: Springer-Verlag; 2003;468–479.

482. Weiser, M. (1996). Open house. Review, the web magazine of the Interactive Telecommunications Program of New York University.

483. Weiser M, Brown JS. The coming age of calm technology. In: Denning PJ, Metcalfe RM, eds. Beyond calculation: The next fifty years. New York, NY: Copernicus; 1997;75–86.

484. Weiss SM, Indurkhya N. Predictive data mining: A practical guide San Francisco, CA: Morgan Kaufmann; 1998.

485. Welling, M., Rosen-Zvi, M., & Hinton, G.E. (2004). Exponential family harmoniums with an application to information retrieval. In Advances in neural information processing systems (pp. 1481–1488).

486. Werbos P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences PhD thesis Harvard University 1974.

487. Wettschereck D, Dietterich TG. An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Machine Learning. 1995;19(1):5–28.

488. Wild CJ, Seber GAF. Introduction to probability and statistics New Zealand: Department of Statistics, University of Auckland; 1995.

489. Williams CK, Rasmussen CE. Gaussian processes for machine learning MIT Press 2006; 2(3), 4.

490. Winn JM, Bishop CM. Variational message passing. Journal of Machine Learning Research. 2005;6:661–694.

491. Winston PH. Artificial intelligence Reading, MA: Addison-Wesley; 1992.

492. Witten IH. Text mining. In: Singh MP, ed. Practical handbook of internet computing. Boca Raton, FL: CRC Press; 2004; 14-1–14-22.

493. Witten IH, Bell TC. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory. 1991;37(4):1085–1094.

494. Witten IH, Bray Z, Mahoui M, Teahan W. Text mining: A new frontier for lossless compression. In: Storer JA, Cohn M, eds. Proceedings of the data compression conference, Snowbird, UT. Los Alamitos, CA: IEEE Press; 1999a;198–207.

495. Witten IH, Moffat A, Bell TC. Managing gigabytes: Compressing and indexing documents and images second edition San Francisco, CA: Morgan Kaufmann; 1999b.

496. Wolpert DH. Stacked generalization. Neural Networks. 1992;5:241–259.

497. Wu X, Kumar V, eds. The top ten algorithms in data mining. London: Chapman and Hall; 2009.

498. Wu XV, Kumar JR, Quinlan J, et al…. Top 10 algorithms in data mining. Knowledge and Information Systems. 2008;14(1):1–37.

499. Xu B, Wang N, Chen T, Li M. Empirical Evaluation of Rectified Activations in Convolutional Network. arXiv preprint 2015; arXiv:1505.00853.

500. Xu X, Frank E. Logistic regression and boosting for labeled bags of instances. Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining, Sydney, Australia Berlin: Springer-Verlag; 2004;272–281.

501. Yan X, Han J. gSpan: Graph-based substructure pattern mining. Proceedings of the IEEE international conference on data mining (ICDM ’02) Washington, DC: IEEE Computer Society; 2002.

502. Yan, X., & Han, J. (2003). CloseGraph: Mining closed frequent graph patterns. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

503. Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large datasets. Proceedings of the SIAM International Conference on Data Mining (SDM’03), San Francisco, CA.

504. Yang, Y., Guan, X., & You, J. (2002). CLOPE: A fast and effective clustering algorithm for transactional data. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 682–687.

505. Yang Y, Webb GI. Proportional k-interval discretization for Naïve Bayes classifiers. In: de Raedt L, Flach P, eds. Proceedings of the Twelfth European Conference on Machine Learning, Freiburg, Germany. Berlin: Springer-Verlag; 2001;564–575.

506. Yu, D., Eversole, A., Seltzer, M., Yao, K., Huang, Z., Guenter, B., Droppo, J. (2014). An introduction to computational networks and the computational network toolkit. Tech. Rep. MSR-TR-2014-112, Microsoft Research, Code: http://codebox/cntk.

507. Yurcik W, Barlow J, Zhou Y, et al…. Scalable data management alternatives to support data mining heterogeneous logs for computer network security. Proceedings of the workshop on data mining for counter terrorism and security, San Francisco, CA Philadelphia, PA: Society for International and Applied Mathematics; 2003.

508. Zadrozny B, Elkan C. Transforming classifier scores into accurate multiclass probability estimates. Proceedings of the eighth ACM international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada New York, NY: ACM Press; 2002;694–699.

509. Zaki, M.J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. Proceedings Knowledge Discovery in Databases (pp. 283–286).

510. Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1592–1599).

511. Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. Proceeding of ECCV 2014 New York, NY: Springer International Publishing; 2014;818–833.

512. Zhang H, Jiang L, Su J. Hidden Naïve Bayes. Proceedings of the 20th national conference on artificial intelligence Menlo Park, CA: AAAI Press; 2005;919–924.

513. Zhang T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. Proceedings of the 21st international conference on machine learning Omni Press 2004;919–926.

514. Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases. Proceedings of the ACM SIGMOD international conference on management of data, Montreal, Quebec, Canada New York, NY: ACM Press; 1996;103–114.

515. Zheng F, Webb G. Efficient lazy elimination for averaged one-dependence estimators. Proceedings of the 23rd international conference on machine learning New York, NY: ACM Press; 2006;1113–1120.

516. Zheng Z, Webb G. Lazy learning of Bayesian rules. Machine Learning. 2000;41(1):53–84.

517. Zhou Z-H, Zhang M-L. Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems. 2007;11(2):155–170.

518. Zhu J, Hastie T. Kernel logistic regression and the import vector machine. Journal of Computational and Graphical Statistics. 2005;14(1):185–205.

519. Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society, Series B. 2005;67:301–320.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.55.20