The following are the references for all the citations throughout the book:
Anguita, D., Ghio, A., Oneto, L., Parra, X., and Reyes-Ortiz, J. L. (2013). A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges (Belgium), 24-26 April 2013.
Bengio, Y. (2012). Practical Recommendations for Gradient-Based Training of Deep Architectures. In Neural Networks: Tricks of the Trade (pp. 437-478). Springer Berlin Heidelberg. (Also on the arXiv: http://arxiv.org/pdf/1206.5533.pdf)
Bengio, Y., Courville, A., and Vincent, P. (2013). Representation Learning: A Review and New Perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions, 35(8), 1798-1828.
Bergmeir, C., and Benítez, J. M. (2012). Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS. Journal of Statistical Software, 46(7), 1-26.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning, Springer.
Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013). Maxout Networks. arXiv preprint arXiv:1302.4389.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. Springer.
Hinton, G. E., Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18 (7), 1527-1554.
Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28 (5), 1-26.
Kuhn, M. and Johnson, K. (2013). Applied Predictive Modeling. New York: Springer.
Lichman, M. (2013). UCI Machine Learning Repository (http://archive.ics.uci.edu/ml). Irvine, CA: University of California, School of Information and Computer Science.
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT press.
Nair, V., and Hinton, G. E. (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 807-814).
Riedmiller, M., and Braun, H. (1993). A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. In Neural Networks, 1993, IEEE International Conference.
Schmidhuber, J. (2015). Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S-Plus. Fourth Edition. Springer.
Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P. A. (2008, July). Extracting and Composing Robust Features with Denoising Autoencoders. In Proceedings of the 25th International Conference on Machine Learning (pp. 1096-1103). ACM.
Zeiler, M. D. (2012). ADADELTA: An Adaptive Learning Rate Method. arXiv preprint arXiv:1212.5701.