
ILSVRC 2016 Results. 106

A. Alahi, R. Ortiz, and P. Vandergheynst. Freak: Fast retina keypoint. In IEEE Conference on Computer Vision and Pattern Recognition, pages 510–517, 2012. DOI: 10.1109/cvpr.2012.6247715. 14

Ark Anderson, Kyle Shaffer, Artem Yankov, Court D. Corley, and Nathan O. Hodas. Beyond fine tuning: A modular approach to learning on small data. arXiv preprint arXiv:1611.01714, 2016. 72

Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. Netvlad: CNN architecture for weakly supervised place recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5297–5307, 2016. DOI: 10.1109/cvpr.2016.572. 63, 64

Pablo Arbeláez, Jordi Pont-Tuset, Jonathan T. Barron, Ferran Marques, and Jitendra Malik. Multiscale combinatorial grouping. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 328–335, 2014. DOI: 10.1109/cvpr.2014.49. 140

Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. Factors of transferability for a generic convnet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9):1790–1802, 2016. DOI: 10.1109/tpami.2015.2500224. 72

David Balduzzi, Marcus Frean, Lennox Leary, J. P. Lewis, Kurt Wan-Duo Ma, and Brian McWilliams. The shattered gradients problem: If resnets are the answer, then what is the question? arXiv preprint arXiv:1702.08591, 2017. 95, 98

Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. Speeded-up robust features (surf). Computer Vision and Image Understanding, 110(3):346–359, 2008. DOI: 10.1016/j.cviu.2007.09.014. 7, 11, 14, 19, 29

Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: A survey. arXiv preprint arXiv:1502.05767, 2015. 90

N. Bayramoglu and A. Alatan. Shape index sift: Range image recognition using local features. In 20th International Conference on Pattern Recognition, pages 352–355, 2010. DOI: 10.1109/icpr.2010.95. 13

Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, et al. Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems, 19:153, 2007. 70

Zhou Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Object detectors emerge in deep scene CNNs. In International Conference on Learning Representations, 2015. 94, 97

Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. DOI: 10.1023/A:1010933404324. 7, 11, 22, 26, 29

M. Brown and D. Lowe. Invariant features from interest point groups. In Proc. of the British Machine Vision Conference, pages 23.1–23.10, 2002. DOI: 10.5244/c.16.23. 20

M. Calonder, V. Lepetit, C. Strecha, and P. Fua. BRIEF: Binary robust independent elementary features. In 11th European Conference on Computer Vision, pages 778–792, 2010. DOI: 10.1007/978-3-642-15561-1_56. 14

Jean-Pierre Changeux and Paul Ricoeur. What Makes Us Think?: A Neuroscientist and a Philosopher Argue About Ethics, Human Nature, and the Brain. Princeton University Press, 2002. 40

Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531, 2014. DOI: 10.5244/c.28.6. 123

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062, 2014. 50, 133

Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014. DOI: 10.3115/v1/w14-4012. 38

Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, (CVPR). IEEE Computer Society Conference on, volume 1, pages 539–546, 2005. DOI: 10.1109/cvpr.2005.202. 67

Dan C. Ciresan, Ueli Meier, Jonathan Masci, Luca M. Gambardella, and Jürgen Schmidhuber. High-performance neural networks for visual object classification. arXiv preprint arXiv:1102.0183, 2011. 102

Corinna Cortes. Support-vector networks. Machine Learning, 20(3):273–297, 1995. DOI: 10.1007/bf00994018. 7, 11, 22, 29

Koby Crammer and Yoram Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265–292, 2001. 66

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852, 2016. 170

Li Deng. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing, 3:e2, 2014. DOI: 10.1017/at-sip.2014.4. 117

Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. Long-term recurrent convolutional networks for visual recognition and description. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2625–2634, 2015. DOI: 10.1109/cvpr.2015.7298878. 150, 155

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, 2011. 83

Vincent Dumoulin and Francesco Visin. A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285, 2016. 60

Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627–1645, 2010. DOI: 10.1109/tpami.2009.167. 121

R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(7):179–188, 1936. DOI: 10.1111/j.1469-1809.1936.tb02137.x. 22

Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997. DOI: 10.1006/jcss.1997.1504. 22

Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189–1232, 2000. 22

Kunihiko Fukushima and Sei Miyake. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and Cooperation in Neural Nets, pages 267–285. Springer, 1982. DOI: 10.1007/978-3-642-46466-9_18. 43

Ross Girshick. Fast R-CNN. In Proc. of the IEEE International Conference on Computer Vision, pages 1440–1448, 2015. DOI: 10.1109/iccv.2015.169. 60

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1):142–158, 2016. DOI: 10.1109/tpami.2015.2437384. 120

Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Aistats, 9:249–256, 2010. 70

Ian Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016. 142

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014. 141, 142, 149

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. 54

Alex Graves and Jürgen Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5):602–610, 2005. DOI: 10.1016/j.neunet.2005.06.042. 38

Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014. 38

Alex Graves et al. Supervised Sequence Labelling with Recurrent Neural Networks, volume 385. Springer, 2012. DOI: 10.1007/978-3-642-24797-2. 31

Saurabh Gupta, Pablo Arbelaez, and Jitendra Malik. Perceptual organization and recognition of indoor scenes from RGB-D images. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 564–571, 2013. DOI: 10.1109/cvpr.2013.79. 140

Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. Learning rich features from RGB-D images for object detection and segmentation. In European Conference on Computer Vision, pages 345–360. Springer, 2014. DOI: 10.1007/978-3-319-10584-0_23. 139, 141

Richard H. R. Hahnloser, Rahul Sarpeshkar, Misha A. Mahowald, Rodney J. Douglas, and H. Sebastian Seung. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405(6789):947, 2000. DOI: 10.1038/35016072. 55

Munawar Hayat, Salman H. Khan, Mohammed Bennamoun, and Senjian An. A spatial layout and scale invariant feature representation for indoor scene classification. IEEE Transactions on Image Processing, 25(10):4829–4841, 2016. DOI: 10.1109/tip.2016.2599292. 96

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In European Conference on Computer Vision, pages 346–361. Springer, 2014. DOI: 10.1007/978-3-319-10578-9_23. 62

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proc. of the IEEE International Conference on Computer Vision, pages 1026–1034, 2015a. DOI: 10.1109/iccv.2015.123. 71

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1904–1916, 2015b. DOI: 10.1109/tpami.2015.2389824. 61, 125

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016a. DOI: 10.1109/cvpr.2016.90. 77, 106, 170

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630–645. Springer, 2016b. DOI: 10.1007/978-3-319-46493-0_38. 108, 111

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask R-CNN. arXiv preprint arXiv:1703.06870, 2017. DOI: 10.1109/iccv.2017.322. 60, 140, 141

Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006. DOI: 10.1162/neco.2006.18.7.1527. 70

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. DOI: 10.1162/neco.1997.9.8.1735. 38

Gao Huang, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993, 2016a. DOI: 10.1109/cvpr.2017.243. 114, 115

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. In European Conference on Computer Vision, pages 646–661, 2016b. DOI: 10.1007/978-3-319-46493-0_39. 170

David H. Hubel and Torsten N. Wiesel. Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 148(3):574–591, 1959. DOI: 10.1113/jphysiol.1959.sp006308. 43

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015. 76, 77

Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer networks. In Advances in Neural Information Processing Systems, pages 2017–2025, 2015. 63

Anil K. Jain, Jianchang Mao, and K. Moidin Mohiuddin. Artificial neural networks: A tutorial. Computer, 29(3):31–44, 1996. DOI: 10.1109/2.485891. 39

Katarzyna Janocha and Wojciech Marian Czarnecki. On loss functions for deep neural networks in classification. arXiv preprint arXiv:1702.05659, 2017. DOI: 10.4467/20838476si.16.004.6185. 68

Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. Aggregating local descriptors into a compact image representation. In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, pages 3304–3311, 2010. DOI: 10.1109/cvpr.2010.5540039. 63

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. Large-scale video classification with convolutional neural networks. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014. DOI: 10.1109/cvpr.2014.223. 150, 152

Salman H. Khan. Feature learning and structured prediction for scene understanding. Ph.D. Thesis, University of Western Australia, 2016. 135

Salman H. Khan, Mohammed Bennamoun, Ferdous Sohel, and Roberto Togneri. Automatic shadow detection and removal from a single image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3):431–446, 2016a. DOI: 10.1109/tpami.2015.2462355. 141

Salman H. Khan, Munawar Hayat, Mohammed Bennamoun, Roberto Togneri, and Ferdous A. Sohel. A discriminative representation of convolutional features for indoor scene recognition. IEEE Transactions on Image Processing, 25(7):3372–3383, 2016b. DOI: 10.1109/tip.2016.2567076. 72, 95

Salman H. Khan, Munawar Hayat, Mohammed Bennamoun, Ferdous A. Sohel, and Roberto Togneri. Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 2017a. DOI: 10.1109/tnnls.2017.2732482. 169

Salman H. Khan, Munawar Hayat, and Fatih Porikli. Scene categorization with spectral features. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5638–5648, 2017b. DOI: 10.1109/iccv.2017.601. 94

Salman H. Khan, Xuming He, Fatih Porikli, Mohammed Bennamoun, Ferdous Sohel, and Roberto Togneri. Learning deep structured network for weakly supervised change detection. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1–7, 2017c. DOI: 10.24963/ijcai.2017/279. 141

Salman Hameed Khan, Mohammed Bennamoun, Ferdous Sohel, and Roberto Togneri. Automatic feature learning for robust shadow detection. In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, pages 1939–1946, 2014. DOI: 10.1109/cvpr.2014.249. 93

Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 85, 86

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016. 170

Philipp Krähenbühl and Vladlen Koltun. Efficient inference in fully connected CRFS with Gaussian edge potentials. In Advances in Neural Information Processing Systems 24, pages 109–117, 2011. 132, 135

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105, 2012. DOI: 10.1145/3065386. 45, 74, 102, 117, 123, 140, 150, 162

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648, 2016. 112, 113

Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 2:2169–2178, 2006. DOI: 10.1109/cvpr.2006.68. 61

Yann LeCun, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne Hubbard, and Lawrence D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, 1989. DOI: 10.1162/neco.1989.1.4.541. 43

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11):2278–2324, 1998. DOI: 10.1109/5.726791. 101, 102

Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photorealistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802, 2016. DOI: 10.1109/cvpr.2017.19. 141, 145, 147, 148

Stefan Leutenegger, Margarita Chli, and Roland Y. Siegwart. BRISK: Binary robust invariant scalable keypoints. In Proc. of the International Conference on Computer Vision, pages 2548–2555, 2011. DOI: 10.1109/iccv.2011.6126542. 14

Li-Jia Li, Richard Socher, and Li Fei-Fei. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In Computer Vision and Pattern Recognition, (CVPR). IEEE Conference on, pages 2036–2043, 2009. DOI: 10.1109/cvpr.2009.5206718. 135

Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv preprint arXiv:1312.4400, 2013. 56, 103

Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015. DOI: 10.1109/cvpr.2015.7298965. 127, 128, 130

David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision, 60(2):91–110, 2004. DOI: 10.1023/b:visi.0000029664.99615.94. 7, 11, 14, 16, 17, 19, 29

A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. In Computer Vision and Pattern Recognition, (CVPR). IEEE Computer Society Conference on, pages 5188–5196, 2015. DOI: 10.1109/cvpr.2015.7299155. 97, 99, 170

A. Mahendran and A. Vedaldi. Visualizing deep convolutional neural networks using natural pre-images. International Journal on Computer Vision, 120(3):233–255, 2016. DOI: 10.1007/s11263-016-0911-8. 170

Michael Mathieu, Mikael Henaff, and Yann LeCun. Fast training of convolutional networks through FFTs. In International Conference on Learning Representations (ICLR2014), 2014. 162

Warren S. McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4):115–133, 1943. DOI: 10.1007/bf02478259. 40

Dmytro Mishkin and Jiri Matas. All you need is a good INIT. arXiv preprint arXiv:1511.06422, 2015. 71

B. Triggs and N. Dalal. Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1063–6919, 2005. DOI: 10.1109/CVPR.2005.177. 7, 11, 14, 15, 29

Yurii Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o (1/k2). In Doklady an SSSR, 269:543–547, 1983. 82

Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. Learning deconvolution network for semantic segmentation. In Proc. of the IEEE International Conference on Computer Vision, pages 1520–1528, 2015. DOI: 10.1109/iccv.2015.178. 130

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proc. of the ACM on Asia Conference on Computer and Communications Security, (ASIA CCS’17), pages 506–519, 2017. DOI: 10.1145/3052973.3053009. 170

Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, and Yoshua Bengio. On the saddle point problem for non-convex optimization. arXiv preprint arXiv:1405.4604, 2014. 81

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. Pointnet: Deep learning on point sets for 3D classification and segmentation. arXiv preprint arXiv:1612.00593, 2016. DOI: 10.1109/cvpr.2017.16. 117, 118, 119

J. R. Quinlan. Induction of decision trees. Machine Learning, pages 81–106, 1986. DOI: 10.1007/bf00116251. 7, 11, 22, 26, 29

Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015. 141, 145, 149

H. Rahmani, A. Mahmood, D. Q. Huynh, and A. Mian. HOPC: Histogram of oriented principal components of 3D pointclouds for action recognition. In 13th European Conference on Computer Vision, pages 742–757, 2014. DOI: 10.1007/978-3-319-10605-2_48. 13

Hossein Rahmani and Mohammed Bennamoun. Learning action recognition model from depth and skeleton videos. In The IEEE International Conference on Computer Vision (ICCV), 2017. DOI: 10.1109/iccv.2017.621. 150

Hossein Rahmani and Ajmal Mian. 3D action recognition from novel viewpoints. In Computer Vision and Pattern Recognition, (CVPR). IEEE Computer Society Conference on, pages 1506–1515, 2016. DOI: 10.1109/cvpr.2016.167. 74, 150

Hossein Rahmani, Ajmal Mian, and Mubarak Shah. Learning a deep model for human action recognition from novel viewpoints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. DOI: 10.1109/tpami.2017.2691768. 74, 150

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91–99, 2015. DOI: 10.1109/tpami.2016.2577031. 61, 123

Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. ORB: An efficient alternative to SIFT or SURF. In Proc. of the International Conference on Computer Vision, pages 2564–2571, 2011. DOI: 10.1109/iccv.2011.6126544. 14

Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016. 81

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning internal representations by error propagation. Technical report, DTIC Document, 1985. DOI: 10.1016/b978-1-4832-1446-7.50035-2. 34

Andrew M. Saxe, James L. McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120, 2013. 70

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 815–823, 2015. DOI: 10.1109/cvpr.2015.7298682. 68

Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. CNN features off-the-shelf: An astounding baseline for recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 806–813, 2014. DOI: 10.1109/cvprw.2014.131. 72

J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In Computer Vision and Pattern Recognition, (CVPR). IEEE Computer Society Conference on, pages 1297–1304, 2011. DOI: 10.1145/2398356.2398381. 26

Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, and Russ Webb. Learning from simulated and unsupervised images through adversarial training. arXiv preprint arXiv:1612.07828, 2016. DOI: 10.1109/cvpr.2017.241. 74

Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. In Proc. of the 27th International Conference on Neural Information Processing Systems—Volume 1, (NIPS’14), pages 568–576, 2014a. 150, 152, 153

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014b. 50, 70, 104, 123

Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015. 146, 147

Shuran Song and Jianxiong Xiao. Deep sliding shapes for a modal 3D object detection in RGB-D images. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 808–816, 2016. DOI: 10.1109/cvpr.2016.94. 136

Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao. Sun RGB-D: A RGB-D scene understanding benchmark suite. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 567–576, 2015. DOI: 10.1109/cvpr.2015.7298655. 136

Jost Tobias Springenberg. Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv preprint arXiv:1511.06390, 2015. 146

Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014. 75, 79, 102

Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015. 108

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015. DOI: 10.1109/cvpr.2015.7298594. 105, 106, 107

Yichuan Tang. Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239, 2013. 67

Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4(2), 2012. 85

Jasper R. R. Uijlings, Koen E. A. Van De Sande, Theo Gevers, and Arnold W. M. Smeulders. Selective search for object recognition. International Journal of Computer Vision, 104(2):154–171, 2013. DOI: 10.1007/s11263-013-0620-5. 61, 120, 122

Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al. Conditional image generation with pixel CNN decoders. In Advances in Neural Information Processing Systems, pages 4790–4798, 2016. 141

Li Wan, Matthew Zeiler, Sixin Zhang, Yann L. Cun, and Rob Fergus. Regularization of neural networks using dropconnect. In Proc. of the 30th International Conference on Machine Learning (ICML’13), pages 1058–1066, 2013. 75

Heng Wang, A. Klaser, C. Schmid, and Cheng-Lin Liu. Action recognition by dense trajectories. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR’11), pages 3169–3176, 2011a. DOI: 10.1109/cvpr.2011.5995407. 154

Zhenhua Wang, Bin Fan, and Fuchao Wu. Local intensity order pattern for feature description. In Proc. of the International Conference on Computer Vision, pages 1550–5499, 2011b. DOI: 10.1109/iccv.2011.6126294. 14

Jason Weston, Chris Watkins, et al. Support vector machines for multi-class pattern recognition. In ESANN, 99:219–224, 1999. 67

Bernard Widrow, Marcian E. Hoff, et al. Adaptive switching circuits. In IRE WESCON Convention Record, 4:96–104, New York, 1960. DOI: 10.21236/ad0241531. 33

Jason Yosinski, Jeff Clune, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. In In ICML Workshop on Deep Learning, Citeseer, 2015. 95, 98

Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015. 50, 141

Sergey Zagoruyko and Nikos Komodakis. Learning to compare image patches via convolutional neural networks. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4353–4361, 2015. DOI: 10.1109/cvpr.2015.7299064. 141

Matthew D. Zeiler. Adadelta: An adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012. 84

Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European Conference on Computer Vision, pages 818–833, Springer, 2014. DOI: 10.1007/978-3-319-10590-1_53. 44, 94, 95, 97, 170

Yinda Zhang, Mingru Bai, Pushmeet Kohli, Shahram Izadi, and Jianxiong Xiao. Deepcontext: Context-encoding neural pathways for 3D holistic scene understanding. arXiv preprint arXiv:1603.04922, 2016. DOI: 10.1109/iccv.2017.135. 135, 139

Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. Loss functions for neural networks for image processing. arXiv preprint arXiv:1511.08861, 2015. 67, 68

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–7, 2017. DOI: 10.1109/cvpr.2017.660. 141

Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip H. S. Torr. Conditional random fields as recurrent neural networks. In Proc. of the IEEE International Conference on Computer Vision, pages 1529–1537, 2015. DOI: 10.1109/iccv.2015.179. 141

C. Lawrence Zitnick and Piotr Dollár. Edge boxes: Locating object proposals from edges. In European Conference on Computer Vision, pages 391–405, Springer, 2014. DOI: 10.1007/978-3-319-10602-1_26. 61, 132

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.