Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Vaibhav Verdhan

Computer Vision Using Deep Learning

Neural Network Architectures with Python and Keras

1st ed.

../images/496201_1_En_BookFrontmatter_Figa_HTML.png

Vaibhav Verdhan

Limerick, Ireland

ISBN 978-1-4842-6615-1e-ISBN 978-1-4842-6616-8

https://doi.org/10.1007/978-1-4842-6616-8

Apress standard

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Distributed to the book trade worldwide by Springer Science+Business Media New York, 1 NY Plazar, New York, NY 10014. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

To Yashi, Pakhi and Rudra

Foreword

Computer Vision, not too long ago the exclusive purview of science fiction, is quickly becoming commonplace across industries, if not in society at large. The progress in the field to emulate human vision, that most prized of human senses, is nothing but astonishing. It was only 1957 when Russell Kirsch scanned the world’s first photograph, a black and white image of his boy¹. By the late 1980s, the work of Sirovich and Kirby² helped establish face recognition as a viable technology for biometric applications. Facebook made the technology ubiquitous, notwithstanding privacy concerns and legal challenges action³, when in 2010, it incorporated face recognition in its social media platform.

The capabilities of Deep Learning vision systems to interpret and extract information from images permeates all aspects of society. Only the most skeptical among us doubt a not too distant future with self-driving cars outnumbering those driven by their human counterparts or computer-aided diagnosis (CADx) of medical images becoming an ordinary service supplied by medical providers. Computer vision applications already control access to our mobile devices and can outperform human inspectors in the tedious but critical task of inspecting for defects in all types of manufacturing processes. That is how I met Vaibhav, or V, as he is known to his friends and colleagues. Collaborating on methods to improve existing computer vision systems to ensure defect-free products critical for human vision. Not lost is an appreciation of the circular history. We teach computers how to see; they help manufacture products vital to improve and care for human vision.

In this book, V takes a practical and convenient approach to the subject. The abundant use of case studies is facilitated by ready-to-use Python code and links to datasets and other tools. The practitioner’s learning experience is enhanced by access to the resources needed to work in a step-by-step fashion through each case study. The book organizes the subject into three parts. In chapters 1 through 4, V describes the nature of Neural Networks and demystifies how they learn. Along the way, he points out different architectures and their historical significance. The practitioner gets to experience, with all required resources in hand, the elegant simplicity of LeNet, the improved efficiency of AlexNet, and the popular VGG Net. In chapters 5 through 7, the practitioner applies simple yet powerful computer vision applications such as training systems to detect objects and recognize human faces. When progressing into performing video analytics, we encounter the nagging problem of vanishing and exploding gradients and how to overcome it using skip connections in the ResNet architecture. Finally, in chapter 8, we review the complete model development process, starting with a correctly defined business problem and systematically advancing until the model is deployed and maintain in a production environment.

We are now just starting to see the dramatic increase in complexity and impact of tasks performed by computer systems that match and often exceed what until recently, would be considered exclusively human vision capabilities. Those aspiring to make this technology their ally, grow more adept at incorporating vision systems into their practice, and become a more skillful practitioner will greatly gain from the tools, techniques, and methods presented in this book.

David O. Ramos

Jacksonville, FL

16 December 2020

Introduction

Innovation distinguishes between a leader and a follower.

—Steve Jobs

How good is your driving? Will you drive better than an autonomous driving system? Or do you think an algorithm will perform better than a specialist in classifying medical images? It can be a tricky question. But artificial intelligence has outperformed doctors in detecting lung cancer and breast cancer by analyzing images! Ouch!

Nature has been very kind to grant us powers of sight, taste, smell, touch, and hearing. Out of these senses, the power of sight allows us to appreciate the beauty of the world, enjoy the colors, recognize the faces of our family and loved ones, and above all relish this beautiful world and life. With time, humans amplified the power of the brain and made path-breaking inventions and discoveries. The wheel or airplane, printing press or clock, light bulb or personal computers – innovations have changed the way we live, work, travel, decide, and progress. These innovations make life simpler, easier, and far enjoyable and safe.

Data science and Deep Learning are allowing us to further enhance the innovative buckets. Using Deep Learning, we are able to replicate the power of vision given by nature. The computers are being trained to perform the same tasks done by a human being. It can be detection of colors or shape or size, classifying between a cat or a dog or a horse, or driving on a road – the use cases are many. The solutions are applicable for all the sectors like retail, manufacturing, BFSI, agriculture, security, transport, pharmaceuticals, and so on.

This book is an attempt to explain the concepts of Deep Learning and Neural Network for computer vision problems. We are examining convolutional Neural Networks in detail, and their various components and attributes. We are exploring various Neural Network architectures like LeNet, AlexNet, VGG, R-CNN, Fast R-CNN, Faster R-CNN, SSD, YOLO, ResNet, Inception, DeepFace, and FaceNet in detail. We are also developing pragmatic solutions to tackle use cases of binary image classification, multiclass image classification, object detection, face recognition, and video analytics. We will use Python and Keras for the solutions. All the codes and datasets are checked into the GitHub repo for quick access. In the final chapter, we are studying all the steps in a Deep Learning project – right from defining the business problem to deployment. We are also dealing with major errors and issues faced while developing the solutions. Throughout the book, we are providing tips and tricks for training better algorithms, reducing the training time, monitoring the results, and improving the solution. We are also sharing prominent research papers and datasets which you should use to gain further knowledge.

The book is suitable for researchers and students who want to explore and implement computer vision solutions using Deep Learning. It is highly useful and hence recommended for professionals who intend to explore cutting-edge technology, grasp the advanced concepts, develop a thorough understanding of Deep Learning architectures, and get the best practices and solutions to common computer vision challenges. It is directed toward the business leaders who wish to implement Deep Learning solutions in their business and gain confidence while they communicate with their teams and clientele. Above all, for a curious person trying to explore how Deep Learning algorithms work for solving computer vision problems and would like to try Python.

I would like to thank Apress, Aaron, Jessica, and Vishwesh for believing in me and giving me the chance to work on this subject. And a special word of thanks to my family – Yashi, Pakhi, and Rudra – for the excellent support without which it would be impossible to complete this work.

Vaibhav Verdhan, November 2020, Limerick

Acknowledgments

I would like to express my thanks to the following people. It is the results of their hard work and passion that are advancing this field:

Ross Girshick

Jeff Donahue

Trevor Darrell

Jitendra Malik

Shaoqing Ren

Kaiming He

Jian Sun

Christian Szegedy

Wei Liu

Yangqing Jia

Pierre Sermanet

Scott Reed

Dragomir Anguelov

Dumitru Erhan

Vincent Vanhoucke

Andrew Rabinovich

Sergey Ioffe

Jonathon Shlens

Xiangyu Zhang

Omkar M. Parkhi

Andrea Vedaldi

Andrew Zisserman

Yaniv Taigman

Ming Yang

Marc’Aurelio Ranzato

Lior Wolf

Yann LeCun

Leon Bottou

Yoshua Bengio

Patrick Haffner

Sefik Ilkin Serengil

Table of Contents

Chapter 1: Introduction to Computer Vision and Deep Learning 1

1.1 Technical requirements 2

1.2 Image Processing using OpenCV 3

1.2.1 Color detection using OpenCV 4

1.3 Shape detection using OpenCV 6

1.3.1 Face detection using OpenCV 9

1.4 Fundamentals of Deep Learning 12

1.4.1 The motivation behind Neural Network 14

1.4.2 Layers in a Neural Network 15

1.4.3 Neuron 16

1.4.4 Hyperparameters 17

1.4.5 Connections and weight of ANN 18

1.4.6 Bias term 18

1.4.7 Activation functions 19

1.4.8 Learning rate 25

1.4.9 Backpropagation 26

1.4.10 Overfitting 28

1.4.11 Gradient descent 29

1.4.12 Loss functions 31

1.5 How Deep Learning works? 32

1.6 Summary 38

1.6.1 Further readings 39

Chapter 2: Nuts and Bolts of Deep Learning for Computer Vision 41

2.1 Technical requirements 42

2.2 Deep Learning using TensorFlow and Keras 42

2.3 What is a tensor? 43

2.3.1 What is a Convolutional Neural Network? 45

2.3.2 What is convolution? 46

2.3.3 What is a Pooling Layer? 51

2.3.4 What is a Fully Connected Layer? 52

2.4 Developing a DL solution using CNN 53

2.5 Summary 64

2.5.1 Further readings 66

Chapter 3: Image Classification Using LeNet 67

3.1 Technical requirements 68

3.2 Deep Learning architectures 68

3.3 LeNet architecture 69

3.4 LeNet-1 architecture 70

3.5 LeNet-4 architecture 71

3.6 LeNet-5 architecture 72

3.7 Boosted LeNet-4 architecture 75

3.8 Creating image classification models using LeNet 76

3.9 MNIST classification using LeNet 77

3.10 German traffic sign identification using LeNet 84

3.11 Summary 100

3.11.1 Further readings 101

Chapter 4: VGGNet and AlexNet Networks 103

4.1 Technical requirements 104

4.2 AlexNet and VGG Neural Networks 104

4.3 What is AlexNet Neural Network? 105

4.4 What is VGG Neural Network? 107

4.5 VGG16 architecture 107

4.6 Difference between VGG16 and VGG19 110

4.7 Developing solutions using AlexNet and VGG 111

4.8 Working on CIFAR-10 using AlexNet 113

4.9 Working on CIFAR-10 using VGG 128

4.10 Comparing AlexNet and VGG 136

4.11 Working with CIFAR-100 137

4.12 Summary 138

4.12.1 Further readings 139

Chapter 5: Object Detection Using Deep Learning 141

5.1 Technical requirements 142

5.2 Object Detection 142

5.2.1 Object classification vs. object localization vs. object detection 143

5.2.2 Use cases of Object Detection 144

5.3 Object Detection methods 146

5.4 Deep Learning frameworks for Object Detection 147

5.4.1 Sliding window approach for Object Detection 148

5.5 Bounding box approach 150

5.6 Intersection over Union (IoU) 152

5.7 Non-max suppression 154

5.8 Anchor boxes 155

5.9 Deep Learning architectures 157

5.9.1 Region-based CNN (R-CNN) 157

5.10 Fast R-CNN 160

5.11 Faster R-CNN 162

5.12 You Only Look Once (YOLO) 165

5.12.1 Salient features of YOLO 166

5.12.2 Loss function in YOLO 167

5.12.3 YOLO architecture 169

5.13 Single Shot MultiBox Detector (SSD) 172

5.14 Transfer Learning 177

5.15 Python implementation 179

5.16 Summary 182

5.16.1 Further readings 184

Chapter 6: Face Recognition and Gesture Recognition 187

6.1 Technical toolkit 188

6.2 Face recognition 188

6.2.1 Applications of face recognition 190

6.2.2 Process of face recognition 192

6.2.3 DeepFace solution by Facebook 194

6.2.4 FaceNet for face recognition 199

6.2.5 Python implementation using FaceNet 206

6.2.6 Python solution for gesture recognition 208

6.3 Summary 217

6.3.1 Further readings 219

Chapter 7: Video Analytics Using Deep Learning 221

7.1 Technical toolkit 222

7.2 Video processing 222

7.3 Use cases of video analytics 223

7.4 Vanishing gradient and exploding gradient problem 225

7.5 ResNet architecture 230

7.5.1 ResNet and skip connection 230

7.5.2 Inception network 234

7.5.3 GoogLeNet architecture 237

7.5.4 Improvements in Inception v2 239

7.6 Video analytics 243

7.7 Python solution using ResNet and Inception v3 244

7.8 Summary 254

7.8.1 Further readings 255

Chapter 8: End-to-End Model Development 257

8.1 Technical requirements 258

8.2 Deep Learning project requirements 258

8.3 Deep Learning project process 262

8.4 Business problem definition 263

8.4.1 Face detection for surveillance 265

8.4.2 Source data or data discovery phase 268

8.5 Data ingestion or data management 270

8.6 Data preparation and augmentation 272

8.6.1 Image augmentation 274

8.7 Deep Learning modeling process 279

8.7.1 Transfer learning 282

8.7.2 Common mistakes/challenges and boosting performance 284

8.8 Model deployment and maintenance 289

8.9 Summary 294

8.9.1 Further readings 296

References 297

Major activation functions and layers used in CNN 297

Google Colab 298

Index 303

About the Author

Vaibhav Verdhan

../images/496201_1_En_BookFrontmatter_Figb_HTML.jpg

is a seasoned data science professional with rich experience spanning across geographies and domains. He is a hands-on technical expert and has led multiple engagements in machine learning and artificial intelligence. He is a leading industry expert, is a regular speaker at conferences and meetups, and mentors students and professionals. Currently, he resides in Ireland and is working as a Principal Data Scientist.

About the Technical Reviewer

Vishwesh Ravi Shrimali

graduated from BITS Pilani in 2018, where he studied mechanical engineering. Since then, he has worked with Big Vision LLC on Deep Learning and computer vision and was involved in creating official OpenCV AI courses. Currently, he is working at Mercedes-Benz Research and Development India Pvt. Ltd. He has a keen interest in programming and AI and has applied that interest in mechanical engineering projects. He has also written multiple blogs on OpenCV and Deep Learning on Learn OpenCV, a leading blog on computer vision. He has also coauthored Machine learning for OpenCV 4 (second edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Front Matter

Create new playlist

Sign In

Sign Up

Computer Vision Using Deep Learning

Neural Network Architectures with Python and Keras

Table of Contents for
Front Matter