Vaibhav Verdhan

Computer Vision Using Deep Learning

Neural Network Architectures with Python and Keras

1st ed.
Vaibhav Verdhan
Limerick, Ireland
ISBN 978-1-4842-6615-1e-ISBN 978-1-4842-6616-8
Apress standard
© Vaibhav Verdhan 2021
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Distributed to the book trade worldwide by Springer Science+Business Media New York, 1 NY Plazar, New York, NY 10014. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

To Yashi, Pakhi and Rudra

Foreword

Computer Vision, not too long ago the exclusive purview of science fiction, is quickly becoming commonplace across industries, if not in society at large. The progress in the field to emulate human vision, that most prized of human senses, is nothing but astonishing. It was only 1957 when Russell Kirsch scanned the world’s first photograph, a black and white image of his boy1. By the late 1980s, the work of Sirovich and Kirby2 helped establish face recognition as a viable technology for biometric applications. Facebook made the technology ubiquitous, notwithstanding privacy concerns and legal challenges action3, when in 2010, it incorporated face recognition in its social media platform.

The capabilities of Deep Learning vision systems to interpret and extract information from images permeates all aspects of society. Only the most skeptical among us doubt a not too distant future with self-driving cars outnumbering those driven by their human counterparts or computer-aided diagnosis (CADx) of medical images becoming an ordinary service supplied by medical providers. Computer vision applications already control access to our mobile devices and can outperform human inspectors in the tedious but critical task of inspecting for defects in all types of manufacturing processes. That is how I met Vaibhav, or V, as he is known to his friends and colleagues. Collaborating on methods to improve existing computer vision systems to ensure defect-free products critical for human vision. Not lost is an appreciation of the circular history. We teach computers how to see; they help manufacture products vital to improve and care for human vision.

In this book, V takes a practical and convenient approach to the subject. The abundant use of case studies is facilitated by ready-to-use Python code and links to datasets and other tools. The practitioner’s learning experience is enhanced by access to the resources needed to work in a step-by-step fashion through each case study. The book organizes the subject into three parts. In chapters 1 through 4, V describes the nature of Neural Networks and demystifies how they learn. Along the way, he points out different architectures and their historical significance. The practitioner gets to experience, with all required resources in hand, the elegant simplicity of LeNet, the improved efficiency of AlexNet, and the popular VGG Net. In chapters 5 through 7, the practitioner applies simple yet powerful computer vision applications such as training systems to detect objects and recognize human faces. When progressing into performing video analytics, we encounter the nagging problem of vanishing and exploding gradients and how to overcome it using skip connections in the ResNet architecture. Finally, in chapter 8, we review the complete model development process, starting with a correctly defined business problem and systematically advancing until the model is deployed and maintain in a production environment.

We are now just starting to see the dramatic increase in complexity and impact of tasks performed by computer systems that match and often exceed what until recently, would be considered exclusively human vision capabilities. Those aspiring to make this technology their ally, grow more adept at incorporating vision systems into their practice, and become a more skillful practitioner will greatly gain from the tools, techniques, and methods presented in this book.

David O. Ramos

Jacksonville, FL

16 December 2020

Introduction

Innovation distinguishes between a leader and a follower.

—Steve Jobs

How good is your driving? Will you drive better than an autonomous driving system? Or do you think an algorithm will perform better than a specialist in classifying medical images? It can be a tricky question. But artificial intelligence has outperformed doctors in detecting lung cancer and breast cancer by analyzing images! Ouch!

Nature has been very kind to grant us powers of sight, taste, smell, touch, and hearing. Out of these senses, the power of sight allows us to appreciate the beauty of the world, enjoy the colors, recognize the faces of our family and loved ones, and above all relish this beautiful world and life. With time, humans amplified the power of the brain and made path-breaking inventions and discoveries. The wheel or airplane, printing press or clock, light bulb or personal computers – innovations have changed the way we live, work, travel, decide, and progress. These innovations make life simpler, easier, and far enjoyable and safe.

Data science and Deep Learning are allowing us to further enhance the innovative buckets. Using Deep Learning, we are able to replicate the power of vision given by nature. The computers are being trained to perform the same tasks done by a human being. It can be detection of colors or shape or size, classifying between a cat or a dog or a horse, or driving on a road – the use cases are many. The solutions are applicable for all the sectors like retail, manufacturing, BFSI, agriculture, security, transport, pharmaceuticals, and so on.

This book is an attempt to explain the concepts of Deep Learning and Neural Network for computer vision problems. We are examining convolutional Neural Networks in detail, and their various components and attributes. We are exploring various Neural Network architectures like LeNet, AlexNet, VGG, R-CNN, Fast R-CNN, Faster R-CNN, SSD, YOLO, ResNet, Inception, DeepFace, and FaceNet in detail. We are also developing pragmatic solutions to tackle use cases of binary image classification, multiclass image classification, object detection, face recognition, and video analytics. We will use Python and Keras for the solutions. All the codes and datasets are checked into the GitHub repo for quick access. In the final chapter, we are studying all the steps in a Deep Learning project – right from defining the business problem to deployment. We are also dealing with major errors and issues faced while developing the solutions. Throughout the book, we are providing tips and tricks for training better algorithms, reducing the training time, monitoring the results, and improving the solution. We are also sharing prominent research papers and datasets which you should use to gain further knowledge.

The book is suitable for researchers and students who want to explore and implement computer vision solutions using Deep Learning. It is highly useful and hence recommended for professionals who intend to explore cutting-edge technology, grasp the advanced concepts, develop a thorough understanding of Deep Learning architectures, and get the best practices and solutions to common computer vision challenges. It is directed toward the business leaders who wish to implement Deep Learning solutions in their business and gain confidence while they communicate with their teams and clientele. Above all, for a curious person trying to explore how Deep Learning algorithms work for solving computer vision problems and would like to try Python.

I would like to thank Apress, Aaron, Jessica, and Vishwesh for believing in me and giving me the chance to work on this subject. And a special word of thanks to my family – Yashi, Pakhi, and Rudra – for the excellent support without which it would be impossible to complete this work.

Vaibhav Verdhan, November 2020, Limerick

Acknowledgments

I would like to express my thanks to the following people. It is the results of their hard work and passion that are advancing this field:

Ross Girshick

Jeff Donahue

Trevor Darrell

Jitendra Malik

Shaoqing Ren

Kaiming He

Jian Sun

Christian Szegedy

Wei Liu

Yangqing Jia

Pierre Sermanet

Scott Reed

Dragomir Anguelov

Dumitru Erhan

Vincent Vanhoucke

Andrew Rabinovich

Sergey Ioffe

Jonathon Shlens

Xiangyu Zhang

Omkar M. Parkhi

Andrea Vedaldi

Andrew Zisserman

Yaniv Taigman

Ming Yang

Marc’Aurelio Ranzato

Lior Wolf

Yann LeCun

Leon Bottou

Yoshua Bengio

Patrick Haffner

Sefik Ilkin Serengil

Table of Contents
Index 303
About the Author
Vaibhav Verdhan
../images/496201_1_En_BookFrontmatter_Figb_HTML.jpg
is a seasoned data science professional with rich experience spanning across geographies and domains. He is a hands-on technical expert and has led multiple engagements in machine learning and artificial intelligence. He is a leading industry expert, is a regular speaker at conferences and meetups, and mentors students and professionals. Currently, he resides in Ireland and is working as a Principal Data Scientist.
 
About the Technical Reviewer
Vishwesh Ravi Shrimali

graduated from BITS Pilani in 2018, where he studied mechanical engineering. Since then, he has worked with Big Vision LLC on Deep Learning and computer vision and was involved in creating official OpenCV AI courses. Currently, he is working at Mercedes-Benz Research and Development India Pvt. Ltd. He has a keen interest in programming and AI and has applied that interest in mechanical engineering projects. He has also written multiple blogs on OpenCV and Deep Learning on Learn OpenCV, a leading blog on computer vision. He has also coauthored Machine learning for OpenCV 4 (second edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.

 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.191.22