To Yashi, Pakhi and Rudra
Computer Vision, not too long ago the exclusive purview of science fiction, is quickly becoming commonplace across industries, if not in society at large. The progress in the field to emulate human vision, that most prized of human senses, is nothing but astonishing. It was only 1957 when Russell Kirsch scanned the world’s first photograph, a black and white image of his boy1. By the late 1980s, the work of Sirovich and Kirby2 helped establish face recognition as a viable technology for biometric applications. Facebook made the technology ubiquitous, notwithstanding privacy concerns and legal challenges action3, when in 2010, it incorporated face recognition in its social media platform.
The capabilities of Deep Learning vision systems to interpret and extract information from images permeates all aspects of society. Only the most skeptical among us doubt a not too distant future with self-driving cars outnumbering those driven by their human counterparts or computer-aided diagnosis (CADx) of medical images becoming an ordinary service supplied by medical providers. Computer vision applications already control access to our mobile devices and can outperform human inspectors in the tedious but critical task of inspecting for defects in all types of manufacturing processes. That is how I met Vaibhav, or V, as he is known to his friends and colleagues. Collaborating on methods to improve existing computer vision systems to ensure defect-free products critical for human vision. Not lost is an appreciation of the circular history. We teach computers how to see; they help manufacture products vital to improve and care for human vision.
In this book, V takes a practical and convenient approach to the subject. The abundant use of case studies is facilitated by ready-to-use Python code and links to datasets and other tools. The practitioner’s learning experience is enhanced by access to the resources needed to work in a step-by-step fashion through each case study. The book organizes the subject into three parts. In chapters 1 through 4, V describes the nature of Neural Networks and demystifies how they learn. Along the way, he points out different architectures and their historical significance. The practitioner gets to experience, with all required resources in hand, the elegant simplicity of LeNet, the improved efficiency of AlexNet, and the popular VGG Net. In chapters 5 through 7, the practitioner applies simple yet powerful computer vision applications such as training systems to detect objects and recognize human faces. When progressing into performing video analytics, we encounter the nagging problem of vanishing and exploding gradients and how to overcome it using skip connections in the ResNet architecture. Finally, in chapter 8, we review the complete model development process, starting with a correctly defined business problem and systematically advancing until the model is deployed and maintain in a production environment.
We are now just starting to see the dramatic increase in complexity and impact of tasks performed by computer systems that match and often exceed what until recently, would be considered exclusively human vision capabilities. Those aspiring to make this technology their ally, grow more adept at incorporating vision systems into their practice, and become a more skillful practitioner will greatly gain from the tools, techniques, and methods presented in this book.
David O. Ramos
Jacksonville, FL
16 December 2020
Innovation distinguishes between a leader and a follower.
—Steve Jobs
How good is your driving? Will you drive better than an autonomous driving system? Or do you think an algorithm will perform better than a specialist in classifying medical images? It can be a tricky question. But artificial intelligence has outperformed doctors in detecting lung cancer and breast cancer by analyzing images! Ouch!
Nature has been very kind to grant us powers of sight, taste, smell, touch, and hearing. Out of these senses, the power of sight allows us to appreciate the beauty of the world, enjoy the colors, recognize the faces of our family and loved ones, and above all relish this beautiful world and life. With time, humans amplified the power of the brain and made path-breaking inventions and discoveries. The wheel or airplane, printing press or clock, light bulb or personal computers – innovations have changed the way we live, work, travel, decide, and progress. These innovations make life simpler, easier, and far enjoyable and safe.
Data science and Deep Learning are allowing us to further enhance the innovative buckets. Using Deep Learning, we are able to replicate the power of vision given by nature. The computers are being trained to perform the same tasks done by a human being. It can be detection of colors or shape or size, classifying between a cat or a dog or a horse, or driving on a road – the use cases are many. The solutions are applicable for all the sectors like retail, manufacturing, BFSI, agriculture, security, transport, pharmaceuticals, and so on.
This book is an attempt to explain the concepts of Deep Learning and Neural Network for computer vision problems. We are examining convolutional Neural Networks in detail, and their various components and attributes. We are exploring various Neural Network architectures like LeNet, AlexNet, VGG, R-CNN, Fast R-CNN, Faster R-CNN, SSD, YOLO, ResNet, Inception, DeepFace, and FaceNet in detail. We are also developing pragmatic solutions to tackle use cases of binary image classification, multiclass image classification, object detection, face recognition, and video analytics. We will use Python and Keras for the solutions. All the codes and datasets are checked into the GitHub repo for quick access. In the final chapter, we are studying all the steps in a Deep Learning project – right from defining the business problem to deployment. We are also dealing with major errors and issues faced while developing the solutions. Throughout the book, we are providing tips and tricks for training better algorithms, reducing the training time, monitoring the results, and improving the solution. We are also sharing prominent research papers and datasets which you should use to gain further knowledge.
The book is suitable for researchers and students who want to explore and implement computer vision solutions using Deep Learning. It is highly useful and hence recommended for professionals who intend to explore cutting-edge technology, grasp the advanced concepts, develop a thorough understanding of Deep Learning architectures, and get the best practices and solutions to common computer vision challenges. It is directed toward the business leaders who wish to implement Deep Learning solutions in their business and gain confidence while they communicate with their teams and clientele. Above all, for a curious person trying to explore how Deep Learning algorithms work for solving computer vision problems and would like to try Python.
I would like to thank Apress, Aaron, Jessica, and Vishwesh for believing in me and giving me the chance to work on this subject. And a special word of thanks to my family – Yashi, Pakhi, and Rudra – for the excellent support without which it would be impossible to complete this work.
Vaibhav Verdhan, November 2020, Limerick
I would like to express my thanks to the following people. It is the results of their hard work and passion that are advancing this field:
Ross Girshick
Jeff Donahue
Trevor Darrell
Jitendra Malik
Shaoqing Ren
Kaiming He
Jian Sun
Christian Szegedy
Wei Liu
Yangqing Jia
Pierre Sermanet
Scott Reed
Dragomir Anguelov
Dumitru Erhan
Vincent Vanhoucke
Andrew Rabinovich
Sergey Ioffe
Jonathon Shlens
Xiangyu Zhang
Omkar M. Parkhi
Andrea Vedaldi
Andrew Zisserman
Yaniv Taigman
Ming Yang
Marc’Aurelio Ranzato
Lior Wolf
Yann LeCun
Leon Bottou
Yoshua Bengio
Patrick Haffner
Sefik Ilkin Serengil
graduated from BITS Pilani in 2018, where he studied mechanical engineering. Since then, he has worked with Big Vision LLC on Deep Learning and computer vision and was involved in creating official OpenCV AI courses. Currently, he is working at Mercedes-Benz Research and Development India Pvt. Ltd. He has a keen interest in programming and AI and has applied that interest in mechanical engineering projects. He has also written multiple blogs on OpenCV and Deep Learning on Learn OpenCV, a leading blog on computer vision. He has also coauthored Machine learning for OpenCV 4 (second edition) by Packt. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar.
3.145.191.22