front matter

preface

Two years ago, I decided to write a book to teach deep learning for computer vision from an intuitive perspective. My goal was to develop a comprehensive resource that takes learners from knowing only the basics of machine learning to building advanced deep learning algorithms that they can apply to solve complex computer vision problems.

The problem : In short, as of this moment, there are no books out there that teach deep learning for computer vision the way I wanted to learn about it. As a beginner machine learning engineer, I wanted to read one book that would take me from point A to point Z. I planned to specialize in building modern computer vision applications, and I wished that I had a single resource that would teach me everything I needed to do two things: 1) use neural networks to build an end-to-end computer vision application, and 2) be comfortable reading and implementing research papers to stay up-to-date with the latest industry advancements.

I found myself jumping between online courses, blogs, papers, and YouTube videos to create a comprehensive curriculum for myself. It’s challenging to try to comprehend what is happening under the hood on a deeper level: not just a basic understanding, but how the concepts and theories make sense mathematically. It was impossible to find one comprehensive resource that (horizontally) covered the most important topics that I needed to learn to work on complex computer vision applications while also diving deep enough (vertically) to help me understand the math that makes the magic work.

As a beginner, I searched but couldn’t find anything to meet these needs. So now I’ve written it. My goal has been to write a book that not only teaches the content I wanted when I was starting out, but also levels up your ability to learn on your own.

My solution is a comprehensive book that dives deep both horizontally and vertically:

  • Horizontally --This book explains most topics that an engineer needs to learn to build production-ready computer vision applications, from neural networks and how they work to the different types of neural network architectures and how to train, evaluate, and tune the network.

  • Vertically --The book dives a level or two deeper than the code and explains intuitively (and gently) how the math works under the hood, to empower you to be comfortable reading and implementing research papers or even inventing your own techniques.

At the time of writing, I believe this is the only deep learning for vision systems resource that is taught this way. Whether you are looking for a job as a computer vision engineer, want to gain a deeper understanding of advanced neural networks algorithms in computer vision, or want to build your product or startup, I wrote this book with you in mind. I hope you enjoy it.

acknowledgments

This book was a lot of work. No, make that really a lot of work! But I hope you will find it valuable. There are quite a few people I’d like to thank for helping me along the way.

I would like to thank the people at Manning who made this book possible: publisher Marjan Bace and everyone on the editorial and production teams, including Jennifer Stout, Tiffany Taylor, Lori Weidert, Katie Tennant, and many others who worked behind the scenes.

Many thanks go to the technical peer reviewers led by Alain Couniot--Al Krinker, Albert Choy, Alessandro Campeis, Bojan Djurkovic, Burhan ul haq, David Fombella Pombal, Ishan Khurana, Ita Cirovic Donev, Jason Coleman, Juan Gabriel Bono, Juan José Durillo Barrionuevo, Michele Adduci, Millad Dagdoni, Peter Hraber, Richard Vaughan, Rohit Agarwal, Tony Holdroyd, Tymoteusz Wolodzko, and Will Fuger--and the active readers who contributed their feedback in the book forums. Their contributions included catching typos, code errors and technical mistakes, as well as making valuable topic suggestions. Each pass through the review process and each piece of feedback implemented through the forum topics shaped and molded the final version of this book.

Finally, thank you to the entire Synapse Technology team. You’ve created something that’s incredibly cool. Thank you to Simanta Guatam, Aleksandr Patsekin, Jay Patel, and others for answering my questions and brainstorming ideas for the book.

about this book

Who should read this book

If you know the basic machine learning framework, can hack around in Python, and want to learn how to build and train advanced, production-ready neural networks to solve complex computer vision problems, I wrote this book for you. The book was written for anyone with intermediate Python experience and basic machine learning understanding who wishes to explore training deep neural networks and learn to apply deep learning to solve computer vision problems.

When I started writing the book, my primary goal was as follows: “I want to write a book to grow readers’ skills, not teach them content.” To achieve this goal, I had to keep an eye on two main tenets:

  1. Teach you how to learn. I don’t want to read a book that just goes through a set of scientific facts. I can get that on the internet for free. If I read a book, I want to finish it having grown my skillset so I can study the topic further. I want to learn how to think about the presented solutions and come up with my own.

  2. Go very deep. If I’m successful in satisfying the first tenet, that makes this one easy. If you learn how to learn new concepts, that allows me to dive deep without worrying that you might fall behind. This book doesn’t avoid the math part of the learning, because understanding the mathematical equations will empower you with the best skill in the AI world: the ability to read research papers, compare innovations, and make the right decisions about implementing new concepts in your own problems. But I promise to introduce only the mathematical concepts you need, and I promise to present them in a way that doesn’t interrupt your flow of understanding the concepts without the math part if you prefer.

How this book is organized: A roadmap

This book is structured into three parts. The first part explains deep leaning in detail as a foundation for the remaining topics. I strongly recommend that you not skip this section, because it dives deep into neural network components and definitions and explains all the notions required to be able to understand how neural networks work under the hood. After reading part 1, you can jump directly to topics of interest in the remaining chapters. Part 2 explains deep learning techniques to solve object classification and detection problems, and part 3 explains deep learning techniques to generate images and visual embeddings. In several chapters, practical projects implement the topics discussed.

About the code

All of this book’s code examples use open source frameworks that are free to download. We will be using Python, Tensorflow, Keras, and OpenCV. Appendix A walks you through the complete setup. I also recommend that you have access to a GPU if you want to run the book projects on your machine, because chapters 6-10 contain more complex projects to train deep networks that will take a long time on a regular CPU. Another option is to use a cloud environment like Google Colab for free or other paid options.

Examples of source code occur both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes code is also in bold to highlight code that has changed from previous steps in the chapter, such as when a new feature adds to an existing line of code.

In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers (). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts.

The code for the examples in this book is available for download from the Manning website at www.manning.com/books/deep-learning-for-vision-systems and from GitHub at https://github.com/moelgendy/deep_learning_for_vision_systems.

liveBook discussion forum

Purchase of Deep Learning for Vision Systems includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To access the forum, go to https://livebook.manning.com/#!/book/deep-learning-for-vision-systems/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/discussion.

Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

about the author

Mohamed Elgendy is the vice president of engineering at Rakuten, where he is leading the development of its AI platform and products. Previously, he served as head of engineering at Synapse Technology, building proprietary computer vision applications to detect threats at security checkpoints worldwide. At Amazon, Mohamed built and managed the central AI team that serves as a deep learning think tank for Amazon engineering teams like AWS and Amazon Go. He also developed the deep learning for computer vision curriculum at Amazon’s Machine University. Mohamed regularly speaks at AI conferences like Amazon’s DevCon, O’Reilly’s AI conference, and Google’s I/O.

about the cover illustration

The figure on the cover of Deep Learning for Vision Systems depicts Ibn al-Haytham, an Arab mathematician, astronomer, and physicist who is often referred to as “the father of modern optics” due to his significant contributions to the principles of optics and visual perception. The illustration is modified from the frontispiece of a fifteenth-century edition of Johannes Hevelius’s work Selenographia.

In his book Kitab al-Manazir (Book of Optics), Ibn al-Haytham was the first to explain that vision occurs when light reflects from an object and then passes to one’s eyes. He was also the first to demonstrate that vision occurs in the brain, rather than in the eyes--and many of these concepts are at the heart of modern vision systems. You will see the correlation when you read chapter 1 of this book.

Ibn al-Haytham has been a great inspiration for me as I work and innovate in this field. By honoring his memory on the cover of this book, I hope to inspire fellow practitioners that our work can live and inspire others for thousands of years.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.204.142