Learning to Classify and Localize Objects

So far, we have studied a range of algorithms and approaches where you have learned how to solve real-world problems with the help of computer vision. In recent years, in parallel with the considerable hardware computational power that is provided with devices such as Graphical Processing Units (GPUs), a lot of algorithms arose that utilized this power and achieved state-of-the-art results in computer vision tasks. Usually, these algorithms are based on neural networks, which enable the creator of the algorithm to squeeze quite a lot of meaningful information from data. 

Meanwhile, in contrast to the classical approaches, this information is often quite hard to interpret. From that point of view, you might say that we are getting closer to artificial intelligence—that is, we are giving a computer an approach and it figures out how to do the rest. In order for all of this to not appear so mysterious, let's learn about deep learning models in this chapter.

As you have already seen, a few of the classical problems in computer vision include object detection and localization. Let's look at how to classify and localize objects with the help of deep learning models in this chapter. 

The goal of this chapter is to learn important deep learning concepts such as transfer learning and how to apply them to build your own object classifier and localizer. Specifically, we will cover the following topics:

  • Preparing a large dataset for training a deep learning model
  • Understanding Convolutional Neural Networks (CNNs)
  • Classifying and localizing with CNNs 
  • Learning about transfer learning
  • Implementing activation functions
  • Understanding backpropagation

We will start by preparing a dataset for training. Then, we will go on to understand how to use a pretrained model for creating a new classifier. Once you have understood how it is done, we will move forward and build more complex architectures that will perform localization.

During these steps, we will use the Oxford-IIIT-Pet dataset. Finally, we will run the app that will use our trained localizer network for inference. Although the network will be trained only using the bounding boxes of the heads of pets, you will see that it will also be good for localization of the human head position. The latter will show the power of generalization of our model.

Learning about these concepts of deep learning and seeing them in action will be very useful in the future when you make your own applications using deep learning models or when you start to work on completely new deep learning architectures.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.159.152