Automated Image Caption Generator

In the previous chapters, we looked at several case studies pertaining to applying transfer learning on problems in computer vision as well as natural language processing (NLP). However, these were problems in their own specific domains. In this chapter, we will be focusing on building an intelligent system that is a combination of these two popular domains—computer vision and NLP. To be more specific, we will be focusing on building an object-recognition system coupled with machine translation to build an automated image-caption generator.

The idea of image captioning is not something new. Typically, any image present in diverse sources of media, such as books, papers, or social media, usually needs to be captioned with a proper text description for better meaning and context. What makes this task tough is that an image caption is typically free-flowing natural language consisting of one or more sentences. Thus, due to the unstructured nature of text data for image captions, it is not a traditional image-classification problem.

Image captioning can be solved by using a combination of leveraging pretrained models that are experts in the computer vision domain, such as Visual Geometry Group (VGG) or Inception, and sequence models, such as recurrent neural networks (RNNs) or Long Short-Term Memory (LSTM), to generate sequences of words to form a proper image caption. In this chapter, we will explore an interesting approach to building an automated image captioning or scene-recognition system.

We will be covering the following major aspects for building this system, which is powered by deep learning and transfer learning:

  • Understanding image captioning
  • Formulating our objective
  • Understanding the data
  • Approach to automated image captioning
  • Image feature extraction with transfer learning
  • Building a vocabulary for our captions
  • Building an image caption dataset generator
  • Building our image language encoder-decoder deep learning model
  • Training our image captioning deep learning model
  • Automated image captioning in action

We will cover essential concepts from both computer vision and NLP to build our automated image caption generator. We will dive deep into a suitable deep learning architecture coupled with transfer learning to implement this system on top of a popular and easily-available image dataset. We will also showcase how to build and test our automated image caption generator on new photos and scenes. The code for this chapter is available for quick reference in the Chapter 11 folder in the GitHub repository at https://github.com/dipanjanS/hands-on-transfer-learning-with-python which you can refer to as needed to follow along with the chapter. We will also be posting some bonus examples there.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.114.0