Automated Image Caption Generator

In the previous chapters, we looked at several case studies pertaining to applying transfer learning on problems in computer vision as well as natural language processing (NLP). However, these were problems in their own specific domains. In this chapter, we will be focusing on building an intelligent system that is a combination of these two popular domains—computer vision and NLP. To be more specific, we will be focusing on building an object-recognition system coupled with machine translation to build an automated image-caption generator.

The idea of image captioning is not something new. Typically, any image present in diverse sources of media, such as books, papers, or social media, usually needs to be captioned with a proper text description for better meaning and context. What makes this task tough is that an image caption is typically free-flowing natural language consisting of one or more sentences. Thus, due to the unstructured nature of text data for image captions, it is not a traditional image-classification problem.

Image captioning can be solved by using a combination of leveraging pretrained models that are experts in the computer vision domain, such as Visual Geometry Group (VGG) or Inception, and sequence models, such as recurrent neural networks (RNNs) or Long Short-Term Memory (LSTM), to generate sequences of words to form a proper image caption. In this chapter, we will explore an interesting approach to building an automated image captioning or scene-recognition system.

We will be covering the following major aspects for building this system, which is powered by deep learning and transfer learning:

Understanding image captioning
Formulating our objective
Understanding the data
Approach to automated image captioning
Image feature extraction with transfer learning
Building a vocabulary for our captions
Building an image caption dataset generator
Building our image language encoder-decoder deep learning model
Training our image captioning deep learning model
Automated image captioning in action

We will cover essential concepts from both computer vision and NLP to build our automated image caption generator. We will dive deep into a suitable deep learning architecture coupled with transfer learning to implement this system on top of a popular and easily-available image dataset. We will also showcase how to build and test our automated image caption generator on new photos and scenes. The code for this chapter is available for quick reference in the Chapter 11 folder in the GitHub repository at https://github.com/dipanjanS/hands-on-transfer-learning-with-python which you can refer to as needed to follow along with the chapter. We will also be posting some bonus examples there.

Table of Contents for Automated Image Caption Generator

Create new playlist

Sign In

Sign Up

Table of Contents for
Automated Image Caption Generator