Formulating our objective

The main objective of our real-world case study is image captioning or scene recognition. This is a supervised learning problem to an extent, but not a traditional classification problem. Here, we will be working on an image dataset, known as Flickr8K, with samples of images or scenes and corresponding natural language captions describing them. The idea is to build a system that can learn from these images and start captioning images automatically.

As I mentioned earlier, a traditional image classification system typically classifies or categorizes images into predefined classes. We have already built such a system in previous chapters. However, the output from an image captioning system is generally a sequence of words forming a textual description in natural language; this makes it more difficult than a traditional supervised classification system.

The nature of our model training will still be supervised since we will have to build it based on training image data and their corresponding caption descriptions. However, the approach toward building our model will be slightly different. We will be leveraging concepts from transfer learning and deep learning to build this system as usual. To be more specific, we will be using a combination of Deep Convolutional Neural Networks (DCNNs) and sequential models.

Table of Contents for Formulating our objective

Create new playlist

Sign In

Sign Up

Table of Contents for
Formulating our objective