Automated image captioning in action!

Evaluating on our test dataset was a good way to test the model's performance, but how do we start using the model in the real world and caption completely new photos? This is where we need some knowledge of building an end-to-end system, which takes in any image as an input and gives us a free-text natural-language caption as the output.

Here are the major components and functions for our automated caption generator:

  • Caption model and metadata initializer
  • Image feature extraction model initializer
  • Transfer learning-based feature extractor
  • Caption generator

To make this generic, we built a class that makes use of several utility functions we mentioned in the previous sections:

from keras.preprocessing import image 
from keras.applications.vgg16 import preprocess_input as preprocess_vgg16_input 
from keras.applications import vgg16 
from keras.models import Model 
 
class CaptionGenerator: 
     
    def __init__(self, image_locations=[],  
                 word_to_index_map=None, index_to_word_map=None,  
                 max_caption_size=None, caption_model=None,  
                                                    beam_size=1): 
        self.image_locs = image_locations 
        self.captions = [] 
        self.image_feats = [] 
        self.word2index = word_to_index_map 
        self.index2word = index_to_word_map 
        self.max_caption_size = max_caption_size 
        self.vision_model = None 
        self.caption_model = caption_model 
        self.beam_size = beam_size 
     
    def process_image2arr(self, path, img_dims=(224, 224)): 
        img = image.load_img(path, target_size=img_dims) 
        img_arr = image.img_to_array(img) 
        img_arr = np.expand_dims(img_arr, axis=0) 
        img_arr = preprocess_vgg16_input(img_arr) 
        return img_arr 
     
    def initialize_model(self): 
         
        vgg_model = vgg16.VGG16(include_top=True, weights='imagenet',  
                                input_shape=(224, 224, 3)) 
        vgg_model.layers.pop() 
        output = vgg_model.layers[-1].output 
        vgg_model = Model(vgg_model.input, output) 
        vgg_model.trainable = False 
        self.vision_model = vgg_model 
         
    def process_images(self): 
        if self.image_locs: 
            image_feats = [self.vision_model.predict
(self.process_image2arr
(path=img_path)) for img_path
in self.image_locs] image_feats = [np.reshape(img_feat, img_feat.shape[1]) for
img_feat in image_feats] self.image_feats = image_feats else: print('No images specified') def generate_captions(self): captions = [generate_image_caption(model=self.caption_model,
word_to_index_map=self.word2index, index_to_word_map=self.index2word,
image_features=img_feat, max_caption_size=self.max_caption_size, beam_size=self.beam_size)[0] for img_feat in self.image_feats] self.captions = captions

Now that our caption generator has been implemented, it is time to put it into action! For the purpose of testing our caption generator, we have downloaded several images that are completely new and not present in the Flickr8K dataset. We downloaded specific images from Flickr that adhere to necessary commercial-usage-based licenses so that we can depict them in this book. We'll show some demonstrations in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.86.154