Captioning sample images from outdoor scenes

We took several images from Flickr that focus on a wide variety of outdoor scenes and used both our image captioning models to generate captions for each image, which is depicted as follows:

# load files 
import glob 
outdoor1_files = glob.glob('real_test/outdoor1/*') 
 
 
# initialize caption generators and generate captions 
cg1 = CaptionGenerator(image_locations=outdoor1_files, word_to_index_map=word_to_index, index_to_word_map=index_to_word,  
                       max_caption_size=max_caption_size, caption_model=model1, beam_size=3) 
cg2 = CaptionGenerator(image_locations=outdoor1_files, word_to_index_map=word_to_index, index_to_word_map=index_to_word,  
                       max_caption_size=max_caption_size, caption_model=model2, beam_size=3) 
cg1.initialize_model() 
cg1.process_images() 
cg1.generate_captions() 
cg2.initialize_model() 
cg2.process_images() 
cg2.generate_captions() 
 
model30ep_captions_outdoor1 = cg1.captions 
model50ep_captions_outdoor1 = cg2.captions 
 
# plot images and their captions 
fig=plt.figure(figsize=(13, 11)) 
plt.suptitle('Automated Image Captioning: Outdoor Scenes 1', verticalalignment='top', size=15) 
columns = 2 
rows = 3 
for i in range(1, columns*rows +1): 
    fig.add_subplot(rows, columns, i) 
    image_name = outdoor1_files[i-1] 
    img = image.load_img(image_name) 
    plt.imshow(img, aspect='auto') 
    modelep30_caption_text = 'Caption(ep30): '+ model30ep_captions_outdoor1[i-1] 
    modelep50_caption_text = 'Caption(ep50): '+ model50ep_captions_outdoor1[i-1] 
    plt.xlabel(modelep30_caption_text+'
'+modelep50_caption_text,size=11, wrap=True) 
fig.tight_layout() 
plt.subplots_adjust(top=0.955)

The output of the preceding code is as follows:

Based on the preceding images, you can clearly see that it has correctly identified each scene. It is not a perfect model because we can clearly see it hasn't identified the dogs in the second image in the second row, beside the group of people it has clearly identified. Also, our model does make some color-identification mistakes, such as identifying a green ball as a red ball. Overall, the generated captions are definitely applicable to the source images!

The following images have been captioned from more diverse outdoor scenes and are based on popular outdoor activities. We'll focus on different activities to see how well our model performs on different types of scenes instead of focusing on only one specific scene:

In the preceding images, we have focused on a wide variety of outdoor activities, including dirt biking, skiing, surfing, kayaking, and rock climbing. If you look at the generated captions, they are relevant to each scene and describe them quite well. In several cases, our models get really specific and even describe what each person is wearing. However as we mentioned before, it misidentifies colors in several scenarios, which could perhaps be improved with some more data as well as training on higher-resolution images.

Table of Contents for Captioning sample images from outdoor scenes

Create new playlist

Sign In

Sign Up

Table of Contents for
Captioning sample images from outdoor scenes