Understanding the data

Let's take a look at the data we will be using to build our model. To keep things simple, we will be using the Flickr8K dataset. This dataset includes images obtained from Flickr, a popular image sharing website. To download the dataset, you can request it by filling in a form at https://forms.illinois.edu/sec/1713398 from the Department of Computer Science, University of Illinois, and you should get the download link in your email.

To check out the details pertaining to each image, you can refer to their website, http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html, which talks about each image, its source, and five text-based captions for each image. In general, any sample image would have several captions similar to the following:

You can clearly see the image and its corresponding captions. It is quite evident that all the captions try to describe the same image or scene but it might focus on specific and different aspects of the image, making this a hard task to automate. We also recommend readers to check out the paper, Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics, Micah Hodosh et al., IJCAI 2015 (https://pdfs.semanticscholar.org/f126/ec304cdad464f6248ac7f73a186ca26db526.pdf).

There will be two files that you obtain when you click on the download link:

  • Flickr8k_Dataset.zip: A 1 GB ZIP archive of all the raw images and photos
  • Flickr8k_text.zip: A 3 MB ZIP archive of all natural-language textual descriptions for photographs, which are the captions

The Flickr_8k.devImages.txt, lickr_8k.trainImages.txt, and Flickr_8k.testImages.txt files consist of the filenames for 6,000, 1,000, and 1,000 images, respectively. We will be combining the dev and train images to build a training dataset of 7,000 images and use a test dataset of 1,000 images for evaluation. Each image has five different yet similar captions and is available in the Flickr8k.token.txt file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.154.64