Input pipeline with RandomShuffleQueue

If you have read Chapter 9, Cruise Control - Automation, you will know that we can use TextLineReader in TensorFlow to simply read the text files line by line and use the line to read the image directly in TensorFlow. However, things get more complex as the data only contains the folder location and the label. Moreover, we only want a subset of frames in one folder. For example, if the number of frames is 30 and we only want 10 frames to train, we will randomize from 0 to 20 and select 10 frames from that point. Therefore, in this chapter, we will use another mechanism to sample the video frames in pure Python and put the selected frame paths into RandomShuffleQueue for training. We also use tf.train.batch_join to leverage the training with multiple pre-processing threads.

First, create a new Python file named utils.py in the root folder and add the following code:

def lines_from_file(filename, repeat=False): 
    with open(filename) as handle: 
        while True: 
            try: 
                line = next(handle) 
                yield line.strip() 
            except StopIteration as e: 
                if repeat: 
                    handle.seek(0) 
                else: 
                    raise 
 
if __name__ == "__main__": 
    data_reader = lines_from_file("/home/ubuntu/datasets/ucf101/train.txt", repeat=True) 
 
    for i in range(15): 
        print(next(data_reader))

In this code, we create a generator function named lines_from_file to read the text files line by line. We also add a repeat parameter so that the generator function can read the text from the beginning when it reaches the end of the file.

We have added a main section so you can try to run it to see how the generator works:

python utils.py

Now, create a new Python file named datasets.py in the root folder and add the following code:

 import tensorflow as tf 
 import cv2 
 import os 
 import random 
 
 from tensorflow.python.ops import data_flow_ops 
 from utils import lines_from_file 
 
 def sample_videos(data_reader, root_folder, num_samples,  
 num_frames): 
    image_paths = list() 
    labels = list() 
    while True: 
        if len(labels) >= num_samples: 
            break 
        line = next(data_reader) 
        video_folder, label, max_frames = line.strip().split(" ") 
        max_frames = int(max_frames) 
        label = int(label) 
        if max_frames > num_frames: 
            start_index = random.randint(0, max_frames - num_frames) 
            frame_paths = list() 
            for index in range(start_index, start_index +  
 num_frames): 
                frame_path = os.path.join(root_folder, video_folder,  
 "%04d.jpg" % index) 
                frame_paths.append(frame_path) 
            image_paths.append(frame_paths) 
            labels.append(label) 
    return image_paths, labels 
 
 if __name__ == "__main__": 
    num_frames = 5 
    root_folder = "/home/ubuntu/datasets/ucf101/train/" 
    data_reader =  
 lines_from_file("/home/ubuntu/datasets/ucf101/train.txt",  
 repeat=True) 
 image_paths, labels = sample_videos(data_reader,  
 root_folder=root_folder, 
 num_samples=3,  
 num_frames=num_frames) 
    print("image_paths", image_paths) 
    print("labels", labels)

The sample_videos function is easy to understand. It will receive the generator object from lines_from_file function and use the next function to get the required samples. You can see that we use a random.randint method to randomize the starting frame position.

You can run the main section to see how the sample_videos work with the following command:

python datasets.py

Up to this point, we have read the dataset text file into the image_paths and labels variables, which are Python lists. In the later training routine, we will use a built-in RandomShuffleQueue in TensorFlow to enqueue image_paths and labels into that queue.

Now, we need to create a method that will be used in the training routine to get data from RandomShuffleQueue, perform pre-processing in multiple threads, and send the data to the batch_join function to create a mini-batch for training.

In the dataset.py file, add the following code:

 def input_pipeline(input_queue, batch_size=32, num_threads=8,  
 image_size=112): 
    frames_and_labels = [] 
    for _ in range(num_threads): 
        frame_paths, label = input_queue.dequeue() 
        frames = [] 
        for filename in tf.unstack(frame_paths): 
            file_contents = tf.read_file(filename) 
            image = tf.image.decode_jpeg(file_contents) 
            image = _aspect_preserving_resize(image, image_size) 
            image = tf.image.resize_image_with_crop_or_pad(image,  
            image_size, image_size) 
            image = tf.image.per_image_standardization(image) 
            image.set_shape((image_size, image_size, 3)) 
            frames.append(image) 
        frames_and_labels.append([frames, label]) 
 
    frames_batch, labels_batch = tf.train.batch_join( 
        frames_and_labels, batch_size=batch_size, 
        capacity=4 * num_threads * batch_size, 
    ) 
    return frames_batch, labels_batch

In this code, we prepare an array named frames_and_labels and use a for loop with a num_threads iteration. This is a very convenient way of adding multi-threading support to the pre-processing process. In each thread, we will call the method dequeue from the input_queue to get a frame_paths and label. From the sample_video function in the previous section, we know that frame_paths is a list of selected video frames. Therefore, we use another for loop to loop through each frame. In each frame, we read, resize, and perform image standardization. This part is similar to the code in Chapter 9, Cruise Control - Automation. At the end of the input pipeline, we add frames_and_labels with batch_size parameters. The returned frames_batch and labels_batch will be used for a later training routine.

Finally, you should add the following code, which contains the _aspect_preserving_resize function:

 def _smallest_size_at_least(height, width, smallest_side): 
    smallest_side = tf.convert_to_tensor(smallest_side,  
 dtype=tf.int32) 
 
    height = tf.to_float(height) 
    width = tf.to_float(width) 
    smallest_side = tf.to_float(smallest_side) 
 
    scale = tf.cond(tf.greater(height, width), 
                    lambda: smallest_side / width, 
                    lambda: smallest_side / height) 
    new_height = tf.to_int32(height * scale) 
    new_width = tf.to_int32(width * scale) 
    return new_height, new_width 
 
 
 def _aspect_preserving_resize(image, smallest_side): 
    smallest_side = tf.convert_to_tensor(smallest_side,  
 dtype=tf.int32) 
    shape = tf.shape(image) 
    height = shape[0] 
    width = shape[1] 
    new_height, new_width = _smallest_size_at_least(height, width,  
 smallest_side) 
    image = tf.expand_dims(image, 0) 
    resized_image = tf.image.resize_bilinear(image, [new_height,  
 new_width], align_corners=False) 
    resized_image = tf.squeeze(resized_image) 
    resized_image.set_shape([None, None, 3]) 
    return resized_image

This code is the same as what you used in Chapter 9, Cruise Control - Automation.

In the next section, we will create the deep neural network architecture that we will use to perform video action recognitions with 101 categories.

Table of Contents for Input pipeline with RandomShuffleQueue

Create new playlist

Sign In

Sign Up

Table of Contents for
Input pipeline with RandomShuffleQueue