Pre-processing the video for training

UCF101 contains 13,320 video clips with a fixed frame rate and resolution of 25 FPS and 320 x 240 respectively. All video clips are stored in AVI format, so it is not convenient to use them in TensorFlow. Therefore, in this section, we will extract video frames from all the videos into JPEG files. We will only extract video frames at the fixed frame rate of 4 FPS so that we can reduce the input size of the network.

Before we start implementing the code, we need to install the av library from https://mikeboers.github.io/PyAV/installation.html.

First, create a Python package named scripts in the root folder. Then, create a new Python file at scripts/convert_ucf101.py. In the newly created file, add the first code to import and define some parameters, as follows:

 import av 
 import os 
 import random 
 import tensorflow as tf 
 from tqdm import tqdm 
 
 FLAGS = tf.app.flags.FLAGS 
 tf.app.flags.DEFINE_string( 
    'dataset_dir', '/mnt/DATA02/Dataset/UCF101', 
    'The folder that contains the extracted content of UCF101.rar' 
 ) 
 
 tf.app.flags.DEFINE_string( 
    'train_test_list_dir',   
 '/mnt/DATA02/Dataset/UCF101/ucfTrainTestlist', 
    'The folder that contains the extracted content of  
 UCF101TrainTestSplits-RecognitionTask.zip' 
 ) 
 
 tf.app.flags.DEFINE_string( 
    'target_dir', '/home/ubuntu/datasets/ucf101', 
    'The location where all the images will be stored' 
 ) 
 
 tf.app.flags.DEFINE_integer( 
    'fps', 4, 
    'Framerate to export' 
 ) 
 
 def ensure_folder_exists(folder_path): 
    if not os.path.exists(folder_path): 
        os.mkdir(folder_path) 
 
    return folder_path

In the preceding code, dataset_dir and train_test_list_dir are the locations of the folders containing the extracted content of UCF101.rar and UCF101TrainTestSplits-RecognitionTask.zip respectively. target_dir is the folder that all the training images will be stored in. ensure_folder_exists is a utility function that creates a folder if it doesn't exist.

Next, let's define the main function of the Python code:

 def main(_): 
    if not FLAGS.dataset_dir: 
        raise ValueError("You must supply the dataset directory with  
 --dataset_dir") 
 
    ensure_folder_exists(FLAGS.target_dir) 
    convert_data(["trainlist01.txt", "trainlist02.txt",  
 "trainlist03.txt"], training=True) 
    convert_data(["testlist01.txt", "testlist02.txt",  
 "testlist03.txt"], training=False) 
 
 if __name__ == "__main__": 
    tf.app.run()

In the main function, we create the target_dir folder and call the convert_data function which we will create shortly. The convert_data function takes a list of train/test text files in the dataset and a Boolean called training that indicates whether the text files are for the training process.

Here are some lines from one of the text files:

ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c01.avi 1
ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c02.avi 1
ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c03.avi 1

Each line of the text file contains the path to the video file and the correct label. In this case, we have three video paths from the ApplyEyeMakeup category, which is the first category in the dataset.

The main idea here is that we read each line of the text files, extract video frames in a JPEG format, and save the location of the extracted files with the corresponding label for further training. Here is the code for the convert_data function:

 def convert_data(list_files, training=False): 
    lines = [] 
    for txt in list_files: 
        lines += [line.strip() for line in  
 open(os.path.join(FLAGS.train_test_list_dir, txt))] 
 
    output_name = "train" if training else "test" 
 
    random.shuffle(lines) 
 
    target_dir = ensure_folder_exists(os.path.join(FLAGS.target_dir,  
 output_name)) 
    class_index_file = os.path.join(FLAGS.train_test_list_dir,  
 "classInd.txt") 
    class_index = {line.split(" ")[1].strip(): int(line.split(" ") 
 [0]) - 1 for line in open(class_index_file)} 
 
    with open(os.path.join(FLAGS.target_dir, output_name + ".txt"),  
 "w") as f: 
        for line in tqdm(lines): 
            if training: 
                filename, _ = line.strip().split(" ") 
            else: 
                filename = line.strip() 
            class_folder, video_name = filename.split("/") 
 
            label = class_index[class_folder] 
            video_name = video_name.replace(".avi", "") 
            target_class_folder =  
 ensure_folder_exists(os.path.join(target_dir, class_folder)) 
            target_folder =  
 ensure_folder_exists(os.path.join(target_class_folder, video_name)) 
 
            container = av.open(os.path.join(FLAGS.dataset_dir,  
            filename)) 
            frame_to_skip = int(25.0 / FLAGS.fps) 
            last_frame = -1 
            frame_index = 0 
            for frame in container.decode(video=0): 
                if last_frame < 0 or frame.index > last_frame +  
                frame_to_skip: 
                    last_frame = frame.index 
                    image = frame.to_image() 
                    target_file = os.path.join(target_folder,  
                   "%04d.jpg" % frame_index) 
                    image.save(target_file) 
                    frame_index += 1 
            f.write("{} {} {}
".format("%s/%s" % (class_folder,  
           video_name), label, frame_index)) 
 
    if training: 
        with open(os.path.join(FLAGS.target_dir, "label.txt"), "w")  
        as f: 
            for class_name in sorted(class_index,  
            key=class_index.get): 
                f.write("%s
" % class_name)

The preceding code is straightforward. We load the video path from the text files and use the av library to open the AVI files. Then, we use FLAGS.fps to control how many frames per second need to be extracted. You can run the scripts/convert_ucf101.py file using the following command:

python scripts/convert_ucf101.py

The total process needs about 30 minutes to convert all the video clips. At the end, the target_dir folder will contain the following files:

label.txt  test  test.txt  train  train.txt

In the train.txt file, the lines will look like this:

Punch/v_Punch_g25_c03 70 43
Haircut/v_Haircut_g20_c01 33 36
BrushingTeeth/v_BrushingTeeth_g25_c02 19 33
Nunchucks/v_Nunchucks_g03_c04 55 36
BoxingSpeedBag/v_BoxingSpeedBag_g16_c04 17 21

This format can be understood as follows:

<Folder location of the video> <Label> <Number of frames in the folder>

There is one thing that you must remember, which is that the labels in train.txt and test.txt go from 0 to 100. However, the labels in the UCF101 go from 1 to 101. This is because the sparse_softmax_cross_entropy function in TensorFlow needs class labels to start from 0.

Table of Contents for Pre-processing the video for training

Create new playlist

Sign In

Sign Up

Table of Contents for
Pre-processing the video for training