Preparing the main script 

The main script will be responsible for the complete logic of the app. It will process a video stream and use an object-detection deep convolutional neural network combined with the tracking algorithm that we will prepare later in this chapter.

The algorithm is used to track objects from frame to frame. It will also be responsible for illustrating results. The script will accept arguments and have some intrinsic constants, which are defined in the following initialization steps of the script:

As with any other script, we start by importing all the required modules:

import argparse

import cv2
import numpy as np

from classes import CLASSES_90
from sort import Sort

We will use argparse as we want our script to accept arguments. We store the object classes in a separate file in order not to contaminate our script. Finally, we import our Sort tracker, which we will build later in the chapter.

Next, we create and parse arguments:

parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input",
                    help="Video path, stream URI, or camera ID ", default="demo.mkv")
parser.add_argument("-t", "--threshold", type=float, default=0.3,
                    help="Minimum score to consider")
parser.add_argument("-m", "--mode", choices=['detection', 'tracking'], default="tracking",
                    help="Either detection or tracking mode")

args = parser.parse_args()

Our first argument is the input, which can be a path to a video, the ID of a camera (0 for the default camera), or a video stream Universal Resource Identifier (URI). For example, you will be able to connect the app to a remote IP camera using the Real-time Transport Control Protocol (RTCP).

The networks that we will use will predict the bounding boxes of objects. Each bounding box will have a score, which will specify how probable it is that the bounding box contains an object of a certain type.

The next parameter is threshold, which specifies the minimal value of the score. If the score is below threshold, then we will not consider the detection. The last parameter is mode, in which we want to run the script. If we run it in detection mode, the flow of the algorithm will stop after detecting objects and will not proceed further with tracking. The results of object detections will be illustrated in the frame.

OpenCV accepts the ID of a camera as an integer. If we specify the ID of a camera, the input argument will be a string instead of an integer. Hence, we need to convert it to an integer if required:

if args.input.isdigit():
    args.input = int(args.input)

Next, we define the required constants:

TRACKED_CLASSES = ["car", "person"]
BOX_COLOR = (23, 230, 210)
TEXT_COLOR = (255, 255, 255)
INPUT_SIZE = (300,300)

In this app, we will track cars and people. We will illustrate bounding boxes in a yellowish color and write text in white. We'll also define the standard input size of the Single Shot Detector (SSD) model that we are going to use for detection.

Table of Contents for Preparing the main script&#xA0;

Create new playlist

Sign In

Sign Up

Table of Contents for
Preparing the main script