Using a pre-trained YOLO model for object detection

The following are the steps that you must follow to be able to use the pre-trained model:

Clone this repository: go to https://github.com/allanzelener/YAD2K/, right-click on clone or download, and select the path where you want to download the ZIP. Then unzip the compressed file to YAD2K-master folder.

Download the weights and cfg file from https://pjreddie.com/darknet/yolo/ by clicking on the yellow links on the page, marked by red boxes here:

Save the yolov2.cfg and the yolov2.weights files downloaded inside the YAD2K-master folder.
Go inside the YAD2K-master folder, open a command prompt (you need to have Python3 installed and in path) and run the following command:

python yad2k.py yolov2.cfg yolov2.weights yolo/yolo.h5

If executed successfully, it will create two files inside the YAD2K-master/model_data folder, namely, yolo.h5 and yolo.anchors.

Now go to the folder from where you want to run your code. Create a folder named yolo here and copy the four files (coco_classes, pascal_classes, yolo.h5, yolo.anchors) from the YAD2K-master/model_data folder to the yolo folder you created.
Copy the yad2k folder from the YAD2K-master folder to the current path. Your current path should have these two folders, yad2k and yolo, now.
Create a new folder named images in the current path and put your input images here.
Create another new empty folder named output in the current path. The YOLO model will save the output images (with objects detected) here.
Create a .py script in the current path and copy-paste the following code and run (or run in from Jupyter Notebook cell from the current path).

Double-check that the folder structure is exactly as shown in the following screenshot, with the required files present, before running the code:

Let's first load all the required libraries, as shown in this code block:

# for jupyter notebook uncomment the following line of code
#% matplotlib inline
import os
import matplotlib.pylab as pylab
import scipy.io
import scipy.misc
import numpy as np
from PIL import Image
from keras import backend as K
from keras.models import load_model
# The following functions from the yad2k library will be used
# Note: it assumed that you have the yad2k folder in your current path, otherwise it will not work!
from yad2k.models.keras_yolo import yolo_head, yolo_eval
import colorsys
import imghdr
import random
from PIL import Image, ImageDraw, ImageFont

Now implement a few functions to read the classes and anchor files, generate the color of the boxes, and scale the boxes predicted by YOLO:

def read_classes(classes_path):

    with open(classes_path) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names

def read_anchors(anchors_path):

    with open(anchors_path) as f:
        anchors = f.readline()
        anchors = [float(x) for x in anchors.split(',')]
        anchors = np.array(anchors).reshape(-1, 2)
    return anchors

def generate_colors(class_names):

    hsv_tuples = [(x / len(class_names), 1., 1.) for x in range(len(class_names))]
    colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
    colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))
    random.seed(10101) # Fixed seed for consistent colors across runs.
    random.shuffle(colors) # Shuffle colors to decorrelate adjacent classes.
    random.seed(None) # Reset seed to default.
    return colors

def scale_boxes(boxes, image_shape):

    """ scales the predicted boxes in order to be drawable on the image"""
    height = image_shape[0]
    width = image_shape[1]
    image_dims = K.stack([height, width, height, width])
    image_dims = K.reshape(image_dims, [1, 4])
    boxes = boxes * image_dims
    return boxes

In the following code snippet, we'll implement a couple of functions to preprocess an image and draw the boxes obtained from YOLO to detect the objects present in the image:

def preprocess_image(img_path, model_image_size):

    image_type = imghdr.what(img_path)
    image = Image.open(img_path)
    resized_image = image.resize(tuple(reversed(model_image_size)), Image.BICUBIC)
    image_data = np.array(resized_image, dtype='float32')
    image_data /= 255.
    image_data = np.expand_dims(image_data, 0) # Add batch dimension.
    return image, image_data

def draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors):    

    font = ImageFont.truetype(font='font/FiraMono-Medium.otf',size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
    thickness = (image.size[0] + image.size[1]) // 300

    for i, c in reversed(list(enumerate(out_classes))):
        predicted_class = class_names[c]
        box = out_boxes[i]
        score = out_scores[i]
        label = '{} {:.2f}'.format(predicted_class, score)
        draw = ImageDraw.Draw(image)
        label_size = draw.textsize(label, font)
        top, left, bottom, right = box
        top = max(0, np.floor(top + 0.5).astype('int32'))
        left = max(0, np.floor(left + 0.5).astype('int32'))
        bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
        right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
        print(label, (left, top), (right, bottom))
        if top - label_size[1] >= 0:
            text_origin = np.array([left, top - label_size[1]])
        else:
            text_origin = np.array([left, top + 1])
        # My kingdom for a good redistributable image drawing library.
        for i in range(thickness):
            draw.rectangle([left + i, top + i, right - i, bottom - i], outline=colors[c])
        draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=colors[c])
        draw.text(text_origin, label, fill=(0, 0, 0), font=font)
        del draw

Let's now load the input image, the classes file, and the anchors using our functions, then load the YOLO pretrained model and print the model summary using the next code block:

# provide the name of the image that you saved in the images folder to be fed through the network
input_image_name = "giraffe_zebra.jpg"
input_image = Image.open("images/" + input_image_name)
width, height = input_image.size
width = np.array(width, dtype=float)
height = np.array(height, dtype=float)
image_shape = (height, width)
#Loading the classes and the anchor boxes that are copied to the yolo folder
class_names = read_classes("yolo/coco_classes.txt")
anchors = read_anchors("yolo/yolo_anchors.txt")
#Load the pretrained model 
yolo_model = load_model("yolo/yolo.h5")
#Print the summery of the model
yolo_model.summary()
#__________________________________________________________________________________________________
#Layer (type) Output Shape Param # Connected to 
#==================================================================================================
#input_1 (InputLayer) (None, 608, 608, 3) 0 
#__________________________________________________________________________________________________
#conv2d_1 (Conv2D) (None, 608, 608, 32) 864 input_1[0][0] 
#__________________________________________________________________________________________________
#batch_normalization_1 (BatchNor (None, 608, 608, 32) 128 conv2d_1[0][0] 
#__________________________________________________________________________________________________
#leaky_re_lu_1 (LeakyReLU) (None, 608, 608, 32) 0 batch_normalization_1[0][0] 
#__________________________________________________________________________________________________
#max_pooling2d_1 (MaxPooling2D) (None, 304, 304, 32) 0 leaky_re_lu_1[0][0] 
#__________________________________________________________________________________________________
#conv2d_2 (Conv2D) (None, 304, 304, 64) 18432 max_pooling2d_1[0][0] 
#__________________________________________________________________________________________________
#batch_normalization_2 (BatchNor (None, 304, 304, 64) 256 conv2d_2[0][0] 
#__________________________________________________________________________________________________
#leaky_re_lu_2 (LeakyReLU) (None, 304, 304, 64) 0 batch_normalization_2[0][0] 
#__________________________________________________________________________________________________
#max_pooling2d_2 (MaxPooling2D) (None, 152, 152, 64) 0 leaky_re_lu_2[0][0] 
#__________________________________________________________________________________________________
#... ... ...
#__________________________________________________________________________________________________
#concatenate_1 (Concatenate) (None, 19, 19, 1280) 0 space_to_depth_x2[0][0] 
# leaky_re_lu_20[0][0] 
#________________________________________________________________________________________________ 
#batch_normalization_22 (BatchNo (None, 19, 19, 1024) 4096 conv2d_22[0][0] 
#__________________________________________________________________________________________________
#leaky_re_lu_22 (LeakyReLU) (None, 19, 19, 1024) 0 batch_normalization_22[0][0] 
#__________________________________________________________________________________________________
#conv2d_23 (Conv2D) (None, 19, 19, 425) 435625 leaky_re_lu_22[0][0] 
#==================================================================================================
#Total params: 50,983,561
#Trainable params: 50,962,889
#Non-trainable params: 20,672
__________________________________________________________________________________________________

Finally, the next code block extracts the bounding boxes from the YOLO prediction output and draws the boxes around the objects found with right labels, scores, and colors:

# convert final layer features to bounding box parameters
yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))
#Now yolo_eval function selects the best boxes using filtering and non-max suppression techniques.
# If you want to dive in more to see how this works, refer keras_yolo.py file in yad2k/models
boxes, scores, classes = yolo_eval(yolo_outputs, image_shape)
# Initiate a session
sess = K.get_session()
#Preprocess the input image before feeding into the convolutional network
image, image_data = preprocess_image("images/" + input_image_name, model_image_size = (608, 608))
#Run the session
out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes],feed_dict={yolo_model.input:image_data,K.learning_phase(): 0})
#Print the results
print('Found {} boxes for {}'.format(len(out_boxes), input_image_name))
#Found 5 boxes for giraffe_zebra.jpg
#zebra 0.83 (16, 325) (126, 477)
#giraffe 0.89 (56, 175) (272, 457)
#zebra 0.91 (370, 326) (583, 472)
#giraffe 0.94 (388, 119) (554, 415)
#giraffe 0.95 (205, 111) (388, 463)
#Produce the colors for the bounding boxes
colors = generate_colors(class_names)
#Draw the bounding boxes
draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
#Apply the predicted bounding boxes to the image and save it
image.save(os.path.join("output", input_image_name), quality=90)
output_image = scipy.misc.imread(os.path.join("output", input_image_name))
pylab.imshow(output_image)
pylab.axis('off')
pylab.show()

The following image shows the output obtained by running the preceding code. The objects, giraffes and zebras, marked with bounding boxes were predicted using the YOLO model. The numbers above each bounding box is the probability score from the YOLO model:

Likewise, let's try using the following photo as input:

We'll get the following objects (cars, bus, person, umbrella) detected instead:

Table of Contents for Using a pre-trained YOLO model for object detection

Create new playlist

Sign In

Sign Up

Table of Contents for
Using a pre-trained YOLO model for object detection