Classifying images using SVM and HOG

Histogram of Oriented Gradients (HOG) is an algorithm that can be used to describe an image using a vector of floating-point descriptors that correspond to the oriented gradient values extracted from that image. The HOG algorithm is very popular and certainly worth reading about in detail to understand how it is implemented in OpenCV, but, for the purposes of this book and especially this section, we'll just mention that the number of the floating-point descriptors will always be the same when they are extracted from images that have exactly the same size with the same HOG parameters. To better understand this, recall that descriptors extracted from an image using the feature detection algorithms we learned about in the previous chapter can have different numbers of elements in them. The HOG algorithm, though, will always produce a vector of the same length if the parameters are unchanged across a set of images of the same size.

This makes the HOG algorithm ideal for being used in conjunction with SVM, to train a model that can be used to classify images. Let's see how it's done with an example. Imagine we have a set of images that contain images of a traffic sign in one folder, and anything but that specific traffic sign in another folder. The following pictures depicts the images in our samples dataset, separated by a black line in between:

Using images similar to the preceding samples, we're going to train the SVM model to detect whether an image is the traffic sign we're looking for or not. Let's start:

Create an HOGDescriptor object. HOGDescriptor, or the HOG algorithm, is a special type of descriptor algorithm that relies on a given window size, block size, and various other parameters; for the sake of simplicity, we'll avoid all but the window size. The HOG algorithm's window size in our example is 128 by 128 pixels, which is set as seen here:

HOGDescriptor hog; 
hog.winSize = Size(128, 128);

Sample images should have the same size as the window size, otherwise we need to use the resize function to make sure they are resized to the HOG window size later on. This guarantees the same descriptor size every time the HOG algorithm is used.

As we just mentioned, the vector length of the descriptor extracted using HOGDescriptor will be constant if the image size is constant, and, assuming that image has the same size as winSize, you can get the descriptor length using the following code:

vector<float> tempDesc; 
hog.compute(Mat(hog.winSize, CV_8UC3), 
            tempDesc); 
int descriptorSize = tempDesc.size();

We'll use descriptorSize later on when we read the sample images.

Assuming the images of the traffic sign are inside a folder called pos (for positive) and the rest inside a folder called neg (for negative), we can use the glob function to get the list of image files in those folders, as seen here:

vector<String> posFiles; 
glob("/pos", posFiles); 
 
vector<String> negFiles; 
glob("/neg", negFiles);

Create buffers to store the HOG descriptors for negative and positive sample images (from pos and neg folders). We also need an additional buffer for the labels (or responses), as seen in the following example:

int scount = posFiles.size() + negFiles.size(); 
 
Mat samples(scount, 
            descriptorSize, 
            CV_32F); 
 
Mat responses(scount, 
              1, 
              CV_32S);

We need to use the HOGDescriptor class to extract the HOG descriptors from positive images and store them in samples, as seen here:

for(int i=0; i<posFiles.size(); i++) 
{ 
    Mat image = imread(posFiles.at(i)); 
    if(image.empty()) 
        continue; 
    vector<float> descriptors; 
    if((image.cols != hog.winSize.width) 
            || 
            (image.rows != hog.winSize.height)) 
    { 
        resize(image, image, hog.winSize); 
    } 
    hog.compute(image, descriptors); 
    Mat(1, descriptorSize, CV_32F, descriptors.data()) 
            .copyTo(samples.row(i)); 
    responses.at<int>(i) = +1; // positive 
}

It needs to be noted that we have added +1 for the labels (responses) of the positive samples. We'll need to use a different number, such as -1, when we label the negative samples.

After the positive samples, we add the negative samples and their responses to the designated buffers:

for(int i=0; i<negFiles.size(); i++) 
{ 
    Mat image = imread(negFiles.at(i)); 
    if(image.empty()) 
        continue; 
    vector<float> descriptors; 
    if((image.cols != hog.winSize.width) 
            || 
            (image.rows != hog.winSize.height)) 
    { 
        resize(image, image, hog.winSize); 
    } 
    hog.compute(image, descriptors); 
    Mat(1, descriptorSize, CV_32F, descriptors.data()) 
            .copyTo(samples.row(i + posFiles.size())); 
    responses.at<int>(i + posFiles.size()) = -1;
}

Similar to the example from the previous section, we need to form a TrainData object using samples and responses to be used with the train function. Here's how it's done:

Ptr<TrainData> tdata = TrainData::create(samples, 
                                         ROW_SAMPLE, 
                                         responses);

Now, we need to train the SVM model as seen in the following example code:

Ptr<SVM> svm = SVM::create(); 
svm->setType(SVM::C_SVC); 
svm->setKernel(SVM::LINEAR); 
svm->setTermCriteria( 
            TermCriteria(TermCriteria::MAX_ITER + 
                         TermCriteria::EPS, 
                         10000, 
                         1e-6)); 
 
svm->train(tdata);

After the training is completed, the SVM model is ready to be used for classifying images with the same size as the HOG window size (in this case, 128 by 128 pixels) using the predict method of the SVM class. Here is how:

Mat image = imread("image.jpg"); 
 
if((image.cols != hog.winSize.width) 
        || 
        (image.rows != hog.winSize.height)) 
{ 
    resize(image, image, hog.winSize); 
} 
 
vector<float> descs; 
hog.compute(image, descs); 
int result = svm->predict(descs); 
if(result == +1) 
{ 
    cout << "Image contains a traffic sign." << endl; 
} 
else if(result == -1) 
{ 
    cout << "Image does not contain a traffic sign." << endl; 
}

In the preceding code, we simply read an image and resize it to the HOG window size. Then we use the compute method of the HOGDescriptor class, just like when we were training the model. Except, this time, we use the predict method to find the label of this new image. If the result equals +1, which was the label we assigned for traffic sign images when we trained the SVM model, then we know that the image is the image of a traffic sign, otherwise it's not.

The accuracy of the result completely depends on the quantity and quality of the data you have used to train your SVM model. This, in fact, is the case for each and every machine learning algorithm. The more you train your model, the more accurate it becomes.

This method of classification assumes that the input image is of the same characteristics as the trained images. Meaning, if the image contains a traffic sign, it is cropped similarly to the images we used to train the model. For instance, if you use an image that contains the traffic sign image we're looking for, but also contain much more, then the result will probably be incorrect.

As the amount of data in your training set increases, it will take more time to train your model. So, it's important to avoid retraining your model every time you want to use it. The SVM class allows you to save and load SVM models using the save and load methods. Here is how you can save a trained SVM model for later use and to avoid retraining it:

svm->save("trained_svm_model.xml");

The file will be saved using the provided filename and extension (XML or any other file type supported by OpenCV). Later, using the static load function, you can create an SVM object that contains the exact parameters and trained model. Here's an example:

Ptr<SVM> svm = SVM::load("trained_svm_model.xml ");

Try using the SVM class along with HOGDescriptor to train models that can detect and classify more types using images of various objects stored in different folders.

Table of Contents for Classifying images using SVM and HOG

Create new playlist

Sign In

Sign Up

Table of Contents for
Classifying images using SVM and HOG