ANNs in OpenCV

Unsurprisingly, ANNs reside in the ml module of OpenCV.

Let's examine a dummy example, as a gentle introduction to ANNs:

import cv2
import numpy as np

ann = cv2.ml.ANN_MLP_create()
ann.setLayerSizes(np.array([9, 5, 9], dtype=np.uint8))
ann.setTrainMethod(cv2.ml.ANN_MLP_BACKPROP)

ann.train(np.array([[1.2, 1.3, 1.9, 2.2, 2.3, 2.9, 3.0, 3.2, 3.3]], dtype=np.float32),
  cv2.ml.ROW_SAMPLE,
  np.array([[0, 0, 0, 0, 0, 1, 0, 0, 0]], dtype=np.float32))

print ann.predict(np.array([[1.4, 1.5, 1.2, 2., 2.5, 2.8, 3., 3.1, 3.8]], dtype=np.float32))

First, we create an ANN:

ann = cv2.ml.ANN_MLP_create()

You may wonder about the MLP acronym in the function name; it stands for multilayer perceptron. By now, you should know what a perceptron is.

After creating the network, we need to set its topology:

ann.setLayerSizes(np.array([9, 5, 9], dtype=np.uint8))
ann.setTrainMethod(cv2.ml.ANN_MLP_BACKPROP)

The layer sizes are defined by the NumPy array that is passed into the setLayerSizes method. The first element sets the size of the input layer, the last element sets the size of the output layer, and all intermediary elements define the size of the hidden layers.

We then set the train method to be backpropagation. There are two choices here: BACKPROP and RPROP.

Both BACKPROP and RPROP are backpropagation algorithms—in simple terms, algorithms that have an effect on weights based on errors in classification.

These two algorithms work in the context of supervised learning, which is what we are using in the example. How can we tell this particular detail? Look at the next statement:

ann.train(np.array([[1.2, 1.3, 1.9, 2.2, 2.3, 2.9, 3.0, 3.2, 3.3]], dtype=np.float32),
  cv2.ml.ROW_SAMPLE,
  np.array([[0, 0, 0, 0, 0, 1, 0, 0, 0]], dtype=np.float32))

You should notice a number of details. The method looks extremely similar to the train() method of support vector machine. The method contains three parameters: samples, layout, and responses. Only samples is the required parameter; the other two are optional.

This reveals the following information:

  • First, ANN, like SVM, is an OpenCV StatModel (statistical model); train and predict are the methods inherited from the base StatModel class.
  • Second, a statistical model trained with only samples is adopting an unsupervised learning algorithm. If we provide layout and responses, we're in a supervised learning context.

As we're using ANNs, we can specify the type of back propagation algorithm we're going to use (BACKPROP or RPROP), because—as we said—we're in a supervised learning environment.

So what is back propagation? Back propagation is a two-phase algorithm that calculates the error of predictions and updates in both directions of the network (the input and output layers); it then updates the neuron weights accordingly.

Let's train the ANN; as we specified an input layer of size 9, we need to provide 9 inputs, and 9 outputs to reflect the size of the output layer:

ann.train(np.array([[1.2, 1.3, 1.9, 2.2, 2.3, 2.9, 3.0, 3.2, 3.3]], dtype=np.float32),
  cv2.ml.ROW_SAMPLE,
    np.array([[0, 0, 0, 0, 0, 1, 0, 0, 0]], dtype=np.float32))

The structure of the response is simply an array of zeros, with a 1 value in the position indicating the class we want to associate the input with. In our preceding example, we indicated that the specified input array corresponds to class 5 (classes are zero-indexed) of classes 0 to 8.

Lastly, we perform classification:

print ann.predict(np.array([[1.4, 1.5, 1.2, 2., 2.5, 2.8, 3., 3.1, 3.8]], dtype=np.float32))

This will yield the following result:

(5.0, array([[-0.06419383, -0.13360272, -0.1681568 , -0.18708915,  0.0970564 ,
  0.89237726,  0.05093023,  0.17537238,  0.13388439]], dtype=float32))

This means that the provided input was classified as belonging to class 5. This is only a dummy example and the classification is pretty meaningless; however, the network behaved correctly. In this code, we only provided one training record for class 5, so the network classified a new input as belonging to class 5.

As you may have guessed, the output of a prediction is a tuple, with the first value being the class and the second being an array containing the probabilities for each class. The predicted class will have the highest value. Let's move on to a slightly more useful example: animal classification.

ANN-imal classification

Picking up from where we left off, let's illustrate a very simple example of an ANN that attempts to classify animals based on their statistics (weight, length, and teeth). My intent is to describe a mock real-life scenario to improve our understanding of ANNs before we start applying it to computer vision and, specifically, OpenCV:

import cv2
import numpy as np
from random import randint

animals_net = cv2.ml.ANN_MLP_create()
animals_net.setTrainMethod(cv2.ml.ANN_MLP_RPROP | cv2.ml.ANN_MLP_UPDATE_WEIGHTS)
animals_net.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)
animals_net.setLayerSizes(np.array([3, 8, 4]))
animals_net.setTermCriteria(( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 ))

"""Input arrays
weight, length, teeth
"""

"""Output arrays
dog, eagle, dolphin and dragon
"""

def dog_sample():
  return [randint(5, 20), 1, randint(38, 42)]

def dog_class():
  return [1, 0, 0, 0]

def condor_sample():
  return [randint(3,13), 3, 0]

def condor_class():
  return [0, 1, 0, 0]

def dolphin_sample():
  return [randint(30, 190), randint(5, 15), randint(80, 100)]

def dolphin_class():
  return [0, 0, 1, 0]

def dragon_sample():
  return [randint(1200, 1800), randint(15, 40), randint(110, 180)]

def dragon_class():
  return [0, 0, 0, 1]

SAMPLES = 5000
for x in range(0, SAMPLES):
  print "Samples %d/%d" % (x, SAMPLES)
  animals_net.train(np.array([dog_sample()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([dog_class()], dtype=np.float32))
  animals_net.train(np.array([condor_sample()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([condor_class()], dtype=np.float32))
  animals_net.train(np.array([dolphin_sample()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([dolphin_class()], dtype=np.float32))
  animals_net.train(np.array([dragon_sample()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([dragon_class()], dtype=np.float32))

print animals_net.predict(np.array([dog_sample()], dtype=np.float32))
print animals_net.predict(np.array([condor_sample()], dtype=np.float32))
print animals_net.predict(np.array([dragon_sample()], dtype=np.float32))

There are a good few differences between this example and the dummy example, so let's examine them in order.

First, the usual imports. Then, we import randint, just because we want to generate some relatively random data:

import cv2
import numpy as np
from random import randint

Then, we create the ANN. This time, we specify the train method to be resilient back propagation (an improved version of back propagation) and the activation function to be a sigmoid function:

animals_net = cv2.ml.ANN_MLP_create()
animals_net.setTrainMethod(cv2.ml.ANN_MLP_RPROP | cv2.ml.ANN_MLP_UPDATE_WEIGHTS)
animals_net.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)
animals_net.setLayerSizes(np.array([3, 8, 4]))

Also, we specify the termination criteria similarly to the way we did in the CAMShift algorithm in the previous chapter:

animals_net.setTermCriteria(( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 ))

Now we need some data. We're not really so much interested in representing animals accurately, as requiring a bunch of records to be used as training data. So we basically define four sample creation functions and four classification functions that will help us train the network:

"""Input arrays
weight, length, teeth
"""

"""Output arrays
dog, eagle, dolphin and dragon
"""

def dog_sample():
  return [randint(5, 20), 1, randint(38, 42)]

def dog_class():
  return [1, 0, 0, 0]

def condor_sample():
  return [randint(3,13), 3, 0]

def condor_class():
  return [0, 1, 0, 0]

def dolphin_sample():
  return [randint(30, 190), randint(5, 15), randint(80, 100)]

def dolphin_class():
  return [0, 0, 1, 0]

def dragon_sample():
  return [randint(1200, 1800), randint(15, 40), randint(110, 180)]

def dragon_class():
  return [0, 0, 0, 1]

Let's proceed with the creation of our fake animal data; we'll create 5,000 samples per class:

SAMPLES = 5000
for x in range(0, SAMPLES):
  print "Samples %d/%d" % (x, SAMPLES)
  animals_net.train(np.array([dog_sample()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([dog_class()], dtype=np.float32))
animals_net.train(np.array([condor_sample()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([condor_class()], dtype=np.float32))
  animals_net.train(np.array([dolphin_sample()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([dolphin_class()], dtype=np.float32))
        animals_net.train(np.array([dragon_sample()], dtype=np.float32),        cv2.ml.ROW_SAMPLE, np.array([dragon_class()], dtype=np.float32))

In the end, we print the results that yield the following code:

(1.0, array([[ 1.49817729,  1.60551953, -1.56444871, -0.04313202]], dtype=float32))
(1.0, array([[ 1.49817729,  1.60551953, -1.56444871, -0.04313202]], dtype=float32))
(3.0, array([[-1.54576635, -1.68725526,  1.6469276 ,  2.23223686]], dtype=float32))

From these results, we deduce the following:

  • The network got two out of three samples correct, which is not perfect but serves as a good example to illustrate the importance of all the elements involved in building and training an ANN. The size of the input layer is very important to create diversification between the different classes. In our case, we only had three statistics and there is a relative overlapping in features.
  • The size of the hidden layer needs to be tested. You will find that increasing neurons may improve accuracy to a point, and then it will overfit, unless you start compensating with enormous amounts of data: the number of training records. Definitely, avoid having too few records or feeding a lot of identical records as the ANN won't learn much from them.

Training epochs

Another important concept in training ANNs is the idea of epochs. A training epoch is an iteration through the training data, after which the data is tested for classification. Most ANNs train over several epochs; you'll find that some of the most common examples of ANNs, classifying handwritten digits, will have the training data iterated several hundred times.

I personally suggest you spend a lot of time playing with ANNs and the number of epochs, until you reach convergence, which means that further iterations will no longer improve (at least not noticeably) the accuracy of the results.

The preceding example can be modified as follows to leverage epochs:

def record(sample, classification):
  return (np.array([sample], dtype=np.float32), np.array([classification], dtype=np.float32))

records = []
RECORDS = 5000
for x in range(0, RECORDS):
  records.append(record(dog_sample(), dog_class()))
  records.append(record(condor_sample(), condor_class()))
  records.append(record(dolphin_sample(), dolphin_class()))
  records.append(record(dragon_sample(), dragon_class()))

EPOCHS = 5
for e in range(0, EPOCHS):
  print "Epoch %d:" % e
  for t, c in records:
    animals_net.train(t, cv2.ml.ROW_SAMPLE, c)

Then, do some tests, starting with the dog class:

dog_results = 0
for x in range(0, 100):
  clas = int(animals_net.predict(np.array([dog_sample()], dtype=np.float32))[0])
  print "class: %d" % clas
  if (clas) == 0:
    dog_results += 1

Repeat over all classes and output the results:

print "Dog accuracy: %f" % (dog_results)
print "condor accuracy: %f" % (condor_results)
print "dolphin accuracy: %f" % (dolphin_results)
print "dragon accuracy: %f" % (dragon_results)

Finally, we obtain the following results:

Dog accuracy: 100.000000%
condor accuracy: 0.000000%
dolphin accuracy: 0.000000%
dragon accuracy: 92.000000%

Consider the fact that we're only playing with toy/fake data and the size of training data/training iterations; this teaches us quite a lot. We can diagnose the ANN as overfitting towards certain classes, so it's important to improve the quality of the data you feed into the training process.

All that said, time for a real-life example: handwritten digit recognition.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.68