The world of Machine Learning is vast and mostly unexplored, and ANNs are but one of the many concepts related to Machine Learning, which is one of the many subdisciplines of Artificial Intelligence. For the purpose of this chapter, we will only be exploring the concept of ANNs in the context of OpenCV. It is by no means an exhaustive treatise on the subject of Artificial Intelligence.
Ultimately, we're interested in seeing ANNs work in the real world. So let's go ahead and make it happen.
One of the most popular resources on the Web for the training of classifiers dealing with OCR and handwritten character recognition is the MNIST database, publicly available at http://yann.lecun.com/exdb/mnist/.
This particular database is a freely available resource to kick-start the creation of a program that utilizes ANNs to recognize handwritten digits.
It is always possible to build your own training data. It will take a little bit of patience but it's fairly easy; collect a vast number of handwritten digits and create images containing a single digit, making sure all the images are the same size and in grayscale.
After this, you will have to create a mechanism that keeps a training sample in sync with the expected classification.
Let's take a look at the individual layers in the network:
Since we're going to utilize the MNIST database, the input layer will have a size of 784 input nodes: that's because MNIST samples are 28x28 pixel images, which means 784 pixels.
As we have seen, there's no hard-and-fast rule for the size of the hidden layer, I've found—through several attempts—that 50 to 60 nodes yields the best result while not necessitating an inordinate amount of training data.
You can increase the size of the hidden layer with the amount of data, but beyond a certain point, there will be no advantage to that; you will also have to be prepared for your network to take hours to train (the more hidden neurons, the longer it takes to train the network).
We will initially use the entire set of the train
data from MNIST, which consists of over 60,000 handwritten images, half of which were written by US government employees, and the other half by high-school students. That's a lot of data, so we won't need more than one epoch to achieve an acceptably high accuracy on detection.
From there on, it is up to you to train the network iteratively on the same train
data, and my suggestion is that you use an accuracy test, and find the epoch at which the accuracy "peaks". By doing so, you will have a precise measurement of the highest possible accuracy achieved by your network given its current configuration.
We will use a sigmoid activation function, Resilient Back Propagation (RPROP), and extend the termination criteria for each calculation to 20 iterations instead of 10, like we did for every other operation in this book that involved cv2.TermCriteria
.
Important notes on train data and ANNs libraries
Exploring the Internet for sources, I found an amazing article by Michael Nielsen at http://neuralnetworksanddeeplearning.com/chap1.html, which illustrates how to write an ANN library from scratch, and the code for this library is freely available on GitHub at https://github.com/mnielsen/neural-networks-and-deep-learning; this is the source code for a book, Neural Networks and Deep Learning, by Michael Nielsen.
In the data
folder, you will find a pickle
file, signifying data that has been saved to disk through the popular Python library, cPickle
, which makes loading and saving the Python data a trivial task.
This pickle file is a cPickle
library-serialized version of the MNIST data and, as it is so useful and ready to work with, I strongly suggest you use that. Nothing stops you from loading the MNIST dataset but the process of deserializing the training data is quite tedious and—strictly speaking—outside the remit of this book.
Second, I would like to point out that OpenCV is not the only Python library that allows you to use ANNs, not by any stretch of the imagination. The Web is full of alternatives that I strongly encourage you to try out, most notably PyBrain, a library called Lasagna (which—as an Italian—I find exceptionally attractive) and many custom-written implementations, such as the aforementioned Michael Nielsen's implementation.
Setting up an ANN in OpenCV is not difficult, but you will almost definitely find yourself training your network countless times, in search of that elusive percentage point that boosts the accuracy of your results.
To automate this as much as possible, we will build a mini-library that wraps the OpenCV's native implementation of ANNs and lets us rerun and retrain the network easily.
Here's an example of a wrapper library:
import cv2 import cPickle import numpy as np import gzip def load_data(): mnist = gzip.open('./data/mnist.pkl.gz', 'rb') training_data, classification_data, test_data = cPickle.load(mnist) mnist.close() return (training_data, classification_data, test_data) def wrap_data(): tr_d, va_d, te_d = load_data() training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]] training_results = [vectorized_result(y) for y in tr_d[1]] training_data = zip(training_inputs, training_results) validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]] validation_data = zip(validation_inputs, va_d[1]) test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]] test_data = zip(test_inputs, te_d[1]) return (training_data, validation_data, test_data) def vectorized_result(j): e = np.zeros((10, 1)) e[j] = 1.0 return e def create_ANN(hidden = 20): ann = cv2.ml.ANN_MLP_create() ann.setLayerSizes(np.array([784, hidden, 10])) ann.setTrainMethod(cv2.ml.ANN_MLP_RPROP) ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM) ann.setTermCriteria(( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 20, 1 )) return ann def train(ann, samples = 10000, epochs = 1): tr, val, test = wrap_data() for x in xrange(epochs): counter = 0 for img in tr: if (counter > samples): break if (counter % 1000 == 0): print "Epoch %d: Trained %d/%d" % (x, counter, samples) counter += 1 data, digit = img ann.train(np.array([data.ravel()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([digit.ravel()], dtype=np.float32)) print "Epoch %d complete" % x return ann, test def test(ann, test_data): sample = np.array(test_data[0][0].ravel(), dtype=np.float32).reshape(28, 28) cv2.imshow("sample", sample) cv2.waitKey() print ann.predict(np.array([test_data[0][0].ravel()], dtype=np.float32)) def predict(ann, sample): resized = sample.copy() rows, cols = resized.shape if (rows != 28 or cols != 28) and rows * cols > 0: resized = cv2.resize(resized, (28, 28), interpolation = cv2.INTER_CUBIC) return ann.predict(np.array([resized.ravel()], dtype=np.float32))
Let's examine it in order. First, the load_data
, wrap_data
, and vectorized_result
functions are included in Michael Nielsen's code for loading the pickle
file.
It's a relatively straightforward loading of a pickle
file. Most notably, though, the loaded data has been split into the train
and test
data. Both train
and test
data are arrays containing two-element tuples: the first one is the data itself; the second one is the expected classification. So we can use the train
data to train the ANN and the test
data to evaluate its accuracy.
The vectorized_result
function is a very clever function that—given an expected classification—creates a 10-element array of zeros, setting a single 1 for the expected result. This 10-element array, you may have guessed, will be used as a classification for the output layer.
The first ANN-related function is create_ANN
:
def create_ANN(hidden = 20): ann = cv2.ml.ANN_MLP_create() ann.setLayerSizes(np.array([784, hidden, 10])) ann.setTrainMethod(cv2.ml.ANN_MLP_RPROP) ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM) ann.setTermCriteria(( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 20, 1 )) return ann
This function creates an ANN specifically geared towards handwritten digit recognition with MNIST, by specifying layer sizes as illustrated in the Initial parameters section.
We now need a training function:
def train(ann, samples = 10000, epochs = 1): tr, val, test = wrap_data() for x in xrange(epochs): counter = 0 for img in tr: if (counter > samples): break if (counter % 1000 == 0): print "Epoch %d: Trained %d/%d" % (x, counter, samples) counter += 1 data, digit = img ann.train(np.array([data.ravel()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([digit.ravel()], dtype=np.float32)) print "Epoch %d complete" % x return ann, test
Again, this is quite simple: given a number of samples and training epochs, we load the data, and then iterate through the samples an x-number-of-epochs times.
The important section of this function is the deconstruction of the single training record into the train
data and an expected classification, which is then passed into the ANN.
To do so, we utilize the numpy
array function, ravel()
, which takes an array of any shape and "flattens" it into a single-row array. So, for example, consider this array:
data = [[ 1, 2, 3], [4, 5, 6], [7, 8, 9]]
The preceding array once "raveled", becomes the following array:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
This is the format that OpenCV's ANN expects data to look like in its train()
method.
Finally, we return both the network
and test
data. We could have just returned the data, but having the test
data at hand for accuracy checking is quite useful.
The last function we need is a predict()
function to wrap ANN's own predict()
method:
def predict(ann, sample): resized = sample.copy() rows, cols = resized.shape if (rows != 28 or cols != 28) and rows * cols > 0: resized = cv2.resize(resized, (28, 28), interpolation = cv2.INTER_CUBIC) return ann.predict(np.array([resized.ravel()], dtype=np.float32))
This function takes an ANN and a sample image; it operates a minimum of "sanitization" by making sure the shape of the data is as expected and resizing it if it's not, and then raveling it for a successful prediction.
The file I created also contains a test
function to verify that the network works and it displays the sample provided for classification.
This whole chapter has been an introductory journey leading us to this point. In fact, many of the techniques we're going to use are from previous chapters, so in a way the entire book has led us to this point. So let's put all our knowledge to good use.
Let's take an initial look at the file, and then decompose it for a better understanding:
import cv2 import numpy as np import digits_ann as ANN def inside(r1, r2): x1,y1,w1,h1 = r1 x2,y2,w2,h2 = r2 if (x1 > x2) and (y1 > y2) and (x1+w1 < x2+w2) and (y1+h1 < y2 + h2): return True else: return False def wrap_digit(rect): x, y, w, h = rect padding = 5 hcenter = x + w/2 vcenter = y + h/2 if (h > w): w = h x = hcenter - (w/2) else: h = w y = vcenter - (h/2) return (x-padding, y-padding, w+padding, h+padding) ann, test_data = ANN.train(ANN.create_ANN(56), 20000) font = cv2.FONT_HERSHEY_SIMPLEX path = "./images/numbers.jpg" img = cv2.imread(path, cv2.IMREAD_UNCHANGED) bw = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) bw = cv2.GaussianBlur(bw, (7,7), 0) ret, thbw = cv2.threshold(bw, 127, 255, cv2.THRESH_BINARY_INV) thbw = cv2.erode(thbw, np.ones((2,2), np.uint8), iterations = 2) image, cntrs, hier = cv2.findContours(thbw.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) rectangles = [] for c in cntrs: r = x,y,w,h = cv2.boundingRect(c) a = cv2.contourArea(c) b = (img.shape[0]-3) * (img.shape[1] - 3) is_inside = False for q in rectangles: if inside(r, q): is_inside = True break if not is_inside: if not a == b: rectangles.append(r) for r in rectangles: x,y,w,h = wrap_digit(r) cv2.rectangle(img, (x,y), (x+w, y+h), (0, 255, 0), 2) roi = thbw[y:y+h, x:x+w] try: digit_class = int(ANN.predict(ann, roi.copy())[0]) except: continue cv2.putText(img, "%d" % digit_class, (x, y-1), font, 1, (0, 255, 0)) cv2.imshow("thbw", thbw) cv2.imshow("contours", img) cv2.imwrite("sample.jpg", img) cv2.waitKey()
After the initial usual imports, we import the mini-library we created, which is stored in digits_ann.py
.
I find it good practice to define functions at the top of the file, so let's examine those. The inside()
function determines whether a rectangle is entirely contained in another rectangle:
def inside(r1, r2): x1,y1,w1,h1 = r1 x2,y2,w2,h2 = r2 if (x1 > x2) and (y1 > y2) and (x1+w1 < x2+w2) and (y1+h1 < y2 + h2): return True else: return False
The wrap_digit()
function takes a rectangle that surrounds a digit, turns it into a square, and centers it on the digit itself, with 5-point padding to make sure the digit is entirely contained in it:
def wrap_digit(rect): x, y, w, h = rect padding = 5 hcenter = x + w/2 vcenter = y + h/2 if (h > w): w = h x = hcenter - (w/2) else: h = w y = vcenter - (h/2) return (x-padding, y-padding, w+padding, h+padding)
The point of this function will become clearer later on; let's not dwell on it too much at the moment.
Now, let's create the network. We will use 58 hidden nodes, and train over 20,000 samples:
ann, test_data = ANN.train(ANN.create_ANN(58), 20000)
This is good enough for a preliminary test to keep the training time down to a minute or two (depending on the processing power of your machine). The ideal is to use the full set of training data (50,000), and iterate through it several times, until some convergence is reached (as we discussed earlier, the accuracy "peak"). You would do this by calling the following function:
ann, test_data = ANN.train(ANN.create_ANN(100), 50000, 30)
We can now prepare the data to test. To do that, we're going to load an image, and clean up a little:
path = "./images/numbers.jpg" img = cv2.imread(path, cv2.IMREAD_UNCHANGED) bw = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) bw = cv2.GaussianBlur(bw, (7,7), 0)
Now that we have a grayscale smoothed image, we can apply a threshold and some morphology operations to make sure the numbers are properly standing out from the background and relatively cleaned up for irregularities, which might throw the prediction operation off:
ret, thbw = cv2.threshold(bw, 127, 255, cv2.THRESH_BINARY_INV) thbw = cv2.erode(thbw, np.ones((2,2), np.uint8), iterations = 2)
After the morphology operation, we need to identify and separate each number in the picture. To do this, we first identify the contours in the image:
image, cntrs, hier = cv2.findContours(thbw.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
Then, we iterate through the contours, and discard all the rectangles that are entirely contained in other rectangles; we only append to the list of good rectangles the ones that are not contained in other rectangles and are also not as wide as the image itself. In some of the tests, findContours
yielded the entire image as a contour itself, which meant no other rectangle passed the inside
test:
rectangles = [] for c in cntrs: r = x,y,w,h = cv2.boundingRect(c) a = cv2.contourArea(c) b = (img.shape[0]-3) * (img.shape[1] - 3) is_inside = False for q in rectangles: if inside(r, q): is_inside = True break if not is_inside: if not a == b: rectangles.append(r)
Now that we have a list of good rectangles, we can iterate through them and define a region of interest for each of the rectangles we identified:
for r in rectangles: x,y,w,h = wrap_digit(r)
This is where the wrap_digit()
function we defined at the beginning of the program comes into play: we need to pass a square region of interest to the predictor function; if we simply resized a rectangle into a square, we'd ruin our test data.
You may wonder why think of the number one. A rectangle surrounding the number one would be very narrow, especially if it has been drawn without too much of a lean to either side. If you simply resized it to a square, you would "fatten" the number one in such a way that nearly the entire square would turn black, rendering the prediction impossible. Instead, we want to create a square around the identified number, which is exactly what wrap_digit()
does.
This approach is quick-and-dirty; it allows us to draw a square around a number and simultaneously pass that square as a region of interest for the prediction. A purer approach would be to take the original rectangle and "center" it into a square numpy
array with rows and columns equal to the larger of the two dimensions of the original rectangle. The reason for this is you will notice that some of the square will include tiny bits of adjacent numbers, which can throw the prediction off. With a square created from a np.zeros()
function, no impurities will be accidentally dragged in:
cv2.rectangle(img, (x,y), (x+w, y+h), (0, 255, 0), 2) roi = thbw[y:y+h, x:x+w] try: digit_class = int(ANN.predict(ann, roi.copy())[0]) except: continue cv2.putText(img, "%d" % digit_class, (x, y-1), font, 1, (0, 255, 0))
Once the prediction for the square region is complete, we draw it on the original image:
cv2.imshow("thbw", thbw) cv2.imshow("contours", img) cv2.imwrite("sample.jpg", img) cv2.waitKey()
18.219.249.210