Handwritten digit recognition with ANNs

The world of Machine Learning is vast and mostly unexplored, and ANNs are but one of the many concepts related to Machine Learning, which is one of the many subdisciplines of Artificial Intelligence. For the purpose of this chapter, we will only be exploring the concept of ANNs in the context of OpenCV. It is by no means an exhaustive treatise on the subject of Artificial Intelligence.

Ultimately, we're interested in seeing ANNs work in the real world. So let's go ahead and make it happen.

MNIST – the handwritten digit database

One of the most popular resources on the Web for the training of classifiers dealing with OCR and handwritten character recognition is the MNIST database, publicly available at http://yann.lecun.com/exdb/mnist/.

This particular database is a freely available resource to kick-start the creation of a program that utilizes ANNs to recognize handwritten digits.

Customized training data

It is always possible to build your own training data. It will take a little bit of patience but it's fairly easy; collect a vast number of handwritten digits and create images containing a single digit, making sure all the images are the same size and in grayscale.

After this, you will have to create a mechanism that keeps a training sample in sync with the expected classification.

The initial parameters

Let's take a look at the individual layers in the network:

  • Input layer
  • Hidden layer
  • Output layer

The input layer

Since we're going to utilize the MNIST database, the input layer will have a size of 784 input nodes: that's because MNIST samples are 28x28 pixel images, which means 784 pixels.

The hidden layer

As we have seen, there's no hard-and-fast rule for the size of the hidden layer, I've found—through several attempts—that 50 to 60 nodes yields the best result while not necessitating an inordinate amount of training data.

You can increase the size of the hidden layer with the amount of data, but beyond a certain point, there will be no advantage to that; you will also have to be prepared for your network to take hours to train (the more hidden neurons, the longer it takes to train the network).

The output layer

The output layer will have a size of 10. This should not be a surprise as we want to classify 10 digits (0-9).

Training epochs

We will initially use the entire set of the train data from MNIST, which consists of over 60,000 handwritten images, half of which were written by US government employees, and the other half by high-school students. That's a lot of data, so we won't need more than one epoch to achieve an acceptably high accuracy on detection.

From there on, it is up to you to train the network iteratively on the same train data, and my suggestion is that you use an accuracy test, and find the epoch at which the accuracy "peaks". By doing so, you will have a precise measurement of the highest possible accuracy achieved by your network given its current configuration.

Other parameters

We will use a sigmoid activation function, Resilient Back Propagation (RPROP), and extend the termination criteria for each calculation to 20 iterations instead of 10, like we did for every other operation in this book that involved cv2.TermCriteria.

Note

Important notes on train data and ANNs libraries

Exploring the Internet for sources, I found an amazing article by Michael Nielsen at http://neuralnetworksanddeeplearning.com/chap1.html, which illustrates how to write an ANN library from scratch, and the code for this library is freely available on GitHub at https://github.com/mnielsen/neural-networks-and-deep-learning; this is the source code for a book, Neural Networks and Deep Learning, by Michael Nielsen.

In the data folder, you will find a pickle file, signifying data that has been saved to disk through the popular Python library, cPickle, which makes loading and saving the Python data a trivial task.

This pickle file is a cPickle library-serialized version of the MNIST data and, as it is so useful and ready to work with, I strongly suggest you use that. Nothing stops you from loading the MNIST dataset but the process of deserializing the training data is quite tedious and—strictly speaking—outside the remit of this book.

Second, I would like to point out that OpenCV is not the only Python library that allows you to use ANNs, not by any stretch of the imagination. The Web is full of alternatives that I strongly encourage you to try out, most notably PyBrain, a library called Lasagna (which—as an Italian—I find exceptionally attractive) and many custom-written implementations, such as the aforementioned Michael Nielsen's implementation.

Enough introductory details, though. Let's get going.

Mini-libraries

Setting up an ANN in OpenCV is not difficult, but you will almost definitely find yourself training your network countless times, in search of that elusive percentage point that boosts the accuracy of your results.

To automate this as much as possible, we will build a mini-library that wraps the OpenCV's native implementation of ANNs and lets us rerun and retrain the network easily.

Here's an example of a wrapper library:

import cv2
import cPickle
import numpy as np
import gzip

def load_data():
  mnist = gzip.open('./data/mnist.pkl.gz', 'rb')
  training_data, classification_data, test_data = cPickle.load(mnist)
  mnist.close()
  return (training_data, classification_data, test_data)

def wrap_data():
  tr_d, va_d, te_d = load_data()
  training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
  training_results = [vectorized_result(y) for y in tr_d[1]]
  training_data = zip(training_inputs, training_results)
  validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
  validation_data = zip(validation_inputs, va_d[1])
  test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
  test_data = zip(test_inputs, te_d[1])
  return (training_data, validation_data, test_data)

def vectorized_result(j):
  e = np.zeros((10, 1))
  e[j] = 1.0
  return e

def create_ANN(hidden = 20):
  ann = cv2.ml.ANN_MLP_create()
  ann.setLayerSizes(np.array([784, hidden, 10]))
  ann.setTrainMethod(cv2.ml.ANN_MLP_RPROP)
  ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)
  ann.setTermCriteria(( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 20, 1 ))
  return ann

def train(ann, samples = 10000, epochs = 1):
  tr, val, test = wrap_data()
  
  
  for x in xrange(epochs):
    counter = 0
    for img in tr:
      
      if (counter > samples):
        break
      if (counter % 1000 == 0):
        print "Epoch %d: Trained %d/%d" % (x, counter, samples)
      counter += 1
      data, digit = img
      ann.train(np.array([data.ravel()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([digit.ravel()], dtype=np.float32))
    print "Epoch %d complete" % x
  return ann, test
  
def test(ann, test_data):
  sample = np.array(test_data[0][0].ravel(), dtype=np.float32).reshape(28, 28)
  cv2.imshow("sample", sample)
  cv2.waitKey()
  print ann.predict(np.array([test_data[0][0].ravel()], dtype=np.float32))

def predict(ann, sample):
  resized = sample.copy()
  rows, cols = resized.shape
  if (rows != 28 or cols != 28) and rows * cols > 0:
    resized = cv2.resize(resized, (28, 28), interpolation = cv2.INTER_CUBIC)
  return ann.predict(np.array([resized.ravel()], dtype=np.float32))

Let's examine it in order. First, the load_data, wrap_data, and vectorized_result functions are included in Michael Nielsen's code for loading the pickle file.

It's a relatively straightforward loading of a pickle file. Most notably, though, the loaded data has been split into the train and test data. Both train and test data are arrays containing two-element tuples: the first one is the data itself; the second one is the expected classification. So we can use the train data to train the ANN and the test data to evaluate its accuracy.

The vectorized_result function is a very clever function that—given an expected classification—creates a 10-element array of zeros, setting a single 1 for the expected result. This 10-element array, you may have guessed, will be used as a classification for the output layer.

The first ANN-related function is create_ANN:

def create_ANN(hidden = 20):
  ann = cv2.ml.ANN_MLP_create()
  ann.setLayerSizes(np.array([784, hidden, 10]))
  ann.setTrainMethod(cv2.ml.ANN_MLP_RPROP)
  ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)
  ann.setTermCriteria(( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 20, 1 ))
  return ann

This function creates an ANN specifically geared towards handwritten digit recognition with MNIST, by specifying layer sizes as illustrated in the Initial parameters section.

We now need a training function:

def train(ann, samples = 10000, epochs = 1):
  tr, val, test = wrap_data()
  
  
  for x in xrange(epochs):
    counter = 0
    for img in tr:
      
      if (counter > samples):
        break
      if (counter % 1000 == 0):
        print "Epoch %d: Trained %d/%d" % (x, counter, samples)
      counter += 1
      data, digit = img
      ann.train(np.array([data.ravel()], dtype=np.float32), cv2.ml.ROW_SAMPLE, np.array([digit.ravel()], dtype=np.float32))
    print "Epoch %d complete" % x
  return ann, test

Again, this is quite simple: given a number of samples and training epochs, we load the data, and then iterate through the samples an x-number-of-epochs times.

The important section of this function is the deconstruction of the single training record into the train data and an expected classification, which is then passed into the ANN.

To do so, we utilize the numpy array function, ravel(), which takes an array of any shape and "flattens" it into a single-row array. So, for example, consider this array:

data = [[ 1, 2, 3], [4, 5, 6], [7, 8, 9]]

The preceding array once "raveled", becomes the following array:

      [1, 2, 3, 4, 5, 6, 7, 8, 9] 

This is the format that OpenCV's ANN expects data to look like in its train() method.

Finally, we return both the network and test data. We could have just returned the data, but having the test data at hand for accuracy checking is quite useful.

The last function we need is a predict() function to wrap ANN's own predict() method:

def predict(ann, sample):
  resized = sample.copy()
  rows, cols = resized.shape
  if (rows != 28 or cols != 28) and rows * cols > 0:
    resized = cv2.resize(resized, (28, 28), interpolation = cv2.INTER_CUBIC)
  return ann.predict(np.array([resized.ravel()], dtype=np.float32))

This function takes an ANN and a sample image; it operates a minimum of "sanitization" by making sure the shape of the data is as expected and resizing it if it's not, and then raveling it for a successful prediction.

The file I created also contains a test function to verify that the network works and it displays the sample provided for classification.

The main file

This whole chapter has been an introductory journey leading us to this point. In fact, many of the techniques we're going to use are from previous chapters, so in a way the entire book has led us to this point. So let's put all our knowledge to good use.

Let's take an initial look at the file, and then decompose it for a better understanding:

import cv2
import numpy as np
import digits_ann as ANN

def inside(r1, r2):
  x1,y1,w1,h1 = r1
  x2,y2,w2,h2 = r2
  if (x1 > x2) and (y1 > y2) and (x1+w1 < x2+w2) and (y1+h1 < y2 + h2):
    return True
  else:
    return False

def wrap_digit(rect):
  x, y, w, h = rect
  padding = 5
  hcenter = x + w/2
  vcenter = y + h/2
  if (h > w):
    w = h
    x = hcenter - (w/2)
  else:
    h = w
    y = vcenter - (h/2)
  return (x-padding, y-padding, w+padding, h+padding)

ann, test_data = ANN.train(ANN.create_ANN(56), 20000)
font = cv2.FONT_HERSHEY_SIMPLEX

path = "./images/numbers.jpg"
img = cv2.imread(path, cv2.IMREAD_UNCHANGED)
bw = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
bw = cv2.GaussianBlur(bw, (7,7), 0)
ret, thbw = cv2.threshold(bw, 127, 255, cv2.THRESH_BINARY_INV)
thbw = cv2.erode(thbw, np.ones((2,2), np.uint8), iterations = 2)
image, cntrs, hier = cv2.findContours(thbw.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

rectangles = []

for c in cntrs:
  r = x,y,w,h = cv2.boundingRect(c)
  a = cv2.contourArea(c)
  b = (img.shape[0]-3) * (img.shape[1] - 3)
  
  is_inside = False
  for q in rectangles:
    if inside(r, q):
      is_inside = True
      break
  if not is_inside:
    if not a == b:
      rectangles.append(r)

for r in rectangles:
  x,y,w,h = wrap_digit(r) 
  cv2.rectangle(img, (x,y), (x+w, y+h), (0, 255, 0), 2)
  roi = thbw[y:y+h, x:x+w]
  
  try:
    digit_class = int(ANN.predict(ann, roi.copy())[0])
  except:
    continue
  cv2.putText(img, "%d" % digit_class, (x, y-1), font, 1, (0, 255, 0))

cv2.imshow("thbw", thbw)
cv2.imshow("contours", img)
cv2.imwrite("sample.jpg", img)
cv2.waitKey()

After the initial usual imports, we import the mini-library we created, which is stored in digits_ann.py.

I find it good practice to define functions at the top of the file, so let's examine those. The inside() function determines whether a rectangle is entirely contained in another rectangle:

def inside(r1, r2):
  x1,y1,w1,h1 = r1
  x2,y2,w2,h2 = r2
  if (x1 > x2) and (y1 > y2) and (x1+w1 < x2+w2) and (y1+h1 < y2 + h2):
    return True
  else:
    return False

The wrap_digit() function takes a rectangle that surrounds a digit, turns it into a square, and centers it on the digit itself, with 5-point padding to make sure the digit is entirely contained in it:

def wrap_digit(rect):
  x, y, w, h = rect
  padding = 5
  hcenter = x + w/2
  vcenter = y + h/2
  if (h > w):
    w = h
    x = hcenter - (w/2)
  else:
    h = w
    y = vcenter - (h/2)
  return (x-padding, y-padding, w+padding, h+padding)

The point of this function will become clearer later on; let's not dwell on it too much at the moment.

Now, let's create the network. We will use 58 hidden nodes, and train over 20,000 samples:

ann, test_data = ANN.train(ANN.create_ANN(58), 20000)

This is good enough for a preliminary test to keep the training time down to a minute or two (depending on the processing power of your machine). The ideal is to use the full set of training data (50,000), and iterate through it several times, until some convergence is reached (as we discussed earlier, the accuracy "peak"). You would do this by calling the following function:

ann, test_data = ANN.train(ANN.create_ANN(100), 50000, 30)

We can now prepare the data to test. To do that, we're going to load an image, and clean up a little:

path = "./images/numbers.jpg"
img = cv2.imread(path, cv2.IMREAD_UNCHANGED)
bw = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
bw = cv2.GaussianBlur(bw, (7,7), 0)

Now that we have a grayscale smoothed image, we can apply a threshold and some morphology operations to make sure the numbers are properly standing out from the background and relatively cleaned up for irregularities, which might throw the prediction operation off:

ret, thbw = cv2.threshold(bw, 127, 255, cv2.THRESH_BINARY_INV)
thbw = cv2.erode(thbw, np.ones((2,2), np.uint8), iterations = 2)

Note

Note the threshold flag, which is for an inverse binary threshold: as the samples of the MNIST database are white on black (and not black on white), we turn the image into a black background with white numbers.

After the morphology operation, we need to identify and separate each number in the picture. To do this, we first identify the contours in the image:

image, cntrs, hier = cv2.findContours(thbw.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

Then, we iterate through the contours, and discard all the rectangles that are entirely contained in other rectangles; we only append to the list of good rectangles the ones that are not contained in other rectangles and are also not as wide as the image itself. In some of the tests, findContours yielded the entire image as a contour itself, which meant no other rectangle passed the inside test:

rectangles = []

for c in cntrs:
  r = x,y,w,h = cv2.boundingRect(c)
  a = cv2.contourArea(c)
  b = (img.shape[0]-3) * (img.shape[1] - 3)
  
  is_inside = False
  for q in rectangles:
    if inside(r, q):
      is_inside = True
      break
  if not is_inside:
    if not a == b:
      rectangles.append(r)

Now that we have a list of good rectangles, we can iterate through them and define a region of interest for each of the rectangles we identified:

for r in rectangles:
  x,y,w,h = wrap_digit(r) 

This is where the wrap_digit() function we defined at the beginning of the program comes into play: we need to pass a square region of interest to the predictor function; if we simply resized a rectangle into a square, we'd ruin our test data.

You may wonder why think of the number one. A rectangle surrounding the number one would be very narrow, especially if it has been drawn without too much of a lean to either side. If you simply resized it to a square, you would "fatten" the number one in such a way that nearly the entire square would turn black, rendering the prediction impossible. Instead, we want to create a square around the identified number, which is exactly what wrap_digit() does.

This approach is quick-and-dirty; it allows us to draw a square around a number and simultaneously pass that square as a region of interest for the prediction. A purer approach would be to take the original rectangle and "center" it into a square numpy array with rows and columns equal to the larger of the two dimensions of the original rectangle. The reason for this is you will notice that some of the square will include tiny bits of adjacent numbers, which can throw the prediction off. With a square created from a np.zeros() function, no impurities will be accidentally dragged in:

  cv2.rectangle(img, (x,y), (x+w, y+h), (0, 255, 0), 2)
  roi = thbw[y:y+h, x:x+w]
  
  try:
    digit_class = int(ANN.predict(ann, roi.copy())[0])
  except:
    continue
  cv2.putText(img, "%d" % digit_class, (x, y-1), font, 1, (0, 255, 0))

Once the prediction for the square region is complete, we draw it on the original image:

cv2.imshow("thbw", thbw)
cv2.imshow("contours", img)
cv2.imwrite("sample.jpg", img)
cv2.waitKey()

And that's it! The final result will look similar to this:

The main file
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.249.210