How to do it...

Import the following packages:

import numpy as np 
import neurolab as nl

Read the input file:

in_file = 'words.data'

Consider 20 data points to build the neural network based system:

# Number of datapoints to load from the input file 
num_of_datapoints = 20

Represent the distinct characters:

original_labels = 'omandig' 
# Number of distinct characters 
num_of_charect = len(original_labels)

Use 90% of data for training the neural network and the remaining 10% for testing:

train_param = int(0.9 * num_of_datapoints) 
test_param = num_of_datapoints - train_param

Define the dataset extraction parameters:

s_index = 6 
e_index = -1

Build the dataset:

information = [] 
labels = [] 
with open(in_file, 'r') as f: 
  for line in f.readlines(): 
    # Split the line tabwise 
    list_of_values = line.split('t')

Implement an error check to confirm the characters:

    if list_of_values[1] not in original_labels: 
      continue

Extract the label and attach it to the main list:

    label = np.zeros((num_of_charect , 1)) 
    label[original_labels.index(list_of_values[1])] = 1 
    labels.append(label)

Extract the character and add it to the main list:

    extract_char = np.array([float(x) for x in     list_of_values[s_index:e_index]]) 
    information.append(extract_char)

Exit the loop once the required dataset has been loaded:

    if len(information) >= num_of_datapoints: 
      break

Convert information and labels to NumPy arrays:

information = np.array(information) 
labels = np.array(labels).reshape(num_of_datapoints, num_of_charect)

Extract the number of dimensions:

num_dimension = len(information[0])

Create and train the neural network:

neural_net = nl.net.newff([[0, 1] for _ in range(len(information[0]))], [128, 16, num_of_charect]) 
neural_net.trainf = nl.train.train_gd 
error = neural_net.train(information[:train_param,:], labels[:train_param,:], epochs=10000, show=100, goal=0.01)

Predict the output for the test input:

p_output = neural_net.sim(information[train_param:, :]) 
print "nTesting on unknown data:" 
  for i in range(test_param): 
    print "nOriginal:", original_labels[np.argmax(labels[i])] 
    print "Predicted:", original_labels[np.argmax(p_output[i])]

The result obtained when optical_character_recognition.py is executed is shown in the following screenshot:

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...