Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Deep neural networks

The neural networks we used in Chapter 8, Beating CAPTCHAs with Neural Networks, have some fantastic theoretical properties. For example, only a single hidden layer is needed to learn any mapping (although the size of the middle layer may need to be very, very big). Neural networks were a very active area of research in the 1970s and 1980s, and then these networks were no longer used, particularly compared to other classification algorithms such as support vector machines. One of the main issues was that the computational power needed to run many neural networks was more than other algorithms and more than what many people had access to.

Another issue was training the networks. While the back propagation algorithm has been known about for some time, it has issues with larger networks, requiring a very large amount of training before the weights settle.

Each of these issues has been addressed in recent times, leading to a resurgence in popularity of neural networks. Computational power is now much more easily available than 30 years ago, and advances in algorithms for training mean that we can now readily use that power.

Intuition

The aspect that differentiates deep neural networks from the more basic neural network we saw in Chapter 8, Beating CAPTCHAs with Neural Networks, is size. A neural network is considered deep when it has two or more hidden layers. In practice, a deep neural network is often much larger, both in the number of nodes in each layer and also the number of layers. While some of the research of the mid-2000s focused on very large numbers of layers, smarter algorithms are reducing the actual number of layers needed.

A neural network basically takes very basic features as inputs—in the case of computer vision, it is simple pixel values. Then, as that data is combined and pushed through the network, these basic features combine into more complex features. Sometimes, these features have little meaning to humans, but they represent the aspects of the sample that the computer looks for to make its classification.

Implementation

Implementing these deep neural networks can be quite challenging due to their size. A bad implementation will take significantly longer to run than a good one, and may not even run at all due to memory usage.

A basic implementation of a neural network might start by creating a node class and collecting a set of these into a layer class. Each node is then connected to a node in the next layer using an instance of an Edge class. This type of implementation, a class-based one, is good to show how networks work, but is too inefficient for larger networks.

Neural networks are, at their core, simply mathematical expressions on matrices. The weights of the connections between one network and the next can be represented as a matrix of values, where the rows represent nodes in the first layer and the columns represent the nodes in the second layer (the transpose of this matrix is used sometimes too). The value is the weight of the edge between one layer and the next. A network can then be defined as a set of these weight matrices. In addition to the nodes, we add a bias term to each layer, which is basically a node that is always on and connected to each neuron in the next layer.

This insight allows us to use mathematical operations to build, train, and use neural networks, as opposed to creating a class-based implementation. These mathematical operations are great, as many great libraries of highly optimized code have been written that we can use to perform these computations as efficiently as we can.

The PyBrain library that we used in Chapter 8, Beating CAPTCHAs with Neural Networks, does contain a simple convolutional layer for a neural network. However, it doesn't offer us some of the features that we need for this application. For larger and more customized networks, though, we need a library that gives us a bit more power. For this reason, we will be using the Lasagne and nolearn libraries. This library works on the Theano library, which is a useful tool for mathematical expressions.

In this chapter, we will start by implementing a basic neural network with Lasagne to introduce the concepts. We will then use nolearn to replicate our experiment in Chapter 8, Beating CAPTCHAs with Neural Networks, on predicting which letter is in an image. Finally, we will use a much more complex convolution neural network to perform image classification on the CIFAR dataset, which will also include running this on GPUs rather than CPUs to improve the performance.

An introduction to Theano

Theano is a library that allows you to build mathematical expressions and run them. While this may immediately not seem that different to what we normally do to write a program, in Theano, we define the function we want to perform, not the way in which it is computed. This allows Theano to optimize the evaluation of the expression and also to perform lazy computation—expressions are only actually computed when they are needed, not when they are defined.

Many programmers don't use this type of programming day-to-day, but most of them interact with a related system that does. Relational databases, specifically SQL-based ones, use this concept called the declarative paradigm. While a programmer might define a SELECT query on a database with a WHERE clause, the database interprets that and creates an optimized query based on a number of factors, such as whether the WHERE clause is on a primary key, the format the data is stored in, and other factors. The programmer defines what they want and the system determines how to do it.

Note

You can install Theano using pip: pip3 install Theano.

Using Theano, we can define many types of functions working on scalars, arrays, and matrices, as well as other mathematical expressions. For instance, we can create a function that computes the length of the hypotenuse of a right-angled triangle:

import theano
from theano import tensor as T

First, we define the two inputs, a and b. These are simple numerical values, so we define them as scalars:

a = T.dscalar()
b = T.dscalar()

Then, we define the output, c. This is an expression based on the values of a and b:

c = T.sqrt(a ** 2 + b ** 2)

Note that c isn't a function or a value here—it is simply an expression, given a and b. Note also that a and b don't have actual values—this is an algebraic expression, not an absolute one. In order to compute on this, we define a function:

f = theano.function([a,b], c)

This basically tells Theano to create a function that takes values for a and b as inputs, and returns c as an output, computed on the values given. For example, f(3, 4) returns 5.

While this simple example may not seem much more powerful than what we can already do with Python, we can now use our function or our mathematical expression c in other parts of code and the remaining mappings. In addition, while we defined c before the function was defined, no actual computation was done until we called the function.

An introduction to Lasagne

Theano isn't a library to build neural networks. In a similar way, NumPy isn't a library to perform machine learning; it just does the heavy lifting and is generally used from another library. Lasagne is such a library, designed specifically around building neural networks, using Theano to perform the computation.

Lasagne implements a number of modern types of neural network layers, and the building blocks for building them.

These include the following:

Network-in-network layers: These are small neural networks that are easier to interpret than traditional neural network layers.
Dropout layers: These randomly drop units during training, preventing overfitting, which is a major problem in neural networks.
Noise layers: These introduce noise into the neurons; again, addressing the overfitting problem.

In this chapter, we will use convolution layers (layers that are organized to mimic the way in which human vision works). They use small collections of connected neurons that analyze only a segment of the input values (in this case, an image). This allows the network to deal with standard alterations such as dealing with translations of images. In the case of vision-based experiments, an example of an alteration dealt with by convolution layers is translating the image.

In contrast, a traditional neural network is often heavily connected—all neurons from one layer connect to all neurons in the next layer.

Convolutional networks are implemented in the lasagne.layers.Conv1DLayer and lasagne.layers.Conv2DLayer classes.

Note

At the time of writing, Lasagne hasn't had a formal release and is not on pip. You can install it from github. In a new folder, download the source code repository using the following:

git clone https://github.com/Lasagne/Lasagne.git

From within the created Lasagne folder, you can then install the library using the following:

sudo python3 setup.py install

See http://lasagne.readthedocs.org/en/latest/user/installation.html for installation instructions.

Neural networks use convolutional layers (generally, just Convolutional Neural Networks) and also the pooling layers, which take the maximum output for a certain region. This reduces noise caused by small variations in the image, and reduces (or down-samples) the amount of information. This has the added benefit of reducing the amount of work needed to be done in later layers.

Lasagne also implements these pooling layers—for example in the lasagne.layers.MaxPool2DLayer class. Together with the convolution layers, we have all the tools needed to build a convolution neural network.

Building a neural network in Lasagne is easier than building it using just Theano. To show the principles, we will implement a basic network to lean on the Iris dataset, which we saw in Chapter 1, Getting Started with Data Mining. The Iris dataset is great for testing new algorithms, even complex ones such as deep neural networks.

First, open a new IPython Notebook. We will come back to the Notebook we loaded the CIFAR dataset with, later in the chapter.

First, we load the dataset:

from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data.astype(np.float32)
y_true = iris.target.astype(np.int32)

Due to the way Lasagne works, we need to be a bit more explicit about the data types. This is why we converted the classes to int32 (they are stored as int64 in the original dataset).

We then split into training and testing datasets:

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y_true, random_state=14)

Next, we build our network by creating the different layers. Our dataset contains four input variables and three output classes. This gives us the size of the first and last layer, but not the layers in between. Playing around with this figure will give different results, and it is worth trailing different values to see what happens.

We start by creating an input layer, which has the same number of nodes as the dataset. We can specify a batch size (where the value is 10), which allows Lasagne to do some optimizations in training:

import lasagne
input_layer = lasagne.layers.InputLayer(shape=(10, X.shape[1]))

Next, we create our hidden layer. This layer takes its input from our input layer (specified as the first argument), which has 12 nodes, and uses the sigmoid nonlinearity, which we saw in Chapter 8, Beating CAPTCHAs with Neural Networks;

hidden_layer = lasagne.layers.DenseLayer(input_layer, num_units=12, nonlinearity=lasagne.nonlinearities.sigmoid)

Next, we have our output layer that takes its input from the hidden layer, which has three nodes (which is the same as the number of classes), and uses the softmax nonlinearity. Softmax is more typically used in the final layer of neural networks:

output_layer = lasagne.layers.DenseLayer(hidden_layer, num_units=3,
                                    nonlinearity=lasagne.nonlinearities.softmax)

In Lasagne's usage, this output layer is our network. When we enter a sample into it, it looks at this output layer and obtains the layer that is inputted into it (the first argument). This continues recursively until we reach an input layer, which applies the samples to itself, as it doesn't have an input layer to it. The activations of the neurons in the input layer are then fed into its calling layer (in our case, the hidden_layer), and that is then propagated up all the way to the output layer.

In order to train our network, we now need to define some training functions, which are Theano-based functions. In order to do this, we need to define a Theano expression and a function for the training. We start by creating variables for the input samples, the output given by the network, and the actual output:

import theano.tensor as T
net_input = T.matrix('net_input')
net_output = output_layer.get_output(net_input)
true_output = T.ivector('true_output')

We can now define our loss function, which tells the training function how to improve the network—it attempts to train the network to minimize the loss according to this function. The loss we will use is the categorical cross entropy, a metric on categorical data such as ours. This is a function of the output given by the network and the actual output we expected:

loss = T.mean(T.nnet.categorical_crossentropy(net_output, true_output))

Next, we define the function that will change the weights in our network. In order to do this, we obtain all of the parameters from the network and create a function (using a helper function given by Lasagne) that adjusts the weights to minimize our loss;

all_params = lasagne.layers.get_all_params(output_layer)
updates = lasagne.updates.sgd(loss, all_params, learning_rate=0.1)

Finally, we create Theano-based functions that perform this training and also obtain the output of the network for testing purposes:

import theano
train = theano.function([net_input, true_output], loss, updates=updates)
get_output = theano.function([net_input], net_output)

We can then call our train function, on our training data, to perform one iteration of training the network. This involves taking each sample, computing the predicted class of it, comparing those predictions to the expected classes, and updating the weights to minimize the loss function. We then perform this 1,000 times, incrementally training our network over those iterations:

for n in range(1000):
    train(X_train, y_train)

Next, we can evaluate by computing the F-score on the outputs. First, we obtain those outputs:

y_output = get_output(X_test)

Note

Note that get_output is a Theano function we obtained from our neural network, which is why we didn't need to add our network as a parameter to this line of code.

This result, y_output, is the activation of each of the neurons in the final output layer. The actual prediction itself is created by finding which neuron has the highest activation:

import numpy as np
y_pred = np.argmax(y_output, axis=1)

Now, y_pred is an array of class predictions, like we are used to in classification tasks. We can now compute the F-score using these predictions:

from sklearn.metrics import f1_score
print(f1_score(y_test, y_pred))

The result is impressively perfect—1.0! This means all the classifications were correct in the test data: a great result (although this is a simpler dataset).

As we can see, while it is possible to develop and train a network using just Lasagne, it can be a little awkward. To address this, we will be using nolearn, which is a package that further wraps this process in code that is conveniently convertible with the scikit-learn API.

Implementing neural networks with nolearn

The nolearn package provides wrappers for Lasagne. We lose some of the fine-tuning that can go with building a neural network by hand in Lasagne, but the code is much more readable and much easier to manage.

The nolearn package implements the normal sorts of complex neural networks you are likely to want to build. If you want more control than nolearn gives you, you can revert to using Lasagne, but at the cost of having to manage a bit more of the training and building process.

To get started with nolearn, we are going to reimplement the example we used in Chapter 8, Beating CAPTCHAs with Neural Networks, to predict which letter was represented in an image. We will recreate the dense neural network we used in Chapter 8, Beating CAPTCHAs with Neural Networks. To start with, we need to enter our dataset building code again in our notebook. For a description of what this code does, refer to Chapter 8, Beating CAPTCHAs with Neural Networks:

import numpy as np
from PIL import Image, ImageDraw, ImageFont
from skimage.transform import resize
from skimage import transform as tf
from skimage.measure import label, regionprops
from sklearn.utils import check_random_state
from sklearn.preprocessing import OneHotEncoder
from sklearn.cross_validation import train_test_split

def create_captcha(text, shear=0, size=(100, 24)):
    im = Image.new("L", size, "black")
    draw = ImageDraw.Draw(im)
    font = ImageFont.truetype(r"Coval.otf", 22)
    draw.text((2, 2), text, fill=1, font=font)
    image = np.array(im)
    affine_tf = tf.AffineTransform(shear=shear)
    image = tf.warp(image, affine_tf)
    return image / image.max()

def segment_image(image):
    labeled_image = label(image > 0)
    subimages = []
    for region in regionprops(labeled_image):
        start_x, start_y, end_x, end_y = region.bbox
        subimages.append(image[start_x:end_x,start_y:end_y])
    if len(subimages) == 0:
        return [image,]
    return subimages

random_state = check_random_state(14)
letters = list("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
shear_values = np.arange(0, 0.5, 0.05)

def generate_sample(random_state=None):
    random_state = check_random_state(random_state)
    letter = random_state.choice(letters)
    shear = random_state.choice(shear_values)
    return create_captcha(letter, shear=shear, size=(20, 20)), letters.index(letter)
dataset, targets = zip(*(generate_sample(random_state) for i in range(3000)))
dataset = np.array(dataset, dtype='float')
targets =  np.array(targets)

onehot = OneHotEncoder()
y = onehot.fit_transform(targets.reshape(targets.shape[0],1))
y = y.todense().astype(np.float32)

dataset = np.array([resize(segment_image(sample)[0], (20, 20)) for sample in dataset])
X = dataset.reshape((dataset.shape[0], dataset.shape[1] * dataset.shape[2]))
X = X / X.max()
X = X.astype(np.float32)

X_train, X_test, y_train, y_test = 
    train_test_split(X, y, train_size=0.9, random_state=14)

A neural network is a collection of layers. Implementing one in nolearn is a case of organizing what those layers will look like, much as it was with PyBrain. The neural network we used in Chapter 8, Beating CAPTCHAs with Neural Networks, used fully connected dense layers. These are implemented in nolearn, meaning we can replicate our basic network structure here. First, we create the layers consisting of an input layer, our dense hidden layer, and our dense output layer:

from lasagne import layers
layers=[
        ('input', layers.InputLayer),
        ('hidden', layers.DenseLayer),
        ('output', layers.DenseLayer),
        ]

We then import some requirements, which we will explain as we use them:

from lasagne import updates
from nolearn.lasagne import NeuralNet
from lasagne.nonlinearities import sigmoid, softmax

Next we define the neural network, which is represented as a scikit-learn-compatible estimator:

net1 = NeuralNet(layers=layers,

Note that we haven't closed off the parenthesis—this is deliberate. At this point, we enter the parameters for the neural network, starting with the size of each layer:

    input_shape=X.shape,
    hidden_num_units=100,
    output_num_units=26,

The parameters here match the layers. In other words, the input_shape parameter first finds the layer in our layers that has given the name input, working much the same way as setting parameters in pipelines.

Next, we define the nonlinearities. Again, we will use sigmoid for the hidden layer and softmax for the output layer:

    hidden_nonlinearity=sigmoid,
    output_nonlinearity=softmax,

Next, we will use bias nodes, which are nodes that are always turned on in the hidden layer. Bias nodes are important to train a network, as they allow for the activations of neurons to train more specifically to their problems. As an oversimplified example, if our prediction is always off by 4, we can add a bias of -4 to remove this bias. Our bias nodes allow for this, and the training of the weights dictates the amount of bias that is used.

The biases are given as a set of weights, meaning that it needs to be the same size as the layer the bias is attaching to:

    hidden_b=np.zeros((100,), dtype=np.float32),

Next, we define how the network will train. The nolearn package doesn't have the exact same training mechanism as we used in Chapter 8, Beating CAPTCHAs with Neural Networks, as it doesn't have a way to decay weights. However, it does have momentum, which we will use, along with a high learning rate and low momentum value:

    update=updates.momentum,
    update_learning_rate=0.9,
    update_momentum=0.1,

Next, we define the problem as a regression problem. This may seem odd, as we are performing a classification task. However, the outputs are real-valued, and optimizing them as a regression problem appears to do much better in training than trying to optimize on classification:

    regression=True,

Finally, we set the maximum number of epochs for training at 1,000, which is a good fit between good training and not taking a long time to train (for this dataset; other datasets may require more or less training):

    max_epochs=1000,

We can now close off the parenthesis for the neural network constructor;

Next, we train the network on our training dataset:

net1.fit(X_train, y_train)

Now we can evaluate the trained network. To do this, we get the output of our network and, as with the Iris example, we need to perform an argmax to get the actual classification by choosing the highest activation:

y_pred = net1.predict(X_test)
y_pred = y_pred.argmax(axis=1)
assert len(y_pred) == len(X_test)
if len(y_test.shape) > 1:
    y_test = y_test.argmax(axis=1)
print(f1_score(y_test, y_pred))

The results are equally impressive—another perfect score on my machine. However, your results may vary as the nolearn package has some randomness that can't be directly controlled at this stage.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Deep neural networks

Create new playlist

Sign In

Sign Up

Deep neural networks

Intuition

Implementation

An introduction to Theano

Note

An introduction to Lasagne

Note

Note

Implementing neural networks with nolearn

Table of Contents for
Deep neural networks