Image classification

In this section, we will discuss how to implement some of the neural network structures with the deeplearning4j library. Let's start.

Deeplearning4j

As we discussed in Chapter 2, Java Libraries and Platforms for Machine Learning, deeplearning4j is an open source, distributed deep learning project in Java and Scala. Deeplearning4j relies on Spark and Hadoop for MapReduce, trains models in parallel, and iteratively averages the parameters they produce in a central model. A detailed library summary is presented in Chapter 2, Java Libraries and Platforms for Machine Learning.

Getting DL4J

The most convenient way to get deeplearning4j is through the Maven repository:

  1. Start a new Eclipse project and pick Maven Project, as shown in the following screenshot:
    Getting DL4J
  2. Open the pom.xml file and add the following dependencies under the <dependencies> section:
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-nlp</artifactId>
        <version>${dl4j.version}</version>
    </dependency>
    
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-core</artifactId>
        <version>${dl4j.version}</version>
    </dependency>
  3. Finally, right-click on Project, select Maven, and pick Update project.

MNIST dataset

One of the most famous datasets is MNIST dataset, which consists of handwritten digits, as shown in the following image. The dataset comprises 60,000 training and 10,000 testing images:

MNIST dataset

The dataset is commonly used in image recognition problems to benchmark algorithms The worst recorded error rate is 12%, with no preprocessing and using a SVM in one-layer neural network. Currently, as of 2016, the lowest error rate is only 0.21%, using the DropConnect neural network, followed by deep convolutional network at 0.23%, and deep feedforward network at 0.35%.

Now, let's see how to load the dataset.

Loading the data

Deeplearning4j provides the MNIST dataset loader out of the box. The loader is initialized as DataSetIterator. Let's first import the DataSetIterator class and all the supported datasets that are part of the impl package, for example, Iris, MNIST, and others:

import org.deeplearning4j.datasets.iterator.DataSetIterator;
import org.deeplearning4j.datasets.iterator.impl.*;

Next, we'll define some constants, for instance, the images consist of 28 x 28 pixels and there are 10 target classes and 60,000 samples. Initialize a new MnistDataSetIterator class that will download the dataset and its labels. The parameters are iteration batch size, total number of examples, and whether the datasets should be binarized or not:

int numRows = 28;
int numColumns = 28;
int outputNum = 10;
int numSamples = 60000;
int batchSize = 100;
DataSetIterator iter = new MnistDataSetIterator(batchSize,
numSamples,true); 

Having an already-implemented data importer is really convenient, but it won't work on your data. Let's take a quick look at how is it implemented and what needs to be modified to support your dataset. If you're eager to start implementing neural networks, you can safely skip the rest of this section and return to it when you need to import your own data.

Note

To load the custom data, you'll need to implement two classes: DataSetIterator that holds all the information about the dataset and BaseDataFetcher that actually pulls the data either from file, database, or web. Sample implementations are available on GitHub at https://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j-core/src/main/java/org/deeplearning4j/datasets/iterator/impl.

Another option is to use the Canova library, which is developed by the same authors, at http://deeplearning4j.org/canovadoc/.

Building models

In this section, we'll discuss how to build an actual neural network model. We'll start with a basic single-layer neural network to establish a benchmark and discuss the basic operations. Later, we'll improve this initial result with DBN and Multilayer Convolutional Network.

Building a single-layer regression model

Let's start by building a single-layer regression model based on the softmax activation function, as shown in the following diagram. As we have a single layer, Input to the neural network will be all the figure pixels, that is, 28 x 28 = 748 neurons. The number of Output neurons is 10, one for each digit. The network layers are fully connected, as shown in the following diagram:

Building a single-layer regression model

A neural network is defined through a NeuralNetConfiguration Builder object as follows:

MultiLayerConfiguration conf = new 
NeuralNetConfiguration.Builder()

We will define the parameters for gradient search in order to perform iterations with the conjugate gradient optimization algorithm. The momentum parameter determines how fast the optimization algorithm converges to an local optimum—the higher the momentum, the faster the training; but higher speed can lower model's accuracy, as follows:

.seed(seed)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
   .gradientNormalizationThreshold(1.0)
   .iterations(iterations)
   .momentum(0.5)
   .momentumAfter(Collections.singletonMap(3, 0.9))
   .optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)

Next, we will specify that the network will have one layer and define the error function (NEGATIVELOGLIKELIHOOD), internal perceptron activation function (softmax), and the number of input and output layers that correspond to total image pixels and the number of target variables:

.list(1)
.layer(0, new 
OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
.activation("softmax")
.nIn(numRows*numColumns).nOut(outputNum).build())

Finally, we will set the network to pretrain, disable backpropagation, and actually build the untrained network structure:

   .pretrain(true).backprop(false)
   .build();

Once the network structure is defined, we can use it to initialize a new MultiLayerNetwork, as follows:

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

Next, we will point the model to the training data by calling the setListeners method, as follows:

model.setListeners(Collections.singletonList((IterationListener) new ScoreIterationListener(listenerFreq)));

We will also call the fit(int) method to trigger an end-to-end network training:

model.fit(iter); 

To evaluate the model, we will initialize a new Evaluation object that will store batch results:

Evaluation eval = new Evaluation(outputNum);

We can then iterate over the dataset in batches in order to keep the memory consumption at a reasonable rate and store the results in an eval object:

DataSetIterator testIter = new MnistDataSetIterator(100,10000);
while(testIter.hasNext()) {
    DataSet testMnist = testIter.next();
    INDArray predict2 = 
    model.output(testMnist.getFeatureMatrix());
    eval.eval(testMnist.getLabels(), predict2);
}

Finally, we can get the results by calling the stats() function:

log.info(eval.stats());

A basic one-layer model achieves the following accuracy:

Accuracy:  0.8945 
Precision: 0.8985
Recall:    0.8922
F1 Score:  0.8953

Getting 89.22% accuracy, that is, 10.88% error rate, on MNIST dataset is quite bad. We'll improve that by going from a simple one-layer network to the moderately sophisticated deep belief network using Restricted Boltzmann machines and Multilayer Convolutional Network.

Building a deep belief network

In this section, we'll build a deep belief network based on Restricted Boltzmann machine, as shown in the following diagram. The network consists of four layers: the first layer recedes the 748 inputs to 500 neurons, then to 250, followed by 200, and finally to the last 10 target values:

Building a deep belief network

As the code is the same as in the previous example, let's take a look at how to configure such a network:

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()

We defined the gradient optimization algorithm, as shown in the following code:

    .seed(seed)
    .gradientNormalization(
    GradientNormalization.ClipElementWiseAbsoluteValue)
    .gradientNormalizationThreshold(1.0)
    .iterations(iterations)
    .momentum(0.5)
    .momentumAfter(Collections.singletonMap(3, 0.9))
    .optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)

We will also specify that our network will have four layers:

   .list(4)

The input to the first layer will be 748 neurons and the output will be 500 neurons. We'll use the root mean squared-error cross entropy, Xavier algorithm, to initialize weights by automatically determining the scale of initialization based on the number of input and output neurons, as follows:

.layer(0, new RBM.Builder()
.nIn(numRows*numColumns)
.nOut(500)         
.weightInit(WeightInit.XAVIER)
.lossFunction(LossFunction.RMSE_XENT)
.visibleUnit(RBM.VisibleUnit.BINARY)
.hiddenUnit(RBM.HiddenUnit.BINARY)
.build())

The next two layers will have the same parameters, except the number of input and output neurons:

.layer(1, new RBM.Builder()
.nIn(500)
.nOut(250)
.weightInit(WeightInit.XAVIER)
.lossFunction(LossFunction.RMSE_XENT)
.visibleUnit(RBM.VisibleUnit.BINARY)
.hiddenUnit(RBM.HiddenUnit.BINARY)
.build())
.layer(2, new RBM.Builder()
.nIn(250)
.nOut(200)
.weightInit(WeightInit.XAVIER)
.lossFunction(LossFunction.RMSE_XENT)
.visibleUnit(RBM.VisibleUnit.BINARY)
.hiddenUnit(RBM.HiddenUnit.BINARY)
.build())

Now the last layer will map the neurons to outputs, where we'll use the softmax activation function, as follows:

.layer(3, new OutputLayer.Builder()
.nIn(200)
.nOut(outputNum)
.lossFunction(LossFunction.NEGATIVELOGLIKELIHOOD)
.activation("softmax")
.build())
.pretrain(true).backprop(false)
.build();

The rest of the training and evaluation is the same as in the single-layer network example. Note that training deep network might take significantly more time compared to a single-layer network. The accuracy should be around 93%.

Now let's take a look at another deep network.

Build a Multilayer Convolutional Network

In the final example, we'll discuss how to build a convolutional network, as shown in the following diagram. The network will consist of seven layers: first, we'll repeat two pairs of convolutional and subsampling layers with max pooling. The last subsampling layer is then connected to a densely connected feedforward neuronal network, comprising 120 neurons, 84 neurons, and 10 neurons in the last three layers, respectively. Such a network effectively forms the complete image recognition pipeline, where the first four layers correspond to feature extraction and the last three layers correspond to the learning model:

Build a Multilayer Convolutional Network

Network configuration is initialized as we did earlier:

MultiLayerConfiguration.Builder conf = new NeuralNetConfiguration.Builder()

We will specify the gradient descent algorithm and its parameters, as follows:

.seed(seed)
.iterations(iterations)
.activation("sigmoid")
.weightInit(WeightInit.DISTRIBUTION)
.dist(new NormalDistribution(0.0, 0.01))
.learningRate(1e-3)
.learningRateScoreBasedDecayRate(1e-1)
.optimizationAlgo(
OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)

We will also specify the seven network layers, as follows:

.list(7)

The input to the first convolutional layer is the complete image, while the output is six feature maps. The convolutional layer will apply a 5 x 5 filter, and the result will be stored in a 1 x 1 cell:

.layer(0, new ConvolutionLayer.Builder(
    new int[]{5, 5}, new int[]{1, 1})
    .name("cnn1")
    .nIn(numRows*numColumns)
    .nOut(6)
    .build())

The second layer is a subsampling layer that will take a 2 x 2 region and store the max result into a 2 x 2 element:

.layer(1, new SubsamplingLayer.Builder(
SubsamplingLayer.PoolingType.MAX, 
new int[]{2, 2}, new int[]{2, 2})
.name("maxpool1")
.build())

The next two layers will repeat the the previous two layers:

.layer(2, new ConvolutionLayer.Builder(new int[]{5, 5}, new int[]{1, 1})
    .name("cnn2")
    .nOut(16)
    .biasInit(1)
    .build())
.layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX, new int[]{2, 2}, new int[]{2, 2})
    .name("maxpool2")
    .build())

Now we will wire the output of the subsampling layer into a dense feedforward network, consisting of 120 neurons, and then through another layer, into 84 neurons, as follows:

.layer(4, new DenseLayer.Builder()
    .name("ffn1")
    .nOut(120)
    .build())
.layer(5, new DenseLayer.Builder()
    .name("ffn2")
    .nOut(84)
    .build())

The final layer connects 84 neurons with 10 output neurons:

.layer(6, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
    .name("output")
    .nOut(outputNum)
    .activation("softmax") // radial basis function required
    .build())
.backprop(true)
.pretrain(false)
.cnnInputSize(numRows,numColumns,1);

To train this structure, we can reuse the code that we developed in the previous two examples. Again, the training might take some time. The network accuracy should be around 98%.

Note

As model training significantly relies on linear algebra, training can be significantly sped up by using Graphics Processing Unit (GPU) for an order of magnitude. As GPU backend is at the time of writing undergoing a rewrite, please check the latest documentation at http://deeplearning4j.org/documentation

As we saw in different examples, increasingly more complex neural networks allow us to extract relevant features automatically, thus completely avoiding traditional image processing. However, the price we pay for this is an increased processing time and a lot of learning examples to make this approach efficient.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.134.130