Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Case study

Several benchmarks exist for image classification. We will use the MNIST image database for this case study. When we used MNIST in Chapter 3, Unsupervised Machine Learning Techniques with clustering and outlier detection techniques, each pixel was considered a feature. In addition to learning from the pixel values as in previous experiments, with deep learning techniques we will also be learning new features from the structure of the training dataset. The deep learning algorithms will be trained on 60,000 images and tested on a 10,000-image test dataset.

Tools and software

In this chapter, we introduce the open-source Java framework for deep learning called DeepLearning4J (DL4J). DL4J has libraries implementing a host of deep learning techniques and they can be used on distributed CPUs and GPUs.

DeepLearning4J: https://deeplearning4j.org/index.html

We will illustrate the use of some DL4J libraries in learning from the MNIST training images and apply the learned models to classify the images in the test set.

Business problem

Image classification is a particularly attractive test-bed to evaluate deep learning networks. We have previously encountered the MNIST database, which consists of greyscale images of handwritten digits. This time, we will show how both unsupervised and supervised deep learning techniques can be used to learn from the same dataset. The MNIST dataset has 28-by-28 pixel images in a single channel. These images are categorized into 10 labels representing the digits 0 to 9. The goal is to train on 60,000 data points and test our deep learning classification algorithm on the remaining 10,000 images.

Machine learning mapping

This includes supervised and unsupervised methods applied to a classification problem in which there are 10 possible output classes. Some techniques use an initial pre-training stage, which is unsupervised in nature, as we have seen in the preceding sections.

Data sampling and transfor

The dataset is available at:

https://yann.lecun.com/exdb/mnist

In the experiments in this case study, the MNIST dataset has been standardized such that pixel values in the range 0 to 255 have been normalized to values from 0.0 to 1.0. The exception is in the experiment using stacked RBMs, where the training and test data have been binarized, that is, set to 1 if the standardized value is greater than or equal to 0.3 and 0 otherwise. Each of the 10 classes is equally represented in both the training set and the test set. In addition, examples are shuffled using a random number generator seed supplied by the user.

Feature analysis

The input data features are the greyscale values of the pixels in each image. This is the raw data and we will be using the deep learning algorithms to learn higher-level features out of the raw pixel values. The dataset has been prepared such that there are an equal number of examples of each class in both the training and the test sets.

Models, results, and evaluation

We will perform different experiments starting with simple MLP, Convolutional Networks, Variational Autoencoders, Stacked RBMS, and DBNs. We will walk through important parts of code that highlight the network structure or specialized tunings, give parameters to help readers, reproduce the experiments, and give the results for each type of network.

Basic data handling

The following snippet of code shows:

How to generically read data from a CSV with a structure enforced by delimiters.

How to iterate the data and get records.

How to shuffle data in memory and create training/testing or validation sets:

RecordReader recordReader = new  ] CSVRecordReader(numLinesToSkip,delimiter);
recordReader.initialize(new FileSplit(new ClassPathResource(fileName).getFile()));
DataSetIterator iterator = new RecordReaderDataSetIterator(recordReader,batchSize,labelIndex,numClasses);
DataSet allData = iterator.next();
allData.shuffle();
SplitTestAndTrain testAndTrain = allData.splitTestAndTrain(trainPercent); 
DataSet trainingData = testAndTrain.getTrain();
DataSet testData = testAndTrain.getTest();

DL4J has a specific MNIST wrapper for handling the data that we have used, as shown in the following snippet:

DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, randomSeed);
DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, randomSeed);

Multi-layer perceptron

In the first experiment, we will use a basic multi-layer perceptron with an input layer, one hidden layer, and an output layer. A detailed list of parameters that are used in the code is given here:

Parameters used for MLP

Parameter	Variable	Value
Number of iterations	m	1
Learning rate	rate	0.0015
Momentum	momentum	0.98
L2 regularization	regularization	0.005
Number of rows in input	numRows	28
Number of columns in input	numColumns	28
Layer 0 output size, Layer 1 input size	outputLayer0, inputLayer1	500
Layer 1 output size, Layer 2 input size	outputLayer1, inputLayer2	300
Layer 2 output size, Layer 3 input size	outputLayer2, inputLayer3	100
Layer 3 output size,	outputNum	10

Code for MLP

In the listing that follows, we can see how we first configure the MLP by passing in the hyperarameters using the Builder pattern.

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(randomSeed) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) // use SGD
.iterations(m)//iterations
.activation(Activation.RELU)//activation function
.weightInit(WeightInit.XAVIER)//weight initialization
.learningRate(rate) //specify the learning rate
.updater(Updater.NESTEROVS).momentum(momentum)//momentum
.regularization(true).l2(rate * regularization) // 
.list()
.layer(0, 
new DenseLayer.Builder() //create the first input layer.
.nIn(numRows * numColumns)
.nOut(firstOutput)
.build())
.layer(1, new DenseLayer.Builder() //create the second input layer
.nIn(secondInput)
.nOut(secondOutput)
.build())
.layer(2, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD) //create hidden layer
.activation(Activation.SOFTMAX)
.nIn(thirdInput)
.nOut(numberOfOutputClasses)
.build())
.pretrain(false).backprop(true) //use backpropagation to adjust weights
.build();

Training, evaluation, and testing the MLP are shown in the following snippet. Notice the code that initializes the visualization backend enabling you to monitor the model training in your browser, particularly the model score (the training error after each iteration) and updates to parameters:

MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
model.setListeners(new ScoreIterationListener(5));  //print the score with every iteration
//Initialize the user interface backend
UIServer uiServer = UIServer.getInstance();
//Configure where the network information (gradients, activations, score vs. time etc) is to be stored
//Then add the StatsListener to collect this information from the network, as it trains
StatsStorage statsStorage = new InMemoryStatsStorage();             //Alternative: new FileStatsStorage(File) - see UIStorageExample
int listenerFrequency = 1;
net.setListeners(new StatsListener(statsStorage, listenerFrequency));
//Attach the StatsStorage instance to the UI: this allows the contents of the StatsStorage to be visualized
uiServer.attach(statsStorage);
log.info(""Train model...."");
for( int i=0; i<numEpochs; i++ ){
log.info(""Epoch "" + i);
model.fit(mnistTrain);
        }
log.info(""Evaluate model...."");
Evaluation eval = new Evaluation(numberOfOutputClasses); 
while(mnistTest.hasNext()){
DataSet next = mnistTest.next();
INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
eval.eval(next.getLabels(), output); //check the prediction against the true class
        }
log.info(eval.stats());

The following plots show the training error against training iteration for the MLP model. This curve should decrease with iterations:

Figure 41: Training error as measured with number of iterations of training for the MLP model.

In the following figure, we see the distribution of parameters in Layer 0 of the MLP as well as the distribution of updates to the parameters. These histograms should have an approximately Gaussian (Normal) shape, which indicates good convergence. For more on how to use charts to tune your model, see the DL4J Visualization page (https://deeplearning4j.org/visualization):

Figure 42: Histograms showing Layer parameters and update distribution.

Convolutional Network

In the second experiment, we configured a Convolutional Network (ConvNet) using the built-in MultiLayerConfiguration. The architecture of the network consists of a total of five layers, as can be seen from the following code snippet. Following the input layer, two convolution layers with 5-by-5 filters alternating with Max pooling layers are followed by a fully connected dense layer using the ReLu activation layer, ending with Softmax activation in the final output layer. The optimization algorithm used is Stochastic Gradient Descent, and the loss function is Negative Log Likelihood.

The various configuration parameters (or hyper-parameters) for the ConvNet are given in the table.

Parameters used for ConvNet

Parameter	Variable	Value
Seed	seed	123
Input size	numRows, numColumns	28, 28
Number of epochs	numEpochs	10
Number of iterations	iterations	1
L2 regularization	regularization	0.005
Learning rate	learningRate	0.1
Momentum	momentum	0.9
Convolution filter size	xsize, ysize	5, 5
Convolution layers stride size	x, y	1, 1
Number of input channels	numChannels	1
Subsampling layer stride size	sx, sy	2, 2
Layer 0 output size	nOut0	20
Layer 2 output size	nOut1	50
Layer 4 output size	nOut2	500
Layer 5 output size	outputNum	10

Code for CNN

As you can see, configuring multi-layer neural networks with the DL4J API is similar whether you are building MLPs or CNNs. Algorithm-specific configuration is simply done in the definition of each layer.

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(iterations) .regularization(true).l2(regularization)
.learningRate(learningRate)
.weightInit(WeightInit.XAVIER) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .updater(Updater.NESTEROVS).momentum(momentum)
.list()
.layer(0, new ConvolutionLayer.Builder(xsize, ysize)
.nIn(nChannels)
.stride(x,y)
.nOut(nOut0)
.activation(Activation.IDENTITY)
.build())
.layer(1, new SubsamplingLayer
.Builder(SubsamplingLayer.PoolingType.MAX)
.kernelSize(width, height)
.stride(sx,sy)
.build())
.layer(2, new ConvolutionLayer.Builder(xsize, ysize)
.stride(x,y)
.nOut(nOut2)
.activation(Activation.IDENTITY)
.build())
.layer(3, new SubsamplingLayer
.Builder(SubsamplingLayer.PoolingType.MAX)
.kernelSize(width, height)
.stride(sx,sy)
.build())
.layer(4, new DenseLayer.Builder()
.activation(Activation.RELU)
.nOut(nOut4).build())
.layer(5, new OutputLayer. Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.nOut(outputNum)
.activation(Activation.SOFTMAX)
.build())
.setInputType(InputType.convolutionalFlat(numRows,numColumns,1)) 
.backprop(true).pretrain(false).build();

Variational Autoencoder

In the third experiment, we configure a Variational Autoencoder as the classifier.

Parameters used for the Variational Autoencoder

The parameters used to configure the VAE are shown in the table.

Parameter	Variable	Values
Seed for RNG	rngSeed	12345
Number of iterations	Iterations	1
Learning rate	learningRate	0.001
RMS decay	rmsDecay	0.95
L2 regularization	regularization	0.0001
Output layer size	outputNum	10
VAE encoder layers size	vaeEncoder1, vaeEncoder2	256, 256
VAE decoder layers size	vaeDecoder1, vaeDecoder2	256, 256
Size of latent variable space	latentVarSpaceSize	128

Code for Variational Autoencoder

We have configured two layers each of encoders and decoders and are reconstructing the input using a Bernoulli distribution.

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(rngSeed)
.iterations(iterations)
.optimizationAlgo(
OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.learningRate(learningRate)
.updater(Updater.RMSPROP).rmsDecay(rmmsDecay)
.weightInit(WeightInit.XAVIER)
.regularization(true).l2(regulaization)
.list()
.layer(0, new VariationalAutoencoder.Builder()
.activation(Activation.LEAKYRELU)
                .encoderLayerSizes(vaeEncoder1, vaeEncoder2)        //2 encoder layers
                .decoderLayerSizes(vaeDecoder1, vaeDecoder2)        //2 decoder layers
.pzxActivationFunction(""identity"")  //p(z|data) activation function
.reconstructionDistribution(new BernoulliReconstructionDistribution(Activation.SIGMOID.getActivationFunction()))     //Bernoulli distribution for p(data|z) (binary or 0 to 1 data only)
.nIn(numRows * numColumns) //Input size                      
.nOut(latentVarSpaceSize) //Size of the latent variable space: p(z|x).
.build())
.layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD).activation(Activation.SOFTMAX)
.nIn(latentVarSpaceSize).nOut(outputNum).build())
.pretrain(true).backprop(true).build();

DBN

The parameters used in DBN are shown in the following table:

Parameter	Variable	Value
Input data size	numRows, numColumns	28, 28
Seed for RNG	seed	123
Number of training iterations	iterations	1
Momentum	momentum	0.5
Layer 0 (input) Layer 0 (output) Layer 1 (input, output) Layer 2 (input, output) Layer 3 (input, output)	numRows * numColumns nOut0 nIn1, nOut1 nIn2, nOut2 nIn3, outputNum	28 * 28 500 500, 250 250, 200 200, 10

Configuring the DBN using the DL4J API is shown in the example used in this case study. The code for the configuration of the network is shown here.

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(1.0)
.iterations(iterations)
.updater(Updater.NESTEROVS)
.momentum(momentum)
.optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)
.list()
.layer(0, new RBM.Builder().nIn(numRows*numColumns).nOut(nOut0)
.weightInit(WeightInit.XAVIER).lossFunction(LossFunction.KL_DIVERGENCE)
.visibleUnit(RBM.VisibleUnit.BINARY)
.hiddenUnit(RBM.HiddenUnit.BINARY)
.build())
.layer(1, new RBM.Builder().nIn(nIn1).nOut(nOut1)
.weightInit(WeightInit.XAVIER).lossFunction(LossFunction.KL_DIVERGENCE)
.visibleUnit(RBM.VisibleUnit.BINARY)
.hiddenUnit(RBM.HiddenUnit.BINARY)
.build())
.layer(2, new RBM.Builder().nIn(nIn2).nOut(nOut2)
.weightInit(WeightInit.XAVIER).lossFunction(LossFunction.KL_DIVERGENCE)
.visibleUnit(RBM.VisibleUnit.BINARY)
.hiddenUnit(RBM.HiddenUnit.BINARY)
.build())
.layer(3, new OutputLayer.Builder().nIn(nIn3).nOut(outputNum)
.weightInit(WeightInit.XAVIER).activation(Activation.SOFTMAX)
.build())
.pretrain(true).backprop(true)
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
model.setListeners(new ScoreIterationListener(listenerFreq));

Parameter search using Arbiter

DeepLearning4J provides a framework for fine-tuning hyper-parameters by taking the burden of hand-tuning away from the modeler; instead, it allows the specification of the parameter space to search. In the following example code snippet, the configuration is specified using a MultiLayerSpace instead of a MutiLayerConfiguration object, in which the ranges for the hyper-parameters are specified by means of ParameterSpace objects in the Arbiter DL4J package for the parameters to be tuned:

ParameterSpace<Double> learningRateHyperparam = new ContinuousParameterSpace(0.0001, 0.1);  //Values will be generated uniformly at random between 0.0001 and 0.1 (inclusive)
ParameterSpace<Integer> layerSizeHyperparam = new IntegerParameterSpace(16,256);            //Integer values will be generated uniformly at random between 16 and 256 (inclusive)
MultiLayerSpace hyperparameterSpace = new MultiLayerSpace.Builder()
//These next few options: fixed values for all models
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.iterations(1)
.regularization(true)
.l2(0.0001)
//Learning rate: this is something we want to test different values for
.learningRate(learningRateHyperparam)
.addLayer( new DenseLayerSpace.Builder()
//Fixed values for this layer:
.nIn(784)  //Fixed input: 28x28=784 pixels for MNIST
.activation(""relu"")
//One hyperparameter to infer: layer size
.nOut(layerSizeHyperparam)
.build())
.addLayer( new OutputLayerSpace.Builder()
//nIn: set the same hyperparemeter as the nOut for the last layer.
.nIn(layerSizeHyperparam)
//The remaining hyperparameters: fixed for the output layer
.nOut(10)
.activation(""softmax"")
.lossFunction(LossFunctions.LossFunction.MCXENT)
.build())
.pretrain(false).backprop(true).build();

Results and analysis

The results of evaluating the performance of the four networks on the test data are given in the following table:

	MLP	ConvNet	VAE	DBN
Accuracy	0.9807	0.9893	0.9743	0.7506
Precision	0.9806	0.9893	0.9742	0.7498
Recall	0.9805	0.9891	0.9741	0.7454
F1 score	0.9806	0.9892	0.9741	0.7476

The goal of the experiments was not to match benchmark results in each of the neural network structures, but to give a comprehensive architecture implementation in the code with detailed parameters for the readers to explore.

Tuning the hyper-parameters in deep learning Networks is quite a challenge and though Arbiter and online resources such as gitter ( https://gitter.im/deeplearning4j/deeplearning4j) help with DL4J, the time and cost of running the hyper-parameter search is quite high as compared to other classification techniques including SVMs.

The benchmark results on the MNIST dataset and corresponding papers are available here:

As seen from the benchmark result, Linear 1 Layer NN gets an error rate of 12% and adding more layers reduces it to about 2. This shows the non-linear nature of the data and the need for a complex algorithm to fit the patterns.

As compared to the benchmark best result on neural networks ranging from a 2.5% to 1.6% error rate, our results are very much comparable with the 2% error rate.

Most of the benchmark results show Convolutional Network architectures having error rates in the range of 1.1% to 0.5% and our hyper-parameter search has matched the best of those models with an error rate of just under 1.1%.

Our results for DBN fall far short of the benchmarks at just over 25%. There is no reason to doubt that further tuning can improve performance bringing it to the range of 3-5%.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Case study

Create new playlist

Sign In

Sign Up