Chapter 8. Deep Learning

In this chapter, we will focus on neural networks, often referred to as Deep Learning Networks (DLNs). This type of network is characterized as a multiple-layer neural network. Each of these layers are rained on the output of the previous layer, potentially identifying features and sub-features of the dataset. A feature hierarchy is created in this manner.

DLNs typically work with unstructured and unlabeled data, which constitute the vast bulk of data found in the world today. DLN will take this unstructured data, identify features, and try to reconstruct the original input. This approach is illustrated with Restricted Boltzmann Machines (RBMs) in Restricted Boltzmann Machines and with autoencoders in Deep autoencoders. An autoencoder takes a dataset and effectively compresses it. It then decompresses it to reconstruct the original dataset.

DLN can also be used for predictive analysis. The last step of a DLN will use an activation function to generate output represented by one of several categories. When used with new data, the model will attempt to classify the input based on the previously trained model.

An important DLN task is ensuring that the model is accurate and minimizes error. As with simple neural networks, weights and biases are used at each layer. As weight values are adjusted, errors can be introduced. A technique to adjust weights uses gradient descent. This can be thought of as the slope of the change. The idea is to modify the weight so as to minimize the error. It is an optimization technique that speeds up the learning process.

Later in the chapter, we will examine Convolutional Neural Networks (CNNs) and briefly discuss Recurrent Neural Networks (RNN). Convolution networks mimic the visual cortex in that each neuron can interact with and make decisions based on a region of information. Recurrent networks process information based on not only the output of the previous layer but also the calculations performed in previous layers.

There are several libraries that support deep learning, including these:

ND4J is a lower level library that is actually used in other projects, including DL4J. Encog is perhaps not as well supported as DL4J, but does provide support for deep learning.

The examples used in this chapter are all based on the Deep Learning for Java (DL4J) (http://deeplearning4j.org) API with support from ND4J. This library provides good support for many of the algorithms associated with deep learning. As a result, the next section explains the basic tasks found in common with many of the deep learning algorithms, such as loading data, training a model, and testing the model.

Deeplearning4j architecture

In this section, we will discuss its architecture and address several of the common tasks performed when using the API. DLN typically starts with the creation of a MultiLayerConfiguration instance, which defines the network, or model. The network is composed of multiple layers. Hyperparameters are used to configure the network and are variables that affect such things as learning speed, activation functions to use for a layer, and how weights are to be initialized.

As with neural networks, the basic DLN process consists of:

  • Acquiring and manipulating data
  • Configuring and building a model
  • Training the model
  • Testing the model

We will investigate each of these tasks in the next sections.

Note

The code examples in this section are not intended to be entered and executed here. Instead, these examples are snippets out of later models that we will be using.

Acquiring and manipulating data

The DL4J API has a number of techniques for acquiring data. We will focus on those specific techniques that we will use in our examples. The dataset used by a DL4J project is often modified using either binarization or normalization. Binarization converts data to ones and zeroes. Normalization converts data to a value between 1 and 0.

Data feed to DLN is transformed to a set of numbers. These numbers are referred to as vectors. These vectors consist of a one-column matrix with a variable number of rows. The process of creating a vector is called vectorization.

Canova (http://deeplearning4j.org/canova.html) is a DL4J library that supports vectorization. It works with many different types of datasets. It has been merged with DataVec (http://deeplearning4j.org/datavec), a vectorization and Extract, Transform, and Load (ETL) library.

In this section, we will focus on how to read in CSV data.

Reading in a CSV file

ND4J provides the CSVRecordReader class, which is useful for reading CSV data. It has three overloaded constructors. The one we will demonstrate is passed two arguments. The first is the number of lines to skip when first reading a file and the second is a string holding the delimiters used to parse the text.

In the following code, we create a new instance of the class, where we do not skip any lines and use only a comma for a delimiter:

RecordReader recordReader = new CSVRecordReader(0, ","); 

The class implements the RecordReader interface. It has an initialize method that is passed an instance of the FileSplit class. One of its constructors is passed an instance of a File object that references a dataset. The FileSplit class assists in splitting the data for training and testing. In this example, we initialize the reader for a file called car.txt that we will use in the Preparing the data section:

recordReader.initialize(new FileSplit(new File("car.txt"))); 

To process the data, we need an iterator such as the DataSetIterator instance shown next. This class possesses a multitude of overloaded constructors. In the following example, the first argument is the RecordReader instance. This is followed by three arguments. The first is the batch size, which is the number of records to retrieve at a time. The next one is the index of the last attribute of the record. The last argument is the number of classes represented by the dataset:

DataSetIterator iterator =  
    new RecordReaderDataSetIterator(recordReader, 1728, 6, 4); 

The file's record's last attribute will hold a class value if we use a dataset for regression. This is precisely how we will use it later. The number of the class's parameter is only used with regression.

In the next code sequence, we will split the dataset into two sets: one for training and one for testing. Starting with the next method, this method returns the next dataset from the source. The size of the dataset is dependent on the batch size used earlier. The shuffle method randomizes the input while the splitTestAndTrain method returns an instance of the SplitTestAndTrain class, which we use to get the training and testing datasets. The splitTestAndTrain method's argument specifies the percentage of the data to be used for training.

DataSet dataset = iterator.next(); 
dataset.shuffle(); 
SplitTestAndTrain testAndTrain = dataset.splitTestAndTrain(0.65); 
DataSet trainingData = testAndTrain.getTrain(); 
DataSet testData = testAndTrain.getTest(); 

We can then use these datasets with a model.

Configuring and building a model

Frequently, DL4J uses the MultiLayerConfiguration class to define the configuration of the model and the MultiLayerNetwork class to represent a model. These classes provide a flexible way of building models.

In the following example, we will demonstrate the use of these classes. Starting with the MultiLayerConfiguration class, we find that several methods are used in a fluent style. We will provide more details about these methods shortly. However, notice that two layers are defined for this model:

MultiLayerConfiguration conf =  
    new NeuralNetConfiguration.Builder() 
        .iterations(1000) 
        .activation("relu") 
        .weightInit(WeightInit.XAVIER) 
        .learningRate(0.4) 
        .list() 
        .layer(0, new DenseLayer.Builder() 
                .nIn(6).nOut(3) 
                .build()) 
        .layer(1, new OutputLayer 
                .Builder(LossFunctions.LossFunction 
                        .NEGATIVELOGLIKELIHOOD) 
                .activation("softmax") 
                .nIn(3).nOut(4).build()) 
        .backprop(true).pretrain(false) 
        .build(); 

The nIn and nOut methods specify the number of inputs and outputs for a layer.

Using hyperparameters in ND4J

Builder classes are common in DL4J. In the previous example, the NeuralNetConfiguration.Builder class is used. The methods used here are but a few of the many that are available. In the following table, we describe several of them:

Method

Usage

iterations

Controls the number of optimization iterations performed

activation

This is the activation function used

weightInit

Used to initialize the initial weights for the model

learningRate

Controls the speed the model learns

List

Creates an instance of the NeuralNetConfiguration.ListBuilder class so that we can add layers

Layer

Creates a new layer

backprop

When set to true, it enables backpropagation

pretrain

When set to true, it will pretrain the model

Build

Performs the actual build process

Let's examine how a layer is created more closely. In the example, the list method returns a NeuralNetConfiguration.ListBuilder instance. Its layer method takes two arguments. The first is the number of the layer, a zero-based numbering scheme. The second is the Layer instance.

There are two different layers used here with two different builders: a DenseLayer.Builder and an OutputLayer.Builder instance. There are several types of layers available in DL4J. The argument of a builder's constructor may be a loss function, as is the case with the output layer, and is explained next.

In a feedback network, the neural network's guess is compared to what is called the ground truth, which is the error. This error is used to update the network through the modification of weights and biases. The loss function, also called an objective or cost function, measures the difference.

There are several loss functions supported by DL4J:

  • MSE: In linear regression MSE stands for mean squared error
  • EXPLL: In poisson regression EXPLL stands for exponential log likelihood
  • XENT: In binary classification XENT stands for cross entropy
  • MCXENT: This stands for multiclass cross entropy
  • RMSE_XENT: This stands for RMSE cross entropy
  • SQUARED_LOSS: This stands for squared loss
  • RECONSTRUCTION_CROSSENTROPY: This stands for reconstruction cross entropy
  • NEGATIVELOGLIKELIHOOD: This stands for negative log likelihood
  • CUSTOM: Define your own loss function

The remaining methods used with the builder instance are the activation function, the number of inputs and outputs for the layer, and the build method, which creates the layer.

Each layer of a multi-layer network requires the following:

  • Input: Usually in the form of an input vector
  • Weights: Also called coefficients
  • Bias: Used to ensure that at least some nodes in a layer are activated
  • Activation function: Determines whether a node fires

There are many different types of activation functions, each of which can address a particular type of problem.

The activation function is used to determine whether the neuron fires. There are several functions supported, including relu (rectified linear), tanh, sigmoid, softmax, hardtanh, leakyrelu, maxout, softsign, and softplus.

Instantiating the network model

Next, a MultiLayerNetwork instance is created using the defined configuration. The model is initialized, and its listeners are set. The ScoreIterationListener instance will display information as the model trains, which we will see shortly. Its constructor's argument specifies how often that information should be displayed:

MultiLayerNetwork model = new MultiLayerNetwork(conf); 
model.init(); 
model.setListeners(new ScoreIterationListener(100)); 

We are now ready to train the model.

Training a model

This is actually a fairly simple step. The fit method performs the training:

model.fit(trainingData); 

When executed, the output will be generated using any listeners associated with the model, as is the preceding case, where a ScoreIterationListener instance is used.

Another example of how the fit method is used is through the process of iterating through a dataset, as shown next. In this example, a sequence of datasets is used. This is the part of an autoencoder where the output is intended to match the input, as explained in Deep autoencoders section. The dataset used as the argument to the fit method uses both the input and the expected output. In this case, they are the same as provided by the getFeatureMatrix method:

while (iterator.hasNext()) { 
    DataSet dataSet = iterator.next(); 
    model.fit(new DataSet(dataSet.getFeatureMatrix(), 
            dataSet.getFeatureMatrix())); 
} 

For larger datasets, it is necessary to pretrain the model several times to get accurate results. This is often performed in parallel to reduce training time. This option is set with a layer class's pretrain method.

Testing a model

The evaluation of a model is performed using the Evaluation class and the training dataset. An Evaluation instance is created using an argument specifying the number of classes. The test data is fed into the model using the output method. The eval method takes the output of the model and compares it against the test data classes to generate statistics:

Evaluation evaluation = new Evaluation(4); 
INDArray output = model.output(testData.getFeatureMatrix()); 
evaluation.eval(testData.getLabels(), output); 
out.println(evaluation.stats()); 

The output will look similar to the following:

==========================Scores===================================
Accuracy: 0.9273
Precision: 0.854
Recall: 0.8323
F1 Score: 0.843

These statistics are detailed here:

  • Accuracy: This is a measure of how often the correct answer was returned.
  • Precision: This is a measure of the probability that a positive response is correct.
  • Recall: This measures how likely the result will be classified correctly if given a positive example.
  • F1 Score: This is the probability that the network's results are correct. It is the harmonic mean of recall and precision. It is calculated by dividing the number of true positives by the sum of true positives and false negatives.

We will use the Evaluation class to determine the quality of our model. A measure called f1 is used, whose values range from 0 to 1, where 1 represents the best quality.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.86.183