Understanding dynamic neural networks

Dynamic neural networks differ from static networks in that they continue learning after the training phase. They can make adjustments to their structure independently of external modification. A feedforward neural network (FNN) is one of the earliest and simplest dynamic neural networks. This type of network, as its name implies, only feeds information forward and does not form any cycles. This type of network formed the foundation for much of the later work in dynamic ANNs. We will show in-depth examples of two types of dynamic networks in this section, MLP networks and SOMs.

Multilayer perceptron networks

A MLP network is a FNN with multiple layers. The network uses supervised learning with backpropagation where feedback is sent to early layers to assist in the learning process. Some of the neurons use a nonlinear activation function mimicking biological neurons. Every nodes of one layer is fully connected to the following layer.

We will use a dataset called dermatology.arff that can be downloaded from http://repository.seasr.org/Datasets/UCI/arff/. This dataset contains 366 instances used to diagnosis erythemato-squamous diseases. It uses 34 attributes to classify the disease into one of five different categories. The following is a sample instance:

2,2,0,3,0,0,0,0,1,0,0,0,0,0,0,3,2,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,0,55,2

The last field represents the disease category. This dataset has been partitioned into two files: dermatologyTrainingSet.arff and dermatologyTestingSet.arff. The training set uses the first 80% (292 instances) of the original set and ends with line 456. The testing set is the last 20% (74 instances) and starts with line 457 of the original set (lines 457-530).

Building the model

Before we can make any predictions, it is necessary that we train the model on a representative set of data. We will use the Weka class, MultilayerPerceptron, for training and eventually to make predictions. First, we declare strings for the training and testing of filenames and the corresponding FileReader instances for them. The instances are created and the last field is specified as the field to use for classification:

String trainingFileName = "dermatologyTrainingSet.arff"; 
String testingFileName = "dermatologyTestingSet.arff"; 
 
try (FileReader trainingReader = new FileReader(trainingFileName); 
        FileReader testingReader =  
            new FileReader(testingFileName)) { 
    Instances trainingInstances = new Instances(trainingReader); 
    trainingInstances.setClassIndex( 
        trainingInstances.numAttributes() - 1); 
    Instances testingInstances = new Instances(testingReader); 
    testingInstances.setClassIndex( 
        testingInstances.numAttributes() - 1); 
    ... 
} catch (Exception ex) { 
    // Handle exceptions 
} 

An instance of the MultilayerPerceptron class is then created:

MultilayerPerceptron mlp = new MultilayerPerceptron(); 

There are several model parameters that we can set, as shown here:

Parameter

Method

Description

Learning rate

setLearningRate

Affects the training speed

Momentum

setMomentum

Affects the training speed

Training time

setTrainingTime

The number of training epochs used to train the model

Hidden layers

setHiddenLayers

The number of hidden layers and perceptrons to use

As mentioned previously, the learning rate will affect the speed in which your model is trained. A large value can increase the training speed. If the learning rate is too small, then the training time may take too long. If the learning rate is too large, then the model may move past a local minimum and become divergent. That is, if the increase is too large, we might skip over a meaningful value. You can think of this a graph where a small dip in a plot along the Y axis is missed because we incremented our X value too much.

Momentum also affects the training speed by effectively increasing the rate of learning. It is used in addition to the learning rate to add momentum to the search for the optimal value. In the case of a local minimum, the momentum helps get out of the minimum in its quest for a global minimum.

When the model is learning it performs operations iteratively. The term, epoch is used to refer to the number of iterations. Hopefully, the total error encounter with each epoch will decrease to a point where further epochs are not useful. It is ideal to avoid too many epochs.

A neural network will have one or more hidden layers. Each of these layers will have a specific number of perceptrons. The setHiddenLayers method specifies the number of layers and perceptrons using a string. For example, 3,5 would specify two hidden layers with three and five perceptrons per layer, respectively.

For this example, we will use the following values:

mlp.setLearningRate(0.1); 
mlp.setMomentum(0.2); 
mlp.setTrainingTime(2000); 
mlp.setHiddenLayers("3"); 

The buildClassifier method uses the training data to build the model:

mlp.buildClassifier(trainingInstances); 

Evaluating the model

The next step is to evaluate the model. The Evaluation class is used for this purpose. Its constructor takes the training set as input and the evaluateModel method performs the actual evaluation. The following code illustrates this using the testing dataset:

Evaluation evaluation = new Evaluation(trainingInstances); 
evaluation.evaluateModel(mlp, testingInstances); 

One simple way of displaying the results of the evaluation is using the toSummaryString method:

System.out.println(evaluation.toSummaryString()); 

This will display the following output:

Correctly Classified Instances 73 98.6486 %
Incorrectly Classified Instances 1 1.3514 %
Kappa statistic 0.9824
Mean absolute error 0.0177
Root mean squared error 0.076 
Relative absolute error 6.6173 %
Root relative squared error 20.7173 %
Coverage of cases (0.95 level) 98.6486 %
Mean rel. region size (0.95 level) 18.018 %
Total Number of Instances 74

Frequently, it will be necessary to experiment with these parameters to get the best results. The following are the results of varying the number of perceptrons:

Evaluating the model

Predicting other values

Once we have a model trained, we can use it to evaluate other data. In the previous testing dataset there was one instance which failed. In the following code sequence, this instance is identified and the predicted and actual results are displayed.

Each instance of the testing dataset is used as input to the classifyInstance method. This method tries to predict the correct result. This result is compared to the last field of the instance that contains the actual value. For mismatches, the predicted and actual values are displayed:

for (int i = 0; i < testingInstances.numInstances(); i++) { 
    double result = mlp.classifyInstance( 
        testingInstances.instance(i)); 
    if (result != testingInstances 
            .instance(i) 
            .value(testingInstances.numAttributes() - 1)) { 
        out.println("Classify result: " + result 
                + " Correct: " + testingInstances.instance(i) 
                .value(testingInstances.numAttributes() - 1)); 
        ... 
    } 
} 

For the testing set we get the following output:

Classify result: 1.0 Correct: 3.0

We can get the likelihood of the prediction being correct using the MultilayerPerceptron class' distributionForInstance method. Place the following code into the previous loop. It will capture the incorrect instance, which is easier than instantiating an instance based on the 34 attributes used by the dataset. The distributionForInstance method takes this instance and returns a two element array of doubles. The first element is the probability of the result being positive and the second is the probability of it being negative:

Instance incorrectInstance = testingInstances.instance(i); 
incorrectInstance.setDataset(trainingInstances); 
double[] distribution = mlp.distributionForInstance(incorrectInstance); 
out.println("Probability of being positive: " + distribution[0]); 
out.println("Probability of being negative: " + distribution[1]); 

The output for this instance is as follows:

Probability of being positive: 0.00350515156929017
Probability of being negative: 0.9683660500711128

This can provide a more quantitative feel for the reliability of the prediction.

Saving and retrieving the model

We can also save and retrieve a model for later use. To save the model, build the model and then use the SerializationHelper class' static method write, as shown in the following code snippet. The first argument is the name of the file to hold the model:

SerializationHelper.write("mlpModel", mlp); 

To retrieve the model, use the corresponding read method as shown here:

mlp = (MultilayerPerceptron)SerializationHelper.read("mlpModel"); 

Next, we will learn how to use another useful neural network approach, SOMs.

Learning vector quantization

Learning Vector Quantization (LVQ) is another special type of a dynamic ANN. SOMs, which we will discuss in a moment, are a by-product of LVQ networks. This type of network implements a competitive type of algorithm in which the winning neuron gains the weight. These types of networks are used in many different applications and are considered to be more natural and intuitive than some other ANNs. In particular, LVQ is effective for classification of text-based data.

The basic algorithm begins by setting the number of neurons, the weight for each neuron, how fast the neurons can learn, and a list of input vectors. In this context, a vector is similar to a vector in physics and represents the values provided to the input layer neurons. As the network is trained, a vector is used as input, a winning neuron is selected, and the weight of the winning neuron is updated. This model is iterative and will continue to run until a solution is found.

Self-Organizing Maps

SOMs is a technique that takes multidimensional data and reducing it to one or two dimensions. This compression technique is called vector quantization. The technique usually involves a visual component that allows a human to better see how the data has been categorized. SOM learns without supervision.

The SOM is good for finding clusters, which is not to be confused with classification. With classification we are interested in finding the best fit for a data instance among predefined categories. With clustering we are interested in grouping instances where the categories are unknown.

A SOM uses a lattice of neurons, usually a two-dimensional array or a hexagonal grid, representing neurons that are assigned weights. The input sources are connected to each of these neurons. The technique then adjusts the weights assigned to each lattice member through several iterations until the best fit is found. When finished, the lattice members will have grouped the input dataset into categories. The SOM results can be viewed to identify categories and map new input to one of the identified categories.

Using a SOM

We will use the Weka to demonstrate SOM. However, it is does not come installed with standard Weka. Instead, we will need to download a set of Weka classification algorithms from https://sourceforge.net/projects/wekaclassalgos/files/ and the actual SOM class from http://www.cis.hut.fi/research/som_pak/. The classification algorithms include support for LVQ. More details about the classification algorithms can be found at http://wekaclassalgos.sourceforge.net/.

To use the SOM class, called SelfOrganizingMap, the source code needs to be in your project. The Javadoc for this class is found at http://jsalatas.ictpro.gr/weka/doc/SelfOrganizingMap/.

We start with the creation of an instance of the SelfOrganizingMap class. This is followed by code to read in data and create an Instances object to hold the data. In this example, we will use the iris.arff file, which can be found in the Weka data directory. Notice that once the Instances object is created we do not specify the class index as we did with previous Weka examples since SOM uses unsupervised learning:

SelfOrganizingMap som = new SelfOrganizingMap(); 
String trainingFileName = "iris.arff"; 
try (FileReader trainingReader =  
        new FileReader(trainingFileName)) { 
    Instances trainingInstances = new Instances(trainingReader); 
    ... 
} catch (IOException ex) { 
    // Handle exceptions 
} catch (Exception ex) {
    // Handle exceptions
}

The buildClusterer method will execute the SOM algorithm using the training dataset:

    som.buildClusterer(trainingInstances); 

Displaying the SOM results

We can now display the results of the operation as follows:

    out.println(som.toString()); 

The iris dataset uses five attributes: sepallength, sepalwidth, petallength, petalwidth, and class. The first four attributes are numeric and the fifth has three possible values: Iris-setosa, Iris-versicolor, and Iris-virginica. The first part of the abbreviated output that follows identified four clusters and the number of instances in each cluster. This is followed by statistics for each of the attributes:

Self Organized Map
==================
Number of clusters: 4
Cluster
Attribute 0 1 2 3
(50) (42) (29) (29)
==============================================
sepallength
value 5.0036 6.2365 5.5823 6.9513
min 4.3 5.6 4.9 6.2
max 5.8 7 6.3 7.9
mean 5.006 6.25 5.5828 6.9586
std. dev. 0.3525 0.3536 0.3675 0.5046
...
class
value 0 1.5048 1.0787 2
min 0 1 1 2
max 0 2 2 2
mean 0 1.4524 1.069 2
std. dev. 0 0.5038 0.2579 0

These statistics can provide insight into the dataset. If we are interested in determining which dataset instance is found in a cluster, we can use the getClusterInstances method to return the array that groups the instances by cluster. As shown next, this method is used to list the instance by cluster:

Instances[] clusters = som.getClusterInstances(); 
int index = 0; 
for (Instances instances : clusters) { 
    out.println("-------Custer " + index); 
    for (Instance instance : instances) { 
        out.println(instance); 
    } 
    out.println(); 
    index++; 
} 

As we can see with the abbreviated output of this sequence, different iris classes are grouped into the different clusters:

-------Custer 0
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
...
5.3,3.7,1.5,0.2,Iris-setosa
5,3.3,1.4,0.2,Iris-setosa
-------Custer 1
7,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
...
6.5,3,5.2,2,Iris-virginica
5.9,3,5.1,1.8,Iris-virginica

-------Custer 2
5.5,2.3,4,1.3,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
4.9,2.4,3.3,1,Iris-versicolor
...
4.9,2.5,4.5,1.7,Iris-virginica
6,2.2,5,1.5,Iris-virginica

-------Custer 3
6.3,3.3,6,2.5,Iris-virginica
7.1,3,5.9,2.1,Iris-virginica
6.5,3,5.8,2.2,Iris-virginica
...

The cluster results can be displayed visually using the Weka GUI interface. In the following screenshot, we have used the Weka Workbench to analyze and visualize the result of the SOM analysis:

Displaying the SOM results

An individual section of the graph can be selected, customized, and analyzed as follows:

Displaying the SOM results

However, before you can use the SOM class, the WekaPackageManager must be used to add the SOM package to Weka. This process is discussed at https://weka.wikispaces.com/How+do+I+use+the+package+manager%3F.

If a new instance needs to be mapped to a cluster, the distributionForInstance method can be used as illustrated in Predicting other values section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.251.169