Dynamic neural networks differ from static networks in that they continue learning after the training phase. They can make adjustments to their structure independently of external modification. A feedforward neural network (FNN) is one of the earliest and simplest dynamic neural networks. This type of network, as its name implies, only feeds information forward and does not form any cycles. This type of network formed the foundation for much of the later work in dynamic ANNs. We will show in-depth examples of two types of dynamic networks in this section, MLP networks and SOMs.
A MLP network is a FNN with multiple layers. The network uses supervised learning with backpropagation where feedback is sent to early layers to assist in the learning process. Some of the neurons use a nonlinear activation function mimicking biological neurons. Every nodes of one layer is fully connected to the following layer.
We will use a dataset called dermatology.arff
that can be downloaded from http://repository.seasr.org/Datasets/UCI/arff/. This dataset contains 366 instances used to diagnosis erythemato-squamous diseases. It uses 34 attributes to classify the disease into one of five different categories. The following is a sample instance:
2,2,0,3,0,0,0,0,1,0,0,0,0,0,0,3,2,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,0,55,2
The last field represents the disease category. This dataset has been partitioned into two files: dermatologyTrainingSet.arff
and dermatologyTestingSet.arff
. The training set uses the first 80% (292 instances) of the original set and ends with line 456. The testing set is the last 20% (74 instances) and starts with line 457 of the original set (lines 457-530).
Before we can make any predictions, it is necessary that we train the model on a representative set of data. We will use the Weka class, MultilayerPerceptron
, for training and eventually to make predictions. First, we declare strings for the training and testing of filenames and the corresponding FileReader
instances for them. The instances are created and the last field is specified as the field to use for classification:
String trainingFileName = "dermatologyTrainingSet.arff"; String testingFileName = "dermatologyTestingSet.arff"; try (FileReader trainingReader = new FileReader(trainingFileName); FileReader testingReader = new FileReader(testingFileName)) { Instances trainingInstances = new Instances(trainingReader); trainingInstances.setClassIndex( trainingInstances.numAttributes() - 1); Instances testingInstances = new Instances(testingReader); testingInstances.setClassIndex( testingInstances.numAttributes() - 1); ... } catch (Exception ex) { // Handle exceptions }
An instance of the MultilayerPerceptron
class is then created:
MultilayerPerceptron mlp = new MultilayerPerceptron();
There are several model parameters that we can set, as shown here:
Parameter |
Method |
Description |
Learning rate |
|
Affects the training speed |
Momentum |
|
Affects the training speed |
Training time |
|
The number of training epochs used to train the model |
Hidden layers |
|
The number of hidden layers and perceptrons to use |
As mentioned previously, the learning rate will affect the speed in which your model is trained. A large value can increase the training speed. If the learning rate is too small, then the training time may take too long. If the learning rate is too large, then the model may move past a local minimum and become divergent. That is, if the increase is too large, we might skip over a meaningful value. You can think of this a graph where a small dip in a plot along the Y axis is missed because we incremented our X value too much.
Momentum also affects the training speed by effectively increasing the rate of learning. It is used in addition to the learning rate to add momentum to the search for the optimal value. In the case of a local minimum, the momentum helps get out of the minimum in its quest for a global minimum.
When the model is learning it performs operations iteratively. The term, epoch is used to refer to the number of iterations. Hopefully, the total error encounter with each epoch will decrease to a point where further epochs are not useful. It is ideal to avoid too many epochs.
A neural network will have one or more hidden layers. Each of these layers will have a specific number of perceptrons. The setHiddenLayers
method specifies the number of layers and perceptrons using a string. For example, 3,5 would specify two hidden layers with three and five perceptrons per layer, respectively.
For this example, we will use the following values:
mlp.setLearningRate(0.1); mlp.setMomentum(0.2); mlp.setTrainingTime(2000); mlp.setHiddenLayers("3");
The buildClassifier
method uses the training data to build the model:
mlp.buildClassifier(trainingInstances);
The next step is to evaluate the model. The Evaluation
class is used for this purpose. Its constructor takes the training set as input and the evaluateModel
method performs the actual evaluation. The following code illustrates this using the testing dataset:
Evaluation evaluation = new Evaluation(trainingInstances); evaluation.evaluateModel(mlp, testingInstances);
One simple way of displaying the results of the evaluation is using the toSummaryString
method:
System.out.println(evaluation.toSummaryString());
This will display the following output:
Correctly Classified Instances 73 98.6486 % Incorrectly Classified Instances 1 1.3514 % Kappa statistic 0.9824 Mean absolute error 0.0177 Root mean squared error 0.076 Relative absolute error 6.6173 % Root relative squared error 20.7173 % Coverage of cases (0.95 level) 98.6486 % Mean rel. region size (0.95 level) 18.018 % Total Number of Instances 74
Frequently, it will be necessary to experiment with these parameters to get the best results. The following are the results of varying the number of perceptrons:
Once we have a model trained, we can use it to evaluate other data. In the previous testing dataset there was one instance which failed. In the following code sequence, this instance is identified and the predicted and actual results are displayed.
Each instance of the testing dataset is used as input to the classifyInstance
method. This method tries to predict the correct result. This result is compared to the last field of the instance that contains the actual value. For mismatches, the predicted and actual values are displayed:
for (int i = 0; i < testingInstances.numInstances(); i++) { double result = mlp.classifyInstance( testingInstances.instance(i)); if (result != testingInstances .instance(i) .value(testingInstances.numAttributes() - 1)) { out.println("Classify result: " + result + " Correct: " + testingInstances.instance(i) .value(testingInstances.numAttributes() - 1)); ... } }
For the testing set we get the following output:
Classify result: 1.0 Correct: 3.0
We can get the likelihood of the prediction being correct using the MultilayerPerceptron
class' distributionForInstance
method. Place the following code into the previous loop. It will capture the incorrect instance, which is easier than instantiating an instance based on the 34 attributes used by the dataset. The distributionForInstance
method takes this instance and returns a two element array of doubles. The first element is the probability of the result being positive and the second is the probability of it being negative:
Instance incorrectInstance = testingInstances.instance(i); incorrectInstance.setDataset(trainingInstances); double[] distribution = mlp.distributionForInstance(incorrectInstance); out.println("Probability of being positive: " + distribution[0]); out.println("Probability of being negative: " + distribution[1]);
The output for this instance is as follows:
Probability of being positive: 0.00350515156929017 Probability of being negative: 0.9683660500711128
This can provide a more quantitative feel for the reliability of the prediction.
We can also save and retrieve a model for later use. To save the model, build the model and then use the SerializationHelper
class' static method write
, as shown in the following code snippet. The first argument is the name of the file to hold the model:
SerializationHelper.write("mlpModel", mlp);
To retrieve the model, use the corresponding read
method as shown here:
mlp = (MultilayerPerceptron)SerializationHelper.read("mlpModel");
Next, we will learn how to use another useful neural network approach, SOMs.
Learning Vector Quantization (LVQ) is another special type of a dynamic ANN. SOMs, which we will discuss in a moment, are a by-product of LVQ networks. This type of network implements a competitive type of algorithm in which the winning neuron gains the weight. These types of networks are used in many different applications and are considered to be more natural and intuitive than some other ANNs. In particular, LVQ is effective for classification of text-based data.
The basic algorithm begins by setting the number of neurons, the weight for each neuron, how fast the neurons can learn, and a list of input vectors. In this context, a vector is similar to a vector in physics and represents the values provided to the input layer neurons. As the network is trained, a vector is used as input, a winning neuron is selected, and the weight of the winning neuron is updated. This model is iterative and will continue to run until a solution is found.
SOMs is a technique that takes multidimensional data and reducing it to one or two dimensions. This compression technique is called vector quantization. The technique usually involves a visual component that allows a human to better see how the data has been categorized. SOM learns without supervision.
The SOM is good for finding clusters, which is not to be confused with classification. With classification we are interested in finding the best fit for a data instance among predefined categories. With clustering we are interested in grouping instances where the categories are unknown.
A SOM uses a lattice of neurons, usually a two-dimensional array or a hexagonal grid, representing neurons that are assigned weights. The input sources are connected to each of these neurons. The technique then adjusts the weights assigned to each lattice member through several iterations until the best fit is found. When finished, the lattice members will have grouped the input dataset into categories. The SOM results can be viewed to identify categories and map new input to one of the identified categories.
We will use the Weka to demonstrate SOM. However, it is does not come installed with standard Weka. Instead, we will need to download a set of Weka classification algorithms from https://sourceforge.net/projects/wekaclassalgos/files/ and the actual SOM class from http://www.cis.hut.fi/research/som_pak/. The classification algorithms include support for LVQ. More details about the classification algorithms can be found at http://wekaclassalgos.sourceforge.net/.
To use the SOM class, called SelfOrganizingMap
, the source code needs to be in your project. The Javadoc for this class is found at http://jsalatas.ictpro.gr/weka/doc/SelfOrganizingMap/.
We start with the creation of an instance of the SelfOrganizingMap
class. This is followed by code to read in data and create an Instances
object to hold the data. In this example, we will use the iris.arff
file, which can be found in the Weka data directory. Notice that once the Instances
object is created we do not specify the class index as we did with previous Weka examples since SOM uses unsupervised learning:
SelfOrganizingMap som = new SelfOrganizingMap(); String trainingFileName = "iris.arff"; try (FileReader trainingReader = new FileReader(trainingFileName)) { Instances trainingInstances = new Instances(trainingReader); ... } catch (IOException ex) { // Handle exceptions } catch (Exception ex) { // Handle exceptions }
The buildClusterer
method will execute the SOM algorithm using the training dataset:
som.buildClusterer(trainingInstances);
We can now display the results of the operation as follows:
out.println(som.toString());
The iris
dataset uses five attributes: sepallength
, sepalwidth
, petallength
, petalwidth
, and class
. The first four attributes are numeric and the fifth has three possible values: Iris-setosa
, Iris-versicolor
, and Iris-virginica
. The first part of the abbreviated output that follows identified four clusters and the number of instances in each cluster. This is followed by statistics for each of the attributes:
Self Organized Map ================== Number of clusters: 4 Cluster Attribute 0 1 2 3 (50) (42) (29) (29) ============================================== sepallength value 5.0036 6.2365 5.5823 6.9513 min 4.3 5.6 4.9 6.2 max 5.8 7 6.3 7.9 mean 5.006 6.25 5.5828 6.9586 std. dev. 0.3525 0.3536 0.3675 0.5046 ... class value 0 1.5048 1.0787 2 min 0 1 1 2 max 0 2 2 2 mean 0 1.4524 1.069 2 std. dev. 0 0.5038 0.2579 0
These statistics can provide insight into the dataset. If we are interested in determining which dataset instance is found in a cluster, we can use the getClusterInstances
method to return the array that groups the instances by cluster. As shown next, this method is used to list the instance by cluster:
Instances[] clusters = som.getClusterInstances(); int index = 0; for (Instances instances : clusters) { out.println("-------Custer " + index); for (Instance instance : instances) { out.println(instance); } out.println(); index++; }
As we can see with the abbreviated output of this sequence, different iris
classes are grouped into the different clusters:
-------Custer 0 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa ... 5.3,3.7,1.5,0.2,Iris-setosa 5,3.3,1.4,0.2,Iris-setosa -------Custer 1 7,3.2,4.7,1.4,Iris-versicolor 6.4,3.2,4.5,1.5,Iris-versicolor 6.9,3.1,4.9,1.5,Iris-versicolor ... 6.5,3,5.2,2,Iris-virginica 5.9,3,5.1,1.8,Iris-virginica -------Custer 2 5.5,2.3,4,1.3,Iris-versicolor 5.7,2.8,4.5,1.3,Iris-versicolor 4.9,2.4,3.3,1,Iris-versicolor ... 4.9,2.5,4.5,1.7,Iris-virginica 6,2.2,5,1.5,Iris-virginica -------Custer 3 6.3,3.3,6,2.5,Iris-virginica 7.1,3,5.9,2.1,Iris-virginica 6.5,3,5.8,2.2,Iris-virginica ...
The cluster results can be displayed visually using the Weka GUI interface. In the following screenshot, we have used the Weka Workbench to analyze and visualize the result of the SOM analysis:
An individual section of the graph can be selected, customized, and analyzed as follows:
However, before you can use the SOM
class, the WekaPackageManager
must be used to add the SOM
package to Weka. This process is discussed at https://weka.wikispaces.com/How+do+I+use+the+package+manager%3F.
If a new instance needs to be mapped to a cluster, the distributionForInstance
method can be used as illustrated in Predicting other values section.
18.221.251.169