Understanding static neural networks

Static neural networks are ANNs that undergo a training or learning phase and then do not change when they are used. They differ from dynamic neural networks, which learn constantly and may undergo structural changes after the initial training period. Static neural networks are useful when the results of a model are relatively easy to reproduce or are more predictable. We will look at dynamic neural networks in a moment, but we will begin by creating our own basic static neural network.

A basic Java example

Before we examine various libraries and tools available for constructing neural networks, we will implement our own basic neural network using standard Java libraries. The next example is an adaptation of work done by Jeff Heaton (http://www.informit.com/articles/article.aspx?p=30596). We will construct a feed-forward backpropagation neural network and train it to recognize the XOR operator pattern. Here is the basic truth table for XOR:

X

Y

Result

0

0

0

0

1

1

1

0

1

1

1

0

This network needs only two input neurons and one output neuron corresponding to the X and Y input and the result. The number of input and output neurons needed for models is dependent upon the problem at hand. The number of hidden neurons is often the sum of the number of input and output neurons, but the exact number may need to be changed as training progresses.

We are going to demonstrate how to create and train the network next. We first provide the network with an input and observe the output. The output is compared to the expected output and then the weight matrix, called weightChanges, is adjusted. This adjustment ensures that the subsequent output will be closer to the expected output. This process is repeated until we are satisfied that the network can produce results significantly close enough to the expected output. In this example, we present the input and output as arrays of doubles where each input or output neuron is an element of the array.

Note

The input and output are sometimes referred to as patterns.

First, we will create a SampleNeuralNetwork class to implement the network. Begin by adding the variables listed underneath to the class. We will discuss and demonstrate their purposes later in this section. Our class contains the following instance variables:

   double errors; 
   int inputNeurons; 
   int outputNeurons; 
   int hiddenNeurons; 
   int totalNeurons; 
   int weights; 
   double learningRate; 
   double outputResults[]; 
   double resultsMatrix[]; 
   double lastErrors[]; 
   double changes[]; 
   double thresholds[]; 
   double weightChanges[]; 
   double allThresholds[]; 
   double threshChanges[]; 
   double momentum; 
   double errorChanges[]; 

Next, let's take a look at our constructor. We have four parameters, representing the number of inputs to our network, the number of neurons in hidden layers, the number of output neurons, and the rate and momentum at which we wish for learning to occur. The learningRate is a parameter that specifies the magnitude of changes in weight and bias during the training process. The momentum parameter specifies what fraction of a previous weight should be added to create a new weight. It is useful to prevent convergence at local minimums or saddle points. A high momentum increases the speed of convergence in a system, but can lead to an unstable system if it is too high. Both the momentum and learning rate should be values between 0 and 1:

public SampleNeuralNetwork(int inputCount, 
         int hiddenCount, 
         int outputCount, 
         double learnRate, 
         double momentum) { 
   ...
} 
 

Within our constructor we initialize all private instance variables. Notice that totalNeurons is set to the sum of all inputs, outputs, and hidden neurons. This sum is then used to set several other variables. Also notice that the weights variable is calculated by finding the product of the number of inputs and hidden neurons, the product of the hidden neurons and the outputs, and adding these two products together. This is then used to create new arrays of length weight:

     learningRate = learnRate; 
     momentum = momentum; 
 
     inputNeurons = inputCount; 
     hiddenNeurons = hiddenCount; 
     outputNeurons = outputCount; 
     totalNeurons = inputCount + hiddenCount + outputCount; 
     weights = (inputCount * hiddenCount)  
        + (hiddenCount * outputCount); 
 
     outputResults    = new double[totalNeurons]; 
     resultsMatrix   = new double[weights]; 
     weightChanges = new double[weights]; 
     thresholds = new double[totalNeurons]; 
     errorChanges = new double[totalNeurons]; 
     lastErrors    = new double[totalNeurons]; 
     allThresholds = new double[totalNeurons]; 
     changes = new double[weights]; 
     threshChanges = new double[totalNeurons]; 
     reset(); 
   

Notice that we call the reset method at the end of the constructor. This method resets the network to begin training with a random weight matrix. It initializes the thresholds and results matrices to random values. It also ensures that all matrices used for tracking changes are set back to zero. Using random values ensures that different results can be obtained:

public void reset() { 
   int loc; 
   for (loc = 0; loc < totalNeurons; loc++) { 
         thresholds[loc] = 0.5 - (Math.random()); 
         threshChanges[loc] = 0; 
         allThresholds[loc] = 0; 
   } 
   for (loc = 0; loc < resultsMatrix.length; loc++) { 
         resultsMatrix[loc] = 0.5 - (Math.random()); 
         weightChanges[loc] = 0; 
         changes[loc] = 0; 
   } 
} 

We also need a method called calcThreshold. The threshold value specifies how close a value has to be to the actual activation threshold before the neuron will fire. For example, a neuron may have an activation threshold of 1. The threshold value specifies whether a number such as 0.999 counts as 1. This method will be used in subsequent methods to calculate the thresholds for individual values:

public double threshold(double sum) { 
   return 1.0 / (1 + Math.exp(-1.0 * sum)); 
} 
 

Next, we will add a method to calculate the output using a given set of inputs. Both our input parameter and the data returned by the method are arrays of double values. First, we need two position variables to use in our loops, loc and pos. We also want to keep track of our position within arrays based upon the number of input and hidden neurons. The index for our hidden neurons will start after our input neurons, so its position is the same as the number of input neurons. The position of our output neurons is the sum of our input neurons and hidden neurons. We also need to initialize our outputResults array:

public double[] calcOutput(double input[]) { 
   int loc, pos; 
   final int hiddenIndex = inputNeurons; 
   final int outIndex = inputNeurons + hiddenNeurons; 
 
   for (loc = 0; loc < inputNeurons; loc++) { 
         outputResults[loc] = input[loc]; 
   } 
... 
} 

Then we calculate outputs based upon our input neurons for the first layer of our network. Notice our use of the threshold method within this section. Before we can place our sum in the outputResults array, we need to utilize the threshold method:

   int rLoc = 0; 
   for (loc = hiddenIndex; loc < outIndex; loc++) { 
         double sum = thresholds[loc]; 
         for (pos = 0; pos < inputNeurons; pos++) { 
               sum += outputResults[pos] * resultsMatrix[rLoc++]; 
         } 
         outputResults[loc] = threshold(sum); 
   } 

Now we take into account our hidden neurons. Notice the process is similar to the previous section, but we are calculating outputs for the hidden layer rather than the input layer. At the end, we return our result. This result is in the form of an array of doubles containing the values of each output neuron. In our example, there is only one output neuron:

 
   double result[] = new double[outputNeurons]; 
   for (loc = outIndex; loc < totalNeurons; loc++) { 
         double sum = thresholds[loc]; 
 
         for (pos = hiddenIndex; pos < outIndex; pos++) { 
               sum += outputResults[pos] * resultsMatrix[rLoc++]; 
         } 
         outputResults[loc] = threshold(sum); 
         result[loc-outIndex] = outputResults[loc]; 
   } 
 
   return result; 
 

It is quite likely that the output does not match the expected output, given our XOR table. To handle this, we use error calculation methods to adjust the weights of our network to produce better output. The first method we will discuss is the calcError method. This method will be called every time a set of outputs is returned by the calcOutput method. It does not return data, but rather modifies arrays containing weight and threshold values. The method takes an array of doubles representing the ideal value for each output neuron. Notice we begin as we did in the calcOutput method and set up indexes to use throughout the method. Then we clear out any existing hidden layer errors:

public void calcError(double ideal[]) { 
   int loc, pos; 
   final int hiddenIndex = inputNeurons; 
   final int outputIndex = inputNeurons + hiddenNeurons; 
 
      for (loc = inputNeurons; loc < totalNeurons; loc++) { 
            lastErrors[loc] = 0; 
      } 
 

Next we calculate the difference between our expected output and our actual output. This allows us to determine how to adjust the weights for further training. To do this, we loop through our arrays containing the expected outputs, ideal, and the actual outputs, outputResults. We also adjust our errors and change in errors in this section:

 
      for (loc = outputIndex; loc < totalNeurons; loc++) { 
         lastErrors[loc] = ideal[loc - outputIndex] -  
            outputResults[loc]; 
         errors += lastErrors[loc] * lastErrors[loc]; 
         errorChanges[loc] = lastErrors[loc] * outputResults[loc]
            *(1 - outputResults[loc]); 
     } 
 
     int locx = inputNeurons * hiddenNeurons; 
     for (loc = outputIndex; loc < totalNeurons; loc++) { 
           for (pos = hiddenIndex; pos < outputIndex; pos++) { 
                 changes[locx] += errorChanges[loc] *
                       outputResults[pos]; 
                 lastErrors[pos] += resultsMatrix[locx] *
                       errorChanges[loc]; 
                 locx++; 
           } 
           allThresholds[loc] += errorChanges[loc]; 
      } 
 

Next we calculate and store the change in errors for each neuron. We use the lastErrors array to modify the errorChanges array, which contains total errors:

for (loc = hiddenIndex; loc < outputIndex; loc++) { 
      errorChanges[loc] = lastErrors[loc] *outputResults[loc] 
            * (1 - outputResults[loc]); 
}

We also fine tune our system by making changes to the allThresholds array. It is important to monitor the changes in errors and thresholds so the network can improve its ability to produce correct output:

 
   locx = 0;  
   for (loc = hiddenIndex; loc < outputIndex; loc++) { 
         for (pos = 0; pos < hiddenIndex; pos++) { 
               changes[locx] += errorChanges[loc] *  
                     outputResults[pos]; 
               lastErrors[pos] += resultsMatrix[locx] *  
                     errorChanges[loc]; 
               locx++; 
         } 
         allThresholds[loc] += errorChanges[loc]; 
   } 
} 

We have one other method used for calculating errors in our network. The getError method calculates the root mean square for our entire set of training data. This allows us to identify our average error rate for the data:

public double getError(int len) { 
   double err = Math.sqrt(errors / (len * outputNeurons)); 
   errors = 0; 
   return err; 
} 
 

Now that we can initialize our network, compute outputs, and calculate errors, we are ready to train our network. We accomplish this through the use of the train method. This method makes adjustments first to the weights based upon the errors calculated in the previous method, and then adjusts the thresholds:

public void train() { 
   int loc; 
   for (loc = 0; loc < resultsMatrix.length; loc++) { 
      weightChanges[loc] = (learningRate * changes[loc]) +  
         (momentum * weightChanges[loc]); 
      resultsMatrix[loc] += weightChanges[loc]; 
      changes[loc] = 0; 
   } 
   for (loc = inputNeurons; loc < totalNeurons; loc++) { 
      threshChanges[loc] = learningRate * allThresholds[loc] +  
         (momentum * threshChanges[loc]); 
      thresholds[loc] += threshChanges[loc]; 
      allThresholds[loc] = 0; 
   } 
} 

Finally, we can create a new class to test our neural network. Within the main method of another class, add the following code to represent the XOR problem:

double xorIN[][] ={ 
               {0.0,0.0}, 
               {1.0,0.0}, 
               {0.0,1.0}, 
               {1.0,1.0}}; 
 
double xorEXPECTED[][] = { {0.0},{1.0},{1.0},{0.0}}; 

Next we want to create our new SampleNeuralNetwork object. In the following example, we have two input neurons, three hidden neurons, one output neuron (the XOR result), a learn rate of 0.7, and a momentum of 0.9. The number of hidden neurons is often best determined by trial and error. In subsequent executions, consider adjusting the values in this constructor and examine the difference in results:

SampleNeuralNetwork network = new  
                SampleNeuralNetwork(2,3,1,0.7,0.9); 

Note

The learning rate and momentum should usually fall between zero and one.

We then repeatedly call our calcOutput, calcError, and train methods, in that order. This allows us to test our output, calculate the error rate, adjust our network weights, and then try again. Our network should display increasingly accurate results:

 
for (int runCnt=0;runCnt<10000;runCnt++) { 
   for (int loc=0;loc<xorIN.length;loc++) { 
         network.calcOutput(xorIN[loc]); 
         network.calcError(xorEXPECTED[loc]); 
         network.train(); 
   } 
   System.out.println("Trial #" + runCnt + ",Error:" +  
               network.getError(xorIN.length)); 
} 

Execute the application and notice that the error rate changes with each iteration of the loop. The acceptable error rate will depend upon the particular network and its purpose. The following is some sample output from the preceding code. For brevity we have included the first and the last training output. Notice that the error rate is initially above 50%, but falls to close to 1% by the last run:

Trial #0,Error:0.5338334002845255
Trial #1,Error:0.5233475199946769
Trial #2,Error:0.5229843653785426
Trial #3,Error:0.5226263062497853
Trial #4,Error:0.5226916275713371
...
Trial #994,Error:0.014457034704806316
Trial #995,Error:0.01444865096401158
Trial #996,Error:0.01444028142777395
Trial #997,Error:0.014431926056394229
Trial #998,Error:0.01442358481032747
Trial #999,Error:0.014415257650182488

In this example, we have used a small scale problem and we were able to train our network rather quickly. In a larger scale problem, we would start with a training set of data and then use additional datasets for further analysis. Because we really only have four inputs in this scenario, we will not test it with any additional data.

This example demonstrates some of the inner workings of a neural network, including details about how errors and output can be calculated. By exploring a relatively simple problem we are able to examine the mechanics of a neural network. In our next examples, however, we will use tools that hide these details from us, but allow us to conduct robust analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.38.92