Defining the network architecture

We just learned how to write and use a simple neural network to warn a user when they are getting too close to an object. As you look through the code, appreciate that most of these values are internally adjusted as part of training. When using a neural network, it is important to understand these basic principals:

Activation function: If you are not using sigmoid, then you will also need to find the partial derivative of your activation function in order to use gradient descent with backward propagation.
# Input neurons: This will not only set the complexity of the network, but it will also determine the number of hidden or middle layer of neurons.
# Output neurons: How many outputs or ways do you need your network to classify?
# Hidden layers/neurons: As a good rule of thumb, you want to use the average of the input and output neurons, or just input+output/2. We will apply this rule in our next example.
Training method: Our neural network supports two methods of training: minimum error or by epoch or number of iterations. Our preference will be to use minimum error, as this quantifies our model better.

Included in the source code download for this chapter is a working example in an asset package of our simple neural network being used as an environment or object recognizer. Jump back to Unity and perform the following steps to set up this example:

Ensure that you save your existing project or download a new ARCore template before beginning. The asset import will overwrite your existing files, so you should make a backup before continuing if you want to keep any of your earlier work.

From the menu, select Assets | Import Package | Custom Package. Use the file dialog to navigate to the Code/Chapter_8 folder of the book's downloaded source code and import Chapter_8_Final.unitypackage.

Open the Main scene from the Assets/ARCoreML folder.
Open the Build Settings dialog and ensure that the Main scene is added to the build and is active.
Connect, build, and run. Now when you run the app, you will see two buttons at the top of the interface: one that says Train 0 and one that says Train 1.
Face your device on an area you want the NN to recognize. Ensure that ARCore is identifying plenty of blue points on the screen, and then press the Train 1 button; this will signal to the network that you want it to identify this feature set.
Face the device on an area that you don't want the NN to recognize and press the Train 0 button; this will reinforce to the network that you do not want it to recognize this area.
While staying in place, continue this process. Point your device at the same area you want recognized repeatedly and press Train 1. Likewise, do this for areas you don't want recognized, but ensure that you press the Train 0 button. After you train 10 or so times, you should start hearing the warning beep, identifying when the NN has recognized your area.
If you start hearing the warning tones, that will be an indicator that your NN is starting to learn. Continue to spin around in the place, training the network, making sure to correct the network by pressing the appropriate button. You will likely have to do this several times (perhaps 20 to 50 times or so) before you note that the NN recognizes the area you want.

Ensure that when you are training the network, you can see plenty of blue points. If you don't see any points, you will essentially be training with null data.

Finally, when your network is fully trained, you should be able to spin slowly around the room and hear when your device recognizes your region of choice.

Using our simple NN, we were able to build an object/feature recognizer that we could train to recognize specific features, places, or objects. This example is quite simple and not very robust or accurate. However, considering the limited training dataset, it does a good job of being able to recognize features on the fly. Open up the Environmental Scanner script, and we will take a look at how the network is configured:

Scroll down to the Awake method and take a look at how the network is created:

public void Awake()
{ 
  int numInputs, numHiddenLayers, numOutputs;
  numInputs = 25; numHiddenLayers = 13; numOutputs = 1;
  net = new NeuralNet(numInputs, numHiddenLayers, numOutputs);
  dataSets = new List<DataSet>();
  normInputs = new double[numInputs];
}

Note that this time we are creating an input layer of 25 neurons and output of 1. If we stick to the general rule for our hidden layer being the average of the input and output, that equates to 13 [(25+1)/2=13].
We removed the initial NN setup and training from Start and moved it to the bottom in a new method called Train:

private void Train()
{ 
  net.Train(dataSets, 100);
  trained = dataSets.Count > 10;
}

This time, we are using a different form of training called epoch. We use this form of training when we are not actually sure what the expected error is or it needs to change, as in this case. Think about this—when we start training our network with a very limited dataset, our error rates will be high due to our lack of data. This will mean that we will never be able to train our network to a minimum error. It, therefore, makes more sense to just run our training algorithm for a set number of iterations or epochs for every training cycle.
Just preceding Train is TrainNetwork, and it's shown as follows:

public void TrainNetwork(float expected)
{
  this.expected = expected;
  training = true;
}

TrainNetwork is a public method that we use to signal to the Environmental Scanner to initiate a training cycle with the expected outcome. This allows us to wire up event handlers on the UI buttons to call this method with an expected value. When you press Train 0, TrainNetwork is passed 0.0, and after the Train 1 button is pressed, 1.0 is passed.
Scroll up to the Update method and look at the following section of code:

if (training)
{ 
  dataSets.Add(new DataSet(normInputs, new double[] { expected }));
  training = false;
  Train();
}

This is the block of code that checks the training flag. If it is set, it collects the normalized inputs and adds them to dataSets with the expected outcome. We then turn the flag off and call Train.
Scroll up to the following block of code, and you can see how we are normalizing the training inputs:

for (int i = 0; i < normInputs.Length; i++)
{
  if (i < pointCloud.PointCount)
  {
    //normalize the inputs
    normInputs[i] = inputs[i] / max;
  }
  else
  {
    normInputs[i] = 0;
  }
}

Here, we are normalizing the inputs. An input represents the distance or magnitude between an identified point and the camera (user). Normalizing is scaling or converting your data to values in the range 0 to 1. We do this, in this case, by finding the maximum distance of each point and then using that to divide into all the other inputs. The test in the loop to check whether i is less than the PointCount is to ensure that we always set a value for each input neuron.

The rest of the code is similar to what we wrote earlier and not worth going over again.

Table of Contents for Defining the network architecture

Create new playlist

Sign In

Sign Up

Table of Contents for
Defining the network architecture