A. Formal Neural Network Notation

To keep discussion of artificial neurons as straightforward as possible, in this book we used a shorthand notation to identify them within a network. In this appendix, we lay out a more widely used formal notation, which may be of interest if you’d like to:

  • Possess a more precise manner for describing neurons

  • Follow closely the backpropagation technique covered in Appendix B

Taking a look back at Figure 7.1, the neural network has a total of four layers. The first is the input layer, which can be thought of as a collection of starting blocks for each data point to enter the network. In the case of the MNIST models, for example, there are 784 such starting blocks, representing each of the pixels in a 28×28–pixel handwritten MNIST digit. No computation happens within an input layer; it simply holds space for the input values to exist in so that the network knows how many values it needs to be ready to compute on in the next layer.1

1. For this reason, we usually don’t need a means to address a particular input neuron; they have no weights or biases.

The next two layers in the network in Figure 7.1 are hidden layers, in which the bulk of the computation within a neural network occurs. As we’ll soon discuss, the input values x are mathematically transformed and combined by each neuron in the hidden layer, outputting some activation value a. Because we need a way to address specific neurons in specific layers, we’ll use superscript to define a layer, starting at the first hidden layer, and subscript to define a neuron in that layer. In Figure 7.1, then, we’d have a11, a21, and a31 in the first hidden layer. In this way, we can precisely refer to an individual neuron in a specific layer. For example, a22 represents the second neuron in the second hidden layer.

Because Figure 7.1 is a dense network, the neuron a11 receives inputs from all of the neurons in the preceding layer, namely the network inputs x1 and x2. Each neuron has its own bias, b, and we’ll label that bias in exactly the same manner as the activation a: For example, b21 is the bias for the second neuron in the first hidden layer.

The green arrows in Figure 7.1 represent the mathematical transformation that takes place during forward propagation, and each green arrow has its own individual weight associated with it. In order to refer to these weights directly, we employ the following notation: w(1,2)1 is the weight in the first hidden layer (superscript) that connects neuron a11 to its input x2 in the input layer (subscript). This double-barreled subscript is necessary because the network is fully connected: Every neuron in a layer is connected to every neuron in the layer before it, and that connection carries its own weight. Let’s generalize this weight notation:

  • The superscript is the hidden-layer number of the input-receiving neuron.

  • The first subscript is the number of the neuron receiving the input within its hidden layer.

  • The second subscript is the number of the neuron providing input from the preceding layer.

As a further example, the weight for neuron a22 will be denoted w(2,i)2 where i is a neuron in the preceding layer.

At the far right of the network, we finally have the output layer. As with the hidden layers, output-layer neurons have weights and a bias, and these are labeled in the same way.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.