Chapter 8. Bayesian Neural Networks

As the name suggests, artificial neural networks are statistical models built taking inspirations from the architecture and cognitive capabilities of biological brains. Neural network models typically have a layered architecture consisting of a large number of neurons in each layer, and neurons between different layers are connected. The first layer is called input layer, the last layer is called output layer, and the rest of the layers in the middle are called hidden layers. Each neuron has a state that is determined by a nonlinear function of the state of all neurons connected to it. Each connection has a weight that is determined from the training data containing a set of input and output pairs. This kind of layered architecture of neurons and their connections is present in the neocortex region of human brain and is considered to be responsible for higher functions such as sensory perception and language understanding.

The first computational model for neural network was proposed by Warren McCulloch and Walter Pitts in 1943. Around the same time, psychologist Donald Hebb created a hypothesis of learning based on the mechanism of excitation and adaptation of neurons that is known as Hebb's rule. The hypothesis can be summarized by saying Neurons that fire together, wire together. Although there were several researchers who tried to implement computational models of neural networks, it was Frank Rosenblatt in 1958 who first created an algorithm for pattern recognition using a two-layer neural network called Perceptron.

The research and applications of neural networks had both stagnant and great periods of progress during 1970-2010. Some of the landmarks in the history of neural networks are the invention of the backpropagation algorithm by Paul Werbos in 1975, a fast learning algorithm for learning multilayer neural networks (also called deep learning networks) by Geoffrey Hinton in 2006, and the use of GPGPUs to achieve greater computational power required for processing neural networks in the latter half of the last decade.

Today, neural network models and their applications have again taken a central stage in artificial intelligence with applications in computer vision, speech recognition, and natural language understanding. This is the reason this book has devoted one chapter specifically to this subject. The importance of Bayesian inference in neural network models will become clear when we go into detail in later sections.

Two-layer neural networks

Let us look at the formal definition of a two-layer neural network. We follow the notations and description used by David MacKay (reference 1, 2, and 3 in the References section of this chapter). The input to the NN is given by Two-layer neural networks. The input values are first multiplied by a set of weights to produce a weighted linear combination and then transformed using a nonlinear function to produce values of the state of neurons in the hidden layer:

Two-layer neural networks

A similar operation is done at the second layer to produce final output values Two-layer neural networks:

Two-layer neural networks

The function Two-layer neural networks is usually taken as either a sigmoid function Two-layer neural networks or Two-layer neural networks. Another common function used for multiclass classification is softmax defined as follows:

Two-layer neural networks

This is a normalized exponential function.

All these are highly nonlinear functions exhibiting the property that the output value has a sharp increase as a function of the input. This nonlinear property gives neural networks more computational flexibility than standard linear or generalized linear models. Here, Two-layer neural networks is called a bias parameter. The weights Two-layer neural networks together with biases Two-layer neural networks form the weight vector w .

The schematic structure of the two-layer neural network is shown here:

Two-layer neural networks

The learning in neural networks corresponds to finding the value of weight vector such as w, such that for a given dataset consisting of ground truth values input and target (output), Two-layer neural networks, the error of prediction of target values by the network is minimum. For regression problems, this is achieved by minimizing the error function:

Two-layer neural networks

For the classification task, in neural network training, instead of squared error one uses a cross entropy defined as follows:

Two-layer neural networks

To avoid overfitting, a regularization term is usually also included in the objective function. The form of the regularization function is usually Two-layer neural networks, which gives penalty to large values of w, reducing the chances of overfitting. The resulting objective function is as follows:

Two-layer neural networks

Here, Two-layer neural networks and Two-layer neural networks are free parameters for which the optimum values can be found from cross-validation experiments.

To minimize M(w) with respect to w, one uses the backpropagation algorithm as described in the classic paper by Rumelhart, Hinton, and Williams (reference 3 in the References section of this chapter). In the backpropagation for each input/output pair, the value of the predicted output is computed using a forward pass from the input layer. The error, or the difference between the predicted output and actual output, is propagated back and at each node, the weights are readjusted so that the error is a minimum.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.87.161