Deep diving into ANN

We know that in artificial neurons, we multiply the input by weights, add bias to them and apply an activation function to produce the output. Now, we will see how this happens in a neural network setting where neurons are arranged in layers. The number of layers in a network is equal to the number of hidden layers plus the number of output layers. We don't take the input layer into account. Consider a two-layer neural network with one input layer, one hidden layer, and one output layer, as shown in the following diagram:

Let's say we have two inputs, x₁ and x₂, and we have to predict the output y. Since we have two inputs, the number of neurons in the input layer will be two. Now, these inputs will be multiplied by weights and then we add bias and propagate the resultant value to the hidden layer where the activation function will be applied. So, first we need to initialize the weight matrix. In the real world, we don't know which input is really important and needing to be weighted high to calculate the output. Therefore, we will randomly initialize weights and a bias value. We can denote the weights and bias flowing between the input layer to the hidden layer as w_xh and b_h, respectively. What about the dimensions of the weight matrix? The dimensions of the weight matrix must be [number of neurons in the current layer * number of neurons in the next layer]. Why is that? Because it is a basic matrix multiplication rule. To multiply any two matrices, AB, the number of columns in matrix A must be equal to the number of rows in matrix B. So, the dimension of weight matrix w_xh should be [number of neurons in the input layer * number of neurons in the hidden layer], that is, 2 x 4:

That is, z₁ = (input * weights) + bias. Now, this is passed to the hidden layer. In the hidden layer, we apply an activation function to z₁. Let's consider the following sigmoid activation function:

After applying the activation function, we again multiply result a₁ by a new weight matrix and add a new bias value which is flowing between the hidden layer and the output layer. We can denote this weight matrix and bias as w_hy and b_y, respectively. The dimension of this weight matrix w_hy will be [number of neurons in the hidden layer * number of neurons in the output layer]. Since we have four neurons in the hidden layer and one neuron in the output layer, the w_hy matrix dimension will be 4 x 1. So, we multiply a₁ by the weight matrix w_hy and add bias b_y and pass the result to next layer, which is the output layer:

Now, in the output layer, we apply a sigmoid function to z₂, which will result in an output value:

This whole process from the input layer to the output layer is known as forward propagation shown as follows:

    def forwardProp():
          z1 = np.dot(x,wxh) + bh
          a1 = sigmoid(z1)
          z2 = np.dot(a1,why) + by
          yHat = sigmoid(z2)

Forward propagation is cool, isn't it? But how do we know whether the output generated by the neural network is correct? We must define a new function called the cost function (J), also known as the loss function, which tells us how well our neural network is performing. There are many different cost functions. We will use the mean squared error as a cost function, which can be defined as the mean of the squared difference between the actual value and the predicted value :

Our objective is to minimize the cost function so that our neural network predictions will be better. How can we minimize the cost function? We can minimize the cost function by changing some values in our forward propagation. What values can we change? Obviously, we can't change input and output. We are now left with weights and bias values. We just initialized weight matrices and biases randomly, so it's not going to be perfect. Now, we will adjust these weight matrices (w_xh and w_hy) in such a way that our neural network gives a good result. How do we adjust these weight matrices? Here comes a new technique called gradient descent.

Table of Contents for Deep diving into ANN

Create new playlist

Sign In

Sign Up

Table of Contents for
Deep diving into ANN