Neural networks in TensorFlow

Now, we will see how to build a basic neural network using TensorFlow, which predicts handwritten digits. We will use the popular MNIST dataset which has a collection of labeled handwritten images for training.

First, we must import TensorFlow and load the dataset from tensorflow.examples.tutorial.mnist:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Now, we will see what we have in our data:

print("No of images in training set {}".format(mnist.train.images.shape))
print("No of labels in training set {}".format(mnist.train.labels.shape))

print("No of images in test set {}".format(mnist.test.images.shape))
print("No of labels in test set {}".format(mnist.test.labels.shape))

It will print the following:

No of images in training set (55000, 784)
No of labels in training set (55000, 10)
No of images in test set (10000, 784)
No of labels in test set (10000, 10)

We have 55000 images in the training set and each image is of size 784. We also have 10 labels which are actually 0 to 9. Similarly, we have 10000 images in the test set.

Now, we plot an input image to see what it looks like:

img1 = mnist.train.images[41].reshape(28,28)
plt.imshow(img1, cmap='Greys')

Let's start building our network. We will build the two-layer neural network with one input layer, one hidden layer, and one output layer which predicts a handwritten digit.

First, we define the placeholders for our input and output. As our input data shape is 784, we can define the input placeholder as:

x = tf.placeholder(tf.float32, [None, 784])

What does None imply? None specifies the number of samples (batch size) passed, which will be decided dynamically at runtime.

Since we have 10 classes as output, we can define the placeholder output as:

 y = tf.placeholder(tf.float32, [None, 10]

Next, we initialize our hyperparameters:

learning_rate = 0.1
epochs = 10
batch_size = 100

We then define the weight and biases between an input to the hidden layer as w_xh and b_h, respectively. We initialize the weight matrix with values, randomly drawing from a normal distribution with a standard deviation of 0.03:

w_xh = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='w_xh')
b_h = tf.Variable(tf.random_normal([300]), name='b_h')

Next, we define the weights and bias between our hidden layer to the output layer as w_hy and b_y, respectively:

w_hy = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='w_hy')
b_y = tf.Variable(tf.random_normal([10]), name='b_y')

Let's perform the forward propagation now. Recall the steps we performed in forward propagation:

z1 = tf.add(tf.matmul(x, w_xh), b_h)
a1 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a1, w_hy), b_y)
yhat = tf.nn.softmax(z2)

We define our cost function as a cross-entropy loss. Cross-entropy loss is also known as log loss and it can be defined as follows:

Where is the actual value and is the predicted value:

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(yhat), reduction_indices=[1]))

Our objective is to minimize the cost function. We can minimize the cost function by propagating the network backward and perform a gradient descent. With TensorFlow, we don't have to manually calculate the gradients; we can use TensorFlow's built-in gradient descent optimizer function as follows:

optimiser = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

To evaluate our model, we will calculate the accuracy as follows:

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(yhat, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

As we know that TensorFlow runs by building the computation graph, whatever we have written so far will actually only run if we start the TensorFlow session. So, let's do that.

First, initialize the TensorFlow variables:

init_op = tf.global_variables_initializer()

Now, start the TensorFlow session and start training the model:

with tf.Session() as sess:
   sess.run(init_op)
   total_batch = int(len(mnist.train.labels) / batch_size)
    
   for epoch in range(epochs):
        avg_cost = 0
        
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
            
            _, c = sess.run([optimiser, cross_entropy], 
                         feed_dict={x: batch_x, y: batch_y})
            
            avg_cost += c / total_batch
            
        print("Epoch:", (epoch + 1), "cost =""{:.3f}".format(avg_cost))
        
   print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))

Table of Contents for Neural networks in TensorFlow

Create new playlist

Sign In

Sign Up

Table of Contents for
Neural networks in TensorFlow