In the previous section, we built a very simple neural network with just an input and output layer. This simple neural network gave us an accuracy of 86%. Let's see if we can improve this accuracy further by building a neural network that is a little deeper than the previous version:
- Let's do this on a new notebook. Loading the dataset and data pre-processing will be the same as in the previous section:
import numpy as np
np.random.seed(42)
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SG
#loading and pre-processing data
(X_train,y_train), (X_test,y_test)= mnist.load_data() X_train= X_train.reshape( 60000, 784). astype('float32')
X_test =X_test.reshape(10000,784).astype('float32') X_train/=255 X_test/=255
- The design of the neural network is slightly different from the previous version. We will add a hidden layer with 64 neurons to the network, along with the input and output layers:
model=Sequential()
model.add(Dense(64,activation='relu', input_shape=(784,))) model.add(Dense(64,activation='relu')) model.add(Dense(10,activation='softmax'))
- Also, we will use the relu activation function for the input and hidden layer instead of the sigmoid function we used previously.
- We can inspect the model design and architecture as follows:
model.summary()
_______________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 64) 50240
_______________________________________________________________
dense_2 (Dense) (None, 64) 4160
_______________________________________________________________
dense_3 (Dense) (None, 10) 650
=================================================================
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________
- Next, we will configure the model to use the derivative categorical_crossentropy cost function rather than MSE. Also, the learning rate is increased from 0.01 to 0.1:
model.compile(loss='categorical_crossentropy',optimizer=SGD(lr=0.1),
metrics =['accuracy'])
- Now, we will train the model, like we did in the previous examples:
model.fit(X_train,y_train,batch_size=128,epochs=200,verbose=1,validation_data =(X_test,y_test))
- Train on 60,000 samples and validate on 10,000 samples:
Epoch 1/200
60000/60000 [==============================] - 1s - loss: 0.4785 - acc: 0.8642 - val_loss: 0.2507 - val_acc: 0.9255
Epoch 2/200
60000/60000 [==============================] - 1s - loss: 0.2245 - acc: 0.9354 - val_loss: 0.1930 - val_acc: 0.9436
.
.
.
60000/60000 [==============================] - 1s - loss: 4.8932e-04 - acc: 1.0000 - val_loss: 0.1241 - val_acc: 0.9774
<keras.callbacks.History at 0x7f3096adadd8>
As you can see, there is an increase in accuracy compared to the model we built in the first version.