Loading the MNIST dataset

The easiest way to obtain the MNIST dataset is by using Keras:

In [1]: from keras.datasets import mnist
... (X_train, y_train), (X_test, y_test) = mnist.load_data()
Out[1]: Using TensorFlow backend.
Downloading data from
https://s3.amazonaws.com/img-datasets/mnist.npz

This will download the data from Amazon Cloud (might take a while depending on your internet connection) and automatically split the data into training and test sets.

MNIST provides its own predefined train-test split. This way, it is easier to compare the performance of different classifiers because they will all use the same data for training and the same data for testing.

This data comes in a format that we are already familiar with:

In [2]: X_train.shape, y_train.shape
Out[2]: ((60000, 28, 28), (60000,))

We should take note that the labels come as integer values between zero and nine (corresponding to the digits 0-9):

In [3]: import numpy as np
... np.unique(y_train)
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)

We can have a look at some example digits:

In [4]: import matplotlib.pyplot as plt
... %matplotlib inline
In [5]: for i in range(10):
... plt.subplot(2, 5, i + 1)
... plt.imshow(X_train[i, :, :], cmap='gray')
... plt.axis('off')

The digits look like this:

In fact, the MNIST dataset is the successor to the NIST digits dataset provided by scikit-learn that we used before (sklearn.datasets.load_digits; refer to Chapter 2, Working with Data in OpenCV). Some notable differences are as follows:

  • MNIST images are significantly larger (28 x 28 pixels) than NIST images (8 x 8 pixels), thus paying more attention to fine details such as distortions and individual differences between images of the same digit.
  • The MNIST dataset is much larger than the NIST dataset, providing 60,000 training and 10,000 test samples (as compared to a total of 5,620 NIST images).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.68