Preprocessing the MNIST dataset

As we learned in Chapter 4, Representing Data and Engineering Features, there are a number of preprocessing steps we might like to apply here:

  • Centering: It is important that all the digits are centered in the image. For example, take a look at all the example images of the digit 1 in the preceding diagram, which are all made of an almost-vertical strike. If the images were misaligned, the strike could lie anywhere in the image, making it hard for the neural network to find commonalities in the training samples. Fortunately, images in MNIST are already centered.
  • Scaling: The same is true for scaling the digits so that they all have the same size. This way, the location of strikes, curves, and loops are important. Otherwise, the neural network might easily confuse eights and zeros because they are all made of one or two closed loops. Fortunately, images in MNIST are already scaled.
  • Representing categorical features: It is important that target labels be one-hot encoded so that we can have 10 neurons in the output layer corresponding to the 10 different classes 0-9. This step we still have to perform ourselves.

The easiest way to transform y_train and y_test is by the one-hot encoder from scikit-learn:

In [6]: from sklearn.preprocessing import OneHotEncoder
... enc = OneHotEncoder(sparse=False, dtype=np.float32)
... y_train_pre = enc.fit_transform(y_train.reshape(-1, 1))

This will transform the labels of the training set from a <n_samples x 1> vector with integers 0-9 into a <n_samples x 10> matrix with floating point numbers 0.0 or 1.0. Analogously, we can transform y_test using the same procedure:

In [7]: y_test_pre = enc.fit_transform(y_test.reshape(-1, 1))

In addition, we need to preprocess X_train and X_test for the purpose of working with OpenCV. Currently, X_train and X_test are 3D matrices <n_samples x 28 x 28> with integer values between 0 and 255. Preferably, we want a 2D matrix <n_samples x n_features> with floating point numbers, where n_features is 784, basically, flattening  the 28 x 28 images to a 784-dimensional vector:

In [8]: X_train_pre = X_train_pre.reshape((X_train.shape[0], -1))
... X_train_pre = X_train.astype(np.float32) / 255.0
... X_test_pre = X_test_pre.reshape((X_test.shape[0], -1))
... X_test_pre = X_test.astype(np.float32) / 255.0

Then we are ready to train the network.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.139.224