Before we can pass the dataset to the classifier, we need to preprocess it following the best practices from Chapter 4, Representing Data and Engineering Features.
Specifically, we want to make sure that all example images have the same mean grayscale level:
In [5]: n_samples, n_features = X.shape[:2]
... X -= X.mean(axis=0)
We repeat this procedure for every image to make sure the feature values of every data point (that is, a row in X) are centered around zero:
In [6]: X -= X.mean(axis=1).reshape(n_samples, -1)
The preprocessed data can be visualized using the following code:
In [7]: for p, i in enumerate(idx_rand):
... plt.subplot(2, 4, p + 1)
... plt.imshow(X[i, :].reshape((64, 64)), cmap='gray')
... plt.axis('off')
This produces the following output: