The BernoulliRBM

The only scikit-learn implemented version of a Restricted Boltzmann Machine is called BernoulliRBM because it imposes a constraint on the type of probability distribution it can learn. The Bernoulli distribution allows for data values to be between zero and one. The scikit-learn documentation states that the model assumes the inputs are either binary values or values between zero and one. This is done to represent the fact that the node values represent a probability that the node is activated or not. It allows for quicker learning of feature sets. To account for this, we will alter our dataset to account for only hardcoded white/black pixel intensities. By doing so, every cell value will either be zero or one (white or black) to make learning more robust. We will accomplish this in two steps:

We will scale the values of the pixels to be between zero and one
We will change the pixel values in place to be true if the value is over 0.5, and false otherwise

Let's start by scaling the pixel values to be between 0 and 1:

# scale images_X to be between 0 and 1
 images_X = images_X / 255.
 
 # make pixels binary (either white or black)
 images_X = (images_X > 0.5).astype(float)
 
 np.min(images_X), np.max(images_X)
 (0.0, 1.0)

Let's take a look at the same number five digit, as we did previously, with our newly altered pixels:

plt.imshow(images_X[0].reshape(28, 28), cmap=plt.cm.gray_r)
 
 images_y[0]

The plot is as follows:

We can see that the fuzziness of the image has disappeared and we are left with a very crisp digit to classify with. Let's try now to extract features from our dataset of digits.

Table of Contents for The BernoulliRBM

Create new playlist

Sign In

Sign Up

Table of Contents for
The BernoulliRBM