Reconstructing the data

In this forward pass of the network, we can see how data goes forward through the network (from the visible layer to the hidden layer), but that doesn't explain how the RBM is able to learn new features from our data without ground truths. This is done through multiple forward and backward passes through the network between our visible and hidden layer.

In the reconstruction phase, we switch the network around and let the hidden layer become the input layer and let it feed our activation variables (a) backwards into the visible layer using the same weights, but a new set of biases. The activated variables that we calculated during the forward pass are then used to reconstruct the original input vectors. The following visualization shows us how activations are fed backwards through our graph using the same weights and different biases:

This becomes the network's way of evaluating itself. By passing the activations backwards through the network and obtaining an approximation of the original input, the network can adjust the weights in order to make the approximations closer to the original input. Towards the beginning of training, because the weights are randomly initialized (this is standard practice), the approximations will likely be very far off. Backpropagation through the network, which occurs in the same direction as our forward pass (confusing, we know), then adjusts the weights to minimize the distance between original input and approximations. This process is then repeated until the approximations are as close to the original input as possible. The number of times this back and forth process occurs is called the number of iterations.

The end result of this process is a network that has an alter-ego for each data point. To transform data, we simply pass it through the network and retrieve the activation variables and call those the new features. This process is a type of generative learning that attempts to learn a probability distribution that generated the original data and exploit knowledge to give us a new feature set of our raw data.

For example, if we were given an image of a digit and asked to classify which digit (0-9) the image was of, the forward pass of the network asks the question, given these pixels, what digit should I expect? On the backwards pass, the network is asking given a digit, what pixels should I expect? This is called a joint probability and it is the simultaneous probability of given and given y, and it is expressed as the shared weights between the two layers of our network.

Let's introduce our new dataset and let it elucidate the usefulness of RBMs in feature learning.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.14.118