Restricted Boltzmann Machines

One popular method of constructing a Deep Belief Network is to comprise it as a layered collection of Restricted Boltzmann Machines (RBMs). These RMBs function as auto-encoders, with each hidden layer, serving as the visible layer for the next. This composition leads to a fast, layer-by-layer and unsupervised training procedure. The Deep Belief Network will have layers of RBMs for the pre-train phase, and then a feedforward network for the fine-tune phase. The first step of the training will be to learn a layer of features from the visible units. The next step is to take the activations from the previously trained features and make them the new visible units. We then repeat the process so that we can learn more features in the second hidden layer. The process then continues for all hidden layers.

We should provide two notes of information here.

First, we should explain a bit about what an auto-encoder is and does. Auto-encoders are at the heart of what is known as representational learning. They encode input, which is usually compressed vectors of significant features, as well as data for reconstructing via unsupervised learning.

Second, we should note that stacking RBMs within a Deep Belief Network is but one way to approach this. Stacking Restricted Linear Units (ReLUs) with dropout and training, and then accompanying that with backpropagation, has once again become state of the art. I say once again because 30 years ago, the supervised approach was the way to go. Rather than let the algorithm look at all the data and determine the feature of interest, sometimes we as humans can actually better find the feature we want.

What I would consider the two most significant properties of Deep Belief Networks are as follows:

There is an efficient, layer-by-layer process for learning top-down, generative weights. It determines how variables in one layer depend on variables in the layers above it.
After the learning is complete, the values of the variables in every layer can easily be inferred by a single, bottom-up pass which starts with an observed data vector in the bottom layer and uses the generative weights in reverse direction to reconstruct the data.

With that said, let's now talk about RBMs as well as Boltzmann machines in general.

A Boltzmann machine is a recurrent neural network with binary units and undirected edges between these units. For those of you who weren't paying attention in your graph theory class, undirected means the edges (or links) are bidirectional, they are not pointing in any specific direction. For those not experienced in graph theory, the following is a diagram of an undirected graph with undirected edges:

Boltzmann machines were one of the first neural networks capable of learning internal representations, and given enough time, they can solve difficult problems. They are, however, not good at scaling, which leads us to our next topic, RBMs.

RBMs were introduced to deal with the Boltzmann Machines' inability to scale. They have hidden layers, with connections restricted between each hidden unit but not outside those units, which helps with efficient learning. More formally, we must dive into a little bit of graph theory to properly explain this.

RBMs must have their neurons form what is known as a bipartite graph, a more advanced form of graph theory; a pair of nodes from each of the two groups of units (visible and hidden layers) may have a symmetric connection between them. There can be no connections between the nodes within any group. A bipartite graph, sometimes called a biograph, is a set of graph vertices decomposed into two disjoint sets such that no two vertices within the same set are adjacent.

Here is a good example that will help visualize this topic.

Note that there are no connections within the same set (red on the left or black on the right), but there are connections between the two sets:

More formally, an RBM is what is known as a symmetrical bipartite graph. This is because inputs from all visible nodes are passed to all hidden nodes. We say symmetrical because each visible node relates to a hidden node; bipartite because there are two layers; and graph because, well, it's a graph, or a collection of nodes if you prefer!

Imagine for a second that our RBM is presented images of cats and dogs, and we have two output nodes, one for each animal. On our forward learning pass, our RBM asks itself "With the pixels I am seeing, should I send stronger weight signals for the cat or for the dog?" On the backward pass, it wonders "Being a dog, which distribution of pixels should I see?" That, my friends, was today's lesson on joint probability: the simultaneous probability of X given A and A given X. In our case, this joint probability is expressed as the weights between the two layers and is an important aspect of RBMs.

With today's mini lessons in joint probability and graph theory behind us, we'll now talk about reconstruction, which is an important piece of what RBMs do. In the example we have been discussing, we are learning which groups of pixels occur (meaning being on) for a set of images. When a hidden layer node is activated by a significant weight (whatever that is determined to be to turn it on), it represents co-occurrences of something happening, in our case, the dog or the cat. Pointy ears + round face + small eyes might be what we are looking for if the image is a cat. Big ears + long tail + big nose may make the image a dog. These activations represent what our RBM "thinks" the original data looks like. For all intents and purposes, we are in fact reconstructing the original data.

We should also quickly point out that an RBM has two biases instead of one. This is very important as this is what distinguishes it from other auto-encoding algorithms. The hidden bias helps our RBM produce the activations we need when it's on the forward pass, and the visible layer bias helps learn the correct reconstructions on the backward pass. The hidden bias is important because its main job is to ensure that some of the nodes fire no matter how sparse our data might be. You will see how this impacts the way a Deep Belief Network dreams a little later on.

Table of Contents for Restricted Boltzmann Machines

Create new playlist

Sign In

Sign Up

Table of Contents for
Restricted Boltzmann Machines