How do auto-encoders work?

Auto-encoders are neural networks and may be shallow or deep, as with other neural networks we have discussed so far. What distinguishes auto-encoders from other forms of neural network is that auto-encoders are trained to reproduce or predict the inputs. Thus the hidden layers and neurons are not maps between an input and some other outcome, but are self (auto)-encoding.

Unlike the more common cases of neural networks where the outcome is some variable we are interested in predicting; given sufficient complexity, auto-encoders can simply learn the identity function and the hidden neurons will exactly mirror the raw data, resulting in no meaningful benefit. Because the outcome used for training is the same as the inputs, the best auto-encoder is not necessarily the most accurate one, but one that reveals some meaningful structure or architecture in the data or one that reduces noise, identifies outliers or anomalous data, or some other useful side effect that is not necessarily directly related to accurate predictions of the model inputs.

One way to use auto-encoders is to perform dimension reduction. Auto-encoders with a lower dimensionality than the raw data are called undercomplete; by using an undercomplete auto-encoder, one can force the auto-encoder to learn the most salient or prominent features of the data. These new hidden features can then be used for further analysis or work. For example, an important and common application of auto-encoders is to pre-train deep neural networks or other supervised learning models. In addition, it may be possible and of interest to directly interpret the hidden features themselves; for example, they may provide insight into the key characteristics or structures in the data.

Using an undercomplete model is effectively a way to regularize the model. However, it is also possible to train overcomplete auto-encoders where the hidden dimensionality is greater than the raw data, so long as some other form of regularization is used. We will discuss different forms of regularization in more depth in the next section.

As with regular neural networks, there are broadly two parts to auto-encoders. First, an encoding function, f(∙), encodes the raw data, x, to the hidden neurons, H. Second, a decoding function, g(∙), decodes H back to x.

Regularized auto-encoders

An undercomplete auto-encoder is, in a way, a form of regularized auto-encoder, where the regularization occurs through using a shallower (or in some other way lower) dimensional representation than the data. However, regularization can be accomplished through other means as well.

Penalized auto-encoders

As we have seen in Chapter 3, Preventing Overfitting, one approach is to use penalties. In general, our goal is to (as simply as possible) minimize the re-construction error. If we have an objective function, F, traditionally, we may optimize F(y, f(x)), where f(∙) encodes the raw data inputs to generate predicted or expected y values. For auto-encoders, we have F(x, g(f(x))), so that the machine learns the weights and functional form of f(∙) and g(∙) to minimize the discrepancy between x and the reconstruction of x, namely g(f(x)). If we want to use an overcomplete auto-encoder, we need to introduce some form of regularization to force the machine to learn a representation that does not simply exactly mirror the input. For example, we might add a function that penalizes based on complexity, so that, instead of optimizing F(x, g(f(x))), we optimize F(x, g(f(x))) + P(f(x)), where the penalty function, P, depends on the encoding or the raw inputs, f(∙). Such penalties differ somewhat from those we have seen before, however, in that the penalty is designed to induce sparseness not of the parameters but rather of the latent variables, H, which are the encoded representations of the raw data. The goal is to learn a latent representation that captures the essential features of the data.

Another type of penalty that can be used to provide regularization is one based on the derivative. Whereas sparse auto-encoders have a penalty that induces sparseness of the latent variables, penalizing the derivatives results in the model learning a form of f(∙) that is relatively insensitive to minor perturbations of the raw input data, x, or rather it forces a penalty on functions where the encoding varies greatly for changes in x, preferring regions where the gradient is relatively flat.

Denoising auto-encoders

Denoising auto-encoders remove noise or denoise data, and are a useful technique for learning a latent representation of raw data (Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P. A. (2008, July); Bengio, Y., Courville, A., and Vincent, P. (2013)). We said the general task of an auto-encoder was to optimize: F(x, g(f(x))). However, for a denoising auto-encoder, the task is to recover x from a noisy or corrupted version of x, denoted as Denoising auto-encoders. So the task becomes optimizing Denoising auto-encoders.

Although denoising auto-encoders are used to try to recover the true representation from corrupted data or data with noise, this technique can also be used as a regularization tool. As a method of regularization, rather than having noisy or corrupted data and attempting to recover the truth, the raw data is purposefully corrupted. This forces the auto-encoder to do more than merely learn the identity function, as the raw inputs Denoising auto-encoders are no longer identical to the output (x). The process is shown in Figure 4.2:

Denoising auto-encoders

Figure 4.2

The remaining choice is what the function, N(∙), which adds the noise or corrupts x, should be. Two choices are to add noise through a stochastic process or for any given training iteration to only include a subset of the raw x inputs. In the next section, we will explore how to actually train auto-encoder models in R.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.186.79