Autoencoders explained

Autoencoders (AEs) are neural networks that are of a feedforward and non-recurrent type. They aim to copy the given inputs to the outputs. An AE works by compressing the input into a lower dimensional summary. This summary is often referred as latent space representation. An AE attempts to reconstruct the output from the latent space representation. An Encoder, a Latent Space Representation, and a Decoder are the three parts that make up the AEs. The following figure is an illustration showing the application of an AE on a sample picked from the MNIST dataset:

Application of AE on MNIST dataset sample

The encoder and decoder components of AEs are fully-connected feedforward networks. The number of neurons in a latent space representation is a hyperparameter that needs to be passed as part of building the AE. The number of neurons or nodes that is decided in the latent semantic space dictates the amount of compression that is attained while compressing the actual input image into a latent space representation. The general architecture of an AE is shown in the following figure:

General architecture of a AE

The given input first passes through an Encoder, which is a fully-connected artificial neural network (ANN). The Encoder acts upon the Input and reduces its dimensions, as specified in the hyperparameter. The Decoder is another fully-connected ANN that picks up this reduced Input (latent space representation) and then reconstructs the Output. The goal is to get the Output identical to that of the Input. In general, the architectures of the Encoder and the Decoder are mirror images. Although there is no such requirement that mandates that the Encoder and Decoder architectures should be the same, it is generally practiced that way. In fact, the only requirement of the AE is to obtain identical output from that of the given input. Anything in between can be customized to the whims and fancies of the individual building the AE.

Mathematically, the encoder can be represented as:

where x is the input and h is the function that acts on the input to represent it in a concise summary format. A decoder, on the other hand, can be represented as:

While the expectation is to obtain , this is not always the case as the reconstruction is done from a compact summary representation; therefore, there is occurrence of certain error. The error e is computed from the original input x and reconstructed output r, .

The AE network then learns by reducing the Mean Squared Error (MSE), and the error is propagated back to the hidden layers for adjustment. The weights of the decoder and encoder are transposes of each other, which makes it faster to learn training parameters. The mirrored architectures of the encoder and decoder make it possible to learn the training parameters faster. In different architectures, the weights cannot be simply transposed; therefore, the computation time will increase. This is the reason for keeping the mirrored architectures for the encoder and decoder.

Table of Contents for Autoencoders explained

Create new playlist

Sign In

Sign Up

Table of Contents for
Autoencoders explained