Sparse autoencoder

The autoencoder that we saw in the previous recipe worked more like an identity network--they simply reconstruct the input. The emphasis is to reconstruct the image at the pixel level, and the only constraint is the number of units in the bottleneck layer; while it is interesting, pixel-level reconstruction does not ensure that the network will learn abstract features from the dataset. We can ensure that the network learns abstract features from the dataset by adding further constraints.

In sparse autoencoders, a sparse penalty term is added to the reconstruction error, which tries to ensure that fewer units in the bottleneck layer will fire at any given time. If m is the total number of input patterns, then we can define a quantity ρ_hat (you can check the mathematical details in Andrew Ng's Lecture at https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf), which measures the net activity (how many times on an average it fires) for each hidden layer unit. The basic idea is to put a constraint ρ_hat, such that it is equal to the sparsity parameter ρ. This results in adding a regularization term for sparsity in the loss function so that now the loss function is as follows:

loss = Mean squared error + Regularization for sparsity parameter

This regularization term will penalize the network if ρ_hat deviates from ρ; one standard way to do this is to use Kullback-Leiber (KL) divergence between ρ and ρ_hat.

Table of Contents for Sparse autoencoder

Create new playlist

Sign In

Sign Up

Table of Contents for
Sparse autoencoder