Multilayer perceptrons

The main limitation of a perceptron is its linearity. How is it possible to exploit this kind of architecture by removing such a constraint? The solution is easier than any speculation. Adding at least a non-linear layer between input and output leads to a highly non-linear combination, parametrized with a larger number of variables. The resulting architecture, called Multilayer Perceptron (MLP) and containing a single (only for simplicity) Hidden Layer, is shown in the following diagram:

This is a so-called feed-forward network, meaning that the flow of information begins in the first layer, proceeds always in the same direction and ends at the output layer. Architectures that allow a partial feedback (for example, in order to implement a local memory) are called recurrent networks and will be analyzed in the next chapter.

In this case, there are two weight matrices, W and H, and two corresponding bias vectors, b and c. If there are m hidden neurons, x_i ∈ ℜ^{n × 1} (column vector), and y_i ∈ ℜ^{k × 1}, the dynamics are defined by the following transformations:

A fundamental condition for any MLP is that at least one hidden-layer activation function f_h(•) is non-linear. It's straightforward to prove that m linear hidden layers are equivalent to a single linear network and, hence, an MLP falls back into the case of a standard perceptron. Conventionally, the activation function is fixed for a given layer, but there are no limitations in their combinations. In particular, the output activation is normally chosen to meet a precise requirement (such as multi-label classification, regression, image reconstruction, and so on). That's why the first step of this analysis concerns the most common activation functions and their features.

Table of Contents for Multilayer perceptrons

Create new playlist

Sign In

Sign Up

Table of Contents for
Multilayer perceptrons