Sigmoid and hyperbolic tangent

These two activations are very similar but with an important difference. Let's start defining them:

The corresponding plots are shown in the following diagram:

Sigmoid and hyperbolic tangent plots

A sigmoid σ(x) is bounded between 0 and 1, with two asymptotes (σ(x) → 0 when x → -∞ and σ(x) → 1 when x → ∞). Similarly, the hyperbolic tangent (tanh) is bounded between -1 and 1 with two asymptotes corresponding to the extreme values. Analyzing the two plots, we can discover that both functions are almost linear in a short range (about [-2, 2]), and they become almost flat immediately after. This means that the gradient is high and about constant when x has small values around 0 and it falls down to about 0 for larger absolute values. A sigmoid perfectly represents a probability or a set of weights that must be bounded between 0 and 1, and therefore, it can be a good choice for some output layers. However, the hyperbolic tangent is completely symmetric, and, for optimization purposes, it's preferable because the performances are normally superior. This activation function is often employed in intermediate layers, whenever the input is normally small. The reason will be clear when the back-propagation algorithm is analyzed; however, it's obvious that large absolute inputs lead to almost constant outputs, and as the gradient is about 0, the weight correction can become extremely slow (this problem is formally known as vanishing gradients). For this reason, in many real-world applications, the next family of activation functions is often employed.

Table of Contents for Sigmoid and hyperbolic tangent

Create new playlist

Sign In

Sign Up

Table of Contents for
Sigmoid and hyperbolic tangent