Activation function properties

The following is a list of activation function properties that are worth considering when deciding which activation function to choose:

Non-linearity: If the activation function is non-linear, it can be proved that even a two-level neural network can be a universal approximator of the function.
Continuous differentiability: This property is desirable for providing gradient descent optimization methods.
Value range: If the set of values for the activation function is limited, gradient-based learning methods are more stable and less prone to calculation errors since there are no large values. If the range of values is infinite, training is usually more effective, but care must be taken to avoid exploding the gradient (its extremal values).
Monotonicity: If the activation function is monotonic, the error surface associated with the single-level model is guaranteed to be convex. This allows us to learn more effectively.
Smooth functions with monotone derivatives: It is shown that in some cases, they provide a higher degree of generality.

Now that we've discussed the main components used to train neural networks, it's time to learn how to deal with the overfitting problem, which regularly appears during the training process.

Table of Contents for Activation function properties

Create new playlist

Sign In

Sign Up

Table of Contents for
Activation function properties