Exploring activation functions

The thing about linear algebra is, it's linear. It is useful when the change of the output is proportional to the change in input. The real world is full of non-linear functions and equations. Solving non-linear equation is hard with a capital H. But we've got a trick. We can take a linear equation, and then add a non-linearity to it. This way, the function becomes non-linear!

Following from this view, you can view an artificial neural network as a generic version of all the previous chapters we've gone through so far.

Throughout the history of artificial neural networks, the community has favored particular activation functions in a fashionable way. In the early days, the Heaviside function was favored. Gradually, the community moved toward favoring differentiable, continuous functions, such as sigmoid and tanh. But lately, the pendulum of fashion has swung back toward the harder, seemingly discontinuous functions. The key is that we've learned new tricks on how to differentiate functions, such as the rectified linear unit (ReLu).

Here are some of the more popular activation functions over time:

One thing to note about these is that these functions are all nonlinear and they all have a hard limit on the y axis.

The vertical ranges of the activation functions are limited, but the horizontal ranges are not. We can use biases to adjust how our activation functions look.

It should be noted that biases can be zero. It also means that we can omit biases. Most of the time, for more complex projects, this is fine, though adding biases will add to the accuracy of the neural network.

Table of Contents for Exploring activation functions

Create new playlist

Sign In

Sign Up

Table of Contents for
Exploring activation functions