But which one should we use?

Each of these activation functions is useful; however, as ReLU has the most useful features of all of the activation functions and is easy to calculate, this should be the function you are using most of the time.

It can be a good idea to switch to leaky ReLU if you run into stuck gradients frequently. However, you can usually lower the learning rate to help to prevent this or use it in the earlier layers, instead of all of your layers, in order to maintain the edge of having fewer activations overall across the network.

Sigmoid is most valuable as an output layer, preferably with a probability as the output. The tanh function can also be valuable, for example, where we would like layers to constantly adjust values upward and downward (rather than being biased upward like ReLU and Sigmoid).

So, the short answer is: it depends on your network and the kind of output you are expecting.

It should, however, be noted that while a number of activation functions have been presented here for you to consider, other activation functions have been proposed such as PReLU, softmax, and Swish, which can also be considered, depending on the task at hand. This is still an active area of research and is considered to be far from solved, so stay tuned!

Table of Contents for But which one should we use?

Create new playlist

Sign In

Sign Up

Table of Contents for
But which one should we use?