Common activation functions – rectifiers, hyperbolic tangent, and maxout

The activation function determines the mapping between inputs and a hidden layer. It defines the functional form for how a neuron gets activated. For example, a linear activation function could be defined as: f(x) = x, in which case the value for the neuron would be the raw input, x, times the learned weight, a linear model. A linear activation function is shown in the top panel of Figure 5.2. The problem with making activation functions linear is that this does not permit any non-linear functional forms to be learned. Previously, we have used the hyperbolic tangent as an activation function, so f(x) = tanh(x). The hyperbolic tangent can work well in some cases, but a potential limitation is that, at either low or high values, it saturates, as shown in the middle panel of Figure 5.2.

Perhaps the most popular activation function currently, and a good first choice (Nair, V., and Hinton, G. E. (2010)), is known as a rectifier. There can be different kinds of rectifiers but, most commonly, linear rectifiers are used and are defined by the function f(x) = max(0, x). Linear rectifiers are flat below some threshold and are then linear; an example is shown in the bottom panel of Figure 5.2. Despite their simplicity, linear rectifiers provide a non-linear transformation, and enough linear rectifiers can be used to approximate arbitrary non-linear functions, unlike using only linear activation functions.

A final type of activation function we will discuss is maxout (Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013)). A maxout unit takes the maximum value of its inputs, although as usual this is after weighting so it is not the case that the input variable with the highest value will always win. Maxout activation functions seem to work particularly well with dropout.

For the purposes of this chapter, we will focus on linear rectifiers. This is both because they are a good default and perform well and also because we have already shown the use of hyperbolic tangent in previous chapters:

Common activation functions – rectifiers, hyperbolic tangent, and maxout

Figure 5.2

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.117.207