Activation functions

Now that you know how to build a basic neural network, let's go through the purpose of some of the elements of your model. One of those elements was the Sigmoid, which is an activation function. Sometimes these are also called transfer functions.

As you have learned previously, a given layer can be simply defined as weights applied to inputs; add some bias and then decide on activation. An activation function decides whether a neuron is fired. We also put this into the network to help to create more complex relationships between input and output. While doing this, we also need it to be a function that works with our backpropagation, so that we can easily optimize our weighs via an optimization method (that is, gradient descent). This means that we need the output of the function to be differentiable.

There are a few things to consider when choosing an activation function, as follows:

  • Speed: Simple activation functions are quicker to execute than more complex activation functions. This is important since, in deep learning, we tend to run the model through large amounts of data, and therefore, will be executing each function over a reasonably large dataset many times.
  • Differentiability: As we have already noted, being able to differentiate the function is useful during backpropagation. Having a gradient allows us to adjust our weights in a direction that brings our network closer to convergence. In brief, it allows us to calculate errors to improve our model by minimizing our cost function.
  • Continuity: It should return a value across the entire range of the inputs.
  • Monotonicity: While this property is not strictly necessary, it helps to optimize the neural network since it will converge faster during gradient descent. Using non-monotonic functions is possible, but we are likely to run into longer training times overall.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.136.63