Softmax activation

Imagine if, instead of a deep neural network, we were using k logistic regressions, where each regression is predicting membership in a single class. That collection of logistic regressions, one for each class would look like this:

The problem with using this group of logistic regressions is that the output of each individual logistic regression is independent. Imagine a case where several of these logistic regressions in our set were uncertain of membership in their particular class, resulting in multiple answers that were around P(Y=k) = 0.5. This keeps us from using these outputs as an overall probability of class membership across the k classes because they won't necessarily sum to 1.

Softmax helps us by squeezing the outputs of all these logistic regressions such that they sum to 1 and the outputs can be used as an overall class membership probability.

The softmax function looks like this:

(for j = classes 1 to k, and where zj/zk is the logistic regression belonging to that k)

So then, if we place the softmax function in front of our previous set of regressions, we get a set of class probabilities that conveniently sum to 1 and can be used as probability of class membership across the k classes. That changes our overall function to look like this:

The preceding function is often called multinomial logistic regression. It's sort of like a one layer, output only, and neural network. We don't use multinomial logistic regression frequently anymore; however, we most certainly use the softmax function all the time. For most multiclass classification problems in the book, we will be using softmax, so it's worth understanding.

If you're like me, and you find all that math hard to read, it might be easier to look at softmax in code. So, let's do that before we move on, with the following code snippet:

def softmax(z):
 z_exp = [math.exp(x) for x in z]
 sum_z_exp = sum(z_exp)
 softmax = [round(i / sum_z_exp, 3) for i in z_exp]
 return softmax

Let's quickly try an example. Imagine we had a set of logistic outputs that looked like this:

z = np.array([0.9, 0.8, 0.2, 0.1, 0.5])

If we apply softmax, we can easily convert these outputs to relative class probabilities, like this:

print(softmax(z))
[0.284, 0.257, 0.141, 0.128, 0.19]

Table of Contents for Softmax activation

Create new playlist

Sign In

Sign Up

Table of Contents for
Softmax activation