One-hot vector

Perhaps, not so coincidentally, the last layer has the shape of (N, 10). N is the number of input images (which we've gotten from x) ; that's fairly self-explanatory. It also means that there is a clean mapping from input to output. What's not self-explanatory is the 10. Why 10? Simply put, there are 10 possible numbers we want to predict - 0, 1, 2, 3, 4, 5, 6, 7, 8, 9:

The preceding diagram is an example result matrix. Recall that we used G.SoftMax to ensure that each row sums up to 1. Therefore, we can interpret the numbers in each column of each row to be the probability that it is the specific digit that we're predicting. To find the digit we're predicting, simply find the highest probability in each column.

In the previous chapter, I introduced the concept of one-hot vector encoding. To recap, it takes a slice of labels and returns a matrix.

Now, this is clearly a matter of encoding. Who's to say that column 0 would have to represent 0? We could of course come up with a completely crazy encoding like such and the neural network would still work:

Of course, we would not be using such a scheme for encoding; it would be a massive source of programmer error. Instead, we would go for the standard encoding of a one-hot vector.

I hope this has given you a taste of how powerful the notion of an expression graph can be. One thing we haven't touched upon yet is the execution of the graph. How do you run a graph? We'll look further into that in the next section.

Table of Contents for One-hot vector

Create new playlist

Sign In

Sign Up

Table of Contents for
One-hot vector