Our output layer will contain 10 neurons, one for each of the possible classes that an observation might be a member of. This corresponds to the encoding we imposed when we used to_categorical() on the y vectors:
prediction = Dense(10, activation='softmax', name="output")(x)
As you can see, the activation we're using is called softmax. Let's talk about what softmax is, and why it's useful.