I'm going to use a first hidden layer with 512 neurons. That's slightly smaller than the input vector's 784 elements, but that's not at all a rule. Again, this architecture is just a start and isn't necessarily best. I'll then walk down the size through the second and third hidden layers, as shown in the following code:
x = Dense(512, activation='relu', name="hidden1")(inputs)
x = Dense(256, activation='relu', name="hidden2")(x)
x = Dense(128, activation='relu', name="hidden3")(x)