Model definition

First, we will load the python modules, in this case, the TensorFlow package and the hyperparameters which we have defined previously:

import tensorflow as tf
import hy_param

Then we define the placeholders which we will be using for input data in the model. The tf.placeholder allows us to feed input data to the computational graph. We can define constraints with the shape of the placeholder to only accept a tensor of a certain shape. Note that it is common to provide None for the first dimension, which allows us to the size of the batch at runtime.

Master Your Craft: Batch size can often have a big impact on the performance of Deep Learning models. Explore different batch sizes in this project. What changes? What's your intuition? Batch size is another tool in your Data Science toolkit!

We have also assigned names to the placeholders so that we can use them later on while building our inference code:

X = tf.placeholder("float", [None, hy_param.num_input],name="input_x")
Y = tf.placeholder("float", [None, hy_param.num_classes],name="input_y

Now we will define variables which will hold values for weights and bias. The tf.Variable allows us to store and update Tensors in our graph. To initialize our variables with random values from a normal distribution, we will use tf.random_normal() (more details can be found at: https://www.tensorflow.org/api_docs/python/tf/random_normal). The important thing to notice here is the mapping variable size between layers:

weights = {
 'h1': tf.Variable(tf.random_normal([hy_param.num_input, hy_param.n_hidden_1])),
 'h2': tf.Variable(tf.random_normal([hy_param.n_hidden_1, hy_param.n_hidden_2])),
 'out': tf.Variable(tf.random_normal([hy_param.n_hidden_2, hy_param.num_classes]))
 }
 biases = {
 'b1': tf.Variable(tf.random_normal([hy_param.n_hidden_1])),
 'b2': tf.Variable(tf.random_normal([hy_param.n_hidden_2])),
 'out': tf.Variable(tf.random_normal([hy_param.num_classes]))
 }

Now, let's set up the operation which we defined in Equation 2.1. This is the logistic regression operation:

layer_1 = tf.add(tf.matmul(X, weights['h1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
logits = tf.matmul(layer_2, weights['out']) + biases['out']

The logistic values are converted into the probabilistic values using the tf.nn.softmax(). The softmax activation squashes the outputs of each unit to a value between 0 and 1.

prediction = tf.nn.softmax(logits, name='prediction')

Next, let's use tf.nn.softmax_cross_entropy_with_logits to define our cost function. We will improve performance with optimization using the Adam Optimizer. Finally, we can use the built-in minimize() function to calculate the stochastic gradient descent (SGD) update rule for each parameter in our network:

loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=hy_param.learning_rate)
train_op = optimizer.minimize(loss_op)

Next, we make the prediction, these functions are needed to calculate and capture the accuracy values in a batch:

correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32) ,name='accuracy')

Hurray!!! The heavy lifting part of the code is done. We save the model code into model.py file. So up till now, we defined the simple 2 hidden layer model architecture with 300 neurons each(as shown in Figure 2.3), which will try to learn the best weight distribution using Adam Optimizer and predict the probability of 10 classes:

Figure 2.3: An illustration of the model that we created

Table of Contents for Model definition

Create new playlist

Sign In

Sign Up

Table of Contents for
Model definition