Implementing logistic regression using TensorFlow

This is a bonus section where we implement logistic regression with TensorFlow and use click prediction as example. We herein use 90% of the first 300,000 samples for training, the remaining 10% for testing, and assume that X_train_enc, Y_train, X_test_enc, and Y_test contain the correct data.

First, we import TensorFlow and specify parameters for the model, including 20 iterations during the training process and a learning rate of 0.001:

>>> import tensorflow as tf
>>> n_features = int(X_train_enc.toarray().shape[1])
>>> learning_rate = 0.001
>>> n_iter = 20

Then, we define placeholders and construct the model by computing the logits (output of logistic function based on the input and model coefficients):

>>> x = tf.placeholder(tf.float32, shape=[None, n_features])
>>> y = tf.placeholder(tf.float32, shape=[None])
>>> W = tf.Variable(tf.zeros([n_features, 1]))
>>> b = tf.Variable(tf.zeros([1]))
>>> logits = tf.add(tf.matmul(x, W), b)[:, 0]
>>> pred = tf.nn.sigmoid(logits)

After defining the graph for the model, we get the loss function, as well as the measurement of performance, the AUC:

>>> cost = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits))
>>> auc = tf.metrics.auc(tf.cast(y, tf.int64), pred)[1]

We then define a gradient descent optimizer that searches for the best coefficients by minimizing the loss. We herein use Adam as our optimizer, which is an advanced gradient descent with a learning rate adaptive to gradients:

>>> optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

Now, we can initialize the variables and start a TensorFlow session:

>>> init_vars = tf.group(tf.global_variables_initializer(), 
                         tf.local_variables_initializer())
>>> sess = tf.Session()
>>> sess.run(init_vars)

Again, the model is trained in a batch manner. We herein reuse the gen_batch function defined in the previous chapter and set the batch size to 1000:

>>> batch_size = 1000
>>> import numpy as np
>>> indices = list(range(n_train))
>>> def gen_batch(indices):
...     np.random.shuffle(indices)
...     for batch_i in range(int(n_train / batch_size)):
...     batch_index = indices[batch_i*batch_size: 
                             (batch_i+1)*batch_size]
...     yield X_train_enc[batch_index], Y_train[batch_index]

Finally, we start the training process and print out the loss after each iteration:

>>> for i in range(1, n_iter+1):
...     avg_cost = 0.
...     for X_batch, Y_batch in gen_batch(indices):
...         _, c = sess.run([optimizer, cost], 
                        feed_dict={x: X_batch.toarray(), y: Y_batch})
...         avg_cost += c / int(n_train / batch_size)
...     print('Iteration %i, training loss: %f' % (i, avg_cost))
Iteration 1, training loss: 0.464850
Iteration 2, training loss: 0.414757
Iteration 3, training loss: 0.409064
Iteration 4, training loss: 0.405977
Iteration 5, training loss: 0.403816
Iteration 6, training loss: 0.402151
Iteration 7, training loss: 0.400824
Iteration 8, training loss: 0.399730
Iteration 9, training loss: 0.398788
Iteration 10, training loss: 0.397975
Iteration 11, training loss: 0.397248
Iteration 12, training loss: 0.396632
Iteration 13, training loss: 0.396041
Iteration 14, training loss: 0.395555
Iteration 15, training loss: 0.395057
Iteration 16, training loss: 0.394610
Iteration 17, training loss: 0.394210
Iteration 18, training loss: 0.393873
Iteration 19, training loss: 0.393489
Iteration 20, training loss: 0.393181

We then conduct a performance check-up on the testing set afterward:

>>> auc_test = sess.run(auc, 
               feed_dict={x: X_test_enc.toarray(), y: Y_test})
>>> print("AUC of ROC on testing set:", auc_test)
AUC of ROC on testing set: 0.7713197

Table of Contents for Implementing logistic regression using TensorFlow

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing logistic regression using TensorFlow