Implementing logistic regression using TensorFlow

This is a bonus section where we implement logistic regression with TensorFlow and use click prediction as example. We herein use 90% of the first 300,000 samples for training, the remaining 10% for testing, and assume that X_train_enc, Y_train, X_test_enc, and Y_test contain the correct data.

  1. First, we import TensorFlow and specify parameters for the model, including 20 iterations during the training process and a learning rate of 0.001:
>>> import tensorflow as tf
>>> n_features = int(X_train_enc.toarray().shape[1])
>>> learning_rate = 0.001
>>> n_iter = 20
  1. Then, we define placeholders and construct the model by computing the logits (output of logistic function based on the input and model coefficients):
>>> x = tf.placeholder(tf.float32, shape=[None, n_features])
>>> y = tf.placeholder(tf.float32, shape=[None])
>>> W = tf.Variable(tf.zeros([n_features, 1]))
>>> b = tf.Variable(tf.zeros([1]))
>>> logits = tf.add(tf.matmul(x, W), b)[:, 0]
>>> pred = tf.nn.sigmoid(logits)
  1. After defining the graph for the model, we get the loss function, as well as the measurement of performance, the AUC:
>>> cost = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits))
>>> auc = tf.metrics.auc(tf.cast(y, tf.int64), pred)[1]
  1. We then define a gradient descent optimizer that searches for the best coefficients by minimizing the loss. We herein use Adam as our optimizer, which is an advanced gradient descent with a learning rate adaptive to gradients:
>>> optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

  1. Now, we can initialize the variables and start a TensorFlow session:
>>> init_vars = tf.group(tf.global_variables_initializer(), 
tf.local_variables_initializer())
>>> sess = tf.Session()
>>> sess.run(init_vars)
  1. Again, the model is trained in a batch manner. We herein reuse the gen_batch function defined in the previous chapter and set the batch size to 1000:
>>> batch_size = 1000
>>> import numpy as np
>>> indices = list(range(n_train))
>>> def gen_batch(indices):
... np.random.shuffle(indices)
... for batch_i in range(int(n_train / batch_size)):
... batch_index = indices[batch_i*batch_size:
(batch_i+1)*batch_size]
... yield X_train_enc[batch_index], Y_train[batch_index]
  1. Finally, we start the training process and print out the loss after each iteration:
>>> for i in range(1, n_iter+1):
... avg_cost = 0.
... for X_batch, Y_batch in gen_batch(indices):
... _, c = sess.run([optimizer, cost],
feed_dict={x: X_batch.toarray(), y: Y_batch})
... avg_cost += c / int(n_train / batch_size)
... print('Iteration %i, training loss: %f' % (i, avg_cost))
Iteration 1, training loss: 0.464850
Iteration 2, training loss: 0.414757
Iteration 3, training loss: 0.409064
Iteration 4, training loss: 0.405977
Iteration 5, training loss: 0.403816
Iteration 6, training loss: 0.402151
Iteration 7, training loss: 0.400824
Iteration 8, training loss: 0.399730
Iteration 9, training loss: 0.398788
Iteration 10, training loss: 0.397975
Iteration 11, training loss: 0.397248
Iteration 12, training loss: 0.396632
Iteration 13, training loss: 0.396041
Iteration 14, training loss: 0.395555
Iteration 15, training loss: 0.395057
Iteration 16, training loss: 0.394610
Iteration 17, training loss: 0.394210
Iteration 18, training loss: 0.393873
Iteration 19, training loss: 0.393489
Iteration 20, training loss: 0.393181
  1. We then conduct a performance check-up on the testing set afterward:
>>> auc_test = sess.run(auc, 
feed_dict={x: X_test_enc.toarray(), y: Y_test})
>>> print("AUC of ROC on testing set:", auc_test)
AUC of ROC on testing set: 0.7713197
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.179.220