Implementing random forest using TensorFlow

This is a bonus section where we implement a random forest with TensorFlow. Let's take a look at the following steps and see how it is done:

First, we import the modules we need, as follows:

>>> import tensorflow as tf
>>> from tensorflow.contrib.tensor_forest.python import tensor_forest
>>> from tensorflow.python.ops import resources

Specify the parameters of the model, including 20 iterations during the training process, 10 trees in total, and 30000 maximal splitting nodes:

>>> n_iter = 20
>>> n_classes = 2
>>> n_features = int(X_train_enc.toarray().shape[1])
>>> n_trees = 10
>>> max_nodes = 30000

Next, we create placeholders and build the TensorFlow graph:

>>> x = tf.placeholder(tf.float32, shape=[None, n_features])
>>> y = tf.placeholder(tf.int64, shape=[None])
>>> hparams = tensor_forest.ForestHParams(num_classes=n_classes,
 num_features=n_features, num_trees=n_trees,
 max_nodes=max_nodes, split_after_samples=30).fill()
>>> forest_graph = tensor_forest.RandomForestGraphs(hparams)

After defining the graph for the random forest model, we get the training graph and loss, as well as the measurement of performance, the AUC:

>>> train_op = forest_graph.training_graph(x, y)
>>> loss_op = forest_graph.training_loss(x, y)
>>> infer_op, _, _ = forest_graph.inference_graph(x)
>>> auc = tf.metrics.auc(tf.cast(y, tf.int64), infer_op[:, 1])[1]

Then, initialize the variables and start a TensorFlow session:

>>> init_vars = tf.group(tf.global_variables_initializer(),
          tf.local_variables_initializer(),
       resources.initialize_resources(resources.shared_resources()))
>>> sess = tf.Session()
>>> sess.run(init_vars)

In TensorFlow, models are usually trained in a batch. That is, the training set is split into many small chunks and the model fits them chunk by chunk. Here, we set the batch size to 1000 and define a function to get randomized chunks of samples in each training iteration:

>>> batch_size = 1000
>>> import numpy as np
>>> indices = list(range(n_train))
>>> def gen_batch(indices):
...     np.random.shuffle(indices)
...     for batch_i in range(int(n_train / batch_size)):
...         batch_index = indices[batch_i*batch_size:
                                (batch_i+1)*batch_size]
...         yield X_train_enc[batch_index], Y_train[batch_index]

Finally, we start the training process and conduct a performance check-up for each iteration:

>>> for i in range(1, n_iter + 1):
...     for X_batch, Y_batch in gen_batch(indices):
...         _, l = sess.run([train_op, loss_op], feed_dict=
                           {x: X_batch.toarray(), y: Y_batch})
...     acc_train = sess.run(auc, feed_dict=
                           {x: X_train_enc.toarray(), y: Y_train})
...     print('Iteration %i, AUC of ROC on training set: %f' %
                                                  (i, acc_train))
...     acc_test = sess.run(auc, feed_dict=
                           {x: X_test_enc.toarray(), y: Y_test})
...     print("AUC of ROC on testing set:", acc_test)
Iteration 1, AUC of ROC on training set: 0.740271
AUC of ROC on testing set: 0.7418298
Iteration 2, AUC of ROC on training set: 0.745904
AUC of ROC on testing set: 0.74665743
Iteration 3, AUC of ROC on training set: 0.749690
AUC of ROC on testing set: 0.7501322
Iteration 4, AUC of ROC on training set: 0.752632
AUC of ROC on testing set: 0.7529533
Iteration 5, AUC of ROC on training set: 0.755357
AUC of ROC on testing set: 0.75560063
Iteration 6, AUC of ROC on training set: 0.757673
AUC of ROC on testing set: 0.75782216
Iteration 7, AUC of ROC on training set: 0.759688
AUC of ROC on testing set: 0.7597882
Iteration 8, AUC of ROC on training set: 0.761526
AUC of ROC on testing set: 0.76160187
Iteration 9, AUC of ROC on training set: 0.763228
AUC of ROC on testing set: 0.7632776
Iteration 10, AUC of ROC on training set: 0.764791
AUC of ROC on testing set: 0.76481616
Iteration 11, AUC of ROC on training set: 0.766269
AUC of ROC on testing set: 0.7662764
Iteration 12, AUC of ROC on training set: 0.767667
AUC of ROC on testing set: 0.76765794
Iteration 13, AUC of ROC on training set: 0.768994
AUC of ROC on testing set: 0.768983
Iteration 14, AUC of ROC on training set: 0.770247
AUC of ROC on testing set: 0.770225
Iteration 15, AUC of ROC on training set: 0.771437
AUC of ROC on testing set: 0.7714067
Iteration 16, AUC of ROC on training set: 0.772580
AUC of ROC on testing set: 0.772544
Iteration 17, AUC of ROC on training set: 0.773677
AUC of ROC on testing set: 0.7736392
Iteration 18, AUC of ROC on training set: 0.774740
AUC of ROC on testing set: 0.7746992
Iteration 19, AUC of ROC on training set: 0.775768
AUC of ROC on testing set: 0.77572197
Iteration 20, AUC of ROC on training set: 0.776747
AUC of ROC on testing set: 0.7766986

After 20 iterations, we are able to achieve 0.78 AUC using the TensorFlow random forest model.

Finally, you may wonder how to implement decision tree with TensorFlow. Well, that's easy. Simply use one tree (n_trees=1), and the whole random forest is basically a decision tree.

Table of Contents for Implementing random forest using TensorFlow

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing random forest using TensorFlow