Implementing regression forest

As seen in Chapter 6, Predicting Online Ads Click-through with Tree-Based Algorithms, we introduced random forest as an ensemble learning method by combining multiple decision trees that are separately trained and randomly subsampling training features in each node of a tree. In classification, a random forest makes a final decision by majority vote of all tree decisions. Applied to regression, a random forest regression model (also called regression forest) assigns the average of regression results from all decision trees to the final decision.

Here, we'll use the regression forest package, RandomForestRegressor, from scikit-learn and deploy it to our Boston house price prediction example:

>>> from sklearn.ensemble import RandomForestRegressor
>>> regressor = RandomForestRegressor(n_estimators=100,
max_depth=10, min_samples_split=3)
>>> regressor.fit(X_train, y_train)
>>> predictions = regressor.predict(X_test)
>>> print(predictions)
[ 19.34404351 20.93928947 21.66535354 19.99581433 20.873871
25.52030056 21.33196685 28.34961905 27.54088571 21.32508585]

As a bonus section, we implement regression forest with TensorFlow. It is actually quite similar to the implementation of random forest in Chapter 6, Predicting Online Ads Click-through with Tree-Based Algorithms. First, we import the necessary modules as follows:

>>> import tensorflow as tf
>>> from tensorflow.contrib.tensor_forest.python import tensor_forest
>>> from tensorflow.python.ops import resources

And we specify the parameters of the model, including 20 iterations during training process, 10 trees in total, and 30000 maximal splitting nodes:

>>> n_iter = 20
>>> n_features = int(X_train.shape[1])
>>> n_trees = 10
>>> max_nodes = 30000

Next, we create placeholders and build the TensorFlow graph:

>>> x = tf.placeholder(tf.float32, shape=[None, n_features])
>>> y = tf.placeholder(tf.float32, shape=[None])
>>> hparams = tensor_forest.ForestHParams(num_classes=1,
regression=True, num_features=n_features,
num_trees=n_trees, max_nodes=max_nodes,
split_after_samples=30).fill()
>>> forest_graph = tensor_forest.RandomForestGraphs(hparams)

Note we need to set num_classes to 1 and regression to True as the forest is used for regression.

After defining the graph for the regression forest model, we specify the training graph and loss and the MSE:

>>> train_op = forest_graph.training_graph(x, y)
>>> loss_op = forest_graph.training_loss(x, y)
>>> infer_op, _, _ = forest_graph.inference_graph(x)
>>> cost = tf.losses.mean_squared_error(labels=y, predictions=infer_op[:, 0])

We then initialize the variables and start a TensorFlow session:

>>> init_vars = tf.group(tf.global_variables_initializer(), 
tf.local_variables_initializer(),
resources.initialize_resources(resources.shared_resources()))
>>> sess = tf.Session()
>>> sess.run(init_vars)

Finally, we start the training process and conduct a performance check-up for each iteration:

>>> for i in range(1, n_iter + 1):
... _, c = sess.run([train_op, cost], feed_dict={x: X_train, y: y_train})
... print('Iteration %i, training loss: %f' % (i, c))
Iteration 1, training loss: 596.255005
Iteration 2, training loss: 51.917843
Iteration 3, training loss: 35.395966
Iteration 4, training loss: 28.848433
Iteration 5, training loss: 22.499760
Iteration 6, training loss: 18.685938
Iteration 7, training loss: 16.956488
Iteration 8, training loss: 14.832330
Iteration 9, training loss: 13.048509
Iteration 10, training loss: 12.084823
Iteration 11, training loss: 11.044588
Iteration 12, training loss: 10.433226
Iteration 13, training loss: 9.818905
Iteration 14, training loss: 8.900123
Iteration 15, training loss: 7.952868
Iteration 16, training loss: 7.417612
Iteration 17, training loss: 6.849032
Iteration 18, training loss: 6.213216
Iteration 19, training loss: 5.869020
Iteration 20, training loss: 5.467315

After 20 iterations, we apply the trained model on the testing set as follows:

>>> pred = sess.run(infer_op, feed_dict={x: X_test})[:, 0]
>>> print(pred)
[15.446515 20.10433 21.38516 19.37373 19.593092 21.932205 22.259298 24.194878 24.095112 22.541391]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.189.228