Implementing a random forest with scikit-learn

Alternatively, we can implement random forests using scikit-learn:

In [13]: from sklearn.ensemble import RandomForestClassifier
... forest = RandomForestClassifier(n_estimators=10, random_state=200)

Here, we have a number of options to customize the ensemble:

  • n_estimators: This specifies the number of trees in the forest.
  • criterion: This specifies the node-splitting criterion. Setting criterion='gini' implements the Gini impurity, whereas setting criterion='entropy' implements information gain.
  • max_features: This specifies the number (or fraction) of features to consider at each node split.
  • max_depth: This specifies the maximum depth of each tree.
  • min_samples: This specifies the minimum number of samples required to split a node.

We can then fit the random forest to the data and score it like any other estimator:

In [14]: forest.fit(X_train, y_train)
... forest.score(X_test, y_test)
Out[14]: 0.83999999999999997

This gives roughly the same result as in OpenCV. We can use our helper function to plot the decision boundary:

In [15]: plot_decision_boundary(forest, X_test, y_test)

The output looks like this:

The preceding image shows the decision boundary of a random forest.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.119.106