How to do it...

Execute the following steps to fit a decision tree classifier.

Import the libraries:

from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import metrics

from chapter_8_utils import performance_evaluation_report

from io import StringIO
import seaborn as sns
from ipywidgets import Image
import pydotplus

Create an instance of the model, fit it to the training data, and create the prediction:

tree_classifier = DecisionTreeClassifier()
tree_classifier.fit(X_train_ohe, y_train)
y_pred = tree_classifier.predict(X_test_ohe)

Evaluate the results:

LABELS = ['No Default', 'Default']
tree_perf = performance_evaluation_report(tree_classifier, 
                                          X_test_ohe, 
                                          y_test, labels=LABELS, 
                                          show_plot=True)

Running the code results in the following plot:

Plot the simplified decision tree:

small_tree = DecisionTreeClassifier(max_depth=3)
small_tree.fit(X_train_ohe, y_train)

tree_dot = StringIO()
export_graphviz(small_tree, feature_names=X_train_ohe.columns,
                class_names=LABELS, rounded=True, 
                out_file=tree_dot,
                proportion=False, precision=2, filled=True)
tree_graph = pydotplus.graph_from_dot_data(tree_dot.getvalue()) 
Image(value=tree_graph.create_png())

Running the code results in the following plot:

For each node, we can see the split criterion (unless it is the terminal node), the value of the Gini impurity criterion, the number of observations, a list containing the number of observations per class, and the majority class.

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...