Decision trees in Python

We can perform the same analysis in Python. Load a number of imports that are to be used:

import pandas as pd 
import numpy as np 
from os import system 
import graphviz #pip install graphviz  
from sklearn.cross_validation import train_test_split 
from sklearn.tree import DecisionTreeClassifier 
from sklearn.metrics import accuracy_score 
from sklearn import tree

Read in the mpg data file:

carmpg = pd.read_csv("car-mpg.csv") 
carmpg.head(5)

Break up the data into factors and results:

columns = carmpg.columns 
mask = np.ones(columns.shape, dtype=bool) 
i = 0 #The specified column that you don't want to show 
mask[i] = 0 
mask[7] = 0 #maker is a string 
X = carmpg[columns[mask]] 
Y = carmpg["mpg"]

Split up the data between training and testing sets:

X_train, X_test, y_train, y_test  
= train_test_split( X, Y, test_size = 0.3,  
random_state = 100)

Create a decision tree model:

clf_gini = tree.DecisionTreeClassifier(criterion = "gini",  
random_state = 100, max_depth=3, min_samples_leaf=5)

Calculate the model fit:

clf_gini.fit(X_train, y_train) 
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3, 
            max_features=None, max_leaf_nodes=None, 
            min_impurity_split=1e-07, min_samples_leaf=5, 
            min_samples_split=2, min_weight_fraction_leaf=0.0, 
            presort=False, random_state=100, splitter='best')

Graph out the tree:

#I could not get this to work on a Windows machine 
#dot_data = tree.export_graphviz(clf_gini, out_file=None,  
#                         filled=True, rounded=True,   
#                         special_characters=True)   
#graph = graphviz.Source(dot_data)   
#graph

Table of Contents for Decision trees in Python

Create new playlist

Sign In

Sign Up

Table of Contents for
Decision trees in Python