How to optimize for node purity

When growing a classification tree, we also use recursive binary splitting but, instead of evaluating the quality of a decision rule using the reduction of the mean-squared error, we can use the classification error rate, which is simply the fraction of the training samples in a given (leave) node that do not belong to the most common class.

However, the alternative measures, Gini Index or Cross-Entropy, are preferred because they are more sensitive to node purity than the classification error rate. Node purity refers to the extent of the preponderance of a single class in a node. A node that only contains samples with outcomes belonging to a single class is pure and imply successful classification for this particular region of the feature space. They are calculated as follows for a classification outcome taking on K values, 0,1,…,K-1, for a given node, m, that represents a region, Rmof the feature space and where pmk is the proportion of outcomes of the k class in the m node:



Both the Gini Impurity and the Cross-Entropy measure take on smaller values when the class proportions approach zero or one, that is, when the child nodes become pure as a result of the split and are highest when the class proportions are even or 0.5 in the binary case. The chart at the end of this section visualizes the values assumed by these two measures and the misclassification error rates across the [0, 1] interval of proportions.

