Chapter 11. Classification Trees

In Chapter 9, Linear Regression we discussed regression. In the previous chapter, we were interested in classification using k-NN and Naïve Bayes. In this chapter, we will continue the topic of classification and discuss it in the context of decision trees. Decision trees notably allow class predictions (group membership) of previously unseen observations (testing datasets or prediction datasets) using statistical criteria applied on the seen data (training set).

Here, we will briefly examine the statistical criteria of six algorithms:

  • ID3
  • C4.5
  • C5.0
  • Classification and regression trees (CART)
  • Random forest
  • Conditional inference trees

We will also examine how to use decision trees in R, notably, how to measure the reliability of the classifications using training and test sets.

Understanding decision trees

Before we go in depth into how decision tree algorithms work, let's examine their outcome in more detail. The goal of decision trees is to extract from the training data the succession of decisions about the attributes that explain the best class, that is, group membership.

In the following example of the conditional inference tree, we try to predict survival (there are two classes: Yes and No) in the Titanic dataset we used in the previous chapter. Now to simplify things, there is an attribute called Class in the dataset. When discussing the outcome we want to predict (the survival of the passenger), we will use a lowercase c (class), and when discussing the Class attribute (with 1st, 2nd, 3rd, and Crew), we will use a capital C. The code to generate the following plot is provided at the end of the chapter, when we describe conditional inference trees:

Understanding decision trees

Example of decision tree (conditional inference tree)

Decision trees have a root (here: Sex), which is the best attribute to split the data upon, in reference to the outcome (here: whether the individuals survived or not). The dataset is partitioned into branches on the basis of this attribute. The branches lead to other nodes that correspond to the next best partition for the considered branch. We can see that the attribute called Class is the best for both Male and Female branches. The process continues until we reach the terminal nodes, where no more partitioning is required. The proportion of individuals that survived and didn't is indicated at the bottom of the plot.

Here, we have used only categorical attributes, but numeric attributes can also be used in the prediction of a categorical outcome in C45, CART, and conditional inference trees.

We will now examine different algorithms for the generation of the decision about the attributes upon which to partition the data. We will start with an easy algorithm and continue with more complex ones.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.