Chapter 6. Tree-based Methods

In this chapter, we are going to present one of the most intuitive ways to create a predictive model—using the concept of a tree. Tree-based models, often also known as decision tree models, are successfully used to handle both regression and classification type problems. We'll explore both scenarios in this chapter, and we'll be looking at a range of different algorithms that are effective in training these models. We will also learn about a number of useful properties that these models possess, such as their ability to handle missing data and the fact that they are highly interpretable.

The intuition for tree models

A decision tree is a model with a very straightforward structure that allows us to make a prediction on an output variable, based on a series of rules arranged in a tree-like structure. The output variable that we can model can be categorical, allowing us to use a decision tree to handle classification problems. Equally, we can use decision trees to predict a numerical output, and in this way we'll also be able to tackle problems where the predictive task is a regression task.

Decision trees consist of a series of split points, often referred to as nodes. In order to make a prediction using a decision tree, we start at the top of the tree at a single node known as the root node. The root node is a decision or split point, because it places a condition in terms of the value of one of the input features, and based on this decision we know whether to continue on with the left part of the tree or with the right part of the tree. We repeat this process of choosing to go left or right at each inner node that we encounter until we reach one of the leaf nodes. These are the nodes at the base of the tree, which give us a specific value of the output to use as our prediction.

To illustrate this, let's look at a very simple decision tree in terms of two features, x1 and x2.

The intuition for tree models

Note that the tree is a recursive structure, in that the left and right parts of the tree that lie beneath a particular node are themselves trees. They are referred to as the left subtree and the right subtree respectively, and the nodes that they lead to are the left child and right child. To understand how we go about using a decision tree in practice, we can try a simple example. Suppose we want to use our tree to predict the output for an observation where the value of x1 is 96.0 and the value of x2 is 79.9. We start at the root and make a decision as to which subtree to follow. Our value of x2 is larger than 23, so we follow the right branch and come to a new node with a new condition to check. Our value of x1 is larger than 46, so we once again take the right branch and arrive at a leaf node. Thus, we output the value indicated by the leaf node, which is -3.7. This is the value that our model predicts given the pair of inputs that we specified.

One way of thinking about decision trees is that they are in fact encoding a series of if-then rules leading to distinct outputs. For every leaf node, we can write a single rule (using the Boolean AND operator if necessary to join together multiple conditions) that must hold true for the tree to output that node's value. We can extract all of these if-then rules by starting at the root node and following every path down the tree that leads to a leaf node. For example, our small regression tree leads to the following three rules, one for each of its leaf nodes:

  • If (x2 < 23) Then Output 2.1
  • If (x2 > 23) AND (x1 < 46) Then Output 1.2
  • If (x2 > 23) AND (x1 > 46) Then Output -3.7

Note that we had to join together two conditions for each one of the last two rules using the AND operator, as the corresponding paths leading down to a leaf node included more than one decision node (counting the root node).

Another way to think about decision trees is that they partition the feature space into a series of rectangular regions in two dimensions, cubes in three dimensions, and hypercubes in higher dimensions. Remember that the number of dimensions in the feature space is just the number of features. So the feature space for our example regression tree has two dimensions and we can visualize how this space is split up into rectangular regions as follows:

The intuition for tree models

The rule-based interpretation and the space partitioning interpretation are equivalent views of the same model. The space partitioning interpretation in particular is very useful in helping us appreciate one particular characteristic of decision trees, which is that they must have complete coverage over all possible combinations of input features. Put differently, there should be no particular input for which there is no path to a leaf node in the decision tree. Every time we are given a value for our input features, we should always be able to return an answer. Our feature space partitioning interpretation of a decision tree essentially tells us that there is no point or space of points that doesn't belong to a particular partition with an assigned value. Similarly, with our if-then ruleset view of a decision tree, we are saying that there is always one rule that can be used for any input feature combination, and therefore we can reorganize our rules into an equivalent if-then-else structure where the last rule is an else statement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.204.186