Decision tree

Decision tree is a technique that helps us in deriving rules from data. A rule-based technique is very helpful in explaining how the model is supposed to work in estimating a dependent variable value.

A typical decision tree looks like this:

The preceding diagram is explained as follows:

  • ROOT Node: This represents the entire population or a sample, and it is further divided into two or more further nodes.
  • Splitting: A process of dividing a node into two or more subnodes based on a certain rule.
  • Decision Node: When a subnode splits into further subnodes, it is called decision node.
  • Leaf/Terminal Node: The final node in a decision tree is a leaf or terminal node.
  • Pruning: When we remove the subnodes of a decision node, this process is called pruning. You can say it is the opposite process of splitting.
  • Branch/Sub-Tree: A subsection of the entire tree is called a branch or a sub-tree.
  • Parent and child node: A node that is divided into subnodes is called the parent node of subnodes, whereas the subnodes are the children of the parent node.

Given a dependent variable and an independent variable value, we will go through how a decision tree works using the following dataset:

var2 response
0.1 1996
0.3 839
0.44 2229
0.51 2309
0.75 815
0.78 2295
0.84 1590

 

In the preceding dataset, the variable var2 is the input variable and the response variable is the dependent variable.

In the first step of the decision tree, we sort the input variable from lowest to highest and test multiple rules, one at a time.

In the first instance, all the observations of the dataset that have a var2 value of less than 0.3 belong to the left node of a decision tree, and the other observations belong to the right node of the decision tree.

In a regression exercise, the predicted value of the left node is the average of the response variable for all the observations that belong to the left node. Similarly, the predicted value of the right node is the average of response for all the observations that belong to the right node.

Given a predicted value for the left node and a different predicted value for the observations that belong to the right node, the squared error can be calculated for each of the left and right nodes. The overall error for a probable rule is the sum of squared error in both left and right nodes.

The decision rule that is implemented is the rule that has the minimum squared error among all the possible rules.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.168.162