A decision tree is a tree-like graph, a sequential diagram illustrating all of the possible decision alternatives and the corresponding outcomes. Starting from the root of a tree, every internal node represents what a decision is made based on; each branch of a node represents how a choice may lead to the next nodes; and finally, each terminal node, the leaf, represents an outcome yielded.
For example, we have just made a couple of decisions that brought us to the action of learning decision tree to solve our advertising problem:
The decision tree classifier operates in the form of a decision tree. It maps observations to class assignments (symbolized as leaf nodes), through a series of tests (represented as internal nodes) based on feature values and corresponding conditions (represented as branches). In each node, a question regarding the values and characteristics of a feature is asked; based on the answer to the question, observations are split into subsets. Sequential tests are conducted until a conclusion about the observations' target label is reached. The paths from root to end leaves represent the decision making process, the classification rules.
The following figure shows a much simplified scenario where we want to predict click or no click on a self-driven car ad, we manually construct a decision tree classifier that works for an available dataset. For example, if a user is interested in technology and they have a car, they will tend to click the ad; for a person outside of this subset, if the person is a high-income female, then she is unlikely to click the ad. We then use the learned tree to predict two new inputs, whose results are click and no click, respectively.
After a decision tree has been constructed, classifying a new sample is straightforward as we just saw: starting from the root, apply the test condition and follow the branch accordingly until a leaf node is reached and the class label associated will be assigned to the new sample.
So how can we build an appropriate decision tree?