Summary

In this chapter, we started with an introduction to a typical machine learning problem, online advertising click-through prediction, and the inherent challenges, including categorical features. We then looked at tree-based algorithms that can take in both numerical and categorical features. We then had an in-depth discussion about the decision tree algorithm: the mechanics, different types, how to construct a tree, and two metrics (Gini Impurity and entropy) that measure the effectiveness of a split at a node. After constructing a tree in an example by hand, we implemented the algorithm from scratch. We also learned how to use the decision tree package from scikit-learn and applied it to predict click-through. We continued to improve the performance by adopting the feature-based random forest bagging algorithm and the chapter ended with some ways to tune a random forest model, as well as a bonus section in which we implemented a random forest with TensorFlow.

More practice is always good for honing skills. We recommend you complete the following exercise before going to the next chapter, where we will solve ad click-through prediction using another algorithm: logistic regression.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary