Decision tree versus Naive Bayes

As stated in the preceding table, DTs are very easy to understand and debug because of their flexibility for training datasets. They will work with both classification as well as regression problems.

If you are trying to predict values out of categorical or continuous values, DTs will handle both problems. Consequently, if you just have tabular data, feed it to the DT and it will build the model toward classifying your data without any additional requirement for upfront or manual interventions. In summary, DTs are very simple to implement, train, and interpret. With very little data preparation, DTs can build the model with much less prediction time. As said earlier, they can handle both numeric and categorical data and are very robust against noise and missing values. They are very easy to validate the model using statistical tests. More interestingly, the constructed trees can be visualized. Overall, they provide very high accuracy.

However, on the downside, DTs sometimes tend to the overfitting problem for the training data. This means that you generally have to prune the tree and find an optimal one for better classification or regression accuracy. Moreover, duplication may occur within the same subtree. Sometimes it also creates issues with diagonal decision boundaries towards overfitting and underfitting. Furthermore, DT learners can create over-complex trees that do not generalize the data well this makes overall interpretation hard. DTs can be unstable because of small variants in the data, and as a result learning DT is itself an NP-complete problem. Finally, DT learners create biased trees if some classes dominate over others.

Readers are suggested to refer to Tables 1 and 3 to get a comparative summary between Naive Bayes and DTs.

On the other hand, there is a saying while using Naive Bayes: NB requires you build a classification by hand. There's no way to feed a bunch of tabular data to it, and it picks the best features for the classification. In this case, however, choosing the right features and features that matter is up to the user, that is, you. On the other hand, DTs will pick the best features from tabular data. Given this fact, you probably need to combine Naive Bayes with other statistical techniques to help toward best feature extraction and classify them later on. Alternatively, use DTs to get better accuracy in terms of precision, recall, and f1 measure. Another positive thing about Naive Bayes is that it will answer as a continuous classifier. However, the downside is that they are harder to debug and understand. Naive Bayes does quite well when the training data doesn't have good features with low amounts of data.

In summary, if you are trying to choose the better classifier from these two often times it is best to test each one to solve a problem. My recommendation would be to build a DT as well as a Naive Bayes classifier using the training data you have and then compare the performance using available performance metrics and then decide which one best solves your problem subject to the dataset nature.

Table of Contents for Decision tree versus Naive Bayes

Create new playlist

Sign In

Sign Up

Table of Contents for
Decision tree versus Naive Bayes