Basic Naive Bayes classifier baseline

As per the rules of the challenge, the participants had to outperform the basic Naive Bayes classifier in order to qualify for prizes, which makes an assumption that features are independent (refer to Chapter 1, Applied Machine Learning Quick Start).

The KDD Cup organizers ran the vanilla Naive Bayes classifier, without any feature selection or hyperparameter adjustments. For the large dataset, the overall scores of the Naive Bayes on the test set were as follows:

  • Churn problem: AUC = 0.6468
  • Appetency problem: AUC = 0.6453
  • Upselling problem: AUC=0.7211

Note that the baseline results are only reported for the large dataset. Moreover, while both the training and testing datasets are provided at the KDD Cup site, the actual true labels for the test set are not provided. Therefore, when we process the data with our models, there is no way to know how well the models will perform on the test set. What we will do is only use the training data, and evaluate our models with cross-validation. The results will not be directly comparable, but nevertheless, we will have an idea about what a reasonable magnitude of the AUC score should be.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.144.108