Getting the data

At the KDD Cup web page (http://kdd.org/kdd-cup/view/kdd-cup-2009/Data), you should see a page that looks similar to the following screenshot. First, under the Small version (230 var.) header, download orange_small_train.data.zip. Next, download the three sets of true labels associated with this training data. The following files are found under the Real binary targets (small) header:

  • orange_small_train_appentency.labels
  • orange_small_train_churn.labels
  • orange_small_train_upselling.labels

Save and unzip all of the files marked in the red boxes, as shown in the screenshot:

In the following sections, first, we will load the data into Weka and apply basic modeling with the Naive Bayes classifier, in order to obtain our own baseline AUC scores. Later, we will look at more advanced modeling techniques and tricks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.202.209