The problem statement

Given the historical data, can we train a binary classifier that can predict whether a particular user will eventually buy a product based on their profile?

First, let's explore the historical labeled data set available to solve this problem:

x € ℜ^b, y € {0,1}

For a particular example, when y = 1, we call it a positive class and when y = 0, we call it a negative class.

Although the level of the positive and negative class can be chosen arbitrarily, it is good practice to define the positive class as the event of interest. If we are trying to flag the fraudulent transaction for a bank, then the positive class (that is, y = 1 ) should be the fraudulent transaction, not the other way around.

Now, let's look at the following:

The actual label, denoted by y
The predicted label, denoted by y`

Note that for our classifiers challenge, the actual value of the label found in examples is represented by y. If, in our example, someone has purchased an item, we say y =1. The predicted values are represented by y`. The input feature vector, x, has a dimension of 4. We want to determine what the probability is that a user will make a purchase, given a particular input.

So, we want to determine the probability that y = 1 is, given a particular value of feature vector x. Mathematically, we can represent this as follows:

Now, let's look at how we can process and assemble different input variables in the feature vector, x. The methodology to assemble different parts of x using the processing pipeline is discussed in more detail in the following section.

Table of Contents for The problem statement

Create new playlist

Sign In

Sign Up

Table of Contents for
The problem statement