Given the historical data, can we train a binary classifier that can predict whether a particular user will eventually buy a product based on their profile?
First, let's explore the historical labeled data set available to solve this problem:
x € ℜb, y € {0,1}
For a particular example, when y = 1, we call it a positive class and when y = 0, we call it a negative class.
Now, let's look at the following:
- The actual label, denoted by y
- The predicted label, denoted by y`
Note that for our classifiers challenge, the actual value of the label found in examples is represented by y. If, in our example, someone has purchased an item, we say y =1. The predicted values are represented by y`. The input feature vector, x, has a dimension of 4. We want to determine what the probability is that a user will make a purchase, given a particular input.
So, we want to determine the probability that y = 1 is, given a particular value of feature vector x. Mathematically, we can represent this as follows:
Now, let's look at how we can process and assemble different input variables in the feature vector, x. The methodology to assemble different parts of x using the processing pipeline is discussed in more detail in the following section.