Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Incremental supervised learning

This section introduces several techniques used to learn from stream data when the true label for each instance is available. In particular, we present linear, non-linear, and ensemble-based algorithms adapted to incremental learning, as well as methods required in the evaluation and validation of these models, keeping in mind that learning is constrained by limits on memory and CPU time.

Modeling techniques

The modeling techniques are divided into linear algorithms, non-linear algorithms, and ensemble methods.

Linear algorithms

The linear methods described here require little to no adaptation to handle stream data.

Online linear models with loss functions

Different loss functions such as hinge, logistic, and squared error can be used in this algorithm.

Inputs and outputs

Only numeric features are used in these methods. The choice of loss function l and learning rate λ at which to apply the weight updates are taken as input parameters. The output is typically updatable models that give predictions accompanied by confidence values.

How does it work?

The basic algorithm assumes linear weight combinations similar to linear/logistic regression explained in Chapter 2, Practical Approach to Real-World Supervised Learning. The stream or online learning algorithm can be summed up as:

for(t=1,2,…T) do
1. x_t = receive(); // receive the data
2. ; //predict the label
3. y_t = obtainTrueLabel(); // get the true label
4. loss = l(w_t, (x_t, w_t)); // calculate the loss
5. if(l(wt,(xt, wt )) > 0 then
6. ; //update the weights
7. end
end

Different loss functions can be plugged in based on types of problems; some of the well-known types are shown here:

Classification:
- Hinge loss: l(w_t, (x_t, w_t)) = max(0, 1 – yf(x_t, w_t))
- Logistic loss:
Regression:
- Squared loss:

Stochastic Gradient Descent (SGD) can be thought of as changing the weights to minimize the squared loss as in the preceding loss functions but going in the direction of the gradient with each example. The update of weights can be described as:

Advantages and limitations

Online linear models have similar advantages and disadvantages as the linear models described in Chapter 2, Practical Approach to Real-World Supervised Learning:

Interpretable to some level as the weights of each features give insights on the impact of each feature
Assumes linear relationship, additive and uncorrelated features, and hence doesn't model complex non-linear real-world data
Very susceptible to outliers in the data
Very fast and normally one of the first algorithms to try or baseline

Online Naïve Bayes

Bayes theorem is applied to get predictions as the posterior probability, given for an m dimensional input: