Data scaling and standardization

All users evaluate (rate) items differently. If someone puts 5s in a row, instead of waiting for 4s from someone else, it's better to normalize the data before calculating it—that is, convert the data to a single scale, so that the algorithm can correctly compare the results with each other. Naturally, the predicted estimate then needs to be converted to the original scale by inverse transformation (and, if necessary, rounded to the nearest whole number).

There are several ways to normalize data, detailed as follows:

Centering (mean-centering): From the user's ratings, subtract their average rating. This type of normalization is only relevant for non-binary matrices.
Standardization (z-score): In addition to centering, this divides the user's rating by the standard deviation of the user. But in this case, after the inverse transformation, the rating can go beyond the scale (for example, six on a five-point scale), but such situations are quite rare and are solved simply by rounding to the nearest acceptable estimate.
Double standardization: The first time normalized by user ratings; the second time, by item ratings.

The details of these normalization techniques were described in Chapter 2, Data Processing. The following section will describe a problem of recommender systems known as the cold start problem, which appears in the early stages of system work when the system doesn't have enough data to make predictions.

Table of Contents for Data scaling and standardization

Create new playlist

Sign In

Sign Up

Table of Contents for
Data scaling and standardization