Assessing system quality 

Testing a recommender system is a complicated process that always poses many questions, mainly due to the ambiguity of the concept of quality.

In general, in machine learning problems, there are the following two main approaches to testing:

Offline model testing on historical data using retro tests
Testing the model using A/B testing (we run several options, and see which one gives the best result)

Both of the preceding approaches are actively used in the development of recommender systems. The main limitation that we have to face is that we can evaluate the accuracy of the forecast only on those products that the user has already evaluated or rated. The standard approach is cross-validation, with the leave-one-out and leave-p-out methods. Multiple repetitions of the test and averaging the results provides a more stable assessment of quality.

The leave-one-out approach uses the model trained on all items except one and evaluated by the user. This excluded item is used for model testing. This procedure is done for all n items, and an average is calculated among the obtained n quality estimates.

The leave-p-out approach is the same, but at each step, points are excluded.

We can divide all quality metrics into the following three categories:

Prediction accuracy: Estimates the accuracy of the predicted rating
Decision support: Evaluates the relevance of the recommendations
Rank accuracy metrics: Evaluates the quality of the ranking of recommendations issued

Unfortunately, there is no single recommended metric for all occasions, and everyone who is involved in testing a recommender system selects it to fit their goals.

In the following section, we will formalize the collaborative filtering method and show the math behind it.

Table of Contents for Assessing system quality&#xA0;

Create new playlist

Sign In

Sign Up

Table of Contents for
Assessing system quality