Besides basic content, they try to combine many additional clues to offer more personalized choices.
They tend to favor variability over predictability to boost excitement in users. Just imagine a recommender system whose proposals would never contain surprising items.
These traits entail a completely different set of metrics to evaluate recommender systems. If you were to try to optimize prediction accuracy solely by using the standard root mean squared error, then your system would probably not perform well in practice; it would be judged as boring. This chapter briefly introduces you to recommender systems and explains some core concepts and related techniques.
Introduction to Recommender Systems
To familiarize yourself with a recommender system, try out the freely available MovieLens project ( https://movielens.org ) without any commercial baggage.1 MovieLens provides personalized movie recommendations. After creating an account, it immediately asks you to distribute three points among six possible genres of movies to produce an initial profile. Later, as you rate and tag movies, it learns more about you and offers better-suited movies. This is a typical case in recommender systems: more data allows the system to create a finer-grained profile about you that can be used to filter content more successfully. In this respect, high-quality input should result in high-quality output. Figure 8-1 shows part of the main user interface of MovieLens. Try to rate a couple of movies and watch how the system adapts to your updated profile.
Explicit data: Voluntarily given by users. For example, each time you rate a movie, you intentionally provide feedback. Collecting lots of explicit data requires system designers to find ingenious ways to entice users to respond, such as by offering incentives. Of course, if you properly rate movies, then you should get better recommendations, which is the minimal incentive for anyone to bother rating products.
- Implicit data: Collected automatically by a system while monitoring activities of users (such as clicking, purchasing, etc.). Even timing events may be valuable; for example, if a user has spent more time on a page showing the script of a particular movie than she has spent on the pages of other movies, that may indicate a higher interest in that movie. Another possibility for collecting implicit data is to trace tweets about movies to evaluate their general popularity. All in all, a multitude of sources could be combined into a single preference formula. Naturally, acquiring implicit data is easier than acquiring explicit data, as it doesn’t require active participation by users.
To always give users a chance to discover new stuff, recommendations are a mixture of personalized content and nonpersonalized content (such as overall popular items). The latter may be customized based on general demographic information (for example, age group, gender, etc.). Another consideration is that sometimes you just want to explore things without “disturbing” your profile (remember that every action is tracked, including browsing and clicking, which might influence future offerings). In MovieLens, this is possible via the MovieExplorer feature (it may change over time, as it is purely tentative at the time of this writing). At any rate, consider in your design a similar possibility to offer an “untracked” option for your customers.
A recommender system learns what a user likes and dislikes and modifies the underlying queries accordingly. The output reflects the current profile (i.e., the outcome resonates with the user’s taste, assuming a personalized solution). Typically, you don’t often alter the output categories in a recommender system; they are quite stable over time. A content-based recommender system may use the products database to display details about items, so it can encode taste in terms of attributes of items.
Filtering facilities: Try to remove uninteresting topics (according to the user profile) from a streaming data source, such as news, tweets, e-mails, etc. For example, a personalized news suggestion engine may pick only articles that could be significant to you.
Recommendation interfaces: Present a selection of items that you may like, such as shown earlier in Figure 8-1. Usually, modern recommenders are hybrid systems that leverage different recommendation algorithms to produce a mix of content. For example, it can give you both specific items matching your taste as well as suggestions for items that are currently popular. Furthermore, the system may only use its own initial recommendation to trigger a dialog-based interaction, where additional feedback from a user may help narrow down the final selection.
Prediction interfaces: Attempt to accurately foresee how much you would like some products (for example, the number of stars you would assign to a movie you haven’t viewed). Of course, these values are estimates and may sometimes turn out to be inaccurate. In general, it doesn’t matter that much whether you will give 5 stars or 4.5 stars to an item. It is more important to offer things that you will find valuable.
Nowadays, most recommender systems are based on collaborative filtering techniques. Collaboration means that data from other users is leveraged to craft offerings for you. A system may correlate your taste with other users, so that you get recommendations based on what those users liked and disliked. This is the user-user type of filtering. It is also possible to correlate items regarding their ratings, which constitutes the item-item type of filtering; this is the preferred scalable model in most systems today. Finally, it is also possible to recommend items using product association rules of the form “users who bought this also bought...”
Context-based systems also use contextual and environmental data to fine-tune the result (for example, depending on your mood, the system may offer different music on top of your general preference model). Such data may be collected from your smartphone; the system may use location information to reduce the list only to objects that are in your vicinity.
Simple Movie Recommender Case Study
We will build a very simple mashup to recommend movies similar to a movie entered by a user. The inspiration for this example comes from reference [3]. The program will use two public services and combine their result; this arrangement is known as a mashup.
tastedrive_service.py Module in the simple_recommender Folder for Getting Similar Stuff from TasteDive
omdb_service.py Module to Help Communicate with the OMDb Service
simple_movie_recommender.py Module to Offer Similar Movies in Sorted Order
Introduction to LensKit for Python
LensKit for Python (LKPY) is an open-source framework for conducting offline recommender experiments. It has a highly modular design that leverages the PyData ecosystem ( https://pydata.org ). LensKit enables you to quickly fire up a recommender system and play with various algorithms and evaluation strategies. It also integrates with external recommender tools to provide a common control plane including metrics and configuration. In this section we will demonstrate how easy it is to experiment with LensKit for both research and educational purposes. You can install LensKit by issuing conda install -c lenskit lenskit.
The metrics package accepts Pandas Series objects as input, so it may be combined with any external framework. There is a subtle inconsistency of lacking a simple knn wrapper package, since user_knn and item_knn are just different implementations of the nearest neighbor search (something already reflected in the corresponding class names). There are two additional wrapper classes toward the implicit external framework, which are omitted here. The Fallback class is a trivial hybrid that will return the first result from a set of predictors passed as input. This is handy when a more sophisticated algorithm has difficulties computing the output, as a simpler version may provide an alternative answer. I suggest that you read reference [4] for a good overview of various approaches to implement recommendation engines (all examples are realized in RapidMiner, with an extension for recommender systems).
The tee function generates two copies of the underlying generator. One is needed for batch evaluation, while the other is needed for creating complete test data. This function is going to be needed later to calculate the ideal DCG for users.
LensKit helps you to experiment with a wide range of recommendation algorithms and evaluate them in a standardized fashion. It will be interesting to see whether LensKit will support Predictive Model Markup Language (see http://dmg.org ) in the future, as a means of exchanging recommender models. Instead of coding up the pipeline manually, this may come as an XML input formatted according to the PMML schema. PMML serialized models can be managed by GUI tools (like RapidMiner, which can even tune model parameters via generic optimizers, including the one based on a genetic algorithm), which further streamlines the whole process.
Exercise 8-1. Report Prediction Accuracy
Read about prediction accuracy metrics in LKPY’s documentation ( https://lkpy.lenskit.org/en/stable/ ). Notice that we have called our evaluator with batch.MultiEval('result', False, nprocs = 4). Change the second argument to True to turn on predictions. You will need to evaluate accuracy using metrics available in the lenskit.metrics.predict package.
This will also be a good opportunity to try the Fallback algorithm to cope with missing data.
Exercise 8-2. Implement a New Metric
Among the Top-N accuracy metrics, you will find two classification metrics (a.k.a. decision-support metrics): precision (P) and recall (R). The former reports how many recommended elements are relevant, while the latter reports how many relevant items are considered by a recommender system. Recall is crucial, since this speaks to whether a recommender will really be able to “surprise” you with proper offerings (i.e., avoid missing useful stuff). This may alleviate a known problem called filter bubble , where a recommender becomes biased with sparse input provided by a user for specific items. The fact that there are missing ratings for many items doesn’t imply that those are worthless for a user.
Usually, we want to balance precision and recall. It is trivial to maximize recall by simply returning everything. F-metrics give us the evenhanded answer. Implement in a similar fashion as we have done with nDCG. Rank algorithms based on this new metric.
Summary
In an online world, abundant with information and product offerings, recommender systems may come as a savior. They can select items (news articles, books, movies, music, etc.) matching our interest (assuming a personalized version) and suggest them to us. To diversify the list, most recommenders are hybrids that mix highly customized items and generic items. Moreover, many systems also take into account the current context to narrow down the possibilities. The overall influence of a recommender system may be judged by performing an A/B test and monitoring whether our activities have been altered in a statistically significant way.
Privacy and confidentiality of users’ preference data, since recommenders may pile up lots of facts based on explicit and implicit input. Users must be informed how this data is handled and why a particular item is recommended (tightly associated with interpretability of machine learning models), and users must be able to manage preference data (for example, you should be able to delete some facts about you from a system). Chapter 9 discusses privacy and confidentiality in depth.
Negative biases induced by machine-based taste formation. It is a bit disturbing that there are already artists (and consultancy companies to help them) that try to produce content appealing to leading recommender systems. On the other side, to bootstrap the content-based engines, some systems mechanically preprocess and extract features from content (for example, doing signal processing on audio data to discover the rhythm, pitch, and impression of songs). We definitely wouldn’t like to see items at the long tail of any distribution disappear just because they aren’t recommender friendly.
References
- 1.
F. Maxwell Harper and Joseph A. Konstan, “The MovieLens Datasets: History and Context,” ACM Transactions on Interactive Intelligent Systems 5, no. 4, 2015; doi: https://doi.org/10.1145/2827872 .
- 2.
Michael D. Ekstrand, “The LKPY Package for Recommender Systems Experiments: Next-Generation Tools and Lessons Learned from the LensKit Project,” Computer Science Faculty Publications and Presentations 147, Boise State University, presented at the REVEAL 2018 Workshop on Offline Evaluation for Recommender Systems, Oct. 7, 2018; doi: https://doi.org/10.18122/cs_facpubs/147/boisestate; arXiv: https://arxiv.org/abs/1809.03125 .
- 3.
Brad Miller, Paul Resnick, Lauren Murphy, Jeffrey Elkner, Peter Wentworth, Allen B. Downey, Chris Meyers, and Dario Mitchell, Foundations of Python Programming, Runstone Interactive, https://fopp.umsi.education/runestone/static/fopp/index.html .
- 4.
Vijay Kotu and Bala Deshpande, Data Science, Concepts and Practice, 2nd Edition, Morgan Kaufmann Publishers, 2018.