Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A. Kulkarni et al.Applied Recommender Systems with Pythonhttps://doi.org/10.1007/978-1-4842-8954-9_1

1. Introduction to Recommendation Systems

Akshay Kulkarni¹, Adarsha Shivananda², Anoosh Kulkarni³ and V Adithya Krishnan⁴

(1)

Bangalore, Karnataka, India

(2)

Hosanagara tq, Shimoga dt, Karnataka, India

(3)

Bangalore, India

(4)

Navi Mumbai, India

In today’s world, customers are faced with multiple choices for every decision. Let’s assume that a person is looking for a book to read without any specific idea of what they want. There’s a wide range of possibilities for how their search might pan out. They might waste a lot of time browsing the Internet and trawling through various sites hoping to strike gold. They might look for recommendations from other people.

But if there was a site or app that could recommend books to this customer based on what they’d read previously, that would save time that would otherwise be spent searching for books of interest on various sites. In short, our main goal is to recommend things based on the user’s interests. And that’s what recommendation engines do.

A recommendation engine, also known as a recommender system or a recommendation system, is one of the most widely used machine learning applications; for example, it is used by companies like Amazon, Netflix, Google, and Goodreads.

This chapter explains recommendation systems and presents various recommendation engine algorithms and the fundamentals of creating them in Python 3.8 or greater using a Jupyter notebook.

What Are Recommendation Engines?

In the past, people generally purchased products recommended to them by their friends or the people they trust. This is how people used to make purchasing decisions when there was doubt about a product. But since the advent of the Internet, we are so used to ordering online and streaming music and movies that we are constantly creating data in the back end. A recommendation engine uses that data and different algorithms to recommend the most relevant items to users. It initially captures the past behavior of a user, and then it recommends items for future purchase or use.

There are scenarios where there is no historical data as well. For example, when a new user visits a site, there is no history of that user. So how does the website recommend products to this user? One way is by recommending bestselling products (i.e., the products that are trending). Another possible solution is to recommend the products that bring maximum profit to the business and any new products recently added to the site.

If you can recommend a few items to a customer based on their interests, it positively impacts the user experience and leads to frequent visits. Hence, intelligent recommendation engines are built by studying the past behavior of their users to enhance revenue.

Recommendation System Types

Data on a user’s likes and dislikes of items are essential to building a recommender engine that can suggest relevant items to the user. There are two feedback mechanisms through which users provide this required data.

Explicit feedback is the data that the user explicitly provides as feedback on an item. It is usually difficult to obtain this type of feedback from users, and companies try many innovative ways. A simple like or dislike button, star ratings, and even comments and reviews as text input can get user feedback on an item.

Implicit feedback is the data that the user implicitly or unknowingly provides through their actions. This can be in the form of pages visited, items viewed, the number of clicks, and all sorts of other activities performed on the site/platform, which can indicate their interest in certain items. This type of data is generally captured automatically through cookies and browsing history and doesn’t require any direct action from the users.

Types of Recommendation Engines

There are many different types of recommendation engines, and each of them is explored in this chapter.

Market basket analysis (association rule mining)
Content-based filtering
Collaborative-based filtering
Hybrid systems
ML clustering
ML classification
Deep learning and NLP

Market Basket Analysis (Association Rule Mining)

Retailers predominantly use market basket analysis to reveal relationships between items. It works by looking for combinations of items that are often put together, allowing retailers to identify relationships between items that people buy.

There are several terms used in association analysis that are important to understand. Association rules are widely used to analyze retail basket or transaction data. They are intended to identify strong rules discovered in transaction data using interest measures based on the concept of strong rules.

Association rules are normally written like this: {bread} -> {butter}. This means a strong relationship exists between customers who bought bread and butter in the same transaction.

In the preceding example, {bread} is the antecedent and {butter} is the consequent. Both antecedents and consequences can have multiple items. In other words, {bread, milk} -> {butter, chips} is a valid rule.

Support is the relative frequency of the rule display. In many cases, you may want to seek high support to make sure it’s a worthwhile relationship. However, there may be cases where low support is useful if you are trying to find “hidden” relationships.

Confidence is a measure of the reliability of a rule. A 0.5 reliability in the preceding example means that bread and milk were purchased 50% of the time. The purchase also included butter and chips. For a product recommendation, 50% confidence may be perfectly acceptable, but this level may not be high enough in a medical situation.

Lift is the ratio of observed support to expected support if the two rules were independent. As a rule of thumb, a lift value close to 1 means that the rules were completely independent. Lift - values > 1 are more “interesting” and could indicate a useful rule pattern. Figure 1-1 illustrates how support, confidence, and lift are calculated.

Content-Based Filtering

The content-based filtering method is a recommendation algorithm that suggests items similar to the ones the users have previously selected or shown interest in. It can recommend based on the actual content present in the item. For example, as shown in Figure 1-2, a new article is recommended based on the text present in the articles.

Let’s look at the popular example of Netflix and its recommendations to explore the workings in detail. Netflix saves all user viewing information in a vector-based format, known as the profile vector, which contains information on past viewings, liked and disliked shows, most frequently watched genres, star ratings, and so forth. Then there is another vector that stores all the information regarding the titles (movies and shows) available on the platform, known as the item vector. This vector stores information like the title, actors, genre, language, length, crew info, synopsis, and so forth.

The content-based filtering algorithm uses the concept of cosine similarity. In it, you find the cosine of the angle between two vectors—the profile and item vectors in this case. Suppose A is the profile vector and B is the item vector, then the (cosine) similarity between them is calculated as follows.

The outcome (i.e., the cosine value) always ranges between –1 and 1, and this value is calculated for multiple item vectors (movies), keeping the profile vector (user) constant. The items/movies are then ranked in descending order of similarity, and either of the two following approaches is used for recommendations.

In a top N approach, the top N movies are recommended, where N is a threshold on the number of titles recommended.
In a rating scale approach, a threshold on the similarity value is set, and all the titles in that threshold are recommended.

The following are other methods popularly used in calculating the similarity.

Euclidean distance is the distance between two points measured by the length of the straight line connecting them. Hence if you can plot the profile and items in an n-dimensional Euclidean space, the similarity value is equal to the distance between them. The closer the item is, the more similar it is. So, the closest items to the profile are recommended. The following is the mathematical formula for calculating Euclidean distance.

A formula for Euclidean distance. The square root of x subscript 1 minus x subscript 1 whole square plus up to plus x subscript N minus y subscript N whole square.

Pearson’s correlation refers to how correlated or similar two things are. The higher the correlation, the higher the similarity. Pearson’s correlation is calculated using the formula shown in Figure 1-3.

The major downside to this recommendation engine is that all suggestions fall into the same category, and it becomes somewhat monotonous. As the suggestions are based on what the user has seen or liked, we’ll never get new recommendations that the user has not explored in the past. For example, if the user has only seen mystery movies, the engine will only recommend more mystery movies.

To improve on this, you need a recommendation engine that not only gives suggestions based on the content but also on the behavior of users and on what other like-minded users are watching.

Collaborative-Based Filtering

In collaborative-based filtering recommendation engines, a user-to-user similarity is also considered, along with item similarities, to address some of the drawbacks of content-based filtering. Simply put, a collaborative filtering system recommends an item to user A based on the interests of a similar user B. Figure 1-4 shows a simple working mechanism of collaborative-based filtering

The similarity between users can be calculated again by all the techniques mentioned earlier. A user-item matrix is created individually for each customer, which stores the user’s preference for an item. Taking the same example of Netflix’s recommendation engine, the user aspects like previously watched and liked titles, ratings provided (if any) by the user, frequently watched genres, and so on are stored and used to find similar users. Once these similar users are found, the engine recommends titles that the user has not yet watched but users with similar interests have watched and liked.

This type of filtering is quite popular because it is only based on a user’s past behavior, and no additional input is required. It’s used by many major companies, including Amazon, Netflix, and American Express.

There are two types of collaborative filtering algorithms.

In user-user collaborative filtering, you find user-user similarities and offer suggestions based on what similar users chose in the past. Even though this algorithm is quite effective, since it requires high computations for getting all user-pair information and calculating the similarities, it takes a lot of time and resources. Hence for big customer bases, this algorithm is too expensive to use unless a proper parallelizable system is set up.
In item-item collaborative filtering, you try to find item similarities instead of similar users. An item look-alike matrix is generated for all the items that the user has previously chosen, and from this matrix, similar items are recommended. This algorithm is far less computationally expensive because the item-item look-alike matrix remains fixed over time with a fixed number of items. Hence recommendations are fetched much quicker for a new customer.

One of the drawbacks of this method happens when no ratings are provided for a particular item; then, it can’t be recommended. And reliable recommendations can be tough to get if a user has only rated a few items.

Hybrid Systems

So far, you have seen how content-based and collaborative-based recommendation engines work and their respective pros and cons. But the hybrid recommendation system combines content and collaborative-based filtering methods.

Hybrid recommendation systems can overcome the drawbacks of both content-based and collaborative-based to form one powerful recommendation system, both the individual methods fail to perform well when there is a lack of data to learn the relation between users and items, which is overcome in this hybrid approach.

Figure 1-5 shows a simple working mechanism of the hybrid recommendation system.

Hybrid recommendation engines can be implemented in multiple ways.

Generating recommendations separately by using content-based and collaborative-based and then combining them at the end
Adding the capabilities of the collaborative-based method to a content-based recommender engine
Adding the capabilities of the content-based method to a collaborative-based recommender engine

Several studies compare the performance of conventional methods to that of a hybrid system, showing that hybrid recommender engines generally perform better and provide more reliable recommendations.

ML Clustering

In today’s world, AI has become an integral part of all automation and technology-based solutions and the area of recommendation systems is no different. Machine learning-based methods are the upcoming high prospective methods that are quickly becoming a norm as more and more companies start adapting AI.

Machine learning methods are of two types: unsupervised and supervised. This section discusses the unsupervised learning method, which is the ML clustering–based method. The unsupervised learning technique uses ML algorithms to find hidden patterns in data to cluster them without human intervention (unlabeled data). Clustering is the grouping of similar objects into clusters. On average, an object belonging to one cluster is more similar to an object within that cluster than to an object belonging to another cluster.

In recommendation engines, clustering is used to form groups of users similar to each other, as shown in Figure 1-6. It can also cluster similar items or products as well. Traditionally similarity measures like cosine similarity have been used to get similar users or items, but they have their demerits. If a user has not rated many items and the resultant user-item matrix is sparse, or when you need to compare multiple user-user pairings to find similar users, it gets computationally expensive. A clustering-based approach is generally taken to get similar users to overcome these issues. If a user is found to be similar to a cluster of users, that user is added to that cluster. Within the cluster, all users share interests and tastes, and recommendations are provided to users based on them.

The following are some of the popularly used clustering algorithms.

k-means clustering
fuzzy mapping
self-organizing maps (SOM)
a hybrid of two or more techniques

ML Classification

Again, clustering comes with its disadvantages. That’s where a classification-based recommendation system comes into play.

In classification based, the algorithm uses features of both items and users to predict whether a user will like a product or not. An application of the classification-based method is the buyer propensity model.

Propensity modeling predicts the chances of customers buying a particular item or any equivalent task. Also, for example, propensity modeling can help predict the likelihood that a sales lead will convert to a customer or not based on various features. The propensity score or probability is used to take action.

The following are some of the limitations of classification-based algorithms.

Collecting a combination of data about different users and items is sometimes difficult.
Classification is challenging.
The problem is training the models in real time.

Deep Learning

Deep learning is a branch of machine learning which is more powerful than ML-based algorithms and tends to produce better results. Of course, there are limitations, like the need for huge data or explainability, which we must overcome.

Various companies use deep neural networks (DNNs) to enhance the customer experience, especially if it’s unstructured data like images and text.

The following are three types of deep learning–based recommender systems.

Restricted Boltzmann
Autoencoder based
Neural attention–based

Later chapters explore how machine learning and deep learning can be leveraged to build powerful recommender systems.

Now that you have a good understanding of the concepts, let’s start with a simple rule-based recommender system in this chapter before proceeding to the implementation in upcoming chapters.

Rule-Based Recommendation Systems

You build these recommendation systems with simple rules, such as popularity-based or buy again.

Popularity

A popularity-based rule is the simplest form: a product is recommended based on its popularity (most sold, most clicked, etc.). Let’s implement a quick one. For example, a song listened to by many people means it’s popular. It is recommended to others without any other intelligence being part of it.

Let’s take a retail dataset and implement the same logic.

Fire up a Jupyter notebook and import the necessary packages.

#import necessary libraries

import pandas as pd

import numpy as np

#import viz libraries

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

Let’s import the data.

Note Refer to the data in this book’s data section. Download the dataset from the GitHub link of this book.

#import data

df = pd.read_csv('data.csv',encoding= 'unicode_escape')

df.head()

Figure 1-7 shows the output of the top 5 rows from the dataset.

# null value counts

df.isnull().sum().sort_values(ascending=False)

An output file depicts the null value counts. It includes customer I D, description, country (null), unit price (null), invoice date (null), quantity (null), stock code (null), invoice number (null), and d type.

# drop where Description is not available

df_new = df.dropna(subset=['Description'])

df_new.describe()

Figure 1-8 shows that the quantity has some negative values that are a part of the incorrect data, so lets drop them using the below code.

df_new = df_new[df_new.Quantity > 0]

df_new.describre()

Figure 1-8
The output contains negative values

Figure 1-9
shows the output after removing the negative values

Now that we cleaned up the data, let’s do some basic types of recommendation systems. These are not intelligent yet effective in some cases. Popularity-based recommendation systems could be a trending song. It could be a fast-selling item required for everyone, a recently released movie that gets traction, or a news article many users have read.

Sometimes it’s important to keep it simple because it gets you the most revenue. Let’s build a popularity-based system in the data we are using.

Global Popular Items

Let’s calculate popular items worldwide and then dice them into different regions.

# popular items globally

global_popularity=df_new.pivot_table(index=['StockCode','Description'], values='Quantity', aggfunc='sum').sort_values(by='Quantity', ascending=False)

print('Top 10 popular items globally....')

global_popularity.head(10)

Figure 1-10 shows that PAPER CRAFT is the most bought item across all regions. It’s a very popular item.

Let’s visualize it.

# vizualize top 10 most popular items

global_popularity.reset_index(inplace=True)

sns.barplot(y='Description', x='Quantity', data=global_popularity.head(10))

plt.title('Top 10 Most Popular Items Globally', fontsize=14)

plt.ylabel('Item')

Figure 1-11 shows the output of top 10 popular items.

Popular Items by Country

Let’s calculate popular items by country.

# popular items by country

countrywise=df_new.pivot_table(index=['Country','StockCode','Description'], values='Quantity', aggfunc='sum').reset_index()

# vizualize top 10 most popular items in UK

sns.barplot(y='Description', x='Quantity', data=countrywise[countrywise['Country']=='United Kingdom'].sort_values(by='Quantity', ascending=False).head(10))

plt.title('Top 10 Most Popular Items in UK', fontsize=14)

plt.ylabel('Item')

Figure 1-12 shows that PAPER CRAFT, LITTLE BIRDIE is the most purchased item. It’s very popular only in the United Kingdom.

# vizualize top 10 most popular items in Netherlands

sns.barplot(y='Description', x='Quantity', data=countrywise[countrywise['Country']=='Netherlands'].sort_values(by='Quantity', ascending=False).head(10))

plt.title('Top 10 Most Popular Items in Netherlands', fontsize=14)

plt.ylabel('Item')

Figure 1-13 shows that RABBIT NIGHT LIGHT is the most purchased item. It’s very popular in the Netherlands.

Buy Again

Now let’s discuss buy again. It’s another simple recommendation simple calculated at the customer/user level. You might have seen “Watch again” on streaming platforms. It’s the same concept. You know a certain set of actions are done repeatedly by a customer, and we recommend the same action next time.

This is very useful in online grocery platforms because customers come back and buy the same item again and again.

Let’s implement it.

# Lets create a function to get buy again output

from collections import Counter

def buy_again(customerid):

# Fetching the items bought by the customer for provided customer id

items_bought = df_new[df_new['CustomerID']==customerid].Description

# Count and sort the repeated purchases

bought_again = Counter(items_bought)

# Convert counter to list for printing recommendations

buy_again_list = list(bought_again)

# Printing the recommendations

print('Items you would like to buy again :')

return(buy_again_list)

Let’s use the function on customer 17850.

buy_again(17850)

Figure 1-14 recommends the holder and the lantern to customer 17850, given that he often buys these items.

Summary

In this chapter, you learned about recommender systems—how they work, their applications, and the various implementation types. You also learned about implicit and explicit types and the differences between them. The chapter also explored market basket analysis (association rule mining), content-based and collaborative-based filtering, hybrid systems, ML clustering-based and classification-based methods, and deep learning and NLP-based recommender systems. Finally, you implemented simple recommender systems. Many other complex algorithms are explored in upcoming chapters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 1. Introduction to Recommendation Systems

Create new playlist

Sign In

Sign Up

1. Introduction to Recommendation Systems

What Are Recommendation Engines?

Recommendation System Types

Types of Recommendation Engines

Market Basket Analysis (Association Rule Mining)

Content-Based Filtering

Collaborative-Based Filtering

Hybrid Systems

ML Clustering

ML Classification

Deep Learning

Rule-Based Recommendation Systems

Popularity

Global Popular Items

Popular Items by Country

Buy Again

Summary

Table of Contents for
1. Introduction to Recommendation Systems