Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A. Kulkarni et al.Applied Recommender Systems with Pythonhttps://doi.org/10.1007/978-1-4842-8954-9_6

6. Hybrid Recommender Systems

Akshay Kulkarni¹, Adarsha Shivananda², Anoosh Kulkarni³ and V Adithya Krishnan⁴

(1)

Bangalore, Karnataka, India

(2)

Hosanagara tq, Shimoga dt, Karnataka, India

(3)

Bangalore, India

(4)

Navi Mumbai, India

The previous chapters implemented recommendation engines using content-based and collaborative-based filtering methods. Each method has its pros and cons. Collaborative filtering suffers from cold-start, which means when there is a new customer or item in the data, recommendation won’t be possible.

Content-based filtering tends to recommend similar items to that purchased/liked before, becoming repetitive. There is no personalization effect in this case.

Figure 6-1 explains hybrid recommendation systems.

Figure 6-1
Hybrid recommendation systems

Reference: https://www.researchgate.net/profile/Xiangjie-Kong-2/publication/330077673/figure/fig5/AS:710433577107459@1546391972632/A-hybrid-paper-recommendation-system.png

To tackle some of these cons, introducing hybrid recommendations systems. Hybrid recommendation systems use a hybrid model (i.e., combining content-based and collaborative filtering methods). It will not only help to overcome the shortcomings of the individual models but also increase efficiency and give better recommendations in most cases.

This chapter implements a hybrid recommendation engine built to recommend products used for an e-commerce company. The LightFM Python package is used for this implementation.

For more information, refer to the LightFM documentation at https://making.lyst.com/lightfm/docs/home.html.

Implementation

Let’s import all the required libraries.

import pandas as pd

import numpy as np

from scipy.sparse import coo_matrix # for constructing sparse matrix

from lightfm import LightFM # for model

from lightfm.evaluation import auc_score

import time

import sklearn

from sklearn import model_selection

Data Collection

This chapter uses the same custom e-commerce dataset used in previous chapters. It can be found at github.com/apress/applied-recommender-systems-python.

The following reads the data.

#orders data

order_df = pd.read_excel('Rec_sys_data.xlsx','order')

#customers data

customer_df = pd.read_excel('Rec_sys_data.xlsx','customer')

#products data

product_df = pd.read_excel('Rec_sys_data.xlsx','product')

order_df.head()

Figure 6-2 shows the orders DataFrame.

customer_df.head()

Figure 6-3 shows the customers DataFrame.

product_df.head()

Figure 6-4 shows the products DataFrame.

Merge the data.

#merging all three data frames

merged_df = pd.merge(order_df,customer_df,left_on=['CustomerID'], right_on=['CustomerID'], how='left')

merged_df = pd.merge(merged_df,product_df,left_on=['StockCode'], right_on=['StockCode'], how='left')

merged_df.head()

Figure 6-5 shows the merged DataFrame that will be used.

Data Preparation

Before building the recommendation model, the required data must be in the proper format so that the model can take input. Let’s get the user-to-product interaction matrix and product-to-features interaction mappings.

Start with getting the list of unique users and unique products. Write two functions to get the unique lists.

def unique_users(data, column):

return np.sort(data[column].unique())

def unique_items(data, column):

item_list = data[column].unique()

return item_list

Create unique lists.

user_list = unique_users(order_df, "CustomerID")

item_list = unique_items(product_df, "Product Name")

user_list

The following is the output.

array([12346, 12347, 12348, ..., 18282, 18283, 18287], dtype=int64)

item_list

The following is the output.

array(['Ganma Superheroes Ordinary Life Case For Samsung Galaxy Note 5 Hard Case Cover',

'Eye Buy Express Prescription Glasses Mens Womens Burgundy Crystal Clear Yellow Rounded Rectangular Reading Glasses Anti Glare grade',

...,

'Mediven Sheer and Soft 15-20 mmHg Thigh w/ Lace Silicone Top Band CT Wheat II - Ankle 8-8.75 inches',

Union 3" Female Ports Stainless Steel Pipe Fitting',

'Auburn Leathercrafters Tuscany Leather Dog Collar’,

'3 1/2"W x 32"D x 36"H Traditional Arts & Crafts Smooth Bracket, Douglas Fir'])

Let’s create a function to get the total list of unique values given three feature names from a DataFrame. It gets the total unique list for three features: Customer Segment, Age, and Gender.

def features_to_add(customer, column1,column2,column3):

customer1 = customer[column1]

customer2 = customer[column2]

customer3 = customer[column3]

return pd.concat([customer1,customer3,customer2], ignore_index = True).unique()

Call the function for these features.

feature_unique_list = features_to_add(customer_df,'Customer Segment',"Age","Gender")

feature_unique_list

The following is the output.

array(['Small Business', 'Corporate', 'Middle class', 'male', 'female',

53, 22, 29, 36, 48, 45, 47, 23, 39, 34, 52, 51, 35, 19, 26, 37, 18,

20, 21, 41, 31, 28, 50, 38, 30, 25, 32, 55, 43, 54, 49, 40, 33, 44,

46, 42, 27, 24], dtype=object)

Now that we have the unique list for users, products, and features, we need to create ID mappings to convert user_id, item_id, and feature_id into integer indices because LightFM can’t read any other data types.

Let’s write a function for that.

def mapping(user_list, item_list, feature_unique_list):

#creating empty output dicts

user_to_index_mapping = {}

index_to_user_mapping = {}

# Create id mappings to convert user_id

for user_index, user_id in enumerate(user_list):

user_to_index_mapping[user_id] = user_index

index_to_user_mapping[user_index] = user_id

item_to_index_mapping = {}

index_to_item_mapping = {}

# Create id mappings to convert item_id

for item_index, item_id in enumerate(item_list):

item_to_index_mapping[item_id] = item_index

index_to_item_mapping[item_index] = item_id

feature_to_index_mapping = {}

index_to_feature_mapping = {}

# Create id mappings to convert feature_id

for feature_index, feature_id in enumerate(feature_unique_list):

feature_to_index_mapping[feature_id] = feature_index

index_to_feature_mapping[feature_index] = feature_id

return user_to_index_mapping, index_to_user_mapping,

item_to_index_mapping, index_to_item_mapping,

feature_to_index_mapping, index_to_feature_mapping

Call the function by giving user_list, item_list, and feature_unique_list as input.

user_to_index_mapping, index_to_user_mapping,

item_to_index_mapping, index_to_item_mapping,

feature_to_index_mapping, index_to_feature_mapping = mapping(user_list, item_list, feature_unique_list)

user_to_index_mapping

The following is the output.

{12346: 0,

12347: 1,

12348: 2,

12350: 3,

12352: 4,

...}

Now let’s fetch the user-to-product relationship and calculate the total quantity per user.

user_to_product = merged_df[['CustomerID','Product Name','Quantity']]

#Calculating the total quantity(sum) per customer-product

user_to_product = user_to_product.groupby(['CustomerID','Product Name']).agg({'Quantity':'sum'}).reset_index()

user_to_product.tail()

Figure 6-6 shows the user-to-product relationship data.

Figure 6-6
User-to-product relationship data

Similarly, let’s get the product-to-features relationship data.

product_to_feature = merged_df[['Product Name','Customer Segment','Quantity']]

#Calculating the total quantity(sum) per customer_segment-product

product_to_feature = product_to_feature.groupby(['Product Name','Customer Segment']).agg({'Quantity':'sum'}).reset_index()

product_to_feature.head()

Figure 6-7 shows the product-to-features relationship data.

Figure 6-7
Product-to-features relationship data

Let’s split the user-to-product relationship into train and test data.

user_to_product_train,user_to_product_test = model_selection.train_test_split(user_to_product,test_size=0.33, random_state=42)

print("Training set size:")

print(user_to_product_train.shape)

print("Test set size:")

print(user_to_product_test.shape)

The following is the output.

Training set size:

(92729, 3)

Test set size:

(45673, 3)

Now that the data and the ID mappings are in place, to get the user-to-product and product-to-features interaction matrix, let’s first create a function that returns the interaction matrix.

def interactions(data, row, col, value, row_map, col_map):

#converting the row with its given mappings

row = data[row].apply(lambda x: row_map[x]).values

#converting the col with its given mappings

col = data[col].apply(lambda x: col_map[x]).values

value = data[value].values

#returning the interaction matrix

return coo_matrix((value, (row, col)), shape = (len(row_map), len(col_map)))

Then let’s generate user_item_interaction_matrix for train and test data using the preceding function.

#for train

user_to_product_interaction_train = interactions(user_to_product_train, "CustomerID",

"Product Name", "Quantity", user_to_index_mapping, item_to_index_mapping)

#for test

user_to_product_interaction_test = interactions(user_to_product_test, "CustomerID",

"Product Name", "Quantity", user_to_index_mapping, item_to_index_mapping)

print(user_to_product_interaction_train)

The following is the output.

(2124, 230) 10

(1060, 268) 16

: :

(64, 8) 24

(3406, 109) 1

(3219, 12) 12

Similarly, let’s generate the product-to-features interaction matrix.

product_to_feature_interaction = interactions(product_to_feature, "Product Name", "Customer Segment","Quantity",item_to_index_mapping, feature_to_index_mapping)

Model Building

The data is in the correct format, so let’s begin the modeling process. This chapter uses the LightFM model, which can incorporate user and item metadata to form robust hybrid recommendation models.

Let’s try multiple models and then choose the one with the best performance. These models have different hyperparameters, so this is part of the hyperparameter tuning stage of modeling.

The loss function used in the model is one of the parameters to tune. The three values are warp, logistic, and bpr.

Let’s start the model-building experiment.

Attempt 1 is loss = warp, epochs = 1, and num_threads = 4.

# initialising model with warp loss function

model_with_features = LightFM(loss = "warp")

start = time.time()

#===================

# fitting the model with hybrid collaborative filtering + content based (product + features)

model_with_features.fit_partial(user_to_product_interaction_train,

user_features=None,

item_features=product_to_feature_interaction,

sample_weight=None,

epochs=1,

num_threads=4,

verbose=False)

#===================

end = time.time()

print("time taken = {0:.{1}f} seconds".format(end - start, 2))

The following is the output.

time taken = 0.11 seconds

Calculate the area under the curve (AUC) score for validation.

start = time.time()

#===================

# Getting the AUC score using in-built function

auc_with_features = auc_score(model = model_with_features,

test_interactions = user_to_product_interaction_test,

train_interactions = user_to_product_interaction_train,

item_features = product_to_feature_interaction,

num_threads = 4, check_intersections=False)

#===================

end = time.time()

print("time taken = {0:.{1}f} seconds".format(end - start, 2))

print("average AUC without adding item-feature interaction = {0:.{1}f}".format(auc_with_features.mean(), 2))

The following is the output.

time taken = 0.24 seconds

average AUC without adding item-feature interaction = 0.17

Attempt 2 is loss = logistic, epochs = 1, and num_threads = 4.

# initialising model with warp loss function

model_with_features = LightFM(loss = "logistic")

start = time.time()

#===================

# fitting the model with hybrid collaborative filtering + content based (product + features)

model_with_features.fit_partial(user_to_product_interaction_train,

user_features=None,

item_features=product_to_feature_interaction,

sample_weight=None,

epochs=1,

num_threads=4,

verbose=False)

#===================

end = time.time()

print("time taken = {0:.{1}f} seconds".format(end - start, 2))

The following is the output.

time taken = 0.11 seconds

Calculate the AUC score for the preceding model.

start = time.time()

#===================

# Getting the AUC score using in-built function

auc_with_features = auc_score(model = model_with_features,

test_interactions = user_to_product_interaction_test,

train_interactions = user_to_product_interaction_train,

item_features = product_to_feature_interaction,

num_threads = 4, check_intersections=False)

#===================

end = time.time()

print("time taken = {0:.{1}f} seconds".format(end - start, 2))

print("average AUC without adding item-feature interaction = {0:.{1}f}".format(auc_with_features.mean(), 2))

The following is the output.

time taken = 0.22 seconds

average AUC without adding item-feature interaction = 0.89

Attempt 3 is loss = bpr, epochs = 1, and num_threads = 4.

# initialising model with warp loss function

model_with_features = LightFM(loss = "bpr")

start = time.time()

#===================

# fitting the model with hybrid collaborative filtering + content based (product + features)

model_with_features.fit_partial(user_to_product_interaction_train,

user_features=None,

item_features=product_to_feature_interaction,

sample_weight=None,

epochs=1,

num_threads=4,

verbose=False)

#===================

end = time.time()

print("time taken = {0:.{1}f} seconds".format(end - start, 2))

The following is the output.

time taken = 0.12 seconds

Calculate the AUC score for the preceding model.

The following is the output.

Attempt 4 is loss = logistic, epochs = 10, and num_threads = 20.

The following is the output.

Calculate the AUC score for the preceding model.

start = time.time()

#===================

# Getting the AUC score using in-built function

auc_with_features = auc_score(model = model_with_features,

test_interactions = user_to_product_interaction_test,

train_interactions = user_to_product_interaction_train,

item_features = product_to_feature_interaction,

num_threads = 4, check_intersections=False)

#===================

end = time.time()

print("time taken = {0:.{1}f} seconds".format(end - start, 2))

print("average AUC without adding item-feature interaction = {0:.{1}f}".format(auc_with_features.mean(), 2))

The following is the output.

time taken = 0.22 seconds

average AUC without adding item-feature interaction = 0.38

model_with_features = LightFM(loss = "logistic")

start = time.time()

#===================

# fitting the model with hybrid collaborative filtering + content based (product + features)

model_with_features.fit_partial(user_to_product_interaction_train,

user_features=None,

item_features=product_to_feature_interaction,

sample_weight=None,

epochs=10,

num_threads=20,

verbose=False)

#===================

end = time.time()

print("time taken = {0:.{1}f} seconds".format(end - start, 2))

time taken = 0.77 seconds

start = time.time()

#===================

# Getting the AUC score using in-built function

auc_with_features = auc_score(model = model_with_features,

test_interactions = user_to_product_interaction_test,

train_interactions = user_to_product_interaction_train,

item_features = product_to_feature_interaction,

num_threads = 4, check_intersections=False)

#===================

end = time.time()

print("time taken = {0:.{1}f} seconds".format(end - start, 2))

print("average AUC without adding item-feature interaction = {0:.{1}f}".format(auc_with_features.mean(), 2))

time taken = 0.25 seconds

average AUC without adding item-feature interaction = 0.89

The last model (logistic) performed the best overall (highest AUC score). Let’s merge the train and test and do a final training by using the parameters from the logistic model, which gave 0.89 AUC.

Merge the train and test with the following function.

def train_test_merge(training_data, testing_data):

# initialising train dict

train_dict = {}

for row, col, data in zip(training_data.row, training_data.col, training_data.data):

train_dict[(row, col)] = data

# replacing with the test set

for row, col, data in zip(testing_data.row, testing_data.col, testing_data.data):

train_dict[(row, col)] = max(data, train_dict.get((row, col), 0))

# converting to the row

row_list = []

col_list = []

data_list = []

for row, col in train_dict:

row_list.append(row)

col_list.append(col)

data_list.append(train_dict[(row, col)])

# converting to np array

row_list = np.array(row_list)

col_list = np.array(col_list)

data_list = np.array(data_list)

#returning the matrix output

return coo_matrix((data_list, (row_list, col_list)), shape = (training_data.shape[0], training_data.shape[1]))

Call the preceding function to get the final (full) data to build the final model.

user_to_product_interaction = train_test_merge(user_to_product_interaction_train, user_to_product_interaction_test)

Final Model after Combining the Train and Test Data

Let’s build the LightFM model with loss = logistic, epochs = 10, and num_threads = 20.

# retraining the final model with combined dataset

final_model = LightFM(loss = "warp",no_components=30)

# fitting to combined dataset

start = time.time()

#===================

#final model fitting

final_model.fit(user_to_product_interaction,

user_features=None,

item_features=product_to_feature_interaction,

sample_weight=None,

epochs=10,

num_threads=20,

verbose=False)

#===================

end = time.time()

print("time taken = {0:.{1}f} seconds".format(end - start, 2))

The following is the output.

time taken = 3.46 seconds

Getting Recommendations

Now that the hybrid recommendation model is ready, let’s use it to get the recommendations for a given user.

Let’s write a function for getting those recommendations given a user id as input.

def get_recommendations(model,user,items,user_to_product_interaction_matrix,user2index_map,product_to_feature_interaction_matrix):

# getting the userindex

userindex = user2index_map.get(user, None)

if userindex == None:

return None

users = userindex

# getting products already bought

known_positives = items[user_to_product_interaction_matrix.tocsr()[userindex].indices]

print('User index =',users)

# scores from model prediction

scores = model.predict(user_ids = users, item_ids = np.arange(user_to_product_interaction_matrix.shape[1]),item_features=product_to_feature_interaction_matrix)

#getting top items

top_items = items[np.argsort(-scores)]

# printing out the result

print("User %s" % user)

print(" Known positives:")

for x in known_positives[:10]:

print(" %s" % x)

print(" Recommended:")

for x in top_items[:10]:

print(" %s" % x)

This function calculates a user’s prediction score (the likelihood to buy) for all items, and the ten highest scored items are recommended. Let’s print the known positives or items bought by that user for validation.

Call the following function for a random user (CustomerID 17017) to get recommendations.

get_recommendations(final_model,17017,item_list,user_to_product_interaction,user_to_index_mapping,product_to_feature_interaction)

The following is the output.

User index = 2888

User 17017

Known positives:

Ganma Superheroes Ordinary Life Case For Samsung Galaxy Note 5 Hard Case Cover

MightySkins Skin Decal Wrap Compatible with Nintendo Sticker Protective Cover 100's of Color Options

Mediven Sheer and Soft 15-20 mmHg Thigh w/ Lace Silicone Top Band CT Wheat II - Ankle 8-8.75 inches

MightySkins Skin Decal Wrap Compatible with OtterBox Sticker Protective Cover 100's of Color Options

MightySkins Skin Decal Wrap Compatible with DJI Sticker Protective Cover 100's of Color Options

MightySkins Skin Decal Wrap Compatible with Lenovo Sticker Protective Cover 100's of Color Options

Ebe Reading Glasses Mens Womens Tortoise Bold Rectangular Full Frame Anti Glare grade ckbdp9088

Window Tint Film Chevy (back doors) DIY

Union 3" Female Ports Stainless Steel Pipe Fitting

Ebe Women Reading Glasses Reader Cheaters Anti Reflective Lenses TR90 ry2209

Recommended:

Mediven Sheer and Soft 15-20 mmHg Thigh w/ Lace Silicone Top Band CT Wheat II - Ankle 8-8.75 inches

MightySkins Skin Decal Wrap Compatible with Apple Sticker Protective Cover 100's of Color Options

MightySkins Skin Decal Wrap Compatible with DJI Sticker Protective Cover 100's of Color Options

3 1/2"W x 20"D x 20"H Funston Craftsman Smooth Bracket, Douglas Fir

MightySkins Skin Decal Wrap Compatible with HP Sticker Protective Cover 100's of Color Options

Owlpack Clear Poly Bags with Open End, 1.5 Mil, Perfect for Products, Merchandise, Goody Bags, Party Favors (4x4 inches)

Ebe Women Reading Glasses Reader Cheaters Anti Reflective Lenses TR90 ry2209

Handcrafted Ercolano Music Box Featuring "Luncheon of the Boating Party" by Renoir, Pierre Auguste - New YorkNew York

A6 Invitation Envelopes w/Peel & Press (4 3/4 x 6 1/2) - Baby Blue (1000 Qty.)

MightySkins Skin Decal Wrap Compatible with Lenovo Sticker Protective Cover 100's of Color Options

Many recommendations align with the known positives. This provides further validation. This hybrid recommendation engine can now get recommendations for all other users.

Summary

This chapter discussed hybrid recommendation engines and how they can overcome the shortfalls of other types of engines. It also showcased the implementation with the help of LightFM.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. Hybrid Recommender Systems

Create new playlist

Sign In

Sign Up

6. Hybrid Recommender Systems

Implementation

Data Collection

Data Preparation

Model Building

Final Model after Combining the Train and Test Data

Getting Recommendations

Summary

Table of Contents for
6. Hybrid Recommender Systems