© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. Kulkarni et al.Applied Recommender Systems with Pythonhttps://doi.org/10.1007/978-1-4842-8954-9_6

6. Hybrid Recommender Systems

Akshay Kulkarni1  , Adarsha Shivananda2, Anoosh Kulkarni3 and V Adithya Krishnan4
(1)
Bangalore, Karnataka, India
(2)
Hosanagara tq, Shimoga dt, Karnataka, India
(3)
Bangalore, India
(4)
Navi Mumbai, India
 

The previous chapters implemented recommendation engines using content-based and collaborative-based filtering methods. Each method has its pros and cons. Collaborative filtering suffers from cold-start, which means when there is a new customer or item in the data, recommendation won’t be possible.

Content-based filtering tends to recommend similar items to that purchased/liked before, becoming repetitive. There is no personalization effect in this case.

Figure 6-1 explains hybrid recommendation systems.

An architecture of hybrid recommendation system. It starts from researcher to content-based filtering, hybrid recommendation, and collaborative filtering.

Figure 6-1

Hybrid recommendation systems

Reference: https://www.researchgate.net/profile/Xiangjie-Kong-2/publication/330077673/figure/fig5/AS:710433577107459@1546391972632/A-hybrid-paper-recommendation-system.png

To tackle some of these cons, introducing hybrid recommendations systems. Hybrid recommendation systems use a hybrid model (i.e., combining content-based and collaborative filtering methods). It will not only help to overcome the shortcomings of the individual models but also increase efficiency and give better recommendations in most cases.

This chapter implements a hybrid recommendation engine built to recommend products used for an e-commerce company. The LightFM Python package is used for this implementation.

For more information, refer to the LightFM documentation at https://making.lyst.com/lightfm/docs/home.html.

Implementation

Let’s import all the required libraries.
import pandas as pd
import numpy as np
from scipy.sparse import coo_matrix # for constructing sparse matrix
from lightfm import LightFM # for model
from lightfm.evaluation import auc_score
import time
import sklearn
from sklearn import model_selection

Data Collection

This chapter uses the same custom e-commerce dataset used in previous chapters. It can be found at github.com/apress/applied-recommender-systems-python.

The following reads the data.
#orders data
order_df = pd.read_excel('Rec_sys_data.xlsx','order')
#customers data
customer_df = pd.read_excel('Rec_sys_data.xlsx','customer')
#products data
product_df = pd.read_excel('Rec_sys_data.xlsx','product')
order_df.head()
Figure 6-2 shows the orders DataFrame.

An output file depicts the list of the first five orders data frame. It includes invoice number, stock code, quantity, invoice date, delivery date, discount percentage, ship mode, shipping cost, and customer I D.

Figure 6-2

Orders data

customer_df.head()
Figure 6-3 shows the customers DataFrame.

An output file depicts the list of the first five rows of the customers data frame. It includes customer I D, gender, age, income, zip code, and customer segment.

Figure 6-3

Customers data

product_df.head()
Figure 6-4 shows the products DataFrame.

An output file depicts the list of the first five rows of the products data frame. It includes stock code, product name, description, category, brand, and unit price.

Figure 6-4

Products data

Merge the data.
#merging all three data frames
merged_df = pd.merge(order_df,customer_df,left_on=['CustomerID'], right_on=['CustomerID'], how='left')
merged_df = pd.merge(merged_df,product_df,left_on=['StockCode'], right_on=['StockCode'], how='left')
merged_df.head()
Figure 6-5 shows the merged DataFrame that will be used.

An output file depicts the merged data frame. It includes invoice number, stock code, quantity, invoice date, delivery date, discount percentage, and ship mode followed by details of a single customer.

Figure 6-5

Merged data

Data Preparation

Before building the recommendation model, the required data must be in the proper format so that the model can take input. Let’s get the user-to-product interaction matrix and product-to-features interaction mappings.

Start with getting the list of unique users and unique products. Write two functions to get the unique lists.
def unique_users(data, column):
    return np.sort(data[column].unique())
def unique_items(data, column):
    item_list = data[column].unique()
    return item_list
Create unique lists.
user_list = unique_users(order_df, "CustomerID")
item_list = unique_items(product_df, "Product Name")
user_list
The following is the output.
array([12346, 12347, 12348, ..., 18282, 18283, 18287], dtype=int64)
item_list
The following is the output.
array(['Ganma Superheroes Ordinary Life Case For Samsung Galaxy Note 5 Hard Case Cover',
       'Eye Buy Express Prescription Glasses Mens Womens Burgundy Crystal Clear Yellow Rounded Rectangular Reading Glasses Anti Glare grade',
        ...,
       'Mediven Sheer and Soft 15-20 mmHg Thigh w/ Lace Silicone Top Band CT Wheat II - Ankle 8-8.75 inches',
       Union 3" Female Ports Stainless Steel Pipe Fitting',
       'Auburn Leathercrafters Tuscany Leather Dog Collar’,
       '3 1/2"W x 32"D x 36"H Traditional Arts & Crafts Smooth Bracket, Douglas Fir'])
Let’s create a function to get the total list of unique values given three feature names from a DataFrame. It gets the total unique list for three features: Customer Segment, Age, and Gender.
def features_to_add(customer, column1,column2,column3):
    customer1 = customer[column1]
    customer2 = customer[column2]
    customer3 = customer[column3]
    return pd.concat([customer1,customer3,customer2], ignore_index = True).unique()
Call the function for these features.
feature_unique_list = features_to_add(customer_df,'Customer Segment',"Age","Gender")
feature_unique_list
The following is the output.
array(['Small Business', 'Corporate', 'Middle class', 'male', 'female',
       53, 22, 29, 36, 48, 45, 47, 23, 39, 34, 52, 51, 35, 19, 26, 37, 18,
       20, 21, 41, 31, 28, 50, 38, 30, 25, 32, 55, 43, 54, 49, 40, 33, 44,
       46, 42, 27, 24], dtype=object)

Now that we have the unique list for users, products, and features, we need to create ID mappings to convert user_id, item_id, and feature_id into integer indices because LightFM can’t read any other data types.

Let’s write a function for that.
def mapping(user_list, item_list, feature_unique_list):
    #creating empty output dicts
    user_to_index_mapping = {}
    index_to_user_mapping = {}
    # Create id mappings to convert user_id
    for user_index, user_id in enumerate(user_list):
        user_to_index_mapping[user_id] = user_index
        index_to_user_mapping[user_index] = user_id
    item_to_index_mapping = {}
    index_to_item_mapping = {}
    # Create id mappings to convert item_id
    for item_index, item_id in enumerate(item_list):
        item_to_index_mapping[item_id] = item_index
        index_to_item_mapping[item_index] = item_id
    feature_to_index_mapping = {}
    index_to_feature_mapping = {}
    # Create id mappings to convert feature_id
    for feature_index, feature_id in enumerate(feature_unique_list):
        feature_to_index_mapping[feature_id] = feature_index
        index_to_feature_mapping[feature_index] = feature_id
    return user_to_index_mapping, index_to_user_mapping,
           item_to_index_mapping, index_to_item_mapping,
           feature_to_index_mapping, index_to_feature_mapping
Call the function by giving user_list, item_list, and feature_unique_list as input.
user_to_index_mapping, index_to_user_mapping,
           item_to_index_mapping, index_to_item_mapping,
           feature_to_index_mapping, index_to_feature_mapping = mapping(user_list, item_list, feature_unique_list)
user_to_index_mapping
The following is the output.
{12346: 0,
 12347: 1,
 12348: 2,
 12350: 3,
 12352: 4,
...}
Now let’s fetch the user-to-product relationship and calculate the total quantity per user.
user_to_product = merged_df[['CustomerID','Product Name','Quantity']]
#Calculating the total quantity(sum) per customer-product
user_to_product = user_to_product.groupby(['CustomerID','Product Name']).agg({'Quantity':'sum'}).reset_index()
user_to_product.tail()
Figure 6-6 shows the user-to-product relationship data.

An output file depicts the user-to-product relationship data. It includes customer I D, product name, and quantity of a single customer.

Figure 6-6

User-to-product relationship data

Similarly, let’s get the product-to-features relationship data.
product_to_feature = merged_df[['Product Name','Customer Segment','Quantity']]
#Calculating the total quantity(sum) per customer_segment-product
product_to_feature = product_to_feature.groupby(['Product Name','Customer Segment']).agg({'Quantity':'sum'}).reset_index()
product_to_feature.head()
Figure 6-7 shows the product-to-features relationship data.

An output file depicts the relationship data of product-to-features. It includes the product name, customer segment, and quantity of a single customer.

Figure 6-7

Product-to-features relationship data

Let’s split the user-to-product relationship into train and test data.
user_to_product_train,user_to_product_test =  model_selection.train_test_split(user_to_product,test_size=0.33, random_state=42)
print("Training set size:")
print(user_to_product_train.shape)
print("Test set size:")
print(user_to_product_test.shape)
The following is the output.
Training set size:
(92729, 3)
Test set size:
(45673, 3)
Now that the data and the ID mappings are in place, to get the user-to-product and product-to-features interaction matrix, let’s first create a function that returns the interaction matrix.
def interactions(data, row, col, value, row_map, col_map):
    #converting the row with its given mappings
    row = data[row].apply(lambda x: row_map[x]).values
    #converting the col with its given mappings
    col = data[col].apply(lambda x: col_map[x]).values
    value = data[value].values
    #returning the interaction matrix
    return coo_matrix((value, (row, col)), shape = (len(row_map), len(col_map)))
Then let’s generate user_item_interaction_matrix for train and test data using the preceding function.
#for train
user_to_product_interaction_train = interactions(user_to_product_train, "CustomerID",
"Product Name", "Quantity", user_to_index_mapping, item_to_index_mapping)
#for test
user_to_product_interaction_test = interactions(user_to_product_test, "CustomerID",
"Product Name", "Quantity", user_to_index_mapping, item_to_index_mapping)
print(user_to_product_interaction_train)
The following is the output.
  (2124, 230)  10
  (1060, 268)  16
  :            :
  (64, 8)      24
  (3406, 109)   1
  (3219, 12)   12
Similarly, let’s generate the product-to-features interaction matrix.
product_to_feature_interaction = interactions(product_to_feature, "Product Name", "Customer Segment","Quantity",item_to_index_mapping, feature_to_index_mapping)

Model Building

The data is in the correct format, so let’s begin the modeling process. This chapter uses the LightFM model, which can incorporate user and item metadata to form robust hybrid recommendation models.

Let’s try multiple models and then choose the one with the best performance. These models have different hyperparameters, so this is part of the hyperparameter tuning stage of modeling.

The loss function used in the model is one of the parameters to tune. The three values are warp, logistic, and bpr.

Let’s start the model-building experiment.

Attempt 1 is loss = warp, epochs = 1, and num_threads = 4.
# initialising model with warp loss function
model_with_features = LightFM(loss = "warp")
start = time.time()
#===================
# fitting the model with hybrid collaborative filtering + content based (product + features)
model_with_features.fit_partial(user_to_product_interaction_train,
          user_features=None,
          item_features=product_to_feature_interaction,
          sample_weight=None,
          epochs=1,
          num_threads=4,
          verbose=False)
#===================
end = time.time()
print("time taken = {0:.{1}f} seconds".format(end - start, 2))
The following is the output.
time taken = 0.11 seconds
Calculate the area under the curve (AUC) score for validation.
start = time.time()
#===================
# Getting the AUC score using in-built function
auc_with_features = auc_score(model = model_with_features,
                        test_interactions = user_to_product_interaction_test,
                        train_interactions = user_to_product_interaction_train,
                        item_features = product_to_feature_interaction,
                        num_threads = 4, check_intersections=False)
#===================
end = time.time()
print("time taken = {0:.{1}f} seconds".format(end - start, 2))
print("average AUC without adding item-feature interaction = {0:.{1}f}".format(auc_with_features.mean(), 2))
The following is the output.
time taken = 0.24 seconds
average AUC without adding item-feature interaction = 0.17
Attempt 2 is loss = logistic, epochs = 1, and num_threads = 4.
# initialising model with warp loss function
model_with_features = LightFM(loss = "logistic")
start = time.time()
#===================
# fitting the model with hybrid collaborative filtering + content based (product + features)
model_with_features.fit_partial(user_to_product_interaction_train,
          user_features=None,
          item_features=product_to_feature_interaction,
          sample_weight=None,
          epochs=1,
          num_threads=4,
          verbose=False)
#===================
end = time.time()
print("time taken = {0:.{1}f} seconds".format(end - start, 2))
The following is the output.
time taken = 0.11 seconds
Calculate the AUC score for the preceding model.
start = time.time()
#===================
# Getting the AUC score using in-built function
auc_with_features = auc_score(model = model_with_features,
                        test_interactions = user_to_product_interaction_test,
                        train_interactions = user_to_product_interaction_train,
                        item_features = product_to_feature_interaction,
                        num_threads = 4, check_intersections=False)
#===================
end = time.time()
print("time taken = {0:.{1}f} seconds".format(end - start, 2))
print("average AUC without adding item-feature interaction = {0:.{1}f}".format(auc_with_features.mean(), 2))
The following is the output.
time taken = 0.22 seconds
average AUC without adding item-feature interaction = 0.89
Attempt 3 is loss = bpr, epochs = 1, and num_threads = 4.
# initialising model with warp loss function
model_with_features = LightFM(loss = "bpr")
start = time.time()
#===================
# fitting the model with hybrid collaborative filtering + content based (product + features)
model_with_features.fit_partial(user_to_product_interaction_train,
          user_features=None,
          item_features=product_to_feature_interaction,
          sample_weight=None,
          epochs=1,
          num_threads=4,
          verbose=False)
#===================
end = time.time()
print("time taken = {0:.{1}f} seconds".format(end - start, 2))
The following is the output.
time taken = 0.12 seconds

Calculate the AUC score for the preceding model.

The following is the output.

Attempt 4 is loss = logistic, epochs = 10, and num_threads = 20.

The following is the output.

Calculate the AUC score for the preceding model.
start = time.time()
#===================
# Getting the AUC score using in-built function
auc_with_features = auc_score(model = model_with_features,
                        test_interactions = user_to_product_interaction_test,
                        train_interactions = user_to_product_interaction_train,
                        item_features = product_to_feature_interaction,
                        num_threads = 4, check_intersections=False)
#===================
end = time.time()
print("time taken = {0:.{1}f} seconds".format(end - start, 2))
print("average AUC without adding item-feature interaction = {0:.{1}f}".format(auc_with_features.mean(), 2))
The following is the output.
time taken = 0.22 seconds
average AUC without adding item-feature interaction = 0.38
model_with_features = LightFM(loss = "logistic")
start = time.time()
#===================
# fitting the model with hybrid collaborative filtering + content based (product + features)
model_with_features.fit_partial(user_to_product_interaction_train,
          user_features=None,
          item_features=product_to_feature_interaction,
          sample_weight=None,
          epochs=10,
          num_threads=20,
          verbose=False)
#===================
end = time.time()
print("time taken = {0:.{1}f} seconds".format(end - start, 2))
time taken = 0.77 seconds
start = time.time()
#===================
# Getting the AUC score using in-built function
auc_with_features = auc_score(model = model_with_features,
                        test_interactions = user_to_product_interaction_test,
                        train_interactions = user_to_product_interaction_train,
                        item_features = product_to_feature_interaction,
                        num_threads = 4, check_intersections=False)
#===================
end = time.time()
print("time taken = {0:.{1}f} seconds".format(end - start, 2))
print("average AUC without adding item-feature interaction = {0:.{1}f}".format(auc_with_features.mean(), 2))
time taken = 0.25 seconds
average AUC without adding item-feature interaction = 0.89

The last model (logistic) performed the best overall (highest AUC score). Let’s merge the train and test and do a final training by using the parameters from the logistic model, which gave 0.89 AUC.

Merge the train and test with the following function.
def train_test_merge(training_data, testing_data):
    # initialising train dict
    train_dict = {}
    for row, col, data in zip(training_data.row, training_data.col, training_data.data):
        train_dict[(row, col)] = data
    # replacing with the test set
    for row, col, data in zip(testing_data.row, testing_data.col, testing_data.data):
        train_dict[(row, col)] = max(data, train_dict.get((row, col), 0))
    # converting to the row
    row_list = []
    col_list = []
    data_list = []
    for row, col in train_dict:
        row_list.append(row)
        col_list.append(col)
        data_list.append(train_dict[(row, col)])
    # converting to np array
    row_list = np.array(row_list)
    col_list = np.array(col_list)
    data_list = np.array(data_list)
    #returning the matrix output
    return coo_matrix((data_list, (row_list, col_list)), shape = (training_data.shape[0], training_data.shape[1]))
Call the preceding function to get the final (full) data to build the final model.
user_to_product_interaction = train_test_merge(user_to_product_interaction_train, user_to_product_interaction_test)

Final Model after Combining the Train and Test Data

Let’s build the LightFM model with loss = logistic, epochs = 10, and num_threads = 20.
# retraining the final model with combined dataset
final_model = LightFM(loss = "warp",no_components=30)
# fitting to combined dataset
start = time.time()
#===================
#final model fitting
final_model.fit(user_to_product_interaction,
          user_features=None,
          item_features=product_to_feature_interaction,
          sample_weight=None,
          epochs=10,
          num_threads=20,
          verbose=False)
#===================
end = time.time()
print("time taken = {0:.{1}f} seconds".format(end - start, 2))
The following is the output.
time taken = 3.46 seconds

Getting Recommendations

Now that the hybrid recommendation model is ready, let’s use it to get the recommendations for a given user.

Let’s write a function for getting those recommendations given a user id as input.
def get_recommendations(model,user,items,user_to_product_interaction_matrix,user2index_map,product_to_feature_interaction_matrix):
# getting the userindex
    userindex = user2index_map.get(user, None)
    if userindex == None:
        return None
    users = userindex
    # getting products already bought
    known_positives = items[user_to_product_interaction_matrix.tocsr()[userindex].indices]
    print('User index =',users)
    # scores from model prediction
    scores = model.predict(user_ids = users, item_ids = np.arange(user_to_product_interaction_matrix.shape[1]),item_features=product_to_feature_interaction_matrix)
    #getting top items
    top_items = items[np.argsort(-scores)]
    # printing out the result
    print("User %s" % user)
    print("     Known positives:")
    for x in known_positives[:10]:
        print("                  %s" % x)
    print("     Recommended:")
    for x in top_items[:10]:
        print("                  %s" % x)

This function calculates a user’s prediction score (the likelihood to buy) for all items, and the ten highest scored items are recommended. Let’s print the known positives or items bought by that user for validation.

Call the following function for a random user (CustomerID 17017) to get recommendations.
get_recommendations(final_model,17017,item_list,user_to_product_interaction,user_to_index_mapping,product_to_feature_interaction)
The following is the output.
User index = 2888
User 17017
Known positives:
Ganma Superheroes Ordinary Life Case For Samsung Galaxy Note 5 Hard Case Cover
MightySkins Skin Decal Wrap Compatible with Nintendo Sticker Protective Cover 100's of Color Options
Mediven Sheer and Soft 15-20 mmHg Thigh w/ Lace Silicone Top Band CT Wheat II - Ankle 8-8.75 inches
MightySkins Skin Decal Wrap Compatible with OtterBox Sticker Protective Cover 100's of Color Options
MightySkins Skin Decal Wrap Compatible with DJI Sticker Protective Cover 100's of Color Options
MightySkins Skin Decal Wrap Compatible with Lenovo Sticker Protective Cover 100's of Color Options
Ebe Reading Glasses Mens Womens Tortoise Bold Rectangular Full Frame Anti Glare grade ckbdp9088
Window Tint Film Chevy (back doors) DIY
Union 3" Female Ports Stainless Steel Pipe Fitting
Ebe Women Reading Glasses Reader Cheaters Anti Reflective Lenses TR90 ry2209
Recommended:
Mediven Sheer and Soft 15-20 mmHg Thigh w/ Lace Silicone Top Band CT Wheat II - Ankle 8-8.75 inches
MightySkins Skin Decal Wrap Compatible with Apple Sticker Protective Cover 100's of Color Options
MightySkins Skin Decal Wrap Compatible with DJI Sticker Protective Cover 100's of Color Options
3 1/2"W x 20"D x 20"H Funston Craftsman Smooth Bracket, Douglas Fir
MightySkins Skin Decal Wrap Compatible with HP Sticker Protective Cover 100's of Color Options
Owlpack Clear Poly Bags with Open End, 1.5 Mil, Perfect for Products, Merchandise, Goody Bags, Party Favors (4x4 inches)
Ebe Women Reading Glasses Reader Cheaters Anti Reflective Lenses TR90 ry2209
Handcrafted Ercolano Music Box Featuring "Luncheon of the Boating Party" by Renoir, Pierre Auguste - New YorkNew York
A6 Invitation Envelopes w/Peel & Press (4 3/4 x 6 1/2) - Baby Blue (1000 Qty.)
MightySkins Skin Decal Wrap Compatible with Lenovo Sticker Protective Cover 100's of Color Options

Many recommendations align with the known positives. This provides further validation. This hybrid recommendation engine can now get recommendations for all other users.

Summary

This chapter discussed hybrid recommendation engines and how they can overcome the shortfalls of other types of engines. It also showcased the implementation with the help of LightFM.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.239.105