© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. Kulkarni et al.Applied Recommender Systems with Pythonhttps://doi.org/10.1007/978-1-4842-8954-9_10

10. Graph-Based Recommender Systems

Akshay Kulkarni1  , Adarsha Shivananda2, Anoosh Kulkarni3 and V Adithya Krishnan4
(1)
Bangalore, Karnataka, India
(2)
Hosanagara tq, Shimoga dt, Karnataka, India
(3)
Bangalore, India
(4)
Navi Mumbai, India
 

The previous chapter covered deep learning-based recommender systems and explained how to implement end-to-end neural collaborative filtering. This chapter explores another recent advanced method: graph-based recommendation systems powered by knowledge graphs.

Figure 10-1 illustrates a graph-based recommendation system for movie recommendations.

A structural representation of a recommender system for movie recommendations. The structure of the network exhibits the relationship between the user and the items.

Figure 10-1

Graph-based movie recommendations

In graph-based recommendation systems, knowledge graph structures represent relationships between users and items. A knowledge graph is the structure of a network of interconnected datasets, enriched with semantics, that is visualized graphically by illustrating the relationship between multiple entities. The graph structure, when visualized, has three primary components; nodes, edges, and labels. The edge of the link defines the relationship between two nodes/entities, where each node can be any object, user, item, place, and so forth. The underlying semantics provide an additional dynamic context to the defined relationships, enabling more complex decision-making.

Figure 10-2 shows a simple one-to-one relationship in a knowledge graph structure.

A knowledge graph of the link between A, B, and C. A, B, and C represent the subject, predicate, and object respectively.

Figure 10-2

Simple knowledge graph connection

Figure 10-3 explains knowledge graphs.

A knowledge graph with deep dynamic context is obtained from the combination of data, graph, and semantics.

Figure 10-3

Knowledge graphs explained

This chapter uses Neo4j to implement the knowledge graphs. Neo4j is one of the best graph databases in the market today. It is a high-performance graph store with a user-friendly query language that is highly scalable and robust.

The knowledge graphs will fetch similar users for the required recommendations.

Implementation

The following installs and imports the required libraries.
# Installing required packages
!pip install py2neo
!pip install openpyxl –upgrade
!pip install neo4j
!pip install neo4jupyter
#Importing the required libraries
import pandas as pd
from neo4j import GraphDatabase, basic_auth
from py2neo import Graph
import re
import neo4jupyter

Before establishing the connection between Neo4j and the notebook, create a new sandbox in Neo4j at https://neo4j.com/sandbox/.

Once the sandbox is created, you must change the URL and the password.

You can find them in the connection details, as shown in Figure 10-4.

A screenshot of connection details includes the username, password, I P address, H T T P port, bolt port, bolt U R L, and web socket bolt U R L.

Figure 10-4

Connection details

Let’s establish a connection between Neo4j and the Python notebook.
# establishing the connection
g = Graph("bolt://44.192.55.13:7687", password = "butter-ohms-chairman")
# The url "bolt://34.201.241.51:7687" needs to be replaced in case of new sandbox creation in neo4j.
# The credentials "neo4j" and "whirls-bullet-boils" also need a replacement for each use case.
driver = GraphDatabase.driver(
  "bolt://44.192.55.13:7687",
  auth=basic_auth("neo4j", "butter-ohms-chairman"))
def execute_transactions(transaction_execution_commands):
    # Establishing connection with database
    data_base_connection = GraphDatabase.driver("bolt://44.192.55.13:7687",
    auth=basic_auth("neo4j", "butter-ohms-chairman"))
    # Creating a session
    session = data_base_connection.session()
    for i in transaction_execution_commands:
        session.run(i)
Let’s import the data.
# This dataset consists of transactions which will be used to establish relationship between the customer and the stock.
df = pd.read_excel(r'Rec_sys_data.xlsx')
# Little bit of preprocessing so that we can easily run NoSQL queries.
df['CustomerID'] = df['CustomerID'].apply(str)
# This dataset contains detailed information about each stock which will be used to link stockcodes and their description/title.
df1 = pd.read_excel('Rec_sys_data.xlsx','product')
df1.head()
Figure 10-5 shows the df1 output (first five rows).

A screenshot of the outcomes of d f 1. It includes stock code, product name, description, category, brand, and unit price.

Figure 10-5

The output

Let’s upload the entities into the Neo4j database.

To implement knowledge graphs in Neo4J, the DataFrame must be converted into a relational database. First, customers and stocks must be converted into entities (or nodes of a graph) to build a relationship between them.
#creating a list of all unique customer IDs
customerids = df['CustomerID'].unique().tolist()
# storing all the create commands to be executed into create_customers list
create_customers = []
for i in customerids:
  # example of create statement "create (n:entity {property_key : '12345'})"
    statement = "create (c:customer{cid:"+ '"' + str(i) + '"' +"})"
    create_customers.append(statement)
# running all the queries into neo4j to create customer entities
execute_transactions(create_customers)
Once the customer nodes are done, create nodes for the stock.
#  creating a list of all unique stockcodes
stockcodes = df['StockCode'].unique().tolist()
# storing all the create commands to be executed into the create_stockcodes list
create_stockcodes = []
for i in stockcodes:
  # example of create statement "create (m:entity {property_key : 'XYZ'})"
    statement = "create (s:stock{stockcode:"+ '"' + str(i) + '"' +"})"
    create_stockcodes.append(statement)
# running all the queries into neo4j to create stock entities
execute_transactions(create_stockcodes)

Next, create a link between the stock codes and title, which are needed to recommend items.

For this, let’s create another property key called 'Title' into the existing stock entity in our Neo4j database.
#creating a blank dataframe
df2 = pd.DataFrame(columns = ['StockCode', 'Title'])
#Converting stockcodes to string in both the dataframe
df['StockCode'] = df['StockCode'].astype(str)
df1['StockCode'] = df1['StockCode'].astype(str)
# This cell of code will add all the unique stockcodes along with their title in df2
stockcodes = df['StockCode'].unique().tolist()
for i in range(len(stockcodes)):
    dict_temp = {}
    dict_temp['StockCode'] = stockcodes[i]
    dict_temp['Title'] = df1[df1['StockCode']==stockcodes[i]]['Product Name'].values
    temp_Df = pd.DataFrame([dict_temp])
    df2 = df2.append(temp_Df)
df2= df2.reset_index(drop=True)
# Doing some data preprocessing such that these queries can be run in neo4j
df2['Title'] = df2['Title'].apply(str)
df2['Title'] = df2['Title'].map(lambda x: re.sub(r'W+', ' ', x))
df2['Title'] = df2['Title'].apply(str)
# This query will add the 'title' property key to each stock entity in our neo4j database
for i in range(len(df2)):
  query = """
  MATCH (s:stock {stockcode:""" + '"' + str(df2['StockCode'][i]) + '"' + """})
  SET s.title ="""+ '"' + str(df2['Title'][i]) + '"' + """
  RETURN s.stockcode, s.title
  """
  g.run(query)

Create a relationship between customers and stocks.

Since all the transactions are in the dataset, the relation is already known and present. To convert it into an RDS, cipher queries must be run in Neo4j to build the relationship.
# Storing transaction values in a list
transaction_list = df.values.tolist()
# storing all commands to build relationship in an empty list relation
relation = []
for i in transaction_list:
  # the 9th column in df is customerID and 2nd column is stockcode which we are appending in the statement
    statement = """MATCH (a:customer),(b:stock) WHERE a.cid = """+'"' + str(i[8])+ '"' + """ AND b.stockcode = """ + '"' + str(i[1]) + '"' + """ CREATE (a)-[:bought]->(b) """
    relation.append(statement)
execute_transactions(relation)

Next, let’s find similarities between users using the relationship created.

The Jaccard similarity can be calculated as the ratio between the intersection and the union of two sets. It is a measure of similarity, and as it is a percentage value, it ranges between 0% to 100%. More similar sets have a higher value.
def similar_users(id):
  # This query will find users who have bought stocks in common with the customer having id specified by user
  # Later we will find jaccard index for each of them
  # We wil return the neighbors sorted by jaccard index in descending order
    query = """
  MATCH (c1:customer)-[:bought]->(s:stock)<-[:bought]-(c2:customer)
  WHERE c1 <> c2 AND c1.cid =""" + '"' + str(id) +'"' """
  WITH c1, c2, COUNT(DISTINCT s) as intersection
  MATCH (c:customer)-[:bought]->(s:stock)
  WHERE c in [c1, c2]
  WITH c1, c2, intersection, COUNT(DISTINCT s) as union
  WITH c1, c2, intersection, union, (intersection * 1.0 / union) as jaccard_index
  ORDER BY jaccard_index DESC, c2.cid
  WITH c1, COLLECT([c2.cid, jaccard_index, intersection, union])[0..15] as neighbors
  WHERE SIZE(neighbors) = 15   // return users with enough neighbors
  RETURN c1.cid as customer, neighbors
  """
    neighbors = pd.DataFrame([['CustomerID','JaccardIndex','Intersection','Union']])
    for i in g.run(query).data():
    neighbors = neighbors.append(i["neighbors"])
    print(" ----------- customer's 15 nearest neighbors --------- ")
    print(neighbors)
The following is a sample output.
similar_users('12347')
Figure 10-6 shows the output of users similar to customer 12347.

A sample output of users similar to customer 12347. Data of the customer's 15 nearest neighbors are exposed. The data includes customer I D, the Jaccard index, intersection, and union.

Figure 10-6

The output

similar_users(' 17975')
Figure 10-7 shows the output of users similar to customer 17975.

A sample output of users similar to customer 17975. Data of the customer's 15 nearest neighbors are exposed. The data includes customer I D, Jaccard index, intersection, and union.

Figure 10-7

The output

Now let’s recommend the product based on similar users.
def recommend(id):
  # The query below is same as similar_users function
  # It will return the most similar customers
    query1 = """
    MATCH (c1:customer)-[:bought]->(s:stock)<-[:bought]-(c2:customer)
    WHERE c1 <> c2 AND c1.cid =""" + '"' + str(id) +'"' """
    WITH c1, c2, COUNT(DISTINCT s) as intersection
    MATCH (c:customer)-[:bought]->(s:stock)
    WHERE c in [c1, c2]
    WITH c1, c2, intersection, COUNT(DISTINCT s) as union
    WITH c1, c2, intersection, union, (intersection * 1.0 / union) as jaccard_index
    ORDER BY jaccard_index DESC, c2.cid
    WITH c1, COLLECT([c2.cid, jaccard_index, intersection, union])[0..15] as neighbors
    WHERE SIZE(neighbors) = 15   // return users with enough neighbors
    RETURN c1.cid as customer, neighbors
    """
    neighbors = pd.DataFrame([['CustomerID','JaccardIndex','Intersection','Union']])
    neighbors_list = {}
    for i in g.run(query1).data():
    neighbors = neighbors.append(i["neighbors"])
    neighbors_list[i["customer"]] = i["neighbors"]
    print(neighbors_list)
    # From the neighbors_list returned, we will fetch the customer ids of those neighbors to recommend items
    nearest_neighbors = [neighbors_list[id][i][0] for i in range(len(neighbors_list[id]))]
    # The below query will fetch all the items boughts by nearest neighbors
    # We will remove the items which have been already bought by the target customer
    # Now from the filtered set of items, we will count how many times each item is repeating within the shopping carts of nearest neighbors
    # We will sort that list on count of repititions and return in descending order
    query2 = """
        // get top n recommendations for customer from their nearest neighbors
        MATCH (c1:customer),(neighbor:customer)-[:bought]->(s:stock)    // all items bought by neighbors
        WHERE c1.cid = """ + '"' + str(id) + '"' """
          AND neighbor.cid in $nearest_neighbors
          AND not (c1)-[:bought]->(s)                    // filter for items that our user hasn't bought before
        WITH c1, s, COUNT(DISTINCT neighbor) as countnns // times bought by nearest neighbors
        ORDER BY c1.cid, countnns DESC
        RETURN c1.cid as customer, COLLECT([s.title, s.stockcode, countnns])[0..$n] as recommendations
        """
    recommendations = pd.DataFrame([['Title','StockCode','Number of times bought by neighbors']])
    for i in g.run(query2, id = id, nearest_neighbors = nearest_neighbors, n = 5).data():
    #recommendations[i["customer"]] = i["recommendations"]
    recommendations = recommendations.append(i["recommendations"])
    # We will also print the items bought earlier by the target customer
    print(" ---------- Top 8 StockCodes bought by customer " + str(id) + " ----------- ")
    print(df[df['CustomerID']==id][['CustomerID','StockCode','Quantity']].nlargest(8,'Quantity'))
    bought = df[df['CustomerID']==id][['CustomerID','StockCode','Quantity']].nlargest(8,'Quantity')
    print(' -------Product Name of bought StockCodes ------ ')
    print((df1[df1.StockCode.isin(bought.StockCode)]['Product Name']).to_string())
    # Here we will print the recommendations
    print("------------ Recommendations for Customer {} ------- ".format(id))
    print(recommendations.to_string())
This function gets the following information.
  • The top eight stock codes and product names bought by a particular customer

  • Recommendations for the same customer and the number of times neighbors bought the same item

The following steps are followed to get to the desired.
  1. 1.

    Get the most similar customers for the given customer.

     
  2. 2.

    Fetch all the items bought by the nearest neighbors and remove the items the target customer has already bought.

     
  3. 3.

    From the filtered set of items, count the number of times each item is repeating within the shopping carts of nearest neighbors and then sort that list on the count of repetitions and return in descending order.

     
Now, let’s try customer 17850.
recommend('17850')
Figure 10-8 shows the recommendations output for customer 17850.

A sample output for customer 17850 with recommendations for the customer along with the data of top 8 stock codes bought by the customer and product name of bought stock codes.

Figure 10-8

The output

Next, let’s try it on customer 12347.
recommend(' 12347')
Figure 10-9 shows the recommendations output for customer 12347.

A sample output for customer 12347 with recommendations for the customer along with the data of the top 8 stock codes bought by the customer and the product name of bought stock codes.

Figure 10-9

The output

Summary

This chapter briefly covered knowledge graphs and how graph-based recommendation engines work. You saw an actual implementation of an end-to-end graph-based recommender system using Neo4j knowledge graphs. The concepts used are very new and advanced and have become popular recently. Big players like Netflix and Amazon are shifting to graph-based systems for their recommendations; hence, the approach is very relevant and must-know.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.121.209