Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Finding similar users in the dataset

One of the most important tasks in building a recommendation engine is finding users that are similar. This guides in creating the recommendations that will be provided to these users. Let's see how to build this.

How to do it…

Create a new Python file, and import the following packages:

import json
import numpy as np

from pearson_score import pearson_score

Let's define a function to find similar users to the input user. It takes three input arguments: the database, input user, and the number of similar users that we are looking for. Our first step is to check whether the user is present in the database. If the user exists, we need to compute the Pearson correlation score between this user and all the other users in the database:
```
# Finds a specified number of users who are similar to the input user
def find_similar_users(dataset, user, num_users):
    if user not in dataset:
        raise TypeError('User ' + user + ' not present in the dataset')

    # Compute Pearson scores for all the users
    scores = np.array([[x, pearson_score(dataset, user, x)] for x in dataset if user != x])
```

The next step is to sort these scores in descending order:

    # Sort the scores based on second column
    scores_sorted = np.argsort(scores[:, 1])

    # Sort the scores in decreasing order (highest score first) 
    scored_sorted_dec = scores_sorted[::-1]

Let's extract the k top scores and return them:

    # Extract top 'k' indices
    top_k = scored_sorted_dec[0:num_users] 

    return scores[top_k]

Let's define the main function and load the input database:

if __name__=='__main__':
    data_file = 'movie_ratings.json'

    with open(data_file, 'r') as f:
        data = json.loads(f.read())

We want to find three similar users to, for example, John Carson. We do this using the following steps:

    user = 'John Carson'
    print "
Users similar to " + user + ":
"
    similar_users = find_similar_users(data, user, 3) 
    print "User			Similarity score
"
    for item in similar_users:
        print item[0], '		', round(float(item[1]), 2)

If you run this code, you will see the following printed on your Terminal:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Finding similar users in the dataset

Create new playlist

Sign In

Sign Up

Finding similar users in the dataset

How to do it…

Table of Contents for
Finding similar users in the dataset