Predicting the rating for the product

Let's now take this framework and use it to predict the rating for a product. We'll limit our example to three users, person X, person Y, and person Z. We'll predict the rating of a product that person X hasn't rated, but that persons Y and Z, who are very similar to X, have rated.

We'll start with our base ratings for each user, as shown in the following table:

Customers

Snarky's Potato Chips

SoSo Smooth
Lotion

Duffly
Beer

BetterTap
Water

XXLargeLivin'
Football Jersey

Snowy
Cotton
Balls

Disposos'
Diapers

X

4

3

4

Y

3.5

2.5

4

4

Z

4

3.5

4.5

4.5

Next, we'll center the ratings:

Customers

Snarky's Potato Chips

SoSo Smooth
Lotion

Duffly
Beer

BetterTap
Water

XXLargeLivin'
Football Jersey

Snowy
Cotton
Balls

Disposos'
Diapers

X

.33

-.66

.33

?

Y

0

-1

.5

.5

Z

-.125

-.625

.375

.375

Now, we'd like to know what rating user X might be likely to give Disposos' Diapers. Using the ratings from user Y and user Z, we can calculate this by taking the weighted average according to their centered cosine similarity.

Let's first get that figure:

user_x = [0,.33,0,-.66,0,33,0] 
user_y = [0,0,0,-1,0,.5,.5] 
 
cosine_similarity(np.array(user_x).reshape(1,-1), 
                  np.array(user_y).reshape(1,-1)) 

The preceding code results in the following output:

Now, let's get that figure for user Z:

user_x = [0,.33,0,-.66,0,33,0] 
user_z = [0,-.125,0,-.625,0,.375,.375] 
 
cosine_similarity(np.array(user_x).reshape(1,-1), 
                  np.array(user_z).reshape(1,-1)) 

The preceding code results in the following output:

So, now we have a figure for the similarity between user X and user Y (0.42447212) and user Z (0.46571861).

Putting it all together, we weight each users rating by their similarity to X, and then divide by the total similarity, as follows:

(.42447212 * (4) + .46571861 * (4.5) ) / (.42447212 + .46571861) = 4.26

And we can see that the expected rating of user X for Disposos' Diapers is 4.26. (Better send a coupon!)

Now, so far, we've looked only at user-to-user collaborative filtering, but there's another method we can use. In practice, this method outperforms user-to-user filtering; it's called item-to-item filtering. Here's how the method works: rather than match each user up with other similar users based on their past ratings, each rated item is compared against all other items to find the most similar ones, again using centered cosine similarity.

Let's take a look at how this would work.

Again we have a utility matrix; this time, we'll look at users' ratings of songs. The users are along the columns and the songs are along the rows, shown as follows:

Entity

U1

U2

U3

U4

U5

S1

2

4

5

S2

3

3

S3

1

5

4

S4

4

4

4

S5

3

5

Now, suppose we would like to know the rating that user 3 will assign to song 5. Instead of looking for similar users, we'll look for songs that are similar based upon how they were rated across the users.

Let's see an example.

First, we start by centering each song row, and calculating the cosine similarity for each versus our target row, which is S5, shown as follows:

Entity

U1

U2

U3

U4

U5

CntrdCoSim

S1

-1.66

.33

1.33

.98

S2

0

0

0

S3

-2.33

1.66

.66

.72

S4

0

0

0

0

S5

-1

?

1

1

You can see the far right column has been calculated with the centered cosine similarity for each row versus row S5.

We next need to select a number, k, that's the number of the nearest neighbors we'll use to rate songs for user 3. We use k = 2 in our simple example.

You can see that song S1 and song S3 are the most similar, so we'll use those two along with the ratings user 3 had for S1 and S3 (4 and 5, respectively).

Let's now calculate the rating:

(.98 * (4) + .72 * (5)) / (.98 + .72) = 4.42

So, based on this item-to-item collaborative filtering, we can see user 3 is likely to rate song S5 very highly at 4.42 from our calculations.

Earlier, I said that user-to-user filtering is less effective than item-to-item filtering. Why might that be?

There's a good chance you have friends who really enjoy some of things that you enjoy as well, but then each of you has other areas of interest that the other has absolutely no interest in.

For example, perhaps you both love Game of Thrones, but your friend also loves Norwegian death metal. You, however, would rather be dead than listen to Norwegian death metal. If you're similar in many ways—excluding the death metal—with user-to-user recommendations, you're still going to see a lot of recommendations for bands with names that include words such as flaming, axe, skull, and bludgeon. With item-to-item filtering, most likely, you would be spared those suggestions.

So far, we've looked at users and items as a single entity when making comparisons, but now let's move on to look at another method that decomposes our users and items into what might be called feature baskets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.249.42