Chapter 11. Recommendation Systems

In our final chapter, we'll tackle one of the most ubiquitous problems prevalent in the e-commerce world, namely that of making effective product recommendations to customers. Recommendation systems, also referred to as recommender systems, often rely on the notion of similarity between objects, in an approach known as collaborative filtering. Its basic premise is that customers can be considered similar to each other if they share most of the products that they have purchased; equally, items can be considered similar to each other if they share a large number of customers who purchased them.

There are a number of different ways to quantify this notion of similarity, and we will present some of the commonly used alternatives. Whether we want to recommend movies, books, hotels, or restaurants, building a recommender system often involves dealing with very large data sets. Consequently, we'll introduce a few ideas and options for working with Big Data using R to wrap up our exploration of building predictive models in R.

Rating matrix

A recommendation system usually involves having a set of users U = {u1, u2, …, um} that have varying preferences on a set of items I = {i1, i2, …, in}. The number of users |U| = m is different from the number of items |I| = n in the general case. In addition, users can often express their preference by rating items on some scale. As an example, we can think of the users as being restaurant patrons in a city, and the items being the restaurants that they visit. Under this setup, the preferences of the users could be expressed as ratings on a five star scale. Of course, our generalization does not require that the items be physical items or that the users be actual people—this is simply an abstraction for the recommender system problem that is commonly used.

As an illustration, think of a dating website in which users rate other users; here, the items that are being rated are the profiles of the actual users themselves. Let's return to our example of a restaurant recommender system and build some example data. A natural data structure that is popularly used for recommendation systems is the rating matrix. This is an m × n matrix where the rows represent the users and the columns represent the items. Each entry, ei,j, of the matrix represents the rating made by the user i for item j. What follows is a simple example:

> oliver   <- c(1,1,2,5,7,8,9,7)
> thibault <- c(5,9,4,1,1,7,5,9)
> maria    <- c(1,4,2,5,8,6,2,8)
> pedro    <- c(2,6,7,2,6,1,8,9)
> ines     <- c(1,3,2,4,8,9,7,7)
> gertrude <- c(1,6,5,7,3,2,5,5)
> ratingMatrix <- rbind(oliver, thibault, maria, pedro, ines,  
                        gertrude)
> colnames(ratingMatrix) <- c("Berny's", "La Traviata", "El Pollo 
  Loco", "Joey's Pizza", "The Old West", "Jake and Jill", "Full 
  Moon", "Acropolis")
> ratingMatrix
         Berny's La Traviata El Pollo Loco Joey's Pizza
oliver         1           1             2            5
thibault       5           9             4            1
maria          1           4             2            5
pedro          2           6             7            2
ines           1           3             2            4
gertrude       1           6             5            7
         The Old West Jake and Jill Full Moon Acropolis
oliver              7             8         9         7
thibault            1             7         5         9
maria               8             6         2         8
pedro               6             1         8         9
ines                8             9         7         7
gertrude            3             2         5         5

Here, we have used a 10-point scale as a rating system, where 10 is the highest rating and 1 is the lowest. An alternative rating scale is a binary rating scale where 1 indicates a positive rating and 0 indicates a negative rating. This second approach would yield a binary rating matrix. How might we be able to use this rating matrix in order to inform a simple recommender system for other users?

Concretely, suppose that a new user, Silvan, has rated a few restaurants and we would like to make a recommendation for a suitable restaurant to which he has not been. Alternatively, we might want to propose a list of top three restaurants or even predict whether Silvan will like a specific restaurant he is currently considering.

One way to think about this problem is to find users that have similar views as Silvan on the restaurants that he has already rated. Then, we could use their ratings on restaurants that Silvan has not yet rated in order to predict Silvan's rating for those restaurants. This seems promising, but we should first think about how we might quantify this notion of similarity between two users based on their item ratings.

Measuring user similarity

Even with a very large database of users, chances are that for a real-world recommender system, it will be rare—if not massively unlikely—to find two people that would rate all the items in our item set with the exact same score. That being said, we can still say that some users are more similar than others based on how they rate different items. For example, in our restaurant rating matrix, we can see that Ines and Oliver rated the first four restaurants poorly and the last four restaurants highly and so their tastes can be considered far more similar compared to a pair like Thibault and Pedro, who sometimes agree and sometimes have completely opposite views on a particular restaurant.

By representing a user as their particular row in the rating matrix, we can think of a user as being a vector in an n dimensional space, n being the number of items. Thus we can use different distance measures appropriate for vectors in order to measure the similarity of two different users. Note that the notion of distance is inversely proportional to the notion of similarity and we can thus use measures of distance as measures of similarity by interpreting a large distance between two vectors as analogous to a low similarity score.

The most familiar distance metric for two vectors a and b is the Euclidean distance:

Measuring user similarity

We can use R's built-in dist() function to compute all the pair-wise distances in our rating matrix as follows:

> dist(ratingMatrix, method = 'euclidean')
            oliver   thibault     maria     pedro      ines
thibault 12.529964                                        
maria     8.000000 11.000000                              
pedro    10.723805  9.899495 10.246951                    
ines      3.316625 11.224972  6.082763 10.583005          
gertrude 10.488088 10.344080  8.717798  8.062258 10.440307

The result is a lower triangular matrix because the Euclidean distance is a symmetric function. Thus, the entry for (maria, pedro) is exactly the same as for (pedro, maria) and so we only need to display one of these. Here, we can explicitly see that Ines and Oliver are the two most similar users as the distance between them is smallest. Note that we can also talk about the distances between items in terms of the similarity of the ratings they received from different users. All we have to do to compute this is to transpose the rating matrix:

> dist(t(ratingMatrix), method = 'euclidean')
                Berny's La Traviata El Pollo Loco Joey's Pizza
La Traviata    8.366600                                       
El Pollo Loco  6.708204    5.744563                           
Joey's Pizza   9.643651    9.949874      7.745967             
The Old West  13.038405   12.247449     10.535654     7.810250
Jake and Jill 12.000000   11.575837     12.449900     9.848858
Full Moon     12.369317   10.246951      8.717798     9.486833
Acropolis     14.212670    8.831761     10.723805    11.789826
              The Old West Jake and Jill Full Moon
La Traviata                                       
El Pollo Loco                                     
Joey's Pizza                                      
The Old West                                      
Jake and Jill     8.246211                        
Full Moon         8.062258      9.110434          
Acropolis         8.831761      9.273618  7.549834

As we can see, the two most dissimilar restaurants (that is to say, those with the largest difference between them) are the Acropolis and Berny's. Looking back at the rating matrix, we should easily convince ourselves why this is the case. The former restaurant has received largely positive reviews across our user base whereas the reviews have been poor for the latter.

A commonly used alternative to the Euclidean distance (or L2 norm, as it is also known) is the cosine distance. This metric measures the cosine of the smallest angle between two vectors. If the vectors are parallel to each other, meaning that their angle is 0, then the cosine distance is 0 as well. If the two vectors are at a right angle to each other, then they have the largest distance according to this metric. The cosine distance is given by:

Measuring user similarity

Here, the numerator is the dot product between the two vectors and the denominator is the product of the magnitudes (typically computed via the L2 norm) of the two vectors. The cosine distance isn't available as a method in the dist() function of R's base distribution, but we can install the proxy package, which enhances this function with a number of new distance metrics in order to compute the cosine distances for our rating matrix:

> library("proxy")
> dist(ratingMatrix, method = 'cosine')
             oliver   thibault      maria      pedro       ines
thibault 0.28387670                                            
maria    0.12450495 0.23879093                                 
pedro    0.20947046 0.17687385 0.20854178                      
ines     0.02010805 0.22821528 0.06911870 0.20437426           
gertrude 0.22600742 0.21481973 0.19156876 0.12227138 0.22459114

Suppose instead that our users rated restaurants on a binary scale. We can convert our rating matrix into a binary rating matrix by considering all ratings above 5 to be positive and assigning them a new score of 1. The remaining ratings are all converted to a score of 0. For two binary vectors, the Jaccard similarity is given by the cardinality of the logical intersection divided by the cardinality of the logical union. The Jaccard distance is then computed as 1 minus this:

Measuring user similarity

In a nutshell, what this is computing is one minus the ratio of the number of positions in which the two vectors both have a positive rating over the total number of positions in which either of the two vectors have a positive rating. Two binary vectors that agree in all their positive positions will be identical and thus have a distance of 0. Using the proxy package, we can show the Jaccard distance for our restaurant patrons as follows:

> binaryRatingMatrix <- ratingMatrix > 5
> dist(binaryRatingMatrix, method = 'jaccard')
            oliver  thibault     maria     pedro      ines
thibault 0.6000000                                        
maria    0.2500000 0.5000000                              
pedro    0.5000000 0.6666667 0.6666667                    
ines     0.0000000 0.6000000 0.2500000 0.5000000          
gertrude 1.0000000 0.7500000 1.0000000 0.8333333 1.0000000

Note

The study of measurement and distance metrics is broad and there are many suitable metrics that have been applied to the recommender system setting. The definitive reference for distance metrics is the Encyclopedia of Distances, Michel Marie Deza and Elena Deza, Springer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.239.226