Netflix – making recommendations 

No ML book is complete without mentioning recommendation engines—probably one of the most well known applications of ML. In part, this is thanks to the publicity gained when Netflix announced a $1 million competition for movie rating predictions, also known as recommendations. Add to this Amazon's commercial success in making use of it. 

The goal of recommendation engines is to predict the likelihood of someone wanting a particular product or service. In the context of Netflix, this would mean recommending movies or TV shows to its users.

One intuitive way of making recommendations is to try and mimic the real world, where a person is likely to seek recommendations from like-minded people. What constitutes likeness is dependent on the domain. For example, you are most likely to have one group of friends that you would ask for restaurant recommendations and another group of friends for movie recommendations. What determines these groups is how similar their tastes are to your own taste for that particular domain. We can replicate this using the (user-based) Collaborative Filtering (CF) algorithm. This algorithm achieves this by finding the distance between each user and then using these distances as a similarity metric to infer predictions on movies for a particular user; that is, those that are more similar will contribute more to the prediction than those that have different preferences. Let's have a look at what form the data might take from Netflix: 

User Movie Rating
0: Jo A: Monsters Inc 5
B: The Bourne Identity 2
C: The Martian 2
D: Blade Runner 1
1: Sam C: The Martian 4
D: Blade Runner 4
E: The Matrix  4
F: Inception 5
2: Chris B: The Bourne Identity 4
C: The Martian 5
D: Blade Runner 5
F: Inception 4

 

For each example, we have a user, a movie, and an assigned rating. To find the similarity between each user, we can first calculate the Euclidean distance of the shared movies between each pair of users. The Euclidean distance gives us larger values for users who are most dissimilar; we invert this by dividing 1 by this distance to give us a result, where 1 represents perfect matches and 0 means the users are most dissimilar. The following is the formula for Euclidean distance and the function used to calculate similarities between two users:

Equation for Euclidian distance and similarity 
func calcSimilarity(userRatingsA: [String:Float], userRatingsB:[String:Float]) -> Float{
var distance = userRatingsA.map( { (movieRating) -> Float in
if userRatingsB[movieRating.key] == nil{
return 0
}
let diff = movieRating.value - (userRatingsB[movieRating.key] ?? 0)
return diff * diff
}).reduce(0) { (prev, curr) -> Float in
return prev + curr
}.squareRoot()
return 1 / (1 + distance)
}

To make this more concrete, let's walk through how we can find the most similar user for Sam, who has rated the following movies: ["The Martian" : 4, "Blade Runner" : 4, "The Matrix" : 4, "Inception" : 5]. Let's now calculate the similarity between Sam and Jo and then between Sam and Chris. 

Sam and Jo 

Jo has rated the movies ["Monsters Inc." : 5, "The Bourne Identity" : 2, "The Martian" : 2, "Blade Runner" : 1]; by calculating the similarity of intersection of the two sets of ratings for each user, we get a value of 0.22.

Sam and Chris 

Similar to the previous ones, but now, by calculating the similarity using the movie ratings from Chris (["The Bourne Identity" : 4, "The Martian" : 5, "Blade Runner" : 5, "Inception" : 4]), we get a value of 0.37.

Through manual inspection, we can see that Chris is more similar to Sam than Jo is, and our similarity rating shows this by giving Chris a higher value than Jo.

To help illustrate why this works, let's project the ratings of each user onto a chart as shown in the following graph:

The preceding graph shows the users plotted in a preference space; the closer two users are in this preference space, the more similar their preferences are. Here, we are just showing two axes, but, as seen in the preceding table, this extends to multiple dimensions. 

We can now use these similarities as weights that contribute to predicting the rating a particular user would give to a particular movie. Then, using these predictions, we can recommend some movies that a user is likely to want to watch.

The preceding approach is a type of clustering algorithm that falls under unsupervised learning, a learning style where examples have no associated label and the job of the ML algorithm is to find patterns within the data. Other common unsupervised learning algorithms include the Apriori algorithm (basket analysis) and K-means.

Recommendations are applicable anytime when there is an abundance of information that can benefit from being filtered and ranked before being presented to the user. Having recommendations performed on the device offers many benefits, such as being able to incorporate the context of the user when filtering and ranking the results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.122.82