Methods for recommendation

In the previous section, we described the use case of building a movie recommendation engine for the company ZHO and also prepared SPSS on the Spark computing platform. In this section, as before, we need to select our analytical methods (equations) for this movie recommendation project, which again means mapping our use case to machine learning methods.

For this exercise, we will use collaborative filtering because this analytical method is well developed and tested on many recommendation projects. At the same time, analytical processes and related algorithms are also well-developed for this method, which are available in R as well as MLlib.

By following the same methodology, once we finalize our decision for analytical methods or models, we will then need to prepare the coding.

Collaborative filtering

Collaborative filtering is a method used very commonly to build recommender systems. Simply speaking, collaborative filtering is an analytical method of producing predictions (filtering) about the interests of a user with preferences of many other users (collaborating). The underlying assumption of this analytical approach is as follows:

If user A has the same opinion as user B on a movie, user A is more likely to have user B's opinion on a different movie x than to have the opinion on x of another user chosen randomly.

Specifically, the techniques of collaborative filtering here aim to fill in the missing entries of a user-movie association matrix. MLlib currently supports model-based collaborative filtering, in which users and movies are modeled by a set of latent factors that can be used to predict missing entries.

MLlib uses the Alternating Least Squares (ALS) algorithm to learn these latent factors. Its implementation in MLlib has the following parameters:

  • numBlocks is the number of blocks used to parallelize computation (set to -1 to autoconfigure)
  • rank is the number of latent factors in the model
  • iterations is the number of iterations to run
  • lambda specifies the regularization parameter in ALS
  • implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for an implicit feedback data
  • alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations

    Note

    The standard approach to matrix factorization-based collaborative filtering treats the entries in the user-item matrix as explicit preferences given by the user to the item. However, it is common in many real-world use cases to only have access to implicit feedback (for example, views, clicks, purchases, likes, shares, and so on). Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data as a combination of binary preferences and confidence values. The ratings are then related to the level of confidence in observed user preferences rather than the explicit ratings given to the items.

    A detailed MLlib guide to collaborating filtering can be founded at http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html.

Preparing coding

In our data, each row consists of a user, a movie, and a rating. Here, we will use the default ALS.train() method with the ratings assumed as explicit. The recommendations are evaluated by measuring the Mean Squared Error of rating prediction. Take a look at the following code:

# Build the recommendation model using Alternating Least Squares
rank = 10
numIterations = 10
model = ALS.train(ratings, rank, numIterations)

# Evaluate the model on training data
testdata = ratings.map(lambda p: (p[0], p[1]))
predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2]))
ratesAndPreds = ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions)
MSE = ratesAndPreds.map(lambda r: (r[1][0] - r[1][1])**2).mean()
print("Mean Squared Error = " + str(MSE))

If the rating matrix is derived from another source of information, you can use the trainImplicit method to get better results, as follows:

# Build the recommendation model using Alternating Least Squares based on implicit ratings
model = ALS.trainImplicit(ratings, rank, numIterations, alpha=0.01)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.137.12