Predicting recommendations for movies and jokes

In this chapter, we will focus on building recommender systems using two different data sets. To do this, we shall use the recommenderlab package. This provides us with not only the algorithms to perform the recommendations, but also with the data structures to store the sparse rating matrices efficiently. The first data set we will use contains anonymous user reviews for jokes from the Jester Online Joke recommender system.

The joke ratings fall on a continuous scale (-10 to +10). A number of data sets collected from the Jester system can be found at http://eigentaste.berkeley.edu/dataset/. We will use the data set labeled on the website as Dataset 2+. This data set contains ratings made by 50,692 users on 150 jokes. As is typical with a real-world application, the rating matrix is very sparse in that each user rated only a fraction of all the jokes; the minimum number of ratings made by a user is 8. We will refer to this data set as the jester data set.

The second data set can be found at http://grouplens.org/datasets/movielens/. This website contains data on user ratings for movies that were made on the MovieLens website at http://movielens.org. Again, there is more than one data set on the website; we will use the one labeled MovieLens 1M. This contains ratings on a five-point scale (1-5) made by 6,040 users on 3,706 movies. The minimum number of movie ratings per user is 20. We will refer to this data set as the movie data set.

Tip

These two data sets are actually very well-known open source data sets, to the point that the recommenderlab package itself includes smaller versions of them as part of the package itself. Readers who would like to skip the process of loading and preprocessing the data, or who would like to run the examples that follow on smaller data sets due to computational constraints are encouraged to try them out using data(Jester5k) or data(MovieLense).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.53.93