Recommendation system (Advanced)

In this recipe, we will implement a recommendation system. Typically, there are two ways to provide recommendations: items frequently bought together, and customers who bought this item also bought… The first problem can be solved with association rules, as shown in the Association rules recipe. The second question is trickier, and this recipe will show you how to address it with a technique called collaborative filtering.

Our task will be to implement a movie recommendation system. The dataset consists of user ratings on scale 1-10 for the first 87 movies in the IMDB Top 250 List. If a user provides a rating for a couple of movies, the system can say: "Users who liked these movies, also liked…".

Note

The presented approach is inspired by Marina Barsky's lecture materials (http://csci.viu.ca/~barskym/).

Getting ready

The dataset is available in the source code bundle. The first file, movieRatings.arff, contains user ratings for each movie on the scale of 1-10. Each attribute corresponds to a movie, while each data line corresponds to user ratings. If a user rating is missing, then the rating is 0.

The second file, user.arff, has the same attribute structure, but a single data line that corresponds to ratings of the current user. Our task is to provide recommendations to this user.

How to do it...

The plan is as follows. First, we will load the movieRatings.arff and user.arff datasets into Weka. Then, we will find five users in movieRatings.arff that have a similar taste as our current user in user.arff. Finally, we will rank the movies, which have not yet been rated by the current user:

import weka.core.*;
import weka.core.converters.ConverterUtils.DataSource;
import weka.core.neighboursearch.LinearNNSearch;
import java.io.*;

import java.util.*;
public class Recommender {
  public static void main(String[] args) throws Exception {


    DataSource source = new DataSource("dataset/movieRatings.arff");
    Instances dataset = source.getDataSet();
    
    source = new DataSource("dataset/user.arff");
    Instances userRating = source.getDataSet();
    Instance userData = userRating.firstInstance();

    LinearNNSearch kNN = new LinearNNSearch(dataset);
    Instances neighbors = null;
    double[] distances = null;

    try {
      neighbors = kNN.kNearestNeighbours(userData, 5);
      distances = kNN.getDistances();
    } catch (Exception e) {
      System.out.println("Neighbors could not be found.");
      return;
    }

    double[] similarities = new double[distances.length];
    for (int i = 0; i < distances.length; i++) {
      similarities[i] = 1.0 / distances[i];
    }

    Enumeration nInstances = neighbors.enumerateInstances();

    Map<String, List<Integer>> recommendations = new HashMap<String, List<Integer>>();
    for(int i = 0; i < neighbors.numInstances(); i++){
     Instance currNeighbor = neighbors.get(i);

      for (int j = 0; j < currNeighbor.numAttributes(); j++) {
        if (userData.value(j) < 1) {
          String attrName = userData.attribute(j).name();
          List<Integer> lst = new ArrayList<Integer>();
          if (recommendations.containsKey(attrName)) {
            lst = recommendations.get(attrName);
          }
          
          lst.add((int)currNeighbor.value(j));
          recommendations.put(attrName, lst);
        }
      }

    }

    List<RecommendationRecord> finalRanks = new ArrayList<RecommendationRecord>();

    Iterator<String> it = recommendations.keySet().iterator();
    while (it.hasNext()) {
      String atrName = it.next();
      double totalImpact = 0;
      double weightedSum = 0;
      List<Integer> ranks = recommendations.get(atrName);
      for (int i = 0; i < ranks.size(); i++) {
        int val = ranks.get(i);
        totalImpact += similarities[i];
        weightedSum += (double) similarities[i] * val;
      }
      RecommendationRecord rec = new RecommendationRecord();
      rec.attributeName = atrName;
      rec.score = weightedSum / totalImpact;

      finalRanks.add(rec);
    }
    Collections.sort(finalRanks);

    // print top 3 recommendations
    System.out.println(finalRanks.get(0));
    System.out.println(finalRanks.get(1));
    System.out.println(finalRanks.get(2));
  }

The outputs are the top three movie recommendations:

The Great Dictator (1940): 1.8935866254179377
The Lord of the Rings: The Fellowship of the Ring (2001): 1.7664942763356077
Schindler s List (1993): 1.5456824917936567

The recommendations (and their score) can be now displayed to the user.

How it works...

First, make the following imports:

import weka.core.*;
import weka.core.converters.ConverterUtils.DataSource;
import weka.core.neighboursearch.LinearNNSearch;
import java.io.*;
import java.util.*;

Note, that we have imported weka.core.neighboursearch.LinearNNSearch, it will help us to find users with similar tastes.

Start a new class and import the movie ratings dataset, as well as ratings of our current user:

public class Recommender {
  public static void main(String[] args) throws Exception {

    // read learning dataset
    DataSource source = new DataSource("dataset/movieRatings.arff");
    Instances dataset = source.getDataSet();
   
    // read user data
    source = new DataSource("dataset/user.arff");
    Instances userRating = source.getDataSet();
   Instance userData = userRating.firstInstance();

Initialize a nearest-neighbor search. This is a brute force search algorithm that finds the nearest neighbors of the given instance. It is initialized with the movie ratings dataset:

    LinearNNSearch kNN = new LinearNNSearch(dataset);
    Instances neighbors = null;
    double[] distances = null;

    try {

Call the kNN.kNearestNeighbours (Instance, int) method to perform a nearest-neighbor search. We pass our current user userData instance, and specify to find five nearest neighbors:

      neighbors = kNN.kNearestNeighbours(userData, 5);

Call kNN.getDistances() to obtain the double[] table that specifies how far away the neighbors are:

      distances = kNN.getDistances();
    } catch (Exception e) {
      System.out.println("Neighbors could not be found.");
      return;
    }

We define similarity between users as 1/distance; that is, the bigger the distance, the smaller the similarity. We will use similarities to weigh how much should neighbor preferences contribute to the overall movie rating:

    double[] similarities = new double[distances.length];
    for (int i = 0; i < distances.length; i++) {
      similarities[i] = 1.0 / distances[i];
      //System.out.println(similarities[i]);
    }

Ok, now we are ready to rank the movies. We will have a double loop. The first loop will go over each neighbor, while the second loop will go over each movie. If the current user has not rated the movie, we will collect the rating provided by the neighbor and store it in HashMap<String, List<Integer>>(). HashMap consists of a String key that corresponds to a movie title, and a list of integers that contain ratings for each of the neighbors:

    Enumeration nInstances = neighbors.enumerateInstances();
    Map<String, List<Integer>> recommendations = new HashMap<String, List<Integer>>();

The first loop over each neighbor:

  for(int i = 0; i < neighbors.numInstances(); i++){
      Instance currNeighbor = neighbors.get(i);

The second loop over each movie:

      for (int j = 0; j < currNeighbor.numAttributes(); j++) {

If the movie is not ranked by the current user:

        if (userData.value(j) < 1) {

Retrieve the name of the movie:

          String attrName = userData.attribute(j).name();

Initialize a new integer list (if this is the first neighbor) or use an existing one:

          List<Integer> lst = new ArrayList<Integer>();
          if (recommendations.containsKey(attrName)) {
            lst = recommendations.get(attrName);
          }

Append the neighbor's rating to the list of ratings for the current movie:

          
          lst.add((int)currNeighbor.value(j));

Save the ratings in the hashmap:

          recommendations.put(attrName, lst);
        }
      }

    }

Ok, so now we have collected the neighbors' ratings for each of the movies. The next task to compute the move is the final recommendation score. We will simply summarize the neighbors' ratings and weigh them by similarity; more similar neighbors will contribute more to the overall recommendation score.

We create a new list of the RecommendationRecord objects (implemented in the source code bundle), which can store movie titles and scores:

    List<RecommendationRecord> finalRanks = new ArrayList<RecommendationRecord>();

Now, we loop over hashmap items – movies:

    Iterator<String> it = recommendations.keySet().iterator();
    while (it.hasNext()) {
      String atrName = it.next();
      double totalImpact = 0;
      double weightedSum = 0;

And loop over neighbors' recommendations:

      List<Integer> ranks = recommendations.get(atrName);
      for (int i = 0; i < ranks.size(); i++) {

Get a neighbor's ranking:

        int val = ranks.get(i);

Accumulate similarity weight:

        totalImpact += similarities[i];

Accumulate recommendation score:

        weightedSum += (double) similarities[i] * val;
      }

Create a new RecommendationRecord object, store the movie title and normalized score:

      RecommendationRecord rec = new RecommendationRecord();
      rec.attributeName = atrName;
      rec.score = weightedSum / totalImpact;

      finalRanks.add(rec);
    }

After all the recommendation scores are computed, sort the collection of RecommendationRecord objects (RecommendationRecord implements a comparable interface):

    Collections.sort(finalRanks);

Finally, print the top three recommendations with their scores:

    // print top 3 recommendations
    System.out.println(finalRanks.get(0));
    System.out.println(finalRanks.get(1));
    System.out.println(finalRanks.get(2));
  }

That's it. The preceding recipe can be easily modified to recommend other items such as music, books, grocery items, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.0.85