Evaluation

You might be wondering how to make sure that the returned recommendations make any sense. The only way to really be sure about how effective recommendations are is to use A/B testing in a live system, with real users. For example, the A group receives a random item as a recommendation, while the B group receives an item that's recommended by our engine.

As this is not always possible (nor practical), we can get an estimate with offline statistical evaluation. One way to proceed is to use k-fold cross-validation, which was introduced in Chapter 1, Applied Machine Learning Quick Start. We partition a dataset into multiple sets; some are used to train our recommendation engine, and the rest are used to test how well it recommends items to unknown users.

Mahout implements the RecommenderEvaluator class, which splits a dataset in two parts. The first part (90%, by default) is used to produce recommendations, while the rest of the data is compared against estimated preference values in order to test the match. The class does not accept a recommender object directly; you need to build a class that's implementing the RecommenderBuilder interface instead, which builds a recommender object for a given DataModel object that is then used for testing. Let's take a look at how this is implemented.

First, we create a class that implements the RecommenderBuilder interface. We need to implement the buildRecommender method, which will return a recommender, as follows:

public class BookRecommender implements RecommenderBuilder  { 
  public Recommender buildRecommender(DataModel dataModel) { 
    UserSimilarity similarity =  
      new PearsonCorrelationSimilarity(model); 
    UserNeighborhood neighborhood =  
      new ThresholdUserNeighborhood(0.1, similarity, model); 
    UserBasedRecommender recommender =  
      new GenericUserBasedRecommender( 
        model, neighborhood, similarity); 
    return recommender; 
  } 
}

Now that we have a class that returns a recommender object, we can initialize a RecommenderEvaluator instance. The default implementation of this class is the AverageAbsoluteDifferenceRecommenderEvaluator class, which computes the average absolute difference between the predicted and actual ratings for users. The following code shows how to put the pieces together and run a hold-out test:

First, load a data model, as follows:

DataModel dataModel = new FileDataModel( 
  new File("/path/to/dataset.csv"));

Next, initialize an evaluator instance, as follows:

RecommenderEvaluator evaluator =  
  new AverageAbsoluteDifferenceRecommenderEvaluator();

Initialize the BookRecommender object, implementing the RecommenderBuilder interface, as follows:

RecommenderBuilder builder = new MyRecommenderBuilder();

Finally, call the evaluate() method, which accepts the following parameters:

RecommenderBuilder: This is the object implementing the RecommenderBuilder that can build the recommender to test
DataModelBuilder: This indicates the DataModelBuilder to use; if null, a default DataModel implementation will be used
DataModel: This is the dataset that will be used for testing
trainingPercentage: This indicates the percentage of each user's preferences to use to produce recommendations; the rest are compared to estimated preference values in order to evaluate the performance of the recommender
evaluationPercentage: This is the percentage of users to be used in the evaluation

The method is called as follows:

double result = evaluator.evaluate(builder, null, model, 0.9, 
   1.0); 
System.out.println(result);

The method returns a double, where 0 represents the best possible evaluation, meaning that the recommender perfectly matches user preferences. In general, the lower the value, the better the match.

Table of Contents for Evaluation

Create new playlist

Sign In

Sign Up

Table of Contents for
Evaluation