Content-based filtering

Content-based filtering is out of the scope of the Mahout framework, mainly because it is up to you to decide how to define similar items. If we want to do a content-based item similarity, we need to implement our own ItemSimilarity. For instance, in our book's dataset, we might want to make up the following rule for book similarity:

  • If the genres are the same, add 0.15 to similarity
  • If the author is the same, add 0.50 to similarity

We can now implement our own similarity measure, as follows:

class MyItemSimilarity implements ItemSimilarity { 
 ... 
 public double itemSimilarity(long itemID1, long itemID2) { 
  MyBook book1 = lookupMyBook (itemID1); 
  MyBook book2 = lookupMyBook (itemID2); 
  double similarity = 0.0; 
  if (book1.getGenre().equals(book2.getGenre())  
   similarity += 0.15; 
  } 
  if (book1.getAuthor().equals(book2. getAuthor ())) { 
   similarity += 0.50; 
  } 
  return similarity; 
 } 
 ... 
} 

We can then use this ItemSimilarity, instead of something like LogLikelihoodSimilarity, or other implementations with a GenericItemBasedRecommender. That's about it. This is as far as we have to go to perform content-based recommendations in the Mahout framework.

What we saw here is one of the simplest forms of content-based recommendation. Another approach would be to create a content-based profile of users, based on a weighted vector of item features. The weights denote the importance of each feature to the user, and can be computed from individually-rated content vectors.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.72.74