Introducing the More Like This component and recommendations

What we have seen so far is the ability to look at an entire index and search for statistics about terms in order to be able to create aggregation of similar terms when needed for a natural and flexible data navigation. We saw that combining an advanced search with filters and facets can produce a good navigation on data. Furthermore, we can move from searching to matching by focusing on some kind of similarity calculation over the internal Lucene term vectors, which can really help us in finding interesting documents.

Starting from the same concept, Solr provides a More Like This component, which is designed to offer the user a selection of interesting, relevant documents similar to the ones returned as a result of the search. In short, Solr can also be used to obtain recommendations. As suggested in the article, Building a real-time, Solr powered recommendation engine, by Trey Granger:

http://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine

The recommendations are not the actual results of a query in itself as they will also occur in the results at some point, but they are recommended in particular by a specific similarity calculation.

We can imagine many strategies here, starting from a simple boosting of interesting attributes (as we saw earlier for editorial boosting to change the ranking of results) to approaches based on hierarchies, controlled vocabulary, and taxonomies. In the latter cases, the interest moves from an enumeration-based approach (similar to what we saw with facets before) to a weight-based boosting, where every level of a tree could have a specific and different boosting value. Moving further, there can be other kinds of textual similarities. Some will be based on a snippet of text that acts as the context, and some will be based on vector-based computations, as seen before. It's even possible to introduce unsupervised machine learning algorithms to cluster documents and dynamically discover concepts. This is achievable using Weka, Mahout, or Carrot, for example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.123.2