Time for action – obtaining similar documents by More Like This

In the case of MoreLikeThis, we can imagine performing an internal query for every term that seems to be relevant by computing the similarity based on its vectors. The document extracted from the collection by the lookup will be included in the MoreLikeThis results list.

We can easily play with a small, simple example. Search for the term boy in the title field, and ask the results to be sorted by descending score (sort=score+desc) from the most relevant to the least as shown in the following query:

>> curl -GET 'http://localhost:8983/solr/paintings/select?q=title:boy&start=0&rows=1&mlt=on&mlt.fl=artist_entity,title&mlt.mindf=1&mlt.mintf=1&mlt.minwl=3&fl=uri,title,score&sort=score+desc&omitHeader=true&wt=json'

Note that we will not need to add new handlers to solrconfig.xml as More Like This will work on term vectors, and can be used by default. It is preferable to use stored term vectors for fields that will be used for calculating similarity. If term vectors are not stored, MoreLikeThis will generate terms from the stored fields.

What just happened?

Once we have enabled the More Like This component (mlt=on), we are able to obtain results for our example query, as shown in the following screenshot:

What just happened?

In our example, we asked for recommendations with a minimum document frequency of 1 (mlt.mindf=1), a minimum term frequency again equal to 1 (mlt.mintf=1), and ignored words that are shorter than a length of three characters as we feel that are not so significant (mlt.wl=3).

We can refer to the wiki page to look for a list of parameters and use cases:

http://wiki.apache.org/solr/MoreLikeThis

If you look at the image, you will find that there are many proposed recommendations, and the first one seems to be very pertinent, but still there is a odd error on the third position. This is probably due to the textual distance between the term boy and the term bee, and we could have configured it better in our example by enabling term vectors on the fields for which we calculated the similarities (mlt.fl=artist_entity,title).

Adopting a More Like This handler

Even if we are not required to define a new handler for the MoreLikeThis component, this is still possible by adding a simple configuration to the solrconfig.xml file:

<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">

</requestHandler>

This could be helpful in some case and is easy to configure. For more details, please visit:

http://wiki.apache.org/solr/MoreLikeThisHandler

Pop quiz

Q1. Which components can be used for an auto-suggester?

  1. Facets component, data import handler component, and terms component
  2. More Like This component, facets component, and filters component
  3. Facets component, terms vector component, and More Like This component

There seem to exist different versions of the 'adoration of the magi' theme:

>> curl -X GET 'http://localhost:8983/solr/paintings/select?q=subject_entity:adoration+of+the+magi&fl=title,artist,comment&wt=json'

Q2. How can we discover this only using facet suggestions?

  1. We can use facet.field=subject_entity
  2. We can use facet.field="adoration of the magi"
  3. We can use facet.prefix="adoration"

Q3. How can we obtain a list of all the possible cities and museums in the index?

  1. Using facet.field=city,musem
  2. Using facet.field=city AND facet.field=museum
  3. Using facet.field=city&facet.field=museum

Q4. For what is the More Like This component designed?

  1. To provide recommendations of similar documents
  2. To provide recommendations of documents with similar terms
  3. To provide recommendations of documents with similar terms for a specific field

Q5. What is the meaning of tf-idf?

  1. It represents the ratio between the terms frequency and inverse document frequency
  2. It represents the ratio between the terms frequency and document frequency
  3. It represents the product between the inverse document frequency and terms frequency
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.57.164