Time for action – using the Boost options

It is interesting to pay attention to the results, because you will find that the first document for the last two examples contain only the words real or realistic, and once we will introduce the parameter start=10, we will start seeing some presence of surrealist in the documents. The reason for this is the difference in the ranking of the documents returned. This will be explained later, but we can also give more importance to some terms over others using the boost options:

>> curl -X GET 'http://localhost:8983/solr/paintings/select?q=abstract:(*real* AND surrealist^2)&wt=json'

Again, I have omitted the encoding for spaces. So, please rewrite the appropriate part of the query as *real*%20AND%20surrealist^2.

What just happened?

Imagine we want to give more importance to the documents containing the term surrealist in the results. In our case, this could be achieved by simply adding the boosted search condition with the AND operator. The boost condition is expressed by the surrealist^2 syntax, which tells the query parser to consider the occurrence of the term surrealist to be two times more interesting than the other. Notice that Solr uses an implicit hidden score parameter behind the scene, and we can project it explicitly if we add it to the fields list with fl, as seen before.

Understanding the basic Lucene score

A Lucene score is calculated using factors like term frequency, inverse document frequency, and normalization of terms over the documents. It's not important to go into the details now, but you should consider some basic rules:

  • A rare word is preferable for giving a high score. For example, if a term recurs on every document, it gives us no additional information useful for retrieval.
  • Matching a term inside a short text gives better scoring than on long ones.
  • If a term is cited more than once, the score will be better. It will be considered as an important term.
  • A document containing all the search terms and phrases is preferable if we searching with many terms.

Given a certain score, the boost operation acts as a multiplier over the existing score for a term. The same mechanism could be used in the indexing phase too when needed, but we will not go into these details now.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.245.1