Time for action – generating highlighted snippets over a term

As an example, let's define a new query for highlighting a term we are currently searching for using the following steps:

  1. The following is the query used for searching the document containing an artist term similar to "salvatore":
    >> curl -X GET 'http://localhost:8983/solr/paintings/select?q=artist:salvatore~0.5&fl=uri,title,artist&rows=2&hl=true&hl.fl=abstract&hl.simple.pre=%3Cstrong%3E&hl.simple.post=%3C/strong%3E&hl.snippets=4&wt=json'
    

    What we are expecting to see in the results are snippets of texts, extracted from the actual texts in the documents, so a user can recognize a matched term, as shown in the following screenshot:

    Time for action – generating highlighted snippets over a term

Clearly the values projected by the highlighting component are not designed to be used directly by users, but by a front-end component of the application, in order to "decorate" the results with an appropriate formatting for highlighting. Generally speaking, the snippets proposed on the highlighted results can be returned by themselves or more often they will be used, for example, to find the portion of text to be emphasized, and replace it when we need it for formatting.

What just happened?

In the example, you can recognize that the highlights in the returned snippets are surrounded by XML tags, even if the format is JSON. The format for highlighting is independent from the result format.

We have reduced the rows returned to only 2, in order to make the results more readable. As you can see there is a highlighted item for every document, so it's important to remember to return the unique key field, too (in our case, it's the URI; in a common scenario, it will be an ID field) in order to be able to assemble the highlights and the actual results, without the need for referring to position.

  • hl=true: This is used for activating the highlighting component.
  • hl.fl=abstract: This lets us return only text snippets from the abstract field.
  • hl.simple.pre=<strong>, hl.simple.post=</strong>: This defines the tags to surround the matching term (we can, of course, replace the characters < and > with %3C and %3E).
  • hl.snippets=4: This will produce four snippets at the most for every document. This is important since there can be more than one snippet from a long text field, such as our abstract field; in the example, the first item in the highlighted results actually contains two snippets.

Moreover a snippet is useful because it not only contains the term to be highlighted, but also a text fragment useful for identifying its context. We can also decide how much text this snippet should contain in the results via a specific parameter.

You can find a complete list of all the usable parameters at: http://wiki.apache.org/solr/HighlightingParameters

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.233.153