Highlighting results

ElasticSearch performs a good job on finding results also in large text documents. Thus, for searching text in very large blocks it's very useful, but to improve the user experience it is sometimes required to show the abstract part: a small portion of the text that has matched the query. The highlight functionality in ElasticSearch is designed to do this job.

Getting ready

You need a working ElasticSearch cluster and an index populated with the script available in online code.

How to do it...

For searching and highlighting the results, we need to perform the following steps:

  1. From command line, we can execute a search with a highlight section as follows:
    curl -XGET 'http://127.0.0.1:9200/test-index/_search?from=0&size=10' -d '
    {
    "query": {"query_string": {"query": "joe"}}, 
    "highlight": {
    "pre_tags": ["<b>"], 
    "fields": {
    "parsedtext": {"order": "score"}, 
    "name": {"order": "score"}}, 
       "post_tags": ["</b>"]}}'
  2. If everything is all right, the command will return the following result:
    {
      … omissis …
      "hits" : {
        "total" : 1,
        "max_score" : 0.44194174,
        "hits" : [ {
          "_index" : "test-index",
          "_type" : "test-type",
          "_id" : "1",
          "_score" : 0.44194174, "_source" : {"position": 1, "parsedtext": "Joe Testere nice guy", "name": "Joe Tester", "uuid": "11111"},
          "highlight" : {
            "name" : [ "<b>Joe</b> Tester" ],
            "parsedtext" : [ "<b>Joe</b> Testere nice guy" ]
          }
        } ]
      }
    }

As you can see, in the standard result there is a new field, highlight, which contains the highlighted fields with an array of fragments.

How it works...

When the highlight parameter is passed to the search object, ElasticSearch tries to execute the highlight on document results.

The highlighting phase, which is after the document fetch, tries to extract the highlight following these steps:

  • It collects the terms available in the query
  • It initializes the highlighter with the parameters given during the query
  • It extracts the interested fields: it tries to load them if they are stored, otherwise they are taken from the source
  • It executes the query on single fields to detect the more relevant parts
  • It adds the found, highlighted fragments to the hit

Using the highlighting functionality is very easy, but there are some important areas where we need to pay attention. They are as follows:

  • The field that must be used for highlighting must be available in one of these forms: stored, in source, or in stored term vector

    Tip

    The ElasticSearch highlighter checks the presence of the data field first as term vector (it is the fastest way to execute the highlighting functionality). If the field doesn't have the term vector, it tries to load the field value from the stored fields. If the field is not stored, it finally loads the JSON source, interprets it, and extracts the data value if available. Obviously, the last approach is the slowest one and most resource intensive.

  • If a special analyzer is used in search, it should be passed also to the highlighter (this is often automatically managed).

There are several parameters that can be passed in the highlight object to control the highlighting process, and these are as follows:

  • number_of_fragments (default 5): This parameter controls how many fragments are to be returned. It can be configured globally or for a field.
  • fragment_size (default 100): This parameter controls the number of characters that the fragments must contain. It can be configured globally or for a field.
  • pre_tags/post_tags: This parameter controls a list of tags to be used for marking the highlighted text.
  • tags_schema="styled": This parameter allows defining a tags schema that marks highlighting with different tags with ordered importance. This is a helper to be used to avoid defining a lot of pre_tags/post_tags tags.

See also

  • Refer to the Executing a search recipe in this chapter
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.92.34