Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Highlighting results

ElasticSearch performs a good job on finding results also in large text documents. Thus, for searching text in very large blocks it's very useful, but to improve the user experience it is sometimes required to show the abstract part: a small portion of the text that has matched the query. The highlight functionality in ElasticSearch is designed to do this job.

Getting ready

You need a working ElasticSearch cluster and an index populated with the script available in online code.

How to do it...

For searching and highlighting the results, we need to perform the following steps:

From command line, we can execute a search with a highlight section as follows:

curl -XGET 'http://127.0.0.1:9200/test-index/_search?from=0&size=10' -d '
{
"query": {"query_string": {"query": "joe"}}, 
"highlight": {
"pre_tags": ["<b>"], 
"fields": {
"parsedtext": {"order": "score"}, 
"name": {"order": "score"}}, 
   "post_tags": ["</b>"]}}'

If everything is all right, the command will return the following result:

{
  … omissis …
  "hits" : {
    "total" : 1,
    "max_score" : 0.44194174,
    "hits" : [ {
      "_index" : "test-index",
      "_type" : "test-type",
      "_id" : "1",
      "_score" : 0.44194174, "_source" : {"position": 1, "parsedtext": "Joe Testere nice guy", "name": "Joe Tester", "uuid": "11111"},
      "highlight" : {
        "name" : [ "<b>Joe</b> Tester" ],
        "parsedtext" : [ "<b>Joe</b> Testere nice guy" ]
      }
    } ]
  }
}

As you can see, in the standard result there is a new field, highlight, which contains the highlighted fields with an array of fragments.

How it works...

When the highlight parameter is passed to the search object, ElasticSearch tries to execute the highlight on document results.

The highlighting phase, which is after the document fetch, tries to extract the highlight following these steps:

It collects the terms available in the query
It initializes the highlighter with the parameters given during the query
It extracts the interested fields: it tries to load them if they are stored, otherwise they are taken from the source
It executes the query on single fields to detect the more relevant parts
It adds the found, highlighted fragments to the hit

Using the highlighting functionality is very easy, but there are some important areas where we need to pay attention. They are as follows:

The field that must be used for highlighting must be available in one of these forms: stored, in source, or in stored term vector
Tip
The ElasticSearch highlighter checks the presence of the data field first as term vector (it is the fastest way to execute the highlighting functionality). If the field doesn't have the term vector, it tries to load the field value from the stored fields. If the field is not stored, it finally loads the JSON source, interprets it, and extracts the data value if available. Obviously, the last approach is the slowest one and most resource intensive.
If a special analyzer is used in search, it should be passed also to the highlighter (this is often automatically managed).

There are several parameters that can be passed in the highlight object to control the highlighting process, and these are as follows:

number_of_fragments (default 5): This parameter controls how many fragments are to be returned. It can be configured globally or for a field.
fragment_size (default 100): This parameter controls the number of characters that the fragments must contain. It can be configured globally or for a field.
pre_tags/post_tags: This parameter controls a list of tags to be used for marking the highlighted text.
tags_schema="styled": This parameter allows defining a tags schema that marks highlighting with different tags with ordered importance. This is a helper to be used to avoid defining a lot of pre_tags/post_tags tags.

Table of Contents for
Highlighting results

Highlighting results

Getting ready

How to do it...

How it works...

Tip

See also

Table of Contents for Highlighting results

Create new playlist

Sign In

Sign Up

Highlighting results

Getting ready

How to do it...

How it works...

Tip

See also

Table of Contents for
Highlighting results