Using the search_after functionality

Elasticsearch standard pagination using from and size performs very poorly on large datasets because for every query you need to compute and discard all the results before the from value. The scrolling doesn't have this problem, but it consumes a lot, due to memory search contexts, so it cannot be used for frequent user queries.

To bypass these problems, Elasticsearch 5.x provides the search_after functionality that provides a fast skipping for scrolling results.

Getting ready

You will need an up-and-running Elasticsearch installation as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via a command line, you need to install curl for your operating system.

To correctly execute the following commands, you will need an index populated with the chapter_05/populate_query.sh script available in the online code.

How to do it...

In order to execute a scrolling query, we will perform the following steps:

  1. From the command line, we can execute a search which will provide a sort for your value and use the _uid of the document as the last sort parameter, as follows:
            curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?
            pretty' -d '
            {
                "size": 1,
                "query": {
                    "match_all" : {}
                },
                "sort": [
                    {"price": "asc"},
                    {"_uid": "desc"}
                ]
            }' 
    
  2. If everything works, the command will return the following:
            { 
              "took" : 52, 
              "timed_out" : false, 
              "_shards" : {...}, 
              "hits" : { 
                "total" : 3, 
                "max_score" : null, 
                "hits" : [ 
                  { 
                    "_index" : "test-index", 
                    "_type" : "test-type", 
                    "_id" : "1", 
                    "_score" : null, 
                    "_source" : {...}, 
                    "sort" : [ 
                      4.0, 
                      "test-type#1" 
                    ] 
                  } 
                ] 
              } 
            } 
    
  3. To use the search_after functionality, you need to keep track of your last sort result, which in this case is as follows: [4.0, "test-type#1"].
  4. To fetch the next result, you must provide the search_after functionality with the last sort value of your last record, as follows:
            curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?
            pretty' -d '
            {
                "size": 1,
                "query": {
                    "match_all" : {}
                },
        
                  "search_after": [4.0, "test-type#1"],
        
                "sort": [
                    {"price": "asc"},
                    {"_uid": "desc"}
                ]
            }'
    

How it works...

Elasticsearch uses Lucene for indexing data. In Lucene indices, all the terms are sorted and stored in an ordered way, so it's natural for Lucene to be extremely fast in skipping to a term value. This operation is managed in the Lucene core with the skipTo method. This operation doesn't consume memory and in the case of search_after, a query is built using search_after values to fast skip in Lucene search and to speed up the result pagination.

The search_after functionality is introduced in Elasticsearch 5.x, but it must be kept as an important focal point to improve the user experience in search scrolling/pagination results.

See also

  • Refer to the Executing a search recipe in this chapter to learn how to structure a search for size pagination and The executing a scrolling query recipe for scrolling values in a query
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.197.212