Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Executing a scan query

Every time a query is executed, the results are calculated and returned to the user. In ElasticSearch there isn't standard order for records, pagination on a big block of values can bring inconsistencies between results due to added and deleted documents. The scan query tries to resolve these kinds of problems by giving a special cursor that allows to uniquely iterate all the documents. It's often used to back up documents or reindex them.

Getting ready

You need a working ElasticSearch cluster and an index populated with the script available in online code.

How to do it...

For executing a scan query, we need to perform the following steps:

From command line, we can execute a search of type scan as follows:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?search_type=scan&scroll=10m&size=50' -d '{"query":{"match_all":{}}}'

If everything is all right, the command will return the following result:

{
  "_scroll_id" : "c2Nhbjs1OzQ1Mzp4d1Ftcng0NlNCYUpVOXh4c0ZiYll3OzQ1Njp4d1Ftcng0NlNCYUpVOXh4c0ZiYll3OzQ1Nzp4d1Ftcng0NlNCYUpVOXh4c0ZiYll3OzQ1NDp4d1Ftcng0NlNCYUpVOXh4c0ZiYll3OzQ1NTp4d1Ftcng0NlNCYUpVOXh4c0ZiYll3OzE7dG90YWxfaGl0czozOw==",
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  }

The result is composed by:

scroll_id: This is the value to be used for scrolling records
took: This is the time required to execute the query
timed_out: This checks if the query was timed out
_shards query status: This gives the information about the status of shards during the query
hits: This gives the other hits that are available after scrolling

By having a scroll_id parameter, you can use scroll to get the results:

curl -XGET 'localhost:9200/_search/scroll?scroll=10m' -d 'c2Nhbjs1OzQ2Mzp4d1Ftcng0NlNCYUpVOXh4c0ZiYll3OzQ2Njp4d1Ftcng0NlNCYUpVOXh4c0ZiYll3OzQ2Nzp4d1Ftcng0NlNCYUpVOXh4c0ZiYll3OzQ2NDp4d1Ftcng0NlNCYUpVOXh4c0ZiYll3OzQ2NTp4d1Ftcng0NlNCYUpVOXh4c0ZiYll3OzE7dG90YWxfaGl0czozOw=='

The result should be something similar to the following one:

{
  "_scroll_id" : "c2NhbjswOzE7dG90YWxfaGl0czozOw==",
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 0,
    "failed" : 5
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
…}

How it works...

The query is interpreted as it is done for search. This kind of search is taught to iterate on a large set of results, so the score and the order are not computed.

During the query phase, every shard takes the state of the IDs in memory until timeout.

Processing a scan query is done in the following two steps:

The first part executes a query and returns a scroll_id parameter which is used to fetch results.
The second part executes the documents scrolling. You iterate the second step, getting the new scroll_id value and fetch other documents.

Tip

If you need to iterate on a large set of records, scan query must be used, otherwise you will have doubled results.

The scan query is a standard query, but are two special parameters that are passed in the query string, which are as follows:

search_type=scan: This parameter informs ElasticSearch to execute the scan query.
scroll=(your timeout): This parameter allows defining how long the hits should live. The time can be expressed in seconds using the s postfix (that is, 5s, 10s, or 15s) or in minutes using the m postfix (that is, 5m, or 10m). If you are using a long timeout, you must be sure that your nodes have a lot of RAM to keep them alive. This parameter is mandatory and must be always provided.

Tip

Size is also a bit special as it is treated "per shard" meaning that if you have size = 10 and 5 shards each scroll will return 50.

Table of Contents for
Executing a scan query

Executing a scan query

Getting ready

How to do it...

How it works...

Tip

Tip

See also

Table of Contents for Executing a scan query

Create new playlist

Sign In

Sign Up

Executing a scan query

Getting ready

How to do it...

How it works...

Tip

Tip

See also

Table of Contents for
Executing a scan query