Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Executing a scroll search

Pagination with a standard query works very well if you are matching documents with the documents that do not change too often; otherwise, doing pagination with live data returns unpredictable results. To bypass this problem, Elasticsearch provides an extra parameter in the query: scroll.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

A Maven tool, or an IDE that natively supports it for Java programming such as Eclipse or IntelliJ IDEA, installed.

The code for this recipe is in the chapter_14/nativeclient directory and the referred class is ScrollQueryExample.

How to do it...

The search is done as in the Execute a standard search recipe. The main difference is a setScroll timeout, which allows the resulting IDs to be stored in memory for a query for a defined timeout. The steps are like those for a standard search apart from:

We import the TimeValue object to define time in a more human way:

        import org.elasticsearch.common.unit.TimeValue;

We execute the search by setting the setScroll value. We can change the code of the Execute a standard search recipe to use scroll in this way:

        SearchResponse response =  
        client.prepareSearch(index).setTypes(type).setSize(30) 
        .setQuery(query).setScroll(TimeValue.timeValueMinutes(2)) 
        .execute().actionGet();

To manage the scrolling we need to create a loop until the results are returned:

        do { 
            for (SearchHit hit : response.getHits().getHits()) { 
                System.out.println("hit: " + hit.getIndex() + ":" +   
                hit.getType() + ":" + hit.getId()); 
            } 
            response = client.prepareSearchScroll
            (response.getScrollId()).setScroll
            (TimeValue.timeValueMinutes(2)).execute( 
         } while (response.getHits().getHits().length != 0);

The loop will iterate on all the results until records are available. The output will be similar to this one:

        hit: mytest:mytype:499 
        hit: mytest:mytype:531 
        hit: mytest:mytype:533 
        hit: mytest:mytype:535 
        hit: mytest:mytype:555 
        hit: mytest:mytype:559 
        hit: mytest:mytype:571 
        hit: mytest:mytype:575 
        ...truncated...

How it works...

To use the scrolling result, it's enough to add setScroll with a timeout to the method call.

When using scrolling, some behaviors must be considered:

The timeout defines the time slice that an Elasticsearch server keeps the results for. If you ask for a scroll after the timeout, the server returns an error. So, the user must be careful with short timeouts.
The scroll consumes memory until it ends or a timeout is raised. Setting too large a timeout without consuming the data, results in a big memory overhead. Using a large number of open scrollers consumes a lot of memory proportional to the number of IDs and their related data (score, order, and so on) in the results.
With scrolling it's not possible to paginate the documents, as there is no start. Scrolling is designed to fetch consecutives results.

A standard search is changed in a scroll in this way:

SearchResponse response = client.prepareSearch(index).setTypes(type).setSize(30) 
       .setQuery(query).setScroll(TimeValue.timeValueMinutes(2)) 
       .execute().actionGet();

The response contains the results as the standard search, plus a scroll ID, which is required to fetch the next results.

To execute the scroll, you need to call the prepareSearchScroll client method with a scroll ID and a new timeout. In the example, we process all the result documents:

do { 
    for (SearchHit hit : response.getHits().getHits()) { 
        //process your hit 
    } 
    response = client.prepareSearchScroll(response.getScrollId()).setScroll(TimeValue.timeValueMinutes(2)).execute( 
} while (response.getHits().getHits().length != 0);

To understand that we are at the end of the scroll, we can check that no results are returned.

There are a lot of scenarios in which scroll is very important; but when working on big data solutions, when the results number of results is very large, it's easy to hit the timeout. In these scenarios, it is important to have good architecture in which you fetch the results as fast as possible, and don't process the results iteratively in the loop, but defer the manipulation result in a distributed way.

In this case the best solution is to use the search_after functionality of Elasticsearch sorting by _uid as described in Using search_after functionality recipe in Chapter 5, Search.

Table of Contents for
Executing a scroll search

Executing a scroll search

Getting ready

How to do it...

How it works...

See also

Table of Contents for Executing a scroll search

Create new playlist

Sign In

Sign Up

Executing a scroll search

Getting ready

How to do it...

How it works...

See also

Table of Contents for
Executing a scroll search