In most cases, the top results returned to the user should be what they are looking for. The top results should be the most relevant ones and the ones we want to show. However, there are use cases where this is not enough. Sometimes we want to get all the results—in the worst case, we want to get all the documents stored in the collection and do something with them. When you are requesting a high number of pages, you will see that the performance will start suffering. This is because Solr needs to build the results list for each request and discard the first N ones to get to the requested page. Of course, there are better ways to handle such cases, and Solr allows you to use one of those methods that we will discuss in this recipe.
schema.xml
file):<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="title" type="text_general" indexed="true" stored="true" />
<add> <doc> <field name="id">1</field> <field name="title">Solr 4.0 cookbook</field> </doc> <doc> <field name="id">2</field> <field name="title">Solr 3.1 cookbook</field> </doc> <doc> <field name="id">3</field> <field name="title">ElasticSearch Server</field> </doc> <doc> <field name="id">4</field> <field name="title">Mastering Elasticsearch</field> </doc> <doc> <field name="id">5</field> <field name="title">Elasticsearch Server Second Edition</field> </doc> </add>
q=*:*&rows=2&sort=score+desc,id+asc&cursorMark=*
The results returned by Solr are as follows:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="q">*:*</str> <str name="cursorMark">*</str> <str name="sort">score desc,id asc</str> <str name="rows">2</str> </lst> </lst> <result name="response" numFound="5" start="0"> <doc> <str name="id">1</str> <str name="title">Solr 4.0 cookbook</str> <long name="_version_">1475631480903303168</long></doc> <doc> <str name="id">2</str> <str name="title">Solr 3.1 cookbook</str> <long name="_version_">1475631480954683392</long></doc> </result> <str name="nextCursorMark">AoIIP4AAACEy</str> </response>
Of course, we got the documents we wanted, but we are not only interested in them. We should also look at the value of nextCursorMark
returned along with the results by Solr. In our case, its value is AoIIP4AAACEy
, and we will use this value in the next query that will give us the next page of results.
q=*:*&rows=2&sort=score+desc,id+asc&cursorMark=AoIIP4AAACEy
The results returned by Solr are as follows:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">3</int> <lst name="params"> <str name="q">*:*</str> <str name="cursorMark">AoIIP4AAACEy</str> <str name="sort">score desc,id asc</str> <str name="rows">2</str> </lst> </lst> <result name="response" numFound="5" start="0"> <doc> <str name="id">3</str> <str name="title">ElasticSearch Server</str> <long name="_version_">1475631480954683393</long></doc> <doc> <str name="id">4</str> <str name="title">Mastering Elasticsearch</str> <long name="_version_">1475631480955731968</long></doc> </result> <str name="nextCursorMark">AoIIP4AAACE0</str> </response>
And we got the next two results and the new value of the nextCursorMark
parameter.
q=*:*&rows=2&sort=score+desc,id+asc&cursorMark=AoIIP4AAACE0
The results are as follows:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="q">*:*</str> <str name="cursorMark">AoIIP4AAACE0</str> <str name="sort">score desc,id asc</str> <str name="rows">2</str> </lst> </lst> <result name="response" numFound="5" start="0"> <doc> <str name="id">5</str> <str name="title">Elasticsearch Server Second Edition</str> <long name="_version_">1475631480956780544</long></doc> </result> <str name="nextCursorMark">AoIIP4AAACE1</str> </response>
Now let's take a look at how it works.
I'll skip discussing the index structure and the data itself, because it is very simple and really doesn't matter in this recipe. They are just here so that we are able to query Solr and get results back.
Before we go into the details on how the scroll method works, you need to remember that Solr is almost stateless when it comes to querying. Of course, there are some caches, but still for a given request Solr creates the result set from scratch for almost each request. When sending a query with start=0
and rows=10
, Solr needs to sort all the documents matching the query and return the 10 values on the top. Now imagine that we pass start=1000000
and rows=10
. Solr needs to sort all the documents, discard the first 1,000,000, and return the ones on positions 1,000,001 to 1,000,010. This doesn't sound too efficient, and it isn't. The cursor paging method allows you to overcome this by giving Solr the query state information in an encoded value provided by the cursorMark
parameter. The con of such an approach is the need of getting one page after another—we cannot randomly choose which page we want.
So starting with our first query—we said that we want all documents to be matched (q=*:*
), we want Solr to return two documents on a single page of results (rows=2
), and we want the results to be sorted on the basis of score (sort=score+desc,id+asc
). Finally, we have the cursorMark
parameter. Because this is the first page of results, we pass *
as its value.
As you can see, in addition to the standard results, Solr returned one additional thing—the nextCursorMark
property. We take the value of this property and use it as the value of the cursorMark
parameter in the next query. This is needed to get to the next page of results. In our case, the value of the cursorMark
parameter should be set to AoIIP4AAACEy
. Of course, you can expect the value of the nextCursorMark
property to be different after each page of results.
As you can see, the second query is almost the same as the first one, with one change—the value of the cursorMark
parameter. We set the value of this parameter to the one returned by the nextCursorMark
property just as I described. And as you can see, Solr returned the second page of results. We did exactly the same for the third query, but of course we set the value of the cursorMark
parameter to the one returned by the nextCursorMark
property in the second page of results (which was AoIIP4AAACE0
in our case).
3.144.106.150