Counting matched results

It is often required to return only the count of the matched results and not the results themselves.

There are a lot of scenarios involving counting, such as the following:

  • To return the number of something (how many posts for a blog, how many comments for a post)
  • Validating whether some items are available. Are there posts? Are there comments?

Getting ready

You will need an up-and-running Elasticsearch installation as used in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via a command line, you need to install curl for your operating system.

To correctly execute the following commands, you will need an index populated with the chapter_05/populate_query.sh script available in the online code.

How to do it...

In order to execute a counting query, we will perform the following steps:

  1. From the command line, we will execute a count query, as follows:
        curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_count? 
        pretty' -d '{"query":{"match_all":{}}}'
    
  2. The result returned by ElasticSearch, if everything works, should be as follows:
            { 
              "count" : 3, 
              "_shards" : { 
              "total" : 5, 
                "successful" : 5, 
                "failed" : 0 
             } 
           } 
    

The result is composed of the count result (a long type) and the shard status at the time of the query.

How it works...

The query is interpreted in the same way as for searching. The count action is processed and distributed in all the shards, in which is executed as a low-level Lucene count call. Every hit shard returns a count that is aggregated and returned to the user.

Note

In Elasticsearch, counting is faster than searching. In the case that the result source hits are not required, it's good practice to use the count API because it's faster and requires less resources.

The HTTP method to execute a count is GET (but also POST works), and the REST endpoints are as follows:

http://<server>/_count

http://<server>/<index_name(s)>/_count

http://<server>/<index_name(s)>/<type_name(s)>/_count

Multi indices and types are comma separated. If an index or a type is defined, the search is limited only to them. An alias can be used as an index name.

Typically, a body is used to express a query, but for simple queries, the q (query argument) can be used. For example, look at the following code:

    curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_count?
    q=uuid:11111'

There's more...

In a previous version of Elasticsearch, the count API call (_count REST entrypoint) was implemented as a custom action, but in Elasticsearch version 5.x, it's removed. Internally the previous count API is implemented as a standard search with the size set to 0.

Using this trick, it not only speeds up the searching, but reduces networking.

You can use this approach to execute aggregations (we will see them in Chapter 8, Aggregations) without returning hits.

The previous query can be also executed as follows:

    curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?
    pretty&size=0' -d '{"query":{"match_all":{}}}'

And the result returned by Elasticsearch, if everything works, should be as follows:

{ 
  "took" : 32, 
  "timed_out" : false, 
  "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "failed" : 0 
  }, 
  "hits" : { 
    "total" : 3, 
    "max_score" : 0.0, 
    "hits" : [ ] 
  } 
} 

The count result (a long type) is available in the hits.total.

See also

  • The Executing a search recipe in this chapter on using size to paginate
  • Chapter 8, Aggregations on how to use the aggregations
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.201.26