Searching for results is obviously the main activity for a search engine; thus a aggregations are very important because they often help to augment the results.
Aggregations are executed along the search by performing analytics on searched results.
You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
You also need the Python installed packages of the Creating a client recipe of this chapter.
The code of this recipe can be found in the chapter_16/aggregation.py
file.
To extend a query with the aggregations part, you need to define an aggregation section, as we have already seen in Chapter 8, Aggregations. In the case of the official Elasticsearch client, you can add the aggregation DSL to the search dictionary to provide aggregations. We will perform the following steps:
import elasticsearch from pprint import pprint es = elasticsearch.Elasticsearch() index_name = "my_index" type_name = "my_type" if es.indices.exists(index_name): es.indices.delete(index_name) from utils import create_and_add_mapping, populate create_and_add_mapping(es, index_name, type_name) populate(es, index_name, type_name)
terms
aggregation:results = es.search(index_name, type_name, { "query": {"match_all": {}}, "aggs": { "pterms": {"terms": {"field": "parsedtext", "size": 10}} } }) pprint(results)
results = es.search(index_name, type_name, { "query": {"match_all": {}}, "aggs": { "date_histo": {"date_histogram": {"field": "date", "interval": "month"}} } }) pprint(results) es.indices.delete(index_name)
As described in Chapter 8, Aggregations, the aggregations are calculated during the search in a distributed way. When you send a query to Elasticsearch with the aggregations defined, it adds an additional step in the query processing, allowing aggregation computation.
In the preceding example, there are two kinds of aggregations; the term aggregation and the date histogram aggregation.
The first one is used to count terms and it is often seen in sites that provide facet filtering on term aggregations of results such as producers, geographic locations, and so on:
results = es.search(index_name, type_name, { "query": {"match_all": {}}, "aggs": { "pterms": {"terms": {"field": "parsedtext", "size": 10}} } })
The terms aggregation requires a field to count on. The default number of buckets for the field returned is 10
. This value could be changed, defining the size
parameter.
The second kind of aggregation calculated is the date histogram, which provides hits based on a datetime
field. This aggregation requires at least two parameters; the datetime
field to be used as the source and the interval
to be used for computation:
results = es.search(index_name, type_name, { "query": {"match_all": {}}, "aggs": { "date_histo": {"date_histogram": {"field": "date", "interval": "month"}} } })
The search results are standard search responses that we have already seen in Chapter 8, Aggregations.
18.218.228.99