Executing a search with aggregations

Searching for results is obviously the main activity for a search engine; thus a aggregations are very important because they often help to augment the results.

Aggregations are executed along the search by performing analytics on searched results.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

You also need the Python installed packages of the Creating a client recipe of this chapter.

The code of this recipe can be found in the chapter_16/aggregation.py file.

How to do it…

To extend a query with the aggregations part, you need to define an aggregation section, as we have already seen in Chapter 8, Aggregations. In the case of the official Elasticsearch client, you can add the aggregation DSL to the search dictionary to provide aggregations. We will perform the following steps:

  1. We initialize the client and populate the index:
            import elasticsearch 
            from pprint import pprint 
     
            es = elasticsearch.Elasticsearch() 
            index_name = "my_index" 
            type_name = "my_type" 
     
            if es.indices.exists(index_name): 
               es.indices.delete(index_name) 
     
            from utils import create_and_add_mapping, populate 
     
            create_and_add_mapping(es, index_name, type_name) 
            populate(es, index_name, type_name) 
    
  2. We can execute a search with a terms aggregation:
            results = es.search(index_name, type_name, 
                { 
                    "query": {"match_all": {}}, 
                    "aggs": { 
                        "pterms": {"terms": {"field": "parsedtext", "size": 
                         10}} 
                    } 
               }) 
            pprint(results) 
    
  3. We can execute a search with a date histogram aggregation:
            results = es.search(index_name, type_name, 
                { 
                    "query": {"match_all": {}}, 
                    "aggs": { 
                        "date_histo": {"date_histogram": {"field": "date",   
                        "interval": "month"}} 
                    } 
               }) 
            pprint(results) 
     
            es.indices.delete(index_name) 
    

How it works…

As described in Chapter 8, Aggregations, the aggregations are calculated during the search in a distributed way. When you send a query to Elasticsearch with the aggregations defined, it adds an additional step in the query processing, allowing aggregation computation.

In the preceding example, there are two kinds of aggregations; the term aggregation and the date histogram aggregation.

The first one is used to count terms and it is often seen in sites that provide facet filtering on term aggregations of results such as producers, geographic locations, and so on:

results = es.search(index_name, type_name, 
    { 
        "query": {"match_all": {}}, 
        "aggs": { 
            "pterms": {"terms": {"field": "parsedtext", "size": 10}} 
        } 
    }) 

The terms aggregation requires a field to count on. The default number of buckets for the field returned is 10. This value could be changed, defining the size parameter.

The second kind of aggregation calculated is the date histogram, which provides hits based on a datetime field. This aggregation requires at least two parameters; the datetime field to be used as the source and the interval to be used for computation:

results = es.search(index_name, type_name, 
    { 
        "query": {"match_all": {}}, 
        "aggs": { 
            "date_histo": {"date_histogram": {"field": "date", "interval": "month"}} 
        } 
    }) 

The search results are standard search responses that we have already seen in Chapter 8, Aggregations.

See also

  • The Executing the termsAggregation recipe in Chapter 8, Aggregations, on aggregating terms values and the Executing the date histogram aggregation recipe in Chapter 8, Aggregations, on computing the histogram aggregation on datetime fields
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.167.39