Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Executing significant terms aggregation

This kind of aggregation is an evolution of the previous one in that it's able to cover several scenarios such as:

Suggesting relevant terms related to current query text
Discovering relations of terms
Discover common patterns in text

In these scenarios cases, the result must not be as simple as the previous terms aggregations; it must be computed as a variance between a foreground set (generally the query) and a background one (a large bulk of data).

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

To correctly execute the following command, you need an index populated with the script (chapter_08/populate_aggregations.sh) available in the online code.

How to do it...

For executing a significant term aggregation, we will perform the following steps:

We want to calculate the significant terms tag given some tags. The REST call should be as follows:

        curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?   
        pretty=true&size=0' -d '{
            "query" : {
                "terms" : {"tag" : [ "ullam", "in", "ex" ]}
             },
             "aggs": {
                 "significant_tags": {
                     "significant_terms": {
                          "field": "tag"
                     }
                 }
             }
          }'

The result returned by Elasticsearch, if everything is okay, should be as follows:

        {
          "took" : 6,
          "timed_out" : false,
          "_shards" : { ...truncated... },
          "hits" : { ...truncated... },
          "aggregations" : {
            "significant_tags" : {
              "doc_count" : 45,
              "buckets" : [
                {
                  "key" : "ullam",
                  "doc_count" : 17,
                  "score" : 8.017283950617283,
                  "bg_count" : 17
                },
                {
                  "key" : "in",
                  "doc_count" : 15,
                  "score" : 7.0740740740740735,
                  "bg_count" : 15
                },
                {
                  "key" : "ex",
                  "doc_count" : 14,
                  "score" : 6.602469135802469,
                  "bg_count" : 14
                },
                {
                  "key" : "vitae",
                  "doc_count" : 3,
                  "score" : 0.674074074074074,
                  "bg_count" : 6
                },
                {
                  "key" : "necessitatibus",
                  "doc_count" : 3,
                  "score" : 0.3373737373737374,
                  "bg_count" : 11
                }
               ]
             }
          }
        }

The aggregation result is composed from several buckets with:

key: the term used to populate the bucket
doc_count: the number of results with the key term
score: the score for this bucket
bg_count: the number of background documents that contains the key term

How it works...

The execution of the aggregation is similar to the previous ones. Internally, two terms aggregations are computed: one related to the documents matched with the query or parent aggregation and one based on all the documents on the knowledge base. Then, the two results datasets are scored to compute the significant result.

Due to the large cardinality of terms queries and the cost of significant relevance computation, this kind of aggregation is very CPU intensive.

The significant aggregation returns terms that are evaluated as significant for the current query.

To compare the results of significant terms aggregation with the plain terms aggregation, we can execute the same aggregation with the terms one as follows:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?pretty=true&size=0' -d '{
"query" : {
          "terms" : {"tag" : [ "ullam", "in", "ex" ]}
    },
        "aggs": {
       "tags": {
          "terms": {
             "field": "tag"
            }
         }
      }
  }'

The returned results will be as follows:

{
  ...truncated...
  "aggregations" : {
    "tags" : {
      "doc_count_error_upper_bound" : 2,
      "sum_other_doc_count" : 96,
      "buckets" : [
        {"key" : "ullam", "doc_count" : 17},
        {"key" : "in", "doc_count" : 15},
        {"key" : "ex", "doc_count" : 14},
        {"key" : "necessitatibus", "doc_count" : 3},
        {"key" : "vitae", "doc_count" : 3},
        {"key" : "architecto", "doc_count" : 2},
        {"key" : "debitis", "doc_count" : 2 },
        {"key" : "dicta", "doc_count" : 2},
        {"key" : "error", "doc_count" : 2},
        {"key" : "excepturi", "doc_count" : 2}
       ]
    }
  }
}

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Executing significant terms aggregation

Create new playlist

Sign In

Sign Up

Executing significant terms aggregation

Getting ready

How to do it...

How it works...

Table of Contents for
Executing significant terms aggregation