Executing geo distance aggregations

Among the other standard types that we have seen in the previous aggregations, Elasticsearch allows for executing aggregations against a GeoPoint: the geo distance aggregations. This is an evolution of the previous discussed range aggregations built to work on geo locations.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command-line, you need to install curl for your operative system.

To correctly execute the following command, you need an index populated with the script (chapter_08/populate_aggregations.sh) available in the online code.

How to do it...

For executing geo distance aggregations, we will perform the following steps:

  1. Using the position field available in the documents, we want to aggregate the other documents in five ranges:
    • Less than 10 kilometers
    • From 10 kilometers to 20
    • From 20 kilometers to 50
    • From 50 kilometers to 100
    • Above 100 kilometers
  2. To achieve these goals, we create a geo distance aggregation with a code similar to following one:
            curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search? 
            pretty&size=0' -d ' { 
                "query" : { 
                    "match_all" : {} 
                }, 
               "aggs" : { 
                   "position" : { 
                        "geo_distance" : { 
                           "field":"position", 
                           "origin" : { 
                               "lat": 83.76, 
                               "lon": -81.20 
                            }, 
                            "ranges" : [ 
                                { "to" : 10 }, 
                                { "from" : 10, "to" : 20 }, 
                                { "from" : 20, "to" : 50 }, 
                                { "from" : 50, "to" : 100 }, 
                                { "from" : 100 } 
                           ] 
                       } 
                    } 
               } 
            }' 
    
  3. The result returned by Elasticsearch, if everything is okay, should be as follows:
            { 
              "took" : 177, 
              "timed_out" : false, 
              "_shards" : {...truncated...}, 
              "hits" : {...truncated...}, 
              "aggregations" : { 
                "position" : { 
                  "buckets" : [ { 
                    "key" : "*-10.0", 
                    "from" : 0.0, 
                    "to" : 10.0, 
                    "doc_count" : 0 
                  }, { 
                    "key" : "10.0-20.0", 
                    "from" : 10.0, 
                    "to" : 20.0, 
                    "doc_count" : 0 
                  }, { 
                    "key" : "20.0-50.0", 
                    "from" : 20.0, 
                    "to" : 50.0, 
                    "doc_count" : 0 
                  }, { 
                    "key" : "50.0-100.0", 
                    "from" : 50.0, 
                    "to" : 100.0, 
                    "doc_count" : 0 
                  }, { 
                    "key" : "100.0-*", 
                    "from" : 100.0, 
                    "doc_count" : 1000 
                  } ] 
                } 
              } 
            } 
    

How it works...

The geo range aggregation is an extension of the range aggregations that works on geo localizations. It works only if a field is mapped as a geo_point.

The field can contain a single or a multi-values geo points.

The aggregation requires at least the following three parameters:

  • field: the field of the geo point to work on
  • origin: the geo point to be used for computing the distances
  • ranges: a list of ranges to collect documents based on their distance from the target point

The GeoPoint can be defined in one of the following accepted formats:

  • latitude and longitude as properties, that is: {"lat": 83.76, "lon": -81.20 }
  • longitude and latitude as array, that is: [-81.20, 83.76]
  • latitude and longitude as string, that is: 83.76, -81.20
  • geohash, that is: fnyk80

The ranges are defined as a couple of from/to values. If one of them is missing, they are considered unbound.

The values used for the range are by default set to kilometers, but using the property unit it's possible to set them as follows:

  • mi or miles
  • in or inch
  • yd or yard
  • km or kilometers
  • m or meters
  • cm or centimeter
  • mm or millimeters

It's also possible to set how the distance is computed with the distance_type parameter. Valid values for this parameter are as follows:

  • arc, which uses the Arc Length formula. It is the most precise. (See http://en.wikipedia.org/wiki/Arc_length for more details on the arc length algorithm.)
  • sloppy_arc (default), which is a faster implementation of the arc length formula, but less precise.
  • plane, which is used for the plane distance formula. It is the fastest and most CPU intensive, but it's also the least precise.

As for the range filter, the range values are treated independently, so the overlapping ranges are allowed.

When the results are returned, this aggregation provides a lot of information in its fields as follows:

  • from/to defines the analyzed range
  • key defines the string representation of the range
  • doc_count defines the number of documents in the bucket that matches the range

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.62.197