Although filtering or scoring results by geolocation is useful, it is often more useful to be able to present information to the user on a map. A search may return way too many results to be able to display each geo-point individually, but geo-aggregations can be used to cluster geo-points into more manageable buckets.
Three aggregations work with fields of type geo_point
:
geo_distance
Groups documents into concentric circles around a central point.
geohash_grid
Groups documents by geohash cell, for display on a map.
geo_bounds
Returns the lat/lon
coordinates of a bounding box that would
encompass all of the geo-points. This is useful for choosing
the correct zoom level when displaying a map.
The geo_distance
agg is useful for searches such as
to “find all pizza restaurants within 1km of me.” The search results
should, indeed, be limited to the 1km radius specified by the user, but we can
add “another result found within 2km”:
GET
/attractions/restaurant/_search
{
"query"
:
{
"filtered"
:
{
"query"
:
{
"match"
:
{
"name"
:
"pizza"
}
},
"filter"
:
{
"geo_bounding_box"
:
{
"location"
:
{
"top_left"
:
{
"lat"
:
40
,
8,
"lon"
:
-74.1
}
,
"bottom_right"
:
{
"lat"
:
40.4
,
"lon"
:
-73.7
}
}
}
}
}
},
"aggs"
:
{
"per_ring"
:
{
"geo_distance"
:
{
"field"
:
"location"
,
"unit"
:
"km"
,
"origin"
:
{
"lat"
:
40.712
,
"lon"
:
-73.988
}
,
"ranges"
:
[
{
"from"
:
0
,
"to"
:
1
},
{
"from"
:
1
,
"to"
:
2
}
]
}
}
},
"post_filter"
:
{
"geo_distance"
:
{
"distance"
:
"1km"
,
"location"
:
{
"lat"
:
40.712
,
"lon"
:
-73.988
}
}
}
}
The main query looks for restaurants with pizza
in the name.
The bounding box filters these results down to just those in the greater New York area.
The geo_distance
agg counts the number of results within
1km of the user, and between 1km and 2km from the user.
Finally, the post_filter
reduces the search results to just
those restaurants within 1km of the user.
The response from the preceding request is as follows:
"hits"
:
{
"total"
:
1
,
"max_score"
:
0.15342641
,
"hits"
:
[
{
"_index"
:
"attractions"
,
"_type"
:
"restaurant"
,
"_id"
:
"3"
,
"_score"
:
0.15342641
,
"_source"
:
{
"name"
:
"Mini Munchies Pizza"
,
"location"
:
[
-73.983
,
40.719
]
}
}
]
},
"aggregations"
:
{
"per_ring"
:
{
"buckets"
:
[
{
"key"
:
"*-1.0"
,
"from"
:
0
,
"to"
:
1
,
"doc_count"
:
1
},
{
"key"
:
"1.0-2.0"
,
"from"
:
1
,
"to"
:
2
,
"doc_count"
:
1
}
]
}
}
The post_filter
has reduced the search hits to just the single
pizza restaurant within 1km of the user.
The aggregation includes the search result plus the other pizza restaurant within 2km of the user.
In this example, we have counted the number of restaurants that fall
into each concentric ring. Of course, we could nest subaggregations under
the per_rings
aggregation to calculate the average price per ring, the
maximium popularity, and more.
The number of results returned by a query may be far too many to display each
geo-point individually on a map. The geohash_grid
aggregation buckets nearby
geo-points together by calculating the geohash for each point, at the level of
precision that you define.
The result is a grid of cells—one cell per geohash—that can be displayed on a map. By changing the precision of the geohash, you can summarize information across the whole world, by country, or by city block.
The aggregation is sparse—it returns only cells that contain documents. If your geohashes are too precise and too many buckets are generated, it will return, by default, the 10,000 most populous cells—those containing the most documents. However, it still needs to generate all the buckets in order to figure out which are the most populous 10,000. You need to control the number of buckets generated by doing the following:
Limit the result with a geo_bounding_box
filter.
Choose an appropriate precision
for the size of your bounding box.
GET
/attractions/restaurant/_search?search_type=count
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geo_bounding_box"
:
{
"location"
:
{
"top_left"
:
{
"lat"
:
40
,
8,
"lon"
:
-74.1
}
,
"bottom_right"
:
{
"lat"
:
40.4
,
"lon"
:
-73.7
}
}
}
}
}
},
"aggs"
:
{
"new_york"
:
{
"geohash_grid"
:
{
"field"
:
"location"
,
"precision"
:
5
}
}
}
}
The bounding box limits the scope of the search to the greater New York area.
Geohashes of precision 5
are approximately 5km x 5km.
Geohashes with precision 5
measure about 25km2 each, so 10,000 cells at
this precision would cover 250,000km2. The bounding box that we specified
measures approximately 44km x 33km, or about 1,452km2, so we are well within
safe limits; we definitely won’t create too many buckets in memory.
The response from the preceding request looks like this:
...
"aggregations"
:
{
"new_york"
:
{
"buckets"
:
[
{
"key"
:
"dr5rs"
,
"doc_count"
:
2
}
,
{
"key"
:
"dr5re"
,
"doc_count"
:
1
}
]
}
}
...
Again, we didn’t specify any subaggregations, so all we got back was the document count. We could have asked for popular restaurant types, average price, or other details.
To plot these buckets on a map, you need a library that understands how to convert a geohash into the equivalent bounding box or central point. Libraries exist in JavaScript and other languages that will perform this conversion for you, but you can also use information from “geo_bounds Aggregation” to perform a similar job.
In our previous example, we filtered our results by using a bounding box that covered the greater New York area. However, our results were all located in downtown Manhattan. When displaying a map for our user, it makes sense to zoom into the area of the map that contains the data; there is no point in showing lots of empty space.
The geo_bounds
aggregation does exactly this: it calculates the smallest
bounding box that is needed to encapsulate all of the geo-points:
GET
/attractions/restaurant/_search?search_type=count
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geo_bounding_box"
:
{
"location"
:
{
"top_left"
:
{
"lat"
:
40
,
8,
"lon"
:
-74.1
},
"bottom_right"
:
{
"lat"
:
40.4
,
"lon"
:
-73.9
}
}
}
}
}
},
"aggs"
:
{
"new_york"
:
{
"geohash_grid"
:
{
"field"
:
"location"
,
"precision"
:
5
}
},
"map_zoom"
:
{
"geo_bounds"
:
{
"field"
:
"location"
}
}
}
}
The geo_bounds
aggregation will calculate the smallest bounding box required to encapsulate all of the documents matching our query.
The response now includes a bounding box that we can use to zoom our map:
...
"aggregations"
:
{
"map_zoom"
:
{
"bounds"
:
{
"top_left"
:
{
"lat"
:
40.722
,
"lon"
:
-74.011
},
"bottom_right"
:
{
"lat"
:
40.715
,
"lon"
:
-73.983
}
}
},
...
In fact, we could even use the geo_bounds
aggregation inside each geohash
cell, in case the geo-points inside a cell are clustered in just a part of the
cell:
GET
/attractions/restaurant/_search?search_type=count
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geo_bounding_box"
:
{
"location"
:
{
"top_left"
:
{
"lat"
:
40
,
8,
"lon"
:
-74.1
},
"bottom_right"
:
{
"lat"
:
40.4
,
"lon"
:
-73.9
}
}
}
}
}
},
"aggs"
:
{
"new_york"
:
{
"geohash_grid"
:
{
"field"
:
"location"
,
"precision"
:
5
},
"aggs"
:
{
"cell"
:
{
"geo_bounds"
:
{
"field"
:
"location"
}
}
}
}
}
}
Now the points in each cell have a bounding box:
...
"aggregations"
:
{
"new_york"
:
{
"buckets"
:
[
{
"key"
:
"dr5rs"
,
"doc_count"
:
2
,
"cell"
:
{
"bounds"
:
{
"top_left"
:
{
"lat"
:
40.722
,
"lon"
:
-73.989
},
"bottom_right"
:
{
"lat"
:
40.719
,
"lon"
:
-73.983
}
}
}
},
...
3.133.142.2