A geo-point is a single latitude/longitude point on the Earth’s surface. Geo-points can be used to calculate distance from a point, to determine whether a point falls within a bounding box, or in aggregations.
Geo-points cannot be automatically detected with
dynamic mapping. Instead, geo_point
fields should be
mapped explicitly:
PUT
/attractions
{
"mappings"
:
{
"restaurant"
:
{
"properties"
:
{
"name"
:
{
"type"
:
"string"
},
"location"
:
{
"type"
:
"geo_point"
}
}
}
}
}
With the location
field defined as a geo_point
, we can proceed to index
documents containing latitude/longitude pairs, which can be formatted as
strings, arrays, or objects:
PUT
/attractions/restaurant/
1
{
"name"
:
"Chipotle Mexican Grill"
,
"location"
:
"40.715, -74.011"
}
PUT
/attractions/restaurant/
2
{
"name"
:
"Pala Pizza"
,
"location"
:
{
"lat"
:
40.722
,
"lon"
:
-73.989
}
}
PUT
/attractions/restaurant/
3
{
"name"
:
"Mini Munchies Pizza"
,
"location"
:
[
-73.983
,
40.719
]
}
A string representation, with "lat,lon"
.
An object representation with lat
and lon
explicitly named.
An array representation with [lon,lat]
.
Everybody gets caught at least once: string geo-points are
"latitude,longitude"
, while array geo-points are [longitude,latitude]
—the opposite order!
Originally, both strings and arrays in Elasticsearch used latitude followed by longitude. However, it was decided early on to switch the order for arrays in order to conform with GeoJSON.
The result is a bear trap that captures all unsuspecting users on their journey to full geolocation nirvana.
Four geo-point filters can be used to include or exclude documents by geolocation:
geo_bounding_box
Find geo-points that fall within the specified rectangle.
geo_distance
Find geo-points within the specified distance of a central point.
geo_distance_range
Find geo-points within a specified minimum and maximum distance from a central point.
geo_polygon
Find geo-points that fall within the specified polygon. This filter is very expensive. If you find yourself wanting to use it, you should be looking at geo-shapes instead.
All of these filters work in a similar way: the lat/lon
values are loaded
into memory for all documents in the index, not just the documents that
match the query (see “Fielddata”). Each filter performs a slightly
different calculation to check whether a point falls into the containing area.
Geo-filters are expensive — they should be used on as few documents as
possible. First remove as many documents as you can with cheaper filters, like
term
or range
filters, and apply the geo-filters last.
The bool
filter will do this for you automatically. First it
applies any bitset-based filters (see “All About Caching”) to exclude as many
documents as it can as cheaply as possible. Then it applies the more
expensive geo or script filters to each remaining document in turn.
This is by far the most efficient geo-filter because its calculation is very
simple. You provide it with the top
, bottom
, left
, and right
coordinates of a rectangle, and all it does is compare the latitude with the
left and right coordinates, and the longitude with the top and bottom
coordinates:
GET
/attractions/restaurant/_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geo_bounding_box"
:
{
"location"
:
{
"top_left"
:
{
"lat"
:
40.8
,
"lon"
:
-74.0
}
,
"bottom_right"
:
{
"lat"
:
40.7
,
"lon"
:
-73.0
}
}
}
}
}
}
}
The geo_bounding_box
is the one geo-filter that doesn’t require all
geo-points to be loaded into memory. Because all it has to do is check
whether the lat
and lon
values fall within the specified ranges, it can
use the inverted index to do a glorified range
filter.
To use this optimization, the geo_point
field must be mapped to
index the lat
and lon
values separately:
PUT
/attractions
{
"mappings"
:
{
"restaurant"
:
{
"properties"
:
{
"name"
:
{
"type"
:
"string"
},
"location"
:
{
"type"
:
"geo_point"
,
"lat_lon"
:
true
}
}
}
}
}
The location.lat
and location.lon
fields will be indexed separately.
These fields can be used for searching, but their values cannot be retrieved.
Now, when we run our query, we have to tell Elasticsearch to use the indexed
lat
and lon
values:
GET
/attractions/restaurant/_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geo_bounding_box"
:
{
"type"
:
"indexed"
,
"location"
:
{
"top_left"
:
{
"lat"
:
40.8
,
"lon"
:
-74.0
},
"bottom_right"
:
{
"lat"
:
40.7
,
"lon"
:
-73.0
}
}
}
}
}
}
}
Setting the type
parameter to indexed
(instead of the default
memory
) tells Elasticsearch to use the inverted index for this filter.
geo_point
field can contain multiple geo-points, the
lat_lon
optimization can be used only on fields that contain a single
geo-point.
The geo_distance
filter draws a circle around the specified location and
finds all documents that have a geo-point within that circle:
GET
/attractions/restaurant/_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geo_distance"
:
{
"distance"
:
"1km"
,
"location"
:
{
"lat"
:
40.715
,
"lon"
:
-73.988
}
}
}
}
}
}
Find all location
fields within 1km
of the specified point.
See Distance Units for
a list of the accepted units.
The central point can be specified as a string, an array, or (as in this example) an object. See “Lat/Lon Formats”.
A geo-distance calculation is expensive. To optimize performance, Elasticsearch draws a box around the circle and first uses the less expensive bounding-box calculation to exclude as many documents as it can. It runs the geo-distance calculation on only those points that fall within the bounding box.
The distance between two points can be calculated using algorithms, which trade performance for accuracy:
arc
The slowest but most accurate is the arc
calculation, which treats the world
as a sphere. Accuracy is still limited because the world isn’t really a sphere.
plane
The plane
calculation, which treats the world as if it were flat, is faster
but less accurate. It is most accurate at the equator and becomes less
accurate toward the poles.
sloppy_arc
So called because it uses the SloppyMath
Lucene class to trade accuracy for speed,
the sloppy_arc
calculation uses the
Haversine formula to calculate
distance. It is four to five times as fast as arc
, and distances are 99.9% accurate.
This is the default calculation.
You can specify a different calculation as follows:
GET
/attractions/restaurant/_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geo_distance"
:
{
"distance"
:
"1km"
,
"distance_type"
:
"plane"
,
"location"
:
{
"lat"
:
40.715
,
"lon"
:
-73.988
}
}
}
}
}
}
The only difference between the geo_distance
and geo_distance_range
filters is that the latter has a doughnut shape and excludes documents within
the central hole.
Instead of specifying a single distance
from the center, you specify a
minimum distance (with gt
or gte
) and maximum distance (with lt
or
lte
), just like a range
filter:
GET
/attractions/restaurant/_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geo_distance_range"
:
{
"gte"
:
"1km"
,
"lt"
:
"2km"
,
"location"
:
{
"lat"
:
40.715
,
"lon"
:
-73.988
}
}
}
}
}
}
The results of geo-filters are not cached by default, for two reasons:
Geo-filters are usually used to find entities that are near to a user’s current location. The problem is that users move, and no two users are in exactly the same location. A cached filter would have little chance of being reused.
Filters are cached as bitsets that represent all documents in a segment. Imagine that our query excludes all documents but one in a particular segment. An uncached geo-filter just needs to check the one remaining document, but a cached geo-filter would need to check all of the documents in the segment.
That said, caching can be used to good effect with geo-filters. Imagine that your index contains restaurants from all over the United States. A user in New York is not interested in restaurants in San Francisco. We can treat New York as a hot spot and draw a big bounding box around the city and neighboring areas.
This geo_bounding_box
filter can be cached and reused whenever we have a
user within the city limits of New York. It will exclude all restaurants
from the rest of the country. We can then use an uncached, more specific
geo_bounding_box
or geo_distance
filter to narrow the remaining results to those that are close to the user:
GET
/attractions/restaurant/_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"bool"
:
{
"must"
:
[
{
"geo_bounding_box"
:
{
"type"
:
"indexed"
,
"_cache"
:
true
,
"location"
:
{
"top_left"
:
{
"lat"
:
40
,
8,
"lon"
:
-74.1
},
"bottom_right"
:
{
"lat"
:
40.4
,
"lon"
:
-73.7
}
}
}
},
{
"geo_distance"
:
{
"distance"
:
"1km"
,
"location"
:
{
"lat"
:
40.715
,
"lon"
:
-73.988
}
}
}
]
}
}
}
}
}
Each lat/lon
pair requires 16 bytes of memory, memory that is in short
supply. It needs this much memory in order to provide very accurate results.
But as we have commented before, such exacting precision is seldom required.
You can reduce the amount of memory that is used by switching to a
compressed
fielddata format and by specifying how precise you need your geo-points to be. Even reducing precision to 1mm
reduces memory usage by a
third. A more realistic setting of 3m
reduces usage by 62%, and 1km
saves
a massive 75%!
This setting can be changed on a live index with the update-mapping
API:
POST
/attractions/_mapping/restaurant
{
"location"
:
{
"type"
:
"geo_point"
,
"fielddata"
:
{
"format"
:
"compressed"
,
"precision"
:
"1km"
}
}
}
Alternatively, you can avoid using memory for geo-points altogether, either by using the technique described in “Optimizing Bounding Boxes”, or by storing geo-points as doc values:
PUT
/attractions
{
"mappings"
:
{
"restaurant"
:
{
"properties"
:
{
"name"
:
{
"type"
:
"string"
},
"location"
:
{
"type"
:
"geo_point"
,
"doc_values"
:
true
}
}
}
}
}
Mapping a geo-point to use doc values can be done only when the field is first created. There is a small performance cost in using doc values instead of fielddata, but with memory in such short supply, it is often worth doing.
Search results can be sorted by distance from a point:
GET
/attractions/restaurant/_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geo_bounding_box"
:
{
"type"
:
"indexed"
,
"location"
:
{
"top_left"
:
{
"lat"
:
40
,
8,
"lon"
:
-74.0
},
"bottom_right"
:
{
"lat"
:
40.4
,
"lon"
:
-73.0
}
}
}
}
}
},
"sort"
:
[
{
"_geo_distance"
:
{
"location"
:
{
"lat"
:
40.715
,
"lon"
:
-73.998
},
"order"
:
"asc"
,
"unit"
:
"km"
,
"distance_type"
:
"plane"
}
}
]
}
Calculate the distance between the specified lat/lon
point and the
geo-point in the location
field of each document.
Return the distance in km
in the sort
keys for each result.
Use the faster but less accurate plane
calculation.
You may ask yourself: why do we specify the distance unit
? For sorting, it
doesn’t matter whether we compare distances in miles, kilometers, or light
years. The reason is that the actual value used for sorting is returned with
each result, in the sort
element:
...
"hits"
:
[
{
"_index"
:
"attractions"
,
"_type"
:
"restaurant"
,
"_id"
:
"2"
,
"_score"
:
null
,
"_source"
:
{
"name"
:
"New Malaysia"
,
"location"
:
{
"lat"
:
40.715
,
"lon"
:
-73.997
}
},
"sort"
:
[
0.08425653647614346
]
},
...
You can set the unit
to return these values in whatever form makes sense for
your application.
Geo-distance sorting can also handle multiple geo-points, both in the document
and in the sort parameters. Use the sort_mode
to specify whether it should
use the min
, max
, or avg
distance between each combination of locations.
This can be used to return “friends nearest to my work and home locations.”
It may be that distance is the only important factor in deciding the order in which results are returned, but more frequently we need to combine distance with other factors, such as full-text relevance, popularity, and price.
In these situations, we should reach for the
function_score
query that allows us to blend all
of these factors into an overall score. See “The Closer, The Better” for an
example that uses geo-distance to influence scoring.
The other drawback of sorting by distance is performance: the distance has to
be calculated for all matching documents. The function_score
query, on the
other hand, can be executed during the rescore
phase,
limiting the number of calculations to just the top n results.
3.133.107.25