Geohashes are a way of encoding
lat/lon
points as strings. The original intention was to have a
URL-friendly way of specifying geolocations, but geohashes have turned out to
be a useful way of indexing geo-points and geo-shapes in databases.
Geohashes divide the world into a grid of 32 cells—4 rows and 8 columns—each represented by a letter or number. The g
cell covers half of
Greenland, all of Iceland, and most of Great Britian. Each cell can be further
divided into another 32 cells, which can be divided into another 32 cells,
and so on. The gc
cell covers Ireland and England, gcp
covers most of
London and part of Southern England, and gcpuuz94k
is the entrance to
Buckingham Palace, accurate to about 5 meters.
In other words, the longer the geohash string, the more accurate it is. If
two geohashes share a prefix— and gcpuuz
—then it implies that
they are near each other. The longer the shared prefix, the closer they
are.
That said, two locations that are right next to each other may have completely
different geohashes. For instance, the
Millenium Dome in London has
geohash u10hbp
, because it falls into the u
cell, the next top-level cell
to the east of the g
cell.
Geo-points can index their associated geohashes automatically, but more
important, they can also index all geohash prefixes. Indexing the location
of the entrance to Buckingham Palace—latitude 51.501568
and longitude
-0.141257
—would index all of the geohashes listed in the following table,
along with the approximate dimensions of each geohash cell:
Geohash | Level | Dimensions |
---|---|---|
|
|
~ 5,004km x 5,004km |
|
|
~ 1,251km x 625km |
|
|
~ 156km x 156km |
|
|
~ 39km x 19.5km |
|
|
~ 4.9km x 4.9km |
|
|
~ 1.2km x 0.61km |
|
|
~ 152.8m x 152.8m |
|
|
~ 38.2m x 19.1m |
|
|
~ 4.78m x 4.78m |
|
|
~ 1.19m x 0.60m |
|
|
~ 14.9cm x 14.9cm |
|
|
~ 3.7cm x 1.8cm |
The geohash_cell
filter can use
these geohash prefixes to find locations near a specified lat/lon
point.
The first step is to decide just how much precision you need. Although you could
index all geo-points with the default full 12 levels of precision, do you
really need to be accurate to within a few centimeters? You can save yourself
a lot of space in the index by reducing your precision requirements to
something more realistic, such as 1km
:
PUT
/attractions
{
"mappings"
:
{
"restaurant"
:
{
"properties"
:
{
"name"
:
{
"type"
:
"string"
},
"location"
:
{
"type"
:
"geo_point"
,
"geohash_prefix"
:
true
,
"geohash_precision"
:
"1km"
}
}
}
}
}
Setting geohash_prefix
to true
tells Elasticsearch to index
all geohash prefixes, up to the specified precision.
The precision can be specified as an absolute number, representing the
length of the geohash, or as a distance. A precision of 1km
corresponds
to a geohash of length 7
.
With this mapping in place, geohash prefixes of lengths 1 to 7 will be indexed, providing geohashes accurate to about 150 meters.
The geohash_cell
filter simply translates a lat/lon
location into a
geohash with the specified precision and finds all locations that contain
that geohash—a very efficient filter indeed.
GET
/attractions/restaurant/_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geohash_cell"
:
{
"location"
:
{
"lat"
:
40.718
,
"lon"
:
-73.983
},
"precision"
:
"2km"
}
}
}
}
}
This filter translates the lat/lon
point into a geohash of the appropriate
length—in this example dr5rsk
—and looks for all locations that contain
that exact term.
However, the filter as written in the preceding example may not return all restaurants within 5km of the specified point. Remember that a geohash is just a rectangle, and the point may fall anywhere within that rectangle. If the point happens to fall near the edge of a geohash cell, the filter may well exclude any restaurants in the adjacent cell.
To fix that, we can tell the filter to include the neigboring cells, by
setting neighbors
to true
:
GET
/attractions/restaurant/_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geohash_cell"
:
{
"location"
:
{
"lat"
:
40.718
,
"lon"
:
-73.983
},
"neighbors"
:
true
,
"precision"
:
"2km"
}
}
}
}
}
Clearly, looking for a geohash with precision 2km
plus all the neighboring
cells results in quite a large search area. This filter is not built for
accuracy, but it is very efficient and can be used as a prefiltering step
before applying a more accurate geo-filter.
precision
as a distance can be misleading. A precision
of 2km
is converted to a geohash of length 6, which actually has dimensions
of about 1.2km x 0.6km. You may find it more understandable to specify an
actual length such as 5
or 6
.
The other advantage that this filter has over a geo_bounding_box
filter is
that it supports multiple locations per field. The lat_lon
option that we
discussed in “Optimizing Bounding Boxes” is efficient, but only when there
is a single lat/lon
point per field.
18.225.57.164