Geo-shapes use a completely different approach than geo-points. A circle on a computer screen does not consist of a perfect continuous line. Instead it is drawn by coloring adjacent pixels as an approximation of a circle. Geo-shapes work in much the same way.
Complex shapes—such as points, lines, polygons, multipolygons, and polygons with holes,--are “painted” onto a grid of geohash cells, and the shape is converted into a list of the geohashes of all the cells that it touches.
Actually, two types of grids can be used with geo-shapes: geohashes, which we have already discussed and which are the default encoding, and quad trees. Quad trees are similar to geohashes except that there are only four cells at each level, instead of 32. The difference comes down to a choice of encoding.
All of the geohashes that compose a shape are indexed as if they were terms. With this information in the index, it is easy to determine whether one shape intersects with another, as they will share the same geohash terms.
That is the extent of what you can do with geo-shapes: determine the
relationship between a query shape and a shape in the index. The relation
can be one of the following:
intersects
The query shape overlaps with the indexed shape (default).
disjoint
The query shape does not overlap at all with the indexed shape.
within
The indexed shape is entirely within the query shape.
Geo-shapes cannot be used to caculate distance, cannot be used for sorting or scoring, and cannot be used in aggregations.
Like fields of type geo_point
, geo-shapes have to be mapped explicitly
before they can be used:
PUT
/attractions
{
"mappings"
:
{
"landmark"
:
{
"properties"
:
{
"name"
:
{
"type"
:
"string"
},
"location"
:
{
"type"
:
"geo_shape"
}
}
}
}
}
There are two important settings that you should consider changing precision
and distance_error_pct
.
The precision
parameter controls the maximum length of the geohashes that
are generated. It defaults to a precision of 9
, which equates to a
geohash with dimensions of about 5m x 5m. That is probably far
more precise than you need.
The lower the precision, the fewer terms that will be indexed and the faster the search will be. But of course, the lower the precision, the less accurate are your geo-shapes. Consider just how accurate you need your shapes to be—even one or two levels of precision can represent a significant savings.
You can specify precisions by using distances—for example, 50m
or 2km
—but
ultimately these distances are converted to the same levels as described in
Chapter 37.
When indexing a polygon, the big central continuous part can be represented cheaply by a short geohash. It is the edges that matter. Edges require much smaller geohashes to represent them with any accuracy.
If you’re indexing a small landmark, you want the edges to be quite accurate. It wouldn’t be good to have one monument overlapping with the next. When indexing an entire country, you don’t need quite as much precision. Fifty meters here or there isn’t likely to start any wars.
The distance_error_pct
specifies the maximum allowable error based on the
size of the shape. It defaults to 0.025
, or 2.5%. In other words, big shapes
(like countries) are allowed to have fuzzier edges than small shapes (like
monuments).
The default of 0.025
is a good starting point, but the more error that is
allowed, the fewer terms that are required to index a shape.
Shapes are represented using GeoJSON, a simple open
standard for encoding two-dimensional shapes in JSON. Each shape definition
contains the type of shape—point
, line
, polygon
, envelope
,—and one or more arrays of longitude/latitude points.
For instance, we can index a polygon representing Dam Square in Amsterdam as follows:
PUT
/attractions/landmark/dam_square
{
"name"
:
"Dam Square, Amsterdam"
,
"location"
:
{
"type"
:
"polygon"
,
"coordinates"
:
[[
[
4.89218
,
52.37356
]
,
[
4.89205
,
52.37276
]
,
[
4.89301
,
52.37274
]
,
[
4.89392
,
52.37250
]
,
[
4.89431
,
52.37287
]
,
[
4.89331
,
52.37346
]
,
[
4.89305
,
52.37326
]
,
[
4.89218
,
52.37356
]
]]
}
}
The type
parameter indicates the type of shape that the coordinates
represent.
The list of lon/lat
points that describe the polygon.
The excess of square brackets in the example may look confusing, but the GeoJSON syntax is quite simple:
Each lon/lat
point is represented as an array:
[lon,lat]
A list of points is wrapped in an array to represent a polygon:
[[lon,lat],[lon,lat], ... ]
A shape of type polygon
can optionally contain several polygons; the
first represents the polygon proper, while any subsequent polygons represent
holes in the first:
[ [[lon,lat],[lon,lat], ... ], # main polygon [[lon,lat],[lon,lat], ... ], # hole in main polygon ... ]
See the Geo-shape mapping documentation for more details about the supported shapes.
The unusual thing about the geo_shape
query and geo_shape
filter is that
they allow us to query using shapes, rather than just points.
For instance, if our user steps out of the central train station in Amsterdam, we could find all landmarks within a 1km radius with a query like this:
GET
/attractions/landmark/_search
{
"query"
:
{
"geo_shape"
:
{
"location"
:
{
"shape"
:
{
"type"
:
"circle"
,
"radius"
:
"1km"
"coordinates"
:
[
4.89994
,
52.37815
]
}
}
}
}
}
The query looks at geo-shapes in the location
field.
The shape
key indicates that the shape is specified inline in the query.
The shape is a circle, with a radius of 1km.
This point is situated at the entrance of the central train station in Amsterdam.
By default, the query (or filter—do the same job) looks for indexed
shapes that intersect with the query shape. The relation
parameter can be
set to disjoint
to find indexed shapes that don’t intersect with the query
shape, or within
to find indexed shapes that are completely contained by the
query shape.
For instance, we could find all landmarks in the center of Amsterdam with this query:
GET
/attractions/landmark/_search
{
"query"
:
{
"geo_shape"
:
{
"location"
:
{
"relation"
:
"within"
,
"shape"
:
{
"type"
:
"polygon"
,
"coordinates"
:
[[
[
4.88330
,
52.38617
]
,
[
4.87463
,
52.37254
]
,
[
4.87875
,
52.36369
]
,
[
4.88939
,
52.35850
]
,
[
4.89840
,
52.35755
]
,
[
4.91909
,
52.36217
]
,
[
4.92656
,
52.36594
]
,
[
4.93368
,
52.36615
]
,
[
4.93342
,
52.37275
]
,
[
4.92690
,
52.37632
]
,
[
4.88330
,
52.38617
]
]]
}
}
}
}
}
With shapes that are often used in queries, it can be more convenient to store
them in the index and to refer to them by name in the query. Take our example
of central Amsterdam in the previous example. We could store it as a document
of type neighborhood
.
First, we set up the mapping in the same way as we did for landmark
:
PUT
/attractions/_mapping/neighborhood
{
"properties"
:
{
"name"
:
{
"type"
:
"string"
},
"location"
:
{
"type"
:
"geo_shape"
}
}
}
Then we can index a shape for central Amsterdam:
PUT
/attractions/neighborhood/central_amsterdam
{
"name"
:
"Central Amsterdam"
,
"location"
:
{
"type"
:
"polygon"
,
"coordinates"
:
[[
[
4.88330
,
52.38617
],
[
4.87463
,
52.37254
],
[
4.87875
,
52.36369
],
[
4.88939
,
52.35850
],
[
4.89840
,
52.35755
],
[
4.91909
,
52.36217
],
[
4.92656
,
52.36594
],
[
4.93368
,
52.36615
],
[
4.93342
,
52.37275
],
[
4.92690
,
52.37632
],
[
4.88330
,
52.38617
]
]]
}
}
After the shape is indexed, we can refer to it by index
, type
, and id
in the
query itself:
GET
/attractions/landmark/_search
{
"query"
:
{
"geo_shape"
:
{
"location"
:
{
"relation"
:
"within"
,
"indexed_shape"
:
{
"index"
:
"attractions"
,
"type"
:
"neighborhood"
,
"id"
:
"central_amsterdam"
,
"path"
:
"location"
}
}
}
}
}
By specifying indexed_shape
instead of shape
, Elasticsearch knows that
it needs to retrieve the query shape from the specified document and
path
.
There is nothing special about the shape for central Amsterdam. We could equally use our existing shape for Dam Square in queries. This query finds neighborhoods that intersect with Dam Square:
GET
/attractions/neighborhood/_search
{
"query"
:
{
"geo_shape"
:
{
"location"
:
{
"indexed_shape"
:
{
"index"
:
"attractions"
,
"type"
:
"landmark"
,
"id"
:
"dam_square"
,
"path"
:
"location"
}
}
}
}
}
The geo_shape
query and filter perform the same function. The query simply
acts as a filter: any matching documents receive a relevance _score
of
1
. Query results cannot be cached, but filter results can be.
The results are not cached by default. Just as with geo-points, any
change in the coordinates in a shape are likely to produce a different set of
geohashes, so there is little point in caching filter results. That said, if
you filter using the same shapes repeatedly, it can be worth caching the
results, by setting _cache
to true
:
GET
/attractions/neighborhood/_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"geo_shape"
:
{
"_cache"
:
true
,
"location"
:
{
"indexed_shape"
:
{
"index"
:
"attractions"
,
"type"
:
"landmark"
,
"id"
:
"dam_square"
,
"path"
:
"location"
}
}
}
}
}
}
}
3.136.22.179