Multivalue buckets—the terms
, histogram
, and date_histogram
—dynamically produce many buckets. How does Elasticsearch decide the order that
these buckets are presented to the user?
By default, buckets are ordered by doc_count
in descending order. This is a
good default because often we want to find the documents that maximize some
criteria: price, population, frequency. But sometimes you’ll want to modify this sort order, and there are a few ways to
do it, depending on the bucket.
These sort modes are intrinsic to the bucket: they operate on data that bucket
generates, such as doc_count
. They share the same syntax but differ slightly
depending on the bucket being used.
Let’s perform a terms
aggregation but sort by doc_count
, in ascending order:
GET
/
cars
/
transactions
/
_search
?
search_type
=
count
{
"aggs"
:
{
"colors"
:
{
"terms"
:
{
"field"
:
"color"
,
"order"
:
{
"_count"
:
"asc"
}
}
}
}
}
We introduce an order
object into the aggregation, which allows us to sort on
one of several values:
_count
Sort by document count. Works with terms
, histogram
, date_histogram
.
_term
Sort by the string value of a term alphabetically. Works only with terms
.
_key
Sort by the numeric value of each bucket’s key (conceptually similar to _term
).
Works only with histogram
and date_histogram
.
Often, you’ll find yourself wanting to sort based on a metric’s calculated value. For our car sales analytics dashboard, we may want to build a bar chart of sales by car color, but order the bars by the average price, ascending.
We can do this by adding a metric to our bucket, and then referencing that
metric from the order
parameter:
GET
/
cars
/
transactions
/
_search
?
search_type
=
count
{
"aggs"
:
{
"colors"
:
{
"terms"
:
{
"field"
:
"color"
,
"order"
:
{
"avg_price"
:
"asc"
}
},
"aggs"
:
{
"avg_price"
:
{
"avg"
:
{
"field"
:
"price"
}
}
}
}
}
}
The average price is calculated for each bucket.
Then the buckets are ordered by the calculated average in ascending order.
This lets you override the sort order with any metric, simply by referencing
the name of the metric. Some metrics, however, emit multiple values. The
extended_stats
metric is a good example: it provides half a dozen individual
metrics.
If you want to sort on a multivalue metric, you just need to use the dot-path to the metric of interest:
GET
/
cars
/
transactions
/
_search
?
search_type
=
count
{
"aggs"
:
{
"colors"
:
{
"terms"
:
{
"field"
:
"color"
,
"order"
:
{
"stats.variance"
:
"asc"
}
},
"aggs"
:
{
"stats"
:
{
"extended_stats"
:
{
"field"
:
"price"
}
}
}
}
}
}
In this example we are sorting on the variance of each bucket, so that colors with the least variance in price will appear before those that have more variance.
In the prior examples, the metric was a direct child of the bucket. An average price was calculated for each term. It is possible to sort on deeper metrics, which are grandchildren or great-grandchildren of the bucket—with some limitations.
You can define a path to a deeper, nested metric by using angle brackets (>
), like
so: my_bucket>another_bucket>metric
.
The caveat is that each nested bucket in the path must be a single-value bucket.
A filter
bucket produces a single bucket: all documents that match the
filtering criteria. Multivalue buckets (such as terms
) generate many
dynamic buckets, which makes it impossible to specify a deterministic path.
Currently, there are only three single-value buckets: filter
, global
, and reverse_nested
. As
a quick example, let’s build a histogram of car prices, but order the buckets
by the variance in price of red and green (but not blue) cars in each price range:
GET
/
cars
/
transactions
/
_search
?
search_type
=
count
{
"aggs"
:
{
"colors"
:
{
"histogram"
:
{
"field"
:
"price"
,
"interval"
:
20000
,
"order"
:
{
"red_green_cars>stats.variance"
:
"asc"
}
},
"aggs"
:
{
"red_green_cars"
:
{
"filter"
:
{
"terms"
:
{
"color"
:
[
"red"
,
"green"
]}},
"aggs"
:
{
"stats"
:
{
"extended_stats"
:
{
"field"
:
"price"
}}
}
}
}
}
}
}
Sort the buckets generated by the histogram according to the variance of a nested metric.
Because we are using a single-value filter
, we can use nested sorting.
Sort on the stats generated by this metric.
In this example, you can see that we are accessing a nested metric. The stats
metric is a child of red_green_cars
, which is in turn a child of colors
. To
sort on that metric, we define the path as red_green_cars>stats.variance
.
This is allowed because the filter
bucket is a single-value bucket.
3.15.220.16