Given the fact that creating, deleting, and updating a single document in
Elasticsearch is atomic, it makes sense to store closely related entities
within the same document. For instance, we could store an order and all of
its order lines in one document, or we could store a blog post and all of its
comments together, by passing an array of comments
:
PUT
/my_index/blogpost/
1
{
"title"
:
"Nest eggs"
,
"body"
:
"Making your money work..."
,
"tags"
:
[
"cash"
,
"shares"
],
"comments"
:
[
{
"name"
:
"John Smith"
,
"comment"
:
"Great article"
,
"age"
:
28
,
"stars"
:
4
,
"date"
:
"2014-09-01"
}
,
{
"name"
:
"Alice White"
,
"comment"
:
"More like this please"
,
"age"
:
31
,
"stars"
:
5
,
"date"
:
"2014-10-22"
}
]
}
If we rely on dynamic mapping, the comments
field will be autocreated as an object
field.
Because all of the content is in the same document, there is no need to join blog posts and comments at query time, so searches perform well.
The problem is that the preceding document would match a query like this:
GET
/_search
{
"query"
:
{
"bool"
:
{
"must"
:
[
{
"match"
:
{
"name"
:
"Alice"
}},
{
"match"
:
{
"age"
:
28
}}
]
}
}
}
The reason for this cross-object matching, as discussed in “Arrays of Inner Objects”, is that our beautifully structured JSON document is flattened into a simple key-value format in the index that looks like this:
{
"title"
:
[
eggs
,
nest
],
"body"
:
[
making
,
money
,
work
,
your
],
"tags"
:
[
cash
,
shares
],
"comments.name"
:
[
alice
,
john
,
smith
,
white
],
"comments.comment"
:
[
article
,
great
,
like
,
more
,
please
,
this
],
"comments.age"
:
[
28
,
31
],
"comments.stars"
:
[
4
,
5
],
"comments.date"
:
[
2014-09-01
,
2014-10-22
]
}
The correlation between Alice
and 31
, or between John
and 2014-09-01
, has been irretrievably lost. While fields of type object
(see
“Multilevel Objects”) are useful for storing a single object, they are useless,
from a search point of view, for storing an array of objects.
This is the problem that nested objects are designed to solve. By mapping
the commments
field as type nested
instead of type object
, each nested
object is indexed as a hidden separate document, something like this:
{
"comments.name"
:
[
john
,
smith
]
,
"comments.comment"
:
[
article
,
great
]
,
"comments.age"
:
[
28
]
,
"comments.stars"
:
[
4
]
,
"comments.date"
:
[
2014-09-01
]
}
{
"comments.name"
:
[
alice
,
white
]
,
"comments.comment"
:
[
like
,
more
,
please
,
this
]
,
"comments.age"
:
[
31
]
,
"comments.stars"
:
[
5
]
,
"comments.date"
:
[
2014-10-22
]
}
{
"title"
:
[
eggs
,
nest
]
,
"body"
:
[
making
,
money
,
work
,
your
]
,
"tags"
:
[
cash
,
shares
]
}
By indexing each nested object separately, the fields within the object maintain their relationships. We can run queries that will match only if the match occurs within the same nested object.
Not only that, because of the way that nested objects are indexed, joining the nested documents to the root document at query time is fast—almost as fast as if they were a single document.
These extra nested documents are hidden; we can’t access them directly. To update, add, or remove a nested object, we have to reindex the whole document. It’s important to note that, the result returned by a search request is not the nested object alone; it is the whole document.
Setting up a nested
field is simple—where you would normally specify type
object
, make it type nested
instead:
PUT
/my_index
{
"mappings"
:
{
"blogpost"
:
{
"properties"
:
{
"comments"
:
{
"type"
:
"nested"
,
"properties"
:
{
"name"
:
{
"type"
:
"string"
},
"comment"
:
{
"type"
:
"string"
},
"age"
:
{
"type"
:
"short"
},
"stars"
:
{
"type"
:
"short"
},
"date"
:
{
"type"
:
"date"
}
}
}
}
}
}
}
That’s all that is required. Any comments
objects would now be indexed as
separate nested documents. See the
nested
type reference docs for more.
Because nested objects are indexed as separate hidden documents, we can’t
query them directly. Instead, we have to use the
nested
query or
nested
filter to access them:
GET
/my_index/blogpost/_search
{
"query"
:
{
"bool"
:
{
"must"
:
[
{
"match"
:
{
"title"
:
"eggs"
}},
{
"nested"
:
{
"path"
:
"comments"
,
"query"
:
{
"bool"
:
{
"must"
:
[
{
"match"
:
{
"comments.name"
:
"john"
}}
,
{
"match"
:
{
"comments.age"
:
28
}}
]
}}}}
]
}}}
The title
clause operates on the root document.
The nested
clause “steps down” into the nested comments
field.
It no longer has access to fields in the root document, nor fields in
any other nested document.
The comments.name
and comments.age
clauses operate on the same nested
document.
A nested
field can contain other nested
fields. Similarly, a nested
query can contain other nested
queries. The nesting hierarchy is applied
as you would expect.
Of course, a nested
query could match several nested documents.
Each matching nested document would have its own relevance score, but these
multiple scores need to be reduced to a single score that can be applied to
the root document.
By default, it averages the scores of the matching nested documents. This can
be controlled by setting the score_mode
parameter to avg
, max
, sum
, or
even none
(in which case the root document gets a constant score of 1.0
).
GET
/my_index/blogpost/_search
{
"query"
:
{
"bool"
:
{
"must"
:
[
{
"match"
:
{
"title"
:
"eggs"
}},
{
"nested"
:
{
"path"
:
"comments"
,
"score_mode"
:
"max"
,
"query"
:
{
"bool"
:
{
"must"
:
[
{
"match"
:
{
"comments.name"
:
"john"
}},
{
"match"
:
{
"comments.age"
:
28
}}
]
}}
}}
]
}}}
A nested
filter behaves much like a nested
query, except that it doesn’t
accept the score_mode
parameter. It can be used only in filter context—such as inside a filtered
query—and it behaves like any other filter:
it includes or excludes, but it doesn’t score.
While the results of the nested
filter itself are not cached, the usual
caching rules apply to the filter inside the nested
filter.
It is possible to sort by the value of a nested field, even though the value exists in a separate nested document. To make the result more interesting, we will add another record:
PUT
/my_index/blogpost/
2
{
"title"
:
"Investment secrets"
,
"body"
:
"What they don't tell you ..."
,
"tags"
:
[
"shares"
,
"equities"
],
"comments"
:
[
{
"name"
:
"Mary Brown"
,
"comment"
:
"Lies, lies, lies"
,
"age"
:
42
,
"stars"
:
1
,
"date"
:
"2014-10-18"
},
{
"name"
:
"John Smith"
,
"comment"
:
"You're making it up!"
,
"age"
:
28
,
"stars"
:
2
,
"date"
:
"2014-10-16"
}
]
}
Imagine that we want to retrieve blog posts that received comments in October,
ordered by the lowest number of stars
that each blog post received. The
search request would look like this:
GET
/_search
{
"query"
:
{
"nested"
:
{
"path"
:
"comments"
,
"filter"
:
{
"range"
:
{
"comments.date"
:
{
"gte"
:
"2014-10-01"
,
"lt"
:
"2014-11-01"
}
}
}
}
},
"sort"
:
{
"comments.stars"
:
{
"order"
:
"asc"
,
"mode"
:
"min"
,
"nested_filter"
:
{
"range"
:
{
"comments.date"
:
{
"gte"
:
"2014-10-01"
,
"lt"
:
"2014-11-01"
}
}
}
}
}
}
The nested
query limits the results to blog posts that received a
comment in October.
Results are sorted in ascending (asc
) order by the lowest value (min
)
in the comment.stars
field in any matching comments.
The nested_filter
in the sort clause is the same as the nested
query in
the main query
clause. The reason is explained next.
Why do we need to repeat the query conditions in the nested_filter
? The
reason is that sorting happens after the query has been executed. The query
matches blog posts that received comments in October, but it returns
blog post documents as the result. If we didn’t include the nested_filter
clause, we would end up sorting based on any comments that the blog post has
ever received, not just those received in October.
In the same way as we need to use the special nested
query to gain access to
nested objects at search time, the dedicated nested
aggregation allows us to
aggregate fields in nested objects:
GET
/my_index/blogpost/_search?search_type=count
{
"aggs"
:
{
"comments"
:
{
"nested"
:
{
"path"
:
"comments"
}
,
"aggs"
:
{
"by_month"
:
{
"date_histogram"
:
{
"field"
:
"comments.date"
,
"interval"
:
"month"
,
"format"
:
"yyyy-MM"
},
"aggs"
:
{
"avg_stars"
:
{
"avg"
:
{
"field"
:
"comments.stars"
}
}
}
}
}
}
}
}
The nested
aggregation “steps down” into the nested comments
object.
Comments are bucketed into months based on the comments.date
field.
The average number of stars is calculated for each bucket.
The results show that aggregation has happened at the nested document level:
...
"aggregations"
:
{
"comments"
:
{
"doc_count"
:
4
,
"by_month"
:
{
"buckets"
:
[
{
"key_as_string"
:
"2014-09"
,
"key"
:
1409529600000
,
"doc_count"
:
1
,
"avg_stars"
:
{
"value"
:
4
}
},
{
"key_as_string"
:
"2014-10"
,
"key"
:
1412121600000
,
"doc_count"
:
3
,
"avg_stars"
:
{
"value"
:
2.6666666666666665
}
}
]
}
}
}
...
A nested
aggregation can access only the fields within the nested document.
It can’t see fields in the root document or in a different nested document.
However, we can step out of the nested scope back into the parent with a
reverse_nested
aggregation.
For instance, we can find out which tags
our commenters are interested in,
based on the age of the commenter. The comment.age
is a nested field, while
the tags
are in the root document:
GET
/my_index/blogpost/_search?search_type=count
{
"aggs"
:
{
"comments"
:
{
"nested"
:
{
"path"
:
"comments"
},
"aggs"
:
{
"age_group"
:
{
"histogram"
:
{
"field"
:
"comments.age"
,
"interval"
:
10
},
"aggs"
:
{
"blogposts"
:
{
"reverse_nested"
:
{},
"aggs"
:
{
"tags"
:
{
"terms"
:
{
"field"
:
"tags"
}
}
}
}
}
}
}
}
}
}
The nested
agg steps down into the comments
object.
The histogram
agg groups on the comments.age
field, in buckets
of 10 years.
The reverse_nested
agg steps back up to the root document.
The terms
agg counts popular terms per age group of the commenter.
The abbreviated results show us the following:
..
"aggregations"
:
{
"comments"
:
{
"doc_count"
:
4
,
"age_group"
:
{
"buckets"
:
[
{
"key"
:
20
,
"doc_count"
:
2
,
"blogposts"
:
{
"doc_count"
:
2
,
"tags"
:
{
"doc_count_error_upper_bound"
:
0
,
"buckets"
:
[
{
"key"
:
"shares"
,
"doc_count"
:
2
}
,
{
"key"
:
"cash"
,
"doc_count"
:
1
}
,
{
"key"
:
"equities"
,
"doc_count"
:
1
}
]
}
}
},
...
Nested objects are useful when there is one main entity, like our blogpost
,
with a limited number of closely related but less important entities, such as
comments. It is useful to be able to find blog posts based on the content of
the comments, and the nested
query and filter provide for fast query-time
joins.
The disadvantages of the nested model are as follows:
To add, change, or delete a nested document, the whole document must be reindexed. This becomes more costly the more nested documents there are.
Search requests return the whole document, not just the matching nested documents. Although there are plans afoot to support returning the best -matching nested documents with the root document, this is not yet supported.
Sometimes you need a complete separation between the main document and its associated entities. This separation is provided by the parent-child relationship.
3.146.176.88