Search lite—a query-string search—is useful for ad
hoc queries from the command line. To harness the full power of search,
however, you should use the request body search
API, so called because
most parameters are passed in the HTTP request body instead of in the query
string.
Request body search—henceforth known as search—not only handles the query itself, but also allows you to return highlighted snippets from your results, aggregate analytics across all results or subsets of results, and return did-you-mean suggestions, which will help guide your users to the best results quickly.
Let’s start with the simplest form of the search
API, the empty search,
which returns all documents in all indices:
GET
/
_search
{}
Just as with a query-string search, you can search on one, many, or _all
indices, and one, many, or all types:
GET
/
index_2014
*
/type1,type2/_search
{}
And you can use the from
and size
parameters for pagination:
GET
/
_search
{
"from"
:
30
,
"size"
:
10
}
We present aggregations in depth in Part IV, but for now, we’re going to focus just on the query.
Instead of the cryptic query-string approach, a request body search allows us to write queries by using the query domain-specific language, or query DSL.
The query DSL is a flexible, expressive search language that Elasticsearch uses to expose most of the power of Lucene through a simple JSON interface. It is what you should be using to write your queries in production. It makes your queries more flexible, more precise, easier to read, and easier to debug.
To use the Query DSL, pass a query in the query
parameter:
GET
/
_search
{
"query"
:
YOUR_QUERY_HERE
}
The empty search—{}
—is functionally equivalent to using the
match_all
query clause, which, as the name suggests, matches all documents:
GET
/
_search
{
"query"
:
{
"match_all"
:
{}
}
}
A query clause typically has this structure:
{
QUERY_NAME
:
{
ARGUMENT
:
VALUE
,
ARGUMENT
:
VALUE
,...
}
}
If it references one particular field, it has this structure:
{
QUERY_NAME
:
{
FIELD_NAME
:
{
ARGUMENT
:
VALUE
,
ARGUMENT
:
VALUE
,...
}
}
}
For instance, you can use a match
query clause to find tweets that
mention elasticsearch
in the tweet
field:
{
"match"
:
{
"tweet"
:
"elasticsearch"
}
}
The full search request would look like this:
GET
/
_search
{
"query"
:
{
"match"
:
{
"tweet"
:
"elasticsearch"
}
}
}
Query clauses are simple building blocks that can be combined with each other to create complex queries. Clauses can be as follows:
Leaf clauses (like the match
clause) that are used to
compare a field (or fields) to a query string.
Compound clauses that are used to combine other query clauses.
For instance, a bool
clause allows you to combine other clauses that
either must
match, must_not
match, or should
match if possible:
{
"bool"
:
{
"must"
:
{
"match"
:
{
"tweet"
:
"elasticsearch"
}},
"must_not"
:
{
"match"
:
{
"name"
:
"mary"
}},
"should"
:
{
"match"
:
{
"tweet"
:
"full text"
}}
}
}
It is important to note that a compound clause can combine any other query clauses, including other compound clauses. This means that compound clauses can be nested within each other, allowing the expression of very complex logic.
As an example, the following query looks for emails that contain
business opportunity
and should either be starred, or be both in the Inbox
and not marked as spam:
{
"bool"
:
{
"must"
:
{
"match"
:
{
"email"
:
"business opportunity"
}},
"should"
:
[
{
"match"
:
{
"starred"
:
true
}},
{
"bool"
:
{
"must"
:
{
"folder"
:
"inbox"
}},
"must_not"
:
{
"spam"
:
true
}}
}}
],
"minimum_should_match"
:
1
}
}
Don’t worry about the details of this example yet; we will explain in full later. The important thing to take away is that a compound query clause can combine multiple clauses—both leaf clauses and other compound clauses—into a single query.
Although we refer to the query DSL, in reality there are two DSLs: the query DSL and the filter DSL. Query clauses and filter clauses are similar in nature, but have slightly different purposes.
A filter asks a yes|no question of every document and is used for fields that contain exact values:
Is the created
date in the range 2013
- 2014
?
Does the status
field contain the term published
?
Is the lat_lon
field within 10km
of a specified point?
A query is similar to a filter, but also asks the question: How well does this document match?
A typical use for a query is to find documents
Best matching the words full text search
Containing the word run
, but maybe also matching runs
, running
,
jog
, or sprint
Containing the words quick
, brown
, and fox
—the closer together they
are, the more relevant the document
Tagged with lucene
, search
, or java
—the more tags, the more
relevant the document
A query calculates how relevant each document is to the
query, and assigns it a relevance _score
, which is later used to
sort matching documents by relevance. This concept of relevance is
well suited to full-text search, where there is seldom a completely
“correct” answer.
The output from most filter clauses—a simple list of the documents that match the filter—is quick to calculate and easy to cache in memory, using only 1 bit per document. These cached filters can be reused efficiently for subsequent requests.
Queries have to not only find matching documents, but also calculate how relevant each document is, which typically makes queries heavier than filters. Also, query results are not cachable.
Thanks to the inverted index, a simple query that matches just a few documents may perform as well or better than a cached filter that spans millions of documents. In general, however, a cached filter will outperform a query, and will do so consistently.
The goal of filters is to reduce the number of documents that have to be examined by the query.
While Elasticsearch comes with many queries and filters, you will use just a few frequently. We discuss them in much greater detail in Part II but next we give you a quick introduction to the most important queries and filters.
The exists
and missing
filters are used to find documents in which the
specified field either has one or more values (exists
) or doesn’t have any
values (missing
). It is similar in nature to IS_NULL
(missing
) and NOT
IS_NULL
(exists
)in SQL:
{
"exists"
:
{
"field"
:
"title"
}
}
These filters are frequently used to apply a condition only if a field is present, and to apply a different condition if it is missing.
The bool
filter is used to combine multiple filter clauses using
Boolean logic. It accepts three parameters:
must
These clauses must match, like and
.
must_not
These clauses must not match, like not
.
should
At least one of these clauses must match, like or
.
Each of these parameters can accept a single filter clause or an array of filter clauses:
{
"bool"
:
{
"must"
:
{
"term"
:
{
"folder"
:
"inbox"
}},
"must_not"
:
{
"term"
:
{
"tag"
:
"spam"
}},
"should"
:
[
{
"term"
:
{
"starred"
:
true
}},
{
"term"
:
{
"unread"
:
true
}}
]
}
}
The match_all
query simply matches all documents. It is the default
query that is used if no query has been specified:
{
"match_all"
:
{}}
This query is frequently used in combination with a filter—for instance, to
retrieve all emails in the inbox folder. All documents are considered to be
equally relevant, so they all receive a neutral _score
of 1
.
The match
query should be the standard query that you reach for whenever
you want to query for a full-text or exact value in almost any field.
If you run a match
query against a full-text field, it will analyze
the query string by using the correct analyzer for that field before executing
the search:
{
"match"
:
{
"tweet"
:
"About Search"
}}
If you use it on a field containing an exact value, such as a number, a date,
a Boolean, or a not_analyzed
string field, then it will search for that
exact value:
{
"match"
:
{
"age"
:
26
}}
{
"match"
:
{
"date"
:
"2014-09-01"
}}
{
"match"
:
{
"public"
:
true
}}
{
"match"
:
{
"tag"
:
"full_text"
}}
Unlike the query-string search that we showed in “Search Lite”, the match
query does not use a query syntax like +user_id:2 +tweet:search
. It just
looks for the words that are specified. This means that it is safe to expose
to your users via a search field; you control what fields they can query, and
it is not prone to throwing syntax errors.
The bool
query, like the bool
filter, is used to combine multiple
query clauses. However, there are some differences. Remember that while
filters give binary yes/no answers, queries calculate a relevance score
instead. The bool
query combines the _score
from each must
or
should
clause that matches. This query accepts the following parameters:
must
Clauses that must match for the document to be included.
must_not
Clauses that must not match for the document to be included.
should
If these clauses match, they increase the _score
;
otherwise, they have no effect. They are simply used to refine
the relevance score for each document.
The following query finds documents whose title
field matches
the query string how to make millions
and that are not marked
as spam
. If any documents are starred
or are from 2014 onward,
they will rank higher than they would have otherwise. Documents that
match both conditions will rank even higher:
{
"bool"
:
{
"must"
:
{
"match"
:
{
"title"
:
"how to make millions"
}},
"must_not"
:
{
"match"
:
{
"tag"
:
"spam"
}},
"should"
:
[
{
"match"
:
{
"tag"
:
"starred"
}},
{
"range"
:
{
"date"
:
{
"gte"
:
"2014-01-01"
}}}
]
}
}
must
clauses, at least one should
clause has to
match. However, if there is at least one must
clause, no should
clauses
are required to match.
Queries can be used in query context, and filters can be used
in filter context. Throughout the Elasticsearch API, you will see parameters
with query
or filter
in the name. These
expect a single argument containing either a single query or filter clause
respectively. In other words, they establish the
outer context as query context or filter context.
Compound query clauses can wrap other query clauses, and compound filter clauses can wrap other filter clauses. However, it is often useful to apply a filter to a query or, less frequently, to use a full-text query as a filter.
To do this, there are dedicated query clauses that wrap filter clauses, and vice versa, thus allowing us to switch from one context to another. It is important to choose the correct combination of query and filter clauses to achieve your goal in the most efficient way.
{
"match"
:
{
"email"
:
"business opportunity"
}}
We want to combine it with the following term
filter, which will
match only documents that are in our inbox:
{
"term"
:
{
"folder"
:
"inbox"
}}
The search
API accepts only a single query
parameter, so we need
to wrap the query and the filter in another query, called the filtered
query:
{
"filtered"
:
{
"query"
:
{
"match"
:
{
"email"
:
"business opportunity"
}},
"filter"
:
{
"term"
:
{
"folder"
:
"inbox"
}}
}
}
We can now pass this query to the query
parameter of the search
API:
GET
/
_search
{
"query"
:
{
"filtered"
:
{
"query"
:
{
"match"
:
{
"email"
:
"business opportunity"
}},
"filter"
:
{
"term"
:
{
"folder"
:
"inbox"
}}
}
}
}
While in query context, if you need to use a filter without a query (for instance, to match all emails in the inbox), you can just omit the query:
GET
/
_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"term"
:
{
"folder"
:
"inbox"
}}
}
}
}
If a query is not specified it defaults to using the match_all
query, so
the preceding query is equivalent to the following:
GET
/
_search
{
"query"
:
{
"filtered"
:
{
"query"
:
{
"match_all"
:
{}},
"filter"
:
{
"term"
:
{
"folder"
:
"inbox"
}}
}
}
}
Occasionally, you will want to use a query while you are in filter context.
This can be achieved with the query
filter, which just wraps a query. The following
example shows one way we could exclude emails that look like spam:
GET
/
_search
{
"query"
:
{
"filtered"
:
{
"filter"
:
{
"bool"
:
{
"must"
:
{
"term"
:
{
"folder"
:
"inbox"
}},
"must_not"
:
{
"query"
:
{
"match"
:
{
"email"
:
"urgent business proposal"
}
}
}
}
}
}
}
}
Queries can become quite complex and, especially when combined with
different analyzers and field mappings, can become a bit difficult to follow.
The validate-query
API can be used to check whether a query is valid.
GET
/
gb
/
tweet
/
_validate
/
query
{
"query"
:
{
"tweet"
:
{
"match"
:
"really powerful"
}
}
}
The response to the preceding validate
request tells us that the query is
invalid:
{
"valid"
:
false
,
"_shards"
:
{
"total"
:
1
,
"successful"
:
1
,
"failed"
:
0
}
}
To find out why it is invalid, add the explain
parameter to the query
string:
GET
/
gb
/
tweet
/
_validate
/
query
?
explain
{
"query"
:
{
"tweet"
:
{
"match"
:
"really powerful"
}
}
}
Apparently, we’ve mixed up the type of query (match
) with the name
of the field (tweet
):
{
"valid"
:
false
,
"_shards"
:
{
...
},
"explanations"
:
[
{
"index"
:
"gb"
,
"valid"
:
false
,
"error"
:
"org.elasticsearch.index.query.QueryParsingException:
[gb] No query registered for [tweet]"
}
]
}
Using the explain
parameter has the added advantage of returning
a human-readable description of the (valid) query, which can be useful for
understanding exactly how your query has been interpreted by Elasticsearch:
GET
/
_validate
/
query
?
explain
{
"query"
:
{
"match"
:
{
"tweet"
:
"really powerful"
}
}
}
An explanation
is returned for each index that we query, because each
index can have different mappings and analyzers:
{
"valid"
:
true
,
"_shards"
:
{
...
},
"explanations"
:
[
{
"index"
:
"us"
,
"valid"
:
true
,
"explanation"
:
"tweet:really tweet:powerful"
},
{
"index"
:
"gb"
,
"valid"
:
true
,
"explanation"
:
"tweet:realli tweet:power"
}
]
}
From the explanation
, you can see how the match
query for the query string
really powerful
has been rewritten as two single-term queries against
the tweet
field, one for each term.
Also, for the us
index, the two terms are really
and powerful
, while
for the gb
index, the terms are realli
and power
. The reason
for this is that we changed the tweet
field in the gb
index to use the
english
analyzer.
18.191.176.5