In the previous recipes, we have seen several type of queries that use text to match the results. The query string query is a special type of query that allows defining complex queries by mixing the field rules.
It uses the Lucene query parser to parse text to complex queries.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via the command line, you need to install curl
for your operative system.
To correctly execute the following commands, you need an index populated with the chapter_05/populate_query.sh
script available in the online code.
For executing a query_string
query, we will perform the following steps:
nice guy
, but with a condition of discarding the term not
and displaying a price lesser than 5
. The query will be:curl -XPOST 'http://127.0.0.1:9200/test-index/test- type/_search?pretty=true' -d '{ "query": { "query_string": { "query": ""nice guy" -parsedtext:not price:{ * TO 5 } ", "fields": [ "parsedtext^5" ], "default_operator": "and" } } }'
{ "took" : 17, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 3.8768208, "hits" : [ { "_index" : "test-index", "_type" : "test-type", "_id" : "1", "_score" : 3.8768208, "_source" : { "position" : 1, "parsedtext" : "Joe Testere nice guy", "name" : "Joe Tester", "uuid" : "11111", "price" : 4.0 } } ] } }
The query_string
query is one of the most powerful types of queries. The only required field is query
that contains the query that must be parsed with Lucene query parser (For more information, refer to the link: http://lucene.apache.org/core/6_2_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html.
Lucene query parser is able to analyze a complex query syntax and convert it in many of the query types that we have seen in the previous recipes.
The optional parameters that can be passed to the query string query are:
default_field
: This defines the default field to be used to the query. It can also be set at index level defining the index
property index.query.default_field
(default _all
).fields
: This defines a list of fields to be used. It replaces the default_field
. The fields
parameter also allows using wildcards as values. (that is, city.*
).default_operator
: This is the default operator to be used for text in query
parameter (the default OR
; the available values are AND
and OR
).analyzer
This is the analyzer that must be used for query string.allow_leading_wildcard
: Here, the *
and ?
wildcards are allowed as first characters. Using similar wildcards gives performance penalties (default true
).lowercase_expanded_terms
: This controls if all expansion terms (generated by fuzzy, range, wildcard, and prefix) must be lowercased (default true
).enable_position_increments
: This enables the position increment in queries. For every query token, the positional value is incremented by 1
(default true
).fuzzy_max_expansions
: This controls the number of terms to be used in fuzzy term expansion (default 50
).fuzziness
: This sets the fuzziness value for fuzzy queries (default AUTO
).fuzzy_prefix_length
: This sets the prefix length for fuzzy queries (default 0
).phrase_slop
: This sets the default slop (number of optional terms that can be present in the middle of the given terms) for phrases. If it sets to zero, the query is an exact phrase match (default 0
).boost
: This defines the boost value of the query (default 1.0
).analyze_wildcard
: This enables the processing of wildcard terms in the query (default false
).auto_generate_phrase_queries
: This enables the autogeneration of phrase queries from the query string (default false
).minimum_should_match
: This controls how many should
clauses should be verified to match the result. The value could be an integer value (that is, 3) or a percentage (that is, 40%) or a combination of both (default 1
).lenient
: If it's set to true, the parser will ignore all format-based failures (such as text to number of date conversion) (default false
).locale
: This is the locale used for string conversion (default ROOT
).The query parser is very powerful to support a wide range of complex queries. The most common cases are:
field:text
: This is used to match a field that contains some text. It's mapped on a term query.field:(term1 OR term2)
: This is used to match some terms in OR
. It's mapped on a terms query.field:"text"
: This is used to match the exact text. It's mapped on a match query._exists_:field
: This is used to match documents that have a field. It's mapped on an exists filter._missing_:field
: This is used to match documents that don't have a field. It's mapped on a missing filter.field:[start TO end]
: This is used to match a range from the start
value to the end
value. The start
and end
values could be terms, numbers, or a valid datetime value. The start
and end
values are included in the range; if you want to exclude a range, you must replace the []
delimiters with {}
.field:/regex/
: This is used to match a regular express.The query parser also supports text modifier, used to manipulate the text functionalities. The most used ones are:
text~
. The default fuzziness value is 2
, which allows a Damerau-Levenshtein edit-distance algorithm (http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) of 2.?
that replace a single character or *
to replace zero or more characters. (that is, b?ll
or bi*
to match bill)."term1 term2"~3
, allows matching phrase terms with defined slop. (that is, "my umbrella"~3
matches "my green umbrella"
, "my new umbrella"
, and so on).18.118.163.142