Using a query string query

In the previous recipes, we have seen several type of queries that use text to match the results. The query string query is a special type of query that allows defining complex queries by mixing the field rules.

It uses the Lucene query parser to parse text to complex queries.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

To correctly execute the following commands, you need an index populated with the chapter_05/populate_query.sh script available in the online code.

How to do it...

For executing a query_string query, we will perform the following steps:

  1. We want to search for text nice guy, but with a condition of discarding the term not and displaying a price lesser than 5. The query will be:
            curl -XPOST 'http://127.0.0.1:9200/test-index/test-
            type/_search?pretty=true' -d '{
              "query": {
                "query_string": {
                  "query": ""nice guy" -parsedtext:not price:{ * TO 5 } ",
                  "fields": [
                    "parsedtext^5"
                  ],
                  "default_operator": "and"
                }
              }
            }'
    
  2. The result returned by Elasticsearch, if everything is alright, should be:
            {
              "took" : 17,
              "timed_out" : false,
              "_shards" : {
                "total" : 5,
                "successful" : 5,
                "failed" : 0
              },  
              "hits" : {
                "total" : 1,
                "max_score" : 3.8768208,
                "hits" : [
                  {
                    "_index" : "test-index",
                    "_type" : "test-type",
                    "_id" : "1",
                    "_score" : 3.8768208,
                    "_source" : {
                     "position" : 1,
                      "parsedtext" : "Joe Testere nice guy",
                      "name" : "Joe Tester",
                      "uuid" : "11111",
                      "price" : 4.0
                    }
                  }
                ]
              }
            }
    

How it works...

The query_string query is one of the most powerful types of queries. The only required field is query that contains the query that must be parsed with Lucene query parser (For more information, refer to the link: http://lucene.apache.org/core/6_2_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html.

Lucene query parser is able to analyze a complex query syntax and convert it in many of the query types that we have seen in the previous recipes.

The optional parameters that can be passed to the query string query are:

  • default_field: This defines the default field to be used to the query. It can also be set at index level defining the index property index.query.default_field (default _all).
  • fields: This defines a list of fields to be used. It replaces the default_field. The fields parameter also allows using wildcards as values. (that is, city.*).
  • default_operator: This is the default operator to be used for text in query parameter (the default OR; the available values are AND and OR).
  • analyzer This is the analyzer that must be used for query string.
  • allow_leading_wildcard: Here, the * and ? wildcards are allowed as first characters. Using similar wildcards gives performance penalties (default true).
  • lowercase_expanded_terms: This controls if all expansion terms (generated by fuzzy, range, wildcard, and prefix) must be lowercased (default true).
  • enable_position_increments: This enables the position increment in queries. For every query token, the positional value is incremented by 1 (default true).
  • fuzzy_max_expansions: This controls the number of terms to be used in fuzzy term expansion (default 50).
  • fuzziness: This sets the fuzziness value for fuzzy queries (default AUTO).
  • fuzzy_prefix_length: This sets the prefix length for fuzzy queries (default 0).
  • phrase_slop: This sets the default slop (number of optional terms that can be present in the middle of the given terms) for phrases. If it sets to zero, the query is an exact phrase match (default 0).
  • boost: This defines the boost value of the query (default 1.0).
  • analyze_wildcard: This enables the processing of wildcard terms in the query (default false).
  • auto_generate_phrase_queries: This enables the autogeneration of phrase queries from the query string (default false).
  • minimum_should_match: This controls how many should clauses should be verified to match the result. The value could be an integer value (that is, 3) or a percentage (that is, 40%) or a combination of both (default 1).
  • lenient: If it's set to true, the parser will ignore all format-based failures (such as text to number of date conversion) (default false).
  • locale: This is the locale used for string conversion (default ROOT).

There's more...

The query parser is very powerful to support a wide range of complex queries. The most common cases are:

  • field:text: This is used to match a field that contains some text. It's mapped on a term query.
  • field:(term1 OR term2): This is used to match some terms in OR. It's mapped on a terms query.
  • field:"text": This is used to match the exact text. It's mapped on a match query.
  • _exists_:field: This is used to match documents that have a field. It's mapped on an exists filter.
  • _missing_:field: This is used to match documents that don't have a field. It's mapped on a missing filter.
  • field:[start TO end]: This is used to match a range from the start value to the end value. The start and end values could be terms, numbers, or a valid datetime value. The start and end values are included in the range; if you want to exclude a range, you must replace the [] delimiters with {}.
  • field:/regex/: This is used to match a regular express.

The query parser also supports text modifier, used to manipulate the text functionalities. The most used ones are:

  • Fuzziness using the form text~. The default fuzziness value is 2, which allows a Damerau-Levenshtein edit-distance algorithm (http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) of 2.
  • Wildcards with ? that replace a single character or * to replace zero or more characters. (that is, b?ll or bi* to match bill).
  • Proximity search "term1 term2"~3, allows matching phrase terms with defined slop. (that is, "my umbrella"~3 matches "my green umbrella", "my new umbrella", and so on).

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.163.142