The match keyword

There are four kinds of queries under the match keyword:

  • match: The query string is tokenized into tokens to match the search text fields and then a Boolean operator, and|or (default = or), logically groups all matches to compute the final result. Let's look at an example of using the default operator, or , to find the ETFs with the fund_name field that contains either one, ishares or global word. The following screenshot shows that there are 75 hits in total:

Another example is to use the operator parameter with the and option to find the ETFs with the fund_name field that contains both words. The following screenshot shows that there are 4 hits in total:

Elasticsearch provides a fuzzy matching feature for matching queries by using the fuzziness parameter with the value in the  Levenshtein edit distance based on the length of the tokens.

Let's look at an example where we have fuzziness=2 (this permits two edits to the token). In the following screenshot, you can see 5 hits, which is one hit more than the result without fuzziness. The extra ETF result, fund_name=iShares J.P. Morgan EM Local Currency Bond ETF, is provided from the fuzzy matching because the word global can be changed to local with two edits:

Another parameter is zero_terms_query. When the analyzer is unable to generate a token, no hits are returned. You can use the all option to return all documents instead of none. Another useful parameter is cutoff_frequency. This provides a feature to identify the importance of a token based on occurrence. Without applying a special tokenizer or token filter, most high-frequency tokens generated are the stop words.

Consequently, high-frequency words are less important, and vice versa. Low-frequency tokens are used to match the search criteria, where high-frequency tokens will increase the weight of the scoring. When you specify a fraction number to cutoff_frequency, this means the ratio of occurrences of the token vis-à-vis the total number of documents in the index. When you specify a number ≥ 1, this indicates an absolute occurrence. Let's look at an example to explain the cutoff_frequency with the absolute value.

We set the cutoff_frequency to 10, which defines the occurrence of the tokens of higher importance as being less than 10. The query is to match the fund_name field with the word usa, global, or emerging. 

In the response, there are six documents with fund_name fields associated with the word usa. None of the results are associated with the words global and emerging, since both words occur with a high frequency and are treated as being less important.

The following screenshot shows what happens when the earlier steps are implemented:

  • match_phrase: The query string is analyzed into tokens and then generates a phrase to match the search text fields. Let's look at an example of using match_phrase to retrieve all iShares msci ETFs. In the following screenshot, 18 ETFs have a name with that phrase. The match_phrase query supports a slop parameter to allow you to specify the tolerance of the gap between the matched terms in the query string. The gap refers to the allowed number of words that can be ignored in the text:

  • match_phrase_prefix: This is similar to the match_phrase query, except that it is a prefix match for the phrase.
  • multi_match: This provides a way to match the query string to multiple fields. You can use a wildcard in the field names. Since there are multiple fields, you can specify an option to define the match type among the fields as described in the following table:
The match type option Description
best_fields This ranks the results by the best score of each match.
most_fields This ranks the results by the total score of each match.
cross_fields This ranks the results by the best score according to the term-by-term basis blended term query to blend the score. 
phrase This is similar to the best_fields option, but uses the match_phrase query instead.
phrase_prefix This is similar to the best_fields option, but uses the match_phrase_prefix query instead.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.131.62