Fuzziness

With the fuzziness parameter, we can turn the match query into a fuzzy query. This fuzziness is based on the Levenshtein edit distance, to turn one term into another by making a number of edits to the original text. Edits can be insertions, deletions, substitutions, or the transposition of characters in the original term. The fuzziness parameter can take one of the following values: 0, 1, 2, or AUTO.

For example, the following query has a misspelled word, victor instead of victory. Since we are using a fuzziness of 1, it will still be able to find all victory multimedia records:

GET /amazon_products/_search
{
"query": {
"match": {
"manufacturer": {
"query": "victor multimedia",
"fuzziness": 1
}
}
}
}

If we wanted to still allow more room for errors to be correctable, the fuzziness should be increased to 2. For example, a fuzziness of 2 will even match victer. Victory is two edits away from victer:

GET /amazon_products/_search
{
"query": {
"match": {
"manufacturer": {
"query": "victer multimedia",
"fuzziness": 2
}
}
}
}

The AUTO value means that the fuzziness numeric value of 0, 1, 2 is determined automatically based on the length of the original term. With AUTO, terms with up to 2 characters have fuzziness = 0 (must match exactly), terms from 3 to 5 characters have fuzziness = 1, and terms with more than five characters have fuzziness = 2.

Fuzziness comes at its own cost because Elasticsearch has to generate extra terms to match against. To control the number of terms, it supports the following additional parameters:

  • max_expansions: The maximum number of terms after expanding.
  • prefix_length: A number, such as 0, 1, 2, and so on. The edits for introducing fuzziness will not be done on the prefix characters as defined by the prefix_length parameter.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.79.45