Edge n-gram tokenizer

This generates n-gram tokens from the start over the entire input string. Like the n-gram tokenizer, the edge n-gram tokenizer also does not consider white space as a delimiter, so white space is also considered during tokenization.

Factory class: solr.EdgeNGramTokenizerFactory

Arguments:

  • minGramSize (integer, default is 1): The minimum n-gram size
  • maxGramSize (integer, default is 1): The maximum n-gram size (0 < minGramSize <= maxGramSize)

In earlier versions, Solr supported an argument side (front or back; the default was front), which generated a token from the provided value. This argument has now been removed and Solr generates the token from the front end of the input string.

Example:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.EdgeNGramTokenizerFactory" minGramSize="2" maxGramSize="10"/>
</analyzer>
</fieldType>

Input: send me

Output: sesensendsendsend msend me

The entire input string is split into n-gram pattern tokens considering size parameters (minGramSize (2) and maxGramSize (10)) along with white spaces. The edge n-gram tokenizer is required for matching n-characters from the start of the string.

For example, the input string is Please send me a mail at [email protected] and we want to match Please send me a mail but it will not match. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.17.12