This generates n-gram tokens of sizes in the provided range from the input string.
Factory class: solr.NGramTokenizerFactory
Arguments: minGramSize (integer, default 1): The minimum n-gram size.
maxGramSize (integer, default 2): The maximum n-gram size
0 < minGramSize <= maxGramSize
Example:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="3"/>
</analyzer>
</fieldType>
Input: send me
Output: se, sen, en, end, nd, nd, d, dm, m, me, me
N-gram tokenizer executes tokenization over the entire input string. Also, it does not consider white spaces as delimiters, so white space characters are also included in the tokenization. In the preceding example, white spaces are preserved as parts of the token after tokenization. The n-gram tokenizer is required in cases where we want to match search words from the start, end, or somewhere in between the string along with white spaces.
For example, the input string is Please send me a mail at [email protected] and we want to match mail at [email protected].