Edge n-gram tokenizer

This generates n-gram tokens from the start over the entire input string. Like the n-gram tokenizer, the edge n-gram tokenizer also does not consider white space as a delimiter, so white space is also considered during tokenization.

Factory class: solr.EdgeNGramTokenizerFactory

Arguments:

minGramSize (integer, default is 1): The minimum n-gram size
maxGramSize (integer, default is 1): The maximum n-gram size (0 < minGramSize <= maxGramSize)

In earlier versions, Solr supported an argument side (front or back; the default was front), which generated a token from the provided value. This argument has now been removed and Solr generates the token from the front end of the input string.

Example:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
 <analyzer>
 <tokenizer class="solr.EdgeNGramTokenizerFactory" minGramSize="2" maxGramSize="10"/>
 </analyzer>
 </fieldType>

Input: send me

Output: se, sen, send, send, send m, send me

The entire input string is split into n-gram pattern tokens considering size parameters (minGramSize (2) and maxGramSize (10)) along with white spaces. The edge n-gram tokenizer is required for matching n-characters from the start of the string.

For example, the input string is Please send me a mail at [email protected] and we want to match Please send me a mail but it will not match.

Table of Contents for Edge n-gram tokenizer

Create new playlist

Sign In

Sign Up

Table of Contents for
Edge n-gram tokenizer