White space tokenizer

This splits the text stream at white spaces only. However, it will not split the text at any punctuation (like the standard tokenizer). Therefore, all of the punctuation will remain as is inside the generated tokens.

Factory class: solr.WhitespaceTokenizerFactory

Arguments:

Example:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory" rule="java" />
</analyzer>
</fieldType>

Input: Please send a mail at [email protected] by 12-11.

Output: Pleasesendamailat[email protected]by12-11.

The input string was split at white spaces but the punctuation (@, .,and -) was preserved in the tokens.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.28.9