Letter tokenizer

The letter tokenizer discards all non-letter characters from the input string and then generates a token at strings of contiguous letters.

Factory classsolr.LetterTokenizerFactory

Arguments: None


<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<tokenizer class="solr.LetterTokenizerFactory"/>

Input: I haven't received mail by Nov12Sunday

Output: IhaventreceivedmailbyNovSunday

All non-letter characters (' and 12) are discarded first, and then tokens are generated by considering strings of contiguous letters.

