This filter splits tokens at word delimiters. This is an alternative to the word delimiter filter. Always use a word delimiter graph filter at query time and not at index time because the indexer can’t directly consume a graph at index time; if you still need to use this filter at index time, use it with a flatten graph filter.
The rules for determining delimiters are as follows:
- A change in case within a word: KnowMore -> Know, More. This can be disabled by setting splitOnCaseChange="0".
- A transition from alpha to numeric characters or vice versa: Alpha1000 -> Alpha, 1000 100MS -> 100, MS. This can be disabled by setting splitOnNumerics="0".
- Non-alphanumeric characters are discarded: air-crew -> air, crew.
- A trailing 's is removed: Solr's -> Solr.
- Any leading or trailing delimiters are discarded: -air-crew!! -> air, crew.
Factory class: solr.WordDelimiterGraphFilterFactory
Arguments: It's not possible to list all the arguments here. Please refer to the Solr document for these.
Example:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory"/>
<!-- required on index analyzers after graph filters -->
<filter class="solr.FlattenGraphFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory"/>
</analyzer>
</fieldType>
Input: KnowMore air-crew Alpha1000
Tokenizer to filter: KnowMore, air-crew, Alpha1000
Output: Know, More, air, crew, Alpha, 1000
This is a simple example of a word delimiter graph filter. However, we can play with this filter by applying much more complex filtering terms.