Word delimiter graph filter 

 This filter splits tokens at word delimiters. This is an alternative to the word delimiter filter. Always use a word delimiter graph filter at query time and not at index time because the indexer can’t directly consume a graph at index time; if you still need to use this filter at index time, use it with a flatten graph filter.

The rules for determining delimiters are as follows:

  • A change in case within a word: KnowMore -> Know, More. This can be disabled by setting splitOnCaseChange="0".
  • A transition from alpha to numeric characters or vice versa: Alpha1000 -> Alpha, 1000 100MS -> 100, MS. This can be disabled by setting splitOnNumerics="0".
  • Non-alphanumeric characters are discarded: air-crew -> air, crew.
  • A trailing 's is removed: Solr's -> Solr.
  • Any leading or trailing delimiters are discarded: -air-crew!! -> air, crew.

Factory class: solr.WordDelimiterGraphFilterFactory

Arguments: It's not possible to list all the arguments here. Please refer to the Solr document for these.

Example:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory"/>
<!-- required on index analyzers after graph filters -->
<filter class="solr.FlattenGraphFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory"/>
</analyzer>
</fieldType>

Input: KnowMore air-crew Alpha1000

Tokenizer to filter: KnowMoreair-crewAlpha1000

OutputKnowMoreaircrewAlpha1000

This is a simple example of a word delimiter graph filter. However, we can play with this filter by applying much more complex filtering terms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.213.27