Synonym graph filter

The synonym graph filter supports single- or multi-token synonyms. The filter maps single- or multi-token synonyms and generates a correct token, which was not supported by the synonym filter.

The synonym graph filter is normally configured at query time, not index time. This will reduce the size of the index. If this filter is configured at index time, after adding any new synonyms to the synonyms.txt file, re-indexing of entire documents is required. The synonym graph filter configuration at query time does not require re-indexing for adding new synonyms to synonyms.txt. To configure this filter at index time, we must mention the flatten graph filter for treating tokens like the synonym filter.

Also, the configuration order for this filter is important. If we are configuring the synonym graph filter before the ASCII folding filter, then we need to maintain all diacritical words (like caffĂ©) in synonyms.txt as well:

Factory class: solr.SynonymGraphFilterFactory

Arguments:

  • synonyms (required): The path of a file (synonyms.txt) that contains a list of synonyms, one per line. Blank lines and lines that begin with # are ignored. This may be a comma-separated list of absolute paths, or paths relative to the Solr config directory.

Sample format of synonyms.txt:

A comma-separated list of words. If the token matches any of the words, then all the words in the list are substituted, which will include the original token.

For example:

football,soccer
dumb,stupid,dull

Two comma-separated lists of words with the symbol => between them. If the token matches any word on the left, then the list on the right is substituted. The original token will not be included unless it is also in the list on the right.

For example:

country => nation
smart,clever,bright => intelligent,genius
  • ignoreCase (optional; default: false): This determines the behavior of the filter in case-sensitive or case insensitive matching from the file. If it is true, synonyms will be matched case insensitively.
  • expand (optional; default: true): If this is set to true, a synonym will be expanded to all equivalent synonyms. If false, all equivalent synonyms will be reduced to the first in the list.
  • format (optional; default: solr): Controls how the synonyms will be parsed. Supported formats are:
    • solr (SolrSynonymParser)
    • wordnet (WordnetSynonymParser)
    • We can pass the name of our own SynonymMap.Builder subclass.
  • tokenizerFactory: The name of the tokenizer factory to use when parsing the synonyms file. If tokenizerFactory is specified, then analyzer may not be, and vice versa.
  • analyzer (optional; default: WhitespaceTokenizerFactory): The name of the analyzer class to use when parsing the synonyms file. If the analyzer is specified, then tokenizerFactory may not be, and vice versa.

Example:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
</analyzer>
</fieldType>

Input: He is stupid, not clever

Tokenizer to filter: Heisstupidnotclever

OutputHeisdumbdullstupidnotintelligent,genius

All the matching synonyms (from synonyms.txt) are added to the token stream.

If we want to apply the synonym graph filter at index time, we must define FlattenGraphFilterFactory in an analyzer definition index.

Example:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
<!-- required on index analyzers after synonym graph filters -->
<filter class="solr.FlattenGraphFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
</analyzer>
</fieldType>

Input: He is stupid, not clever

Tokenizer to FilterHeisstupidnotclever

Output: Heisdumbdullstupidnotintelligentgenius

All the matching synonyms (from synonyms.txt) are added to the token stream.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.230.126