Language detection configuration

The configuration for language detection is done in solrconfig.xml and both Tika as well as langdetect language detection use the same parameters, as follows:

<processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
<lst name="defaults">
<str name="langid.fl">title,subject,text,keywords</str>
<str name="langid.langField">language_s</str>
</lst>
</processor>
<processor class=
"org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
<lst name="defaults">
<str name="langid.fl">title,subject,text,keywords</str>
<str name="langid.langField">language_s</str>
</lst>
</processor>

As you can see, both the configurations use the same parameters, the only difference being the processor class. The list of parameters is given here:

Parameter Description
langid Used to enable language detection by setting the value to true.
langid.fl

This is a required parameter, which can contain either comma-delimited or space-delimited fields to be processed using langid.

langid.langField

This is a required parameter used to specify the field for the returned language code.

langid.langsField

The same as langid.langField, but in this case, it is used to specify the field for a list instead of a single language code.

langid.overwrite

If you enable this parameter, then the content of the langField and langsFields fields will be overwritten provided they already have a value. By default, the value is set to false.

langid.lcmap Contains a space-separated list that specifies the language code mappings (colon-delimited) to apply to the detected languages.
langid.threshold

Used to set a threshold between 0 and 1, and the language identification score must reach the threshold. Only then is langid accepted. The default value is 0.5.

langid.whitelist

Used to specify the allowed language identification codes list.

langid.map

Used to enable field name mapping. The default value is false.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.201.206