Language detection configuration

The configuration for language detection is done in solrconfig.xml and both Tika as well as langdetect language detection use the same parameters, as follows:

<processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
    <lst name="defaults">
        <str name="langid.fl">title,subject,text,keywords</str>
        <str name="langid.langField">language_s</str>
    </lst>
</processor>
<processor class=
"org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
    <lst name="defaults">
        <str name="langid.fl">title,subject,text,keywords</str>
        <str name="langid.langField">language_s</str>
    </lst>
</processor>

As you can see, both the configurations use the same parameters, the only difference being the processor class. The list of parameters is given here:

Parameter	Description
`langid`	Used to enable language detection by setting the value to true.
`langid.fl`	This is a required parameter, which can contain either comma-delimited or space-delimited fields to be processed using `langid`.
`langid.langField`	This is a required parameter used to specify the field for the returned language code.
`langid.langsField`	The same as `langid.langField`, but in this case, it is used to specify the field for a list instead of a single language code.
`langid.overwrite`	If you enable this parameter, then the content of the `langField` and `langsFields` fields will be overwritten provided they already have a value. By default, the value is set to false.
`langid.lcmap`	Contains a space-separated list that specifies the language code mappings (colon-delimited) to apply to the detected languages.
`langid.threshold`	Used to set a threshold between `0` and `1`, and the language identification score must reach the threshold. Only then is `langid` accepted. The default value is `0.5`.
`langid.whitelist`	Used to specify the allowed language identification codes list.
`langid.map`	Used to enable field name mapping. The default value is false.

Table of Contents for Language detection configuration

Create new playlist

Sign In

Sign Up

Table of Contents for
Language detection configuration