Creating separate indexes per language

In this approach, Solr creates a separate Solr index (Solr Core) per language. Solr supports the creation of multiple cores. Every core contains a unique Solr index. Every core uses separate configuration files, including the managed-schema.xml file. During searching, every Solr core searches its own data from its own configuration file. After the search, results from all cores are combined together and then returned as an output of the query. Here is the simple configuration of defining a core per language.

File en_managed-schema.xml:

<field name="content" type="text_en" indexed="true" stored="true" />

File el_managed-schema.xml:

<field name="content" type="text_el" indexed="true" stored="true" />

File es_managed-schema.xml:

<field name="content" type="text_es" indexed="true" stored="true" />

File solr.xml:

<cores>
<core name="english" instanceDir="shared" dataDir="../cores/core-perlanguage/data/english/" schema="en_managed-schema.xml" />
<core name="greek" instanceDir="shared" dataDir="../cores/core-perlanguage/data/greek/" schema="el_managed-schema.xml" />
<core name="spanish" instanceDir="shared" dataDir="../cores/core-perlanguage/data/spanish" schema="es_managed-schema.xml" />
<core name="aggregator" instanceDir="shared" dataDir="data/aggregator" />
</cores>

During the query, a request is sent to each of the language-specific cores using the shards parameter. Now the search will be independent and parallel with other languages. This will improve the performance of Solr. Here the field "content" is defined differently in each language-specific core, the language analysis also executed as per the configuration of that core. The search performance is better by defining a core per language as the search executes in parallel across multiple smaller indexes. As opposed to this, searching across a growing number of fields in a much larger index (the language-per-field approach) will hurt the performance. However, managing each core per language is somehow difficult.

During multiple language search configuration, selecting the implementation approach completely depends on the search requirements. The separate fields per language approach suits cases where the index size is in control. The separate indexes per language approach suits cases where the former does not satisfy the requirements and we have sufficient environment maintenance capabilities.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.19.242.46