Chapter 9. Introducing Customizations

In this chapter, we will introduce paths for customizations.

First of all, we will see how to handle the Solr core configurations in a more flexible way, so that it is compatible with the new standard that will become mandatory from Version 5.

Then, we will focus on the language analysis, starting from the adoption of a widely used stemmer, and moving to the creation of a very simple named Entity Recognizer. This process will give us the possibility to introduce a quick and easy way to create new plugins. We will use th=e Scala language for this because of its simplicity, and we will create the base for a new ResponseWriter. Then, it's up to you to complete its development.

Looking at the Solr customizations

This is the last chapter, and will be a little different from the previous ones, because it will involve writing code for some of our examples.

Once we learn how to manage the basic components and typical configurations for common Solr usage, it's important to have an idea of where to start for specific customizations—even if it's obvious that in most cases we will not need them at all.

The following will be the main topics:

  • Detecting and managing language: We will start with what we call an advanced configuration, more than an actual customization which includes language recognition and managing multiple languages. I decided to put this argument in this chapter because the configurations that depend on language require specific analysis, testing, and libraries. So, we will introduce some of the libraries that can be used for this task; but our examples will only be a beginning.
  • In order to test language analysis chains, we will see that it's possible to test the components in a more precise way by using units test. For this purpose, we will use a specific Solr testing library and the Scala language, because it's concise and simple to use here.
  • The choice of the Scala language could seem a bit risky, but the motivations are simple. If you already have experience with Java, you should find the examples very concise and simple to read. If not, it' a good point from which to start, since it can be used as Java without too much boilerplate code. With this approach in mind, we can introduce the basic structural elements for writing new plugins using Java or Scala. We do this by looking at the methods inherited from the most common interfaces. Writing new plugins will require the knowledge of how a plugin structure is made, and what are the basic phases of a plugin lifecycle.

The code we will write and analyze should be considered only as a start. You need to know a lot more things and I hope you'll find our first experiment a good way to move further with more specific and deeper explorations.

Adding some more details to the core discovery

Previously, we have introduced the solr.xml configuration syntax as fast as we could. Since the current syntax will be adopted as mandatory from Solr Version 5.0, it's important to look at the Solr reference for acquiring knowledge on other details from https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml. The structure will now include configurations of shards and nodes, as shown in the following example code:

<solr>
  <solrcloud>
      <str name="host">127.0.0.1</str>
      <int name="hostPort">${hostPort:8983}</int>
      <str name="hostContext">${hostContext:solr}</str>
      <int name="zkClientTimeout">${solr.zkclienttimeout:30000}</int>
      <str name="shareSchema">${shareSchema:false}</str>
      <str name="genericCoreNodeNames">${genericCoreNodeNames:true}</str>
    <str name="zkHost">${zkHost:}</str>
   </solrcloud>
   <shardHandlerFactory name="shardHandlerFactory" class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:120000}</int>
    <int name="connTimeout">${connTimeout:15000}</int>
  </shardHandlerFactory>
  <logging>
    <str name="class">${loggingClass:}</str>
    <str name="enabled">${loggingEnabled:}</str>
    <watcher>
      <int name="size">${loggingSize:}</int>
      <int name="threshold">${loggingThreshold:}</int>
    </watcher>
  </logging>
</solr>

In the examples for this chapter, we have added some of these configurations with predefined values just to use them as a template. We can easily recognize the options to be used for configuring logging and the component that will be used as a factory for managing shards. The configurations can be written as usual by using the core.properties file.

Please remember that the presence of the <solrcloud> element is needed but that doesn't mean that the current instance is running in the SolrCloud mode. To start the instance in the SolrCloud mode, we need to specify the -DzkHost and -DzkRun parameters at startup, or add the corresponding configuration in the <solrcloud> element, as done in some of the previous examples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.103.210