Time for action – defining the basic solrconfig.xml file

In this paragraph, we will define a basic configuration file and add a handler to trigger commits when a certain amount of data has been posted to the core.

  1. Let's start with a very basic solrconfig.xml file that will have the following structure:
    <config>
      <luceneMatchVersion>LUCENE_45</luceneMatchVersion>
      <directoryFactory name="DirectoryFactory" class="solr.MMapDirectoryFactory" />
      <codecFactory name="CodecFactory" class="solr.SchemaCodecFactory" />
    
      <requestHandler name="standard"class="solr.StandardRequestHandler"default="true" />
      <requestHandler name="/update" class="solr.UpdateRequestHandler"/>
    
      <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
      <admin>
        <defaultQuery>*:*</defaultQuery>
      </admin>
      <requestHandler name="/analysis/field"class="solr.FieldAnalysisRequestHandler" />
    
      <updateHandler class="solr.DirectUpdateHandler2">
        <updateLog>
          <str name="dir">${solr.ulog.dir:}</str>
        </updateLog>
        <autoCommit>
          <maxTime>60000</maxTime>
          <maxDocs>100</maxDocs>
        </autoCommit>
    </updateHandler>
    
    </config>

    The only notable difference from the previous examples seen in Chapter 2, Indexing with Local PDF Files, is in the addition of the update handler solr.DirectUpdateHandler2, which is needed by Solr for handling internal calls to the update process, and a different choice for the codec used to save the binary data.

  2. In this case, we are using a standard, but you can easily adopt the SimpleTextCodec seen before if you want to use it for your tests. If you change codec during your tests, remember to clean and rebuild the index.

What just happened?

Most of the configuration is identical to the previous examples, and will be used again for the next, so we will focus on the elements introduced that are new to us.

Tip

solr.* alias

When writing Solr configuration files, we can use short names alias for the fully qualified Java class name. For example, we wrote solr.DirectUpdateHandler2, which is a short alias for the full name: org.apache.solr.update.DirectUpdateHandler2.

This alias works for Solr's internal types and components defined in the main packages:.org.apache.solr.(schema/core/analysis/search/update/request/response).

This example introduced the DirectUpdateHandler2 component that is used to perform commits automatically depending on certain conditions.

With <autoCommit>, we can trigger a new automatic commit action when a certain amount of milliseconds have passed (maxTime) or after a certain amount of documents have been posted (maxDocs) and are waiting to be indexed.

The <updateLog/> tag is used for enabling the Atomic Update feature (for more details on the Atomic Update feature, see http://wiki.apache.org/solr/Atomic_Updates) introduced in the recent versions. This feature permits us to perform an update on a per field basis instead of using the default delete-and-add mechanism for an entire document. We also have to add a specific stored field for tracking versions, as we see in the schema.xml configuration details.

Looking at the differences between commits and soft commits

We always refer to the standard commit mechanism; a document will not be found on an index until it has been included in a commit, which fixes the modifications in updating an index. This way we are obtaining an almost stable version of the binary data for index storage, but reconstructing the index every time a new document is added can be very expensive and does not help when we need features such as atomic updates, distributed indexes, and near real-time searches (http://wiki.apache.org/solr/NearRealtimeSearch). From this point of view, you'll also find references to a soft commit. This is intended to make modifications to a document available for search even if a complete (hard) commit has not been performed yet. Because a commit can consume time and resources on big indexes, this is useful to fix a list of operations on the document while waiting to update the entire index with its new values. These small temporary updates can be triggered too with a corresponding, similar <autoSoftCommit> configuration.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.103.154