Chapter 8. Commits, Real-Time Index Optimizations, and Atomic Updates

In the previous chapter, we saw how we can use Apache Nutch to crawl websites and index them in Solr. In this chapter, we'll see how we can index data in real-time using the features available from Solr 4. By using these features, the indexed data will be available in real-time for the user to see.

This chapter will cover the following topics:

  • Understanding soft commit, optimize, and hard commit
  • Using atomic updates to update fields
  • Using RealTime Get

Understanding soft commit, optimize, and hard commit

Solr provides us a Near-Real-Time (NRT) search, which makes documents available for searching just after they have been indexed in Solr. Additions or updates to documents are seen nearly in real-time after we index them in Solr. This near-real-time search can be done by using a soft commit (available in Solr 4.0+), which avoids the high cost of calling fsync, and it will flush the index data into a stable storage so that it can be retrieved in the event of a JVM crash.

An optimize, on the other hand, will cause all index segments to be merged into a single segment first and then reindex them. It's just like the defragmentation that we do on an HDD, which reindexes and frees up space. Normally, index segments are merged over time as specified in the merge policy, but this happens immediately when forced using the optimize command.

Let's see how we can use soft commit and optimize in Solr. We'll use our musicCatalog example and create a new core based on it. We'll call our new core musicCatalog commit, which will contain the updated solrconfig.xml. The code for this core can be found in the Chapter 8 code that comes with this book.

Let's index into Solr an XML file (sampleAlbumData.xml) that is available in the Chapter 8 code examples provided with this book. We'll use the following command to do a soft commit in Solr and check out how it will be readily available for us to see from the Solr query browser.

Let's run the following command:

$ cd %BOOK_EXAMPLES%/Chapter-8/example-files
$ curl http://localhost:8983/solr/musicCatalog-commit/update?softCommit=true -H "Content-Type: text/xml" --data-binary @sampleAlbumData.xml
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">44</int></lst>
</response>

Now let's index two more files into Solr:

$ curl http://localhost:8983/solr/musicCatalog-commit/update?softCommit=true -H "Content-Type: text/xml" --data-binary @sampleAlbumData2.xml

$ curl http://localhost:8983/solr/musicCatalog-commit/update?softCommit=true -H "Content-Type: text/xml" --data-binary @sampleAlbumData3.xml

After executing this command, we can navigate to the Solr query browser by going to http://localhost:8983/solr/#/musicCatalog-commit/query, where we can see the index data. At this time, softCommit has just made the index data available for view, but there is not 100% reliability that the index data is committed into the persistence storage.

In Solr, we can also configure softCommit using the autoSoftCommit element in solrconfig.xml.

In solrconfig.xml, we'll add the following configuration:

<updateHandler class="solr.DirectUpdateHandler2">
    <updateLog class="solr.FSUpdateLog">
      <str name="dir">${solr.ulog.dir:}</str>
    </updateLog>

    <autoSoftCommit>
      <maxTime>10000</maxTime>
    </autoSoftCommit>
  </updateHandler>

The autoSoftCommit element accepts the following config elements:

  • maxTime: The amount of time to wait before indexing the documents (in milliseconds)
  • maxDocs: The number of documents to queue before indexing them in Solr

In the precedingexample, maxTime is set to 10 seconds, which tells Solr to perform a soft commit after 10 seconds. To test this, we can easily change maxTime to a higher value and then use the curl command to send the document to Solr. We'll see that the document won't be available in the Solr query browser until maxTime has elapsed.

Now we'll see how we can configure the autoCommit feature, which is available in solr.DirectUpdateHandler2. Here is an example of autoCommit that is available in Solr:

<autoCommit>
  <!-- maximum number of documents before an autocommit is triggered -->
  <maxDocs>2</maxDocs>

  <!-- maximum time (in MS) after adding a doc before an autocommit is triggered -->
  <maxTime>15000</maxTime>

  <openSearcher>false</openSearcher>
</autoCommit>

Note

For more information on using autoCommit, visit http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22.

Now let's see how we can update specific fields of an indexed document using atomic updates in Solr.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.197.164