Using the check index functionality

It's night; the phone is ringing, you answer it and hear: "We've got a problem—the index is corrupted, nothing works, the apocalypse is coming". What can we do? Is there anything besides the full indexation or restoring from backup? There is something that we can do and this recipe will show you.

How to do it...

For the purpose of this recipe, let's suppose that we have a corrupted index that we want to check and fix. To use the CheckIndex class that we will use, we will need to point it to the index we want to fix. We will need to run a command similar to the following one:

java –cp LUCENE_JAR_LOCATION -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex INDEX_PATH -fix

Here, INDEX_PATH is the path to the index, for example, /usr/share/solr/data/index and LUCENE_JAR_LOCATION is the path to the Lucene core JAR library (which is provided with the Solr distribution). So, with the given index location, the command will look as follows:

java –cp lucene-core-4.10.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /usr/share/solr/data/index -fix

After running the preceding command, you should see a series of information about the process of index repair, which in my case looked as follows:

Opening index @ /usr/share/solr/data/index

Segments file=segments_2 numSegments=1 version=4.10.0 format= userData={commitTimeMSec=1395237413525}
  1 of 1: name=_0 docCount=11
    version=4.7.0
    codec=Lucene46
    compound=false
    numFiles=10
    size (MB)=0.002
    diagnostics = {os=Windows 8.1, java.vendor=Oracle Corporation, java.version=1.8.0, lucene.version=4.10.0 1570806 - simon - 2014-12-22 08:25:23, os.arch=amd64, source=flush, os.version=6.3, timestamp=1395237413563}
    no deletions
    test: open reader.........FAILED
    WARNING: fixIndex() would remove reference to this segment; full exception:
java.io.IOException: Invalid vInt detected (too many bits)
        at org.apache.lucene.store.DataInput.readVInt(DataInput.java:138)
        at org.apache.lucene.store.DataInput.readString(DataInput.java:232)
        at org.apache.lucene.store.DataInput.readStringStringMap(DataInput.java:263)
    at org.apache.lucene.codecs.lucene46.Lucene46FieldInfosReader.read(Lucene46FieldInfosReader.java:93)
        at org.apache.lucene.index.SegmentReader.readFieldInfos(SegmentReader.java:289)
        at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:107)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:583)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)

WARNING: 1 broken segments (containing 11 documents) detected
WARNING: 11 documents will be lost

NOTE: will write new segments file in 5 seconds; this will remove 11 docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
  5...
  4...
  3...
  2...
  1...
Writing...
OK
Wrote new segments file "segments_3"

And that's all. After this, you should have the index processed and depending on the case, you can have your index repaired. Now, let's see how it works.

How it works...

As you see, the command-line instruction runs the CheckIndex class from the org.apache.lucene.index package. We also provided the absolute path to the directory that contains the index files, the library that contains the necessary classes, and the –fix parameter, which tells the CheckIndex tool to try to repair any errors found in the index structure. In addition to this, we provided the ea parameter to enable the assertions. We did this to make the test more accurate. Let's take a look at the response that the CheckIndex tool provided. As you can see, we have information about the segments, the number of documents, and the version of Lucene used to build the index. We can also see the number of files that the index consists of, the operating system, and so on. This information might be useful but it is not crucial. The most interesting thing for us is the following information:

WARNING: 1 broken segments (containing 11 documents) detected
WARNING: 11 documents will be lost

This information tells us that the CheckIndex tool found one broken segment, which contains 11 documents and that all the 11 documents will be lost in the repair process. This is not always the case, but it can happen and you should be aware of that.

The next lines of the CheckIndex tool response tells us about the process of writing the new segment files that will be repaired. And that's actually all. Of course, when dealing with larger indexes, the response generated by the CheckIndex tool will be much larger and will contain information about all the segments of the index. The preceding example is simple but it should illustrate how the tool works.

Note

Note that you should turn off Solr and not have any process accessing the index at the same time CheckIndex tool is working.

When using the CheckIndex tool, you need to be very careful. There are many situations where the index files can't be repaired and the CheckIndex tool will result in the deletion of all the documents in the index. That's not always the case, but you should be aware of that and be extra careful—for example, a good practice is to make a backup of the existing index before running the CheckIndex tool.

There's more...

There is one more thing worth noticing when talking about the CheckIndex tool.

Checking the index without the repair procedure

If you only want to check the index for any errors without the need to repair it, you can run the CheckIndex tool in the repair mode. To do this, run the command-line fragment shown in the recipe without the –fix part. For example:

java -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /usr/share/solr/data/index
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.19.211.21