It's night; the phone is ringing, you answer it and hear: "We've got a problem—the index is corrupted, nothing works, the apocalypse is coming". What can we do? Is there anything besides the full indexation or restoring from backup? There is something that we can do and this recipe will show you.
For the purpose of this recipe, let's suppose that we have a corrupted index that we want to check and fix. To use the CheckIndex
class that we will use, we will need to point it to the index we want to fix. We will need to run a command similar to the following one:
java –cp LUCENE_JAR_LOCATION -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex INDEX_PATH -fix
Here, INDEX_PATH
is the path to the index, for example, /usr/share/solr/data/index
and LUCENE_JAR_LOCATION
is the path to the Lucene core JAR library (which is provided with the Solr distribution). So, with the given index location, the command will look as follows:
java –cp lucene-core-4.10.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /usr/share/solr/data/index -fix
After running the preceding command, you should see a series of information about the process of index repair, which in my case looked as follows:
Opening index @ /usr/share/solr/data/index Segments file=segments_2 numSegments=1 version=4.10.0 format= userData={commitTimeMSec=1395237413525} 1 of 1: name=_0 docCount=11 version=4.7.0 codec=Lucene46 compound=false numFiles=10 size (MB)=0.002 diagnostics = {os=Windows 8.1, java.vendor=Oracle Corporation, java.version=1.8.0, lucene.version=4.10.0 1570806 - simon - 2014-12-22 08:25:23, os.arch=amd64, source=flush, os.version=6.3, timestamp=1395237413563} no deletions test: open reader.........FAILED WARNING: fixIndex() would remove reference to this segment; full exception: java.io.IOException: Invalid vInt detected (too many bits) at org.apache.lucene.store.DataInput.readVInt(DataInput.java:138) at org.apache.lucene.store.DataInput.readString(DataInput.java:232) at org.apache.lucene.store.DataInput.readStringStringMap(DataInput.java:263) at org.apache.lucene.codecs.lucene46.Lucene46FieldInfosReader.read(Lucene46FieldInfosReader.java:93) at org.apache.lucene.index.SegmentReader.readFieldInfos(SegmentReader.java:289) at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:107) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:583) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096) WARNING: 1 broken segments (containing 11 documents) detected WARNING: 11 documents will be lost NOTE: will write new segments file in 5 seconds; this will remove 11 docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C! 5... 4... 3... 2... 1... Writing... OK Wrote new segments file "segments_3"
And that's all. After this, you should have the index processed and depending on the case, you can have your index repaired. Now, let's see how it works.
As you see, the command-line instruction runs the CheckIndex
class from the org.apache.lucene.index
package. We also provided the absolute path to the directory that contains the index files, the library that contains the necessary classes, and the –fix
parameter, which tells the CheckIndex
tool to try to repair any errors found in the index structure. In addition to this, we provided the ea
parameter to enable the assertions. We did this to make the test more accurate. Let's take a look at the response that the CheckIndex
tool provided. As you can see, we have information about the segments, the number of documents, and the version of Lucene used to build the index. We can also see the number of files that the index consists of, the operating system, and so on. This information might be useful but it is not crucial. The most interesting thing for us is the following information:
WARNING: 1 broken segments (containing 11 documents) detected WARNING: 11 documents will be lost
This information tells us that the CheckIndex
tool found one broken segment, which contains 11 documents and that all the 11 documents will be lost in the repair process. This is not always the case, but it can happen and you should be aware of that.
The next lines of the CheckIndex
tool response tells us about the process of writing the new segment files that will be repaired. And that's actually all. Of course, when dealing with larger indexes, the response generated by the CheckIndex
tool will be much larger and will contain information about all the segments of the index. The preceding example is simple but it should illustrate how the tool works.
When using the CheckIndex
tool, you need to be very careful. There are many situations where the index files can't be repaired and the CheckIndex
tool will result in the deletion of all the documents in the index. That's not always the case, but you should be aware of that and be extra careful—for example, a good practice is to make a backup of the existing index before running the CheckIndex
tool.
There is one more thing worth noticing when talking about the CheckIndex
tool.
If you only want to check the index for any errors without the need to repair it, you can run the CheckIndex
tool in the repair mode. To do this, run the command-line fragment shown in the recipe without the –fix
part. For example:
java -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /usr/share/solr/data/index
3.19.211.21