Generational indexing with TrackingIndexWriter

A generation is analogous to versioning in a revision control system. In TrackingIndexWriter, when an index changes, a new generation is created and can be used to open the index in that particular point in time. TrackingIndexWriter is a wrapper class to IndexWriter. It provides the corresponding addDocument, updateDocument, and deleteDocument methods to keep a track of index changes. On each update, a long value is returned, reflecting the current index generation. This value can be used to acquire an IndexSearcher that includes all the updates up to this specific point (generation). This class is intended to run alongside with ControlledRealTimeReopenThread. The ControlledRealTimeReopenThread is a utility class that runs as a separate thread managing the periodic reopening of the IndexSearcher. It accepts the TrackingIndexWriter and SearcherManager in its constructor to initialize this object. The generation value returned from TrackingIndexWriter can be used to tell ControlledRealTimeReopenThread to reopen index to a specific generation.

The relationship between components can be seen in the following diagram:

Generational indexing with TrackingIndexWriter

These utility classes can be used in conjunction with SearcherManager, with periodic index refreshes. The ControlledRealTimeReopenThread method provides the facility to refresh an index to a specific generation that guarantees inclusion of certain changes: say for a particular user after an update, while SearcherManager can trigger regular refreshes to maintain a general IndexSearcher freshness.

How to do it…

Here is the sample code on TrackingIndexWriter:

SearcherManager searcherManager = new SearcherManager(indexWriter, true, new SearcherFactory());
TrackingIndexWriter trackingIndexWriter = new TrackingIndexWriter(indexWriter);
ControlledRealTimeReopenThread controlledRealTimeReopenThread = new ControlledRealTimeReopenThread(trackingIndexWriter, searcherManager, 5, 0.001f);
controlledRealTimeReopenThread.start();

long indexGeneration = 0;

// add documents to index here

indexGeneration = trackingIndexWriter.addDocument(doc);

controlledRealTimeReopenThread.waitForGeneration(indexGeneration);
IndexSearcher indexSearcher = searcherManager.acquire();

// perform search here

searcherManager.release();
indexWriter.commit();

// add more documents to index here

indexGeneration = trackingIndexWriter.addDocument(doc);

controlledRealTimeReopenThread.waitForGeneration(indexGeneration);
indexSearcher = searcherManager.acquire();

// perform another search here

searcherManager.release();

controlledRealTimeReopenThread.close();
indexWriter.commit();

How it works...

Here, we instantiated a SearcherManager instance to pass along to TrackingIndexWriter. Then, we pass both of these objects into ControlledRealTimeReopenThread to instantiate it. Note that we keep a long value indexGeneration to store the generation value on each index update (for example, addDocument). We make a call to the ControllerRealTimeReopenThread's waitForGeneration method with indexGeneration. This tells the thread to refresh an index to the specified generation. When we call searcherManager.acquire() to return an IndexSearcher, the IndexSearcher should include all the changes up to the specified generation. You can also pass a maximum wait time (in milliseconds) to waitForGeneration so that the thread will only wait up to the specified duration instead of indefinitely. When the time is up and the index generation is not available, it will leave the currently opened index as it is.

This mechanism is useful when there is a need to provide a search result to a specific point in time. For example, a search right after a user submitted a post to a forum. This is especially, useful when it's necessary to guarantee that certain changes be included in the search.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.105.89