The Apache Solr 1045 patch provides Solr users a way to build Solr indexes using the MapReduce framework of Apache Hadoop. Once created, this index can be pushed to Solr storage. The following diagram depicts the mapper and reducer in Hadoop:
Each Apache Hadoop mapper transforms input records into a set of (key-value) pairs, which then gets transformed into SolrInputDocument
. The Mapper task ends up creating an index from SolrInputDocument
.
The focus of reducer is to perform de-duplication of different indexes and merge them if needed. Once the indexes are created, you can load them on your Solr instance and use them to search. You can read more about this patch on https://issues.apache.org/jira/browse/SOLR-1045.
The patch follows the standard process of patching up your label through SVN. To apply a patch to your Solr instance, you first need to build your Solr instance using source. The instance should be supported by the Solr 1045 patch. Now, download the patch from the Apache JIRA site (https://issues.apache.org/jira/secure/attachment/12401278/SOLR-1045.0.patch). Before running the patch, first do a dry run, which does not actually apply the patch. You can do it with the following command:
cd <solr-trunk-dir> svn patch <name-of-patch> --dry-run
If dry-run
works without any failure, you can apply the patch directly. You can also perform dry-run
using a simple patch
command:
patch <name-of-patch> --dry-run
If it is successful, you can run the patch without the -dry-run
option to apply the patch. On Windows, you can apply the patch with a right-click:
On Linux, you can use the SVN path as shown in the previous example. Let's look at some of the important classes in the patch. The SolrIndexUpdateMapper
class is responsible for creating create new indexes from the input document. The SolrXMLDocRecordReader
class reads Solr input XML files for indexing. The
SolrIndexUpdater
class is responsible for creating a MapReduce job and running it to read the document and for updating Solr instance.
This patch also provides a way for users to merge the indexes in the reducer phase of the patch. This patch is not yet part of the Solr label, but it is targeted for the Solr 4.9/5.0 label.
18.191.154.161