The contrib
folder contains other modules or plugins that are briefly described in the following sections.
The clustering module is a framework used to plug in third-party (clustering) implementations. At the time of writing this book, it provides support for clustering search results using the Carrot2 project.
The Solr example that comes with the download bundle already contains a ClusteringComponent
within the solrconfig.xml
configuration file. The declaration happens in two phases. First, the component has to be configured:
<searchComponent name="clustering" enable="${solr.clustering.enabled:false}" class="solr.clustering.ClusteringComponent" > <lst name="engine"> <str name="name">lingo</str> <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str> <str name="carrot.resourcesDir">clustering/carrot2</str> </lst> … </searchComponent>
After this, as with any other SearchComponent
, you should enable it by including its name in the RequestHandler
instance where it is supposed to play:
<requestHandler name="/myRequestHandler" class="solr.SearchHandler">
…
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
In this way, it can contribute to search results by adding a "clusters" section, like this:
<response>
<result>
…
</result>
<arr name="clusters">
<arr name="labels">
<str>iPod</str>
</arr>
<double name="score">1.3174612693376382</double>
<arr name="docs">
<str>F8V7067-APL-KIT</str>
<str>IW-02</str>
…
</arr>
<arr name="labels">
<str>Hard Drive</str>
</arr>
…
</response>
If you want to try this yourself, open a shell and type the following commands:
# cd $INSTALL_DIR/example # java -Dsolr.clustering.enabled=true -jar start.jar
These will start Solr with the ClusteringComponent
enabled. Now, on another shell type this:
# cd $INSTALL_DIR/example/exampledocs # ./post.sh *.xml
Finally, open a browser and execute this query:
http://localhost:8983/solr/clustering?q=*:*&rows=10
You should get a response similar to the preceding example, with the "clusters" section at the bottom.
This module integrates Apache UIMA in Solr by providing a powerful Metadata Extraction Library that can be used for tasks such as automatic keyword extraction and Named Entity Recognition (for example, places, names, concepts, and dates).
The plugin can be provided both as an UpdateRequestProcessor
subclass, to decorate the index process chain, or as a set of Tokenizers
/Filters
, to add such behavior in the (index or query) text analysis phase.
Using this module, you can enrich your Solr documents with additional metadata information extracted from the input data. UIMA provides an analysis engine that involves several components arranged in a pipeline. The default pipeline supports the use of existing analysis engines such as Alchemy or
OpenCalais. Keep in mind that these engines are not free-of-charge, but they provide a free trial period. You can register and obtain an API key that must be configured in the solrconfig.xml
file. Other components are used for language and sentence detection.
The UIMA UpdateRequestProcessor
intercepts the documents that are being indexed and sends them to its analysis pipeline. Those documents will be automatically enriched with extracted information such as sentences, languages, or named entities (for example, places or names).
The MapReduce contrib module provides integration with Apache Hadoop. MapReduce is the name of a paradigm (programming model) that is implemented in Apache Hadoop to process large datasets with a parallel and distributed algorithm.
The contribution contains a MapReduce job to build Solr indexes and merge them into a Solr cluster.
52.15.74.25