Other extensions

The contrib folder contains other modules or plugins that are briefly described in the following sections.

Clustering

The clustering module is a framework used to plug in third-party (clustering) implementations. At the time of writing this book, it provides support for clustering search results using the Carrot2 project.

The Solr example that comes with the download bundle already contains a ClusteringComponent within the solrconfig.xml configuration file. The declaration happens in two phases. First, the component has to be configured:

<searchComponent 
  name="clustering"
  enable="${solr.clustering.enabled:false}"
  class="solr.clustering.ClusteringComponent" >
  <lst name="engine">
    <str name="name">lingo</str>
    <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
    <str name="carrot.resourcesDir">clustering/carrot2</str>
  </lst>
  …
</searchComponent>

After this, as with any other SearchComponent, you should enable it by including its name in the RequestHandler instance where it is supposed to play:

<requestHandler name="/myRequestHandler" class="solr.SearchHandler">
  …
  <arr name="last-components">
    <str>clustering</str>
  </arr>
</requestHandler>

In this way, it can contribute to search results by adding a "clusters" section, like this:

<response>
  <result>
    …
  </result>
  <arr name="clusters">
  <arr name="labels">
    <str>iPod</str>	
  </arr>
  <double name="score">1.3174612693376382</double>
  <arr name="docs">
    <str>F8V7067-APL-KIT</str>
    <str>IW-02</str>
    …
  </arr>
  <arr name="labels">
    <str>Hard Drive</str>
  </arr>
  …
</response>

If you want to try this yourself, open a shell and type the following commands:

# cd $INSTALL_DIR/example
# java -Dsolr.clustering.enabled=true -jar start.jar

These will start Solr with the ClusteringComponent enabled. Now, on another shell type this:

# cd $INSTALL_DIR/example/exampledocs
# ./post.sh *.xml

Finally, open a browser and execute this query:

http://localhost:8983/solr/clustering?q=*:*&rows=10

You should get a response similar to the preceding example, with the "clusters" section at the bottom.

UIMA Metadata Extraction Library

This module integrates Apache UIMA in Solr by providing a powerful Metadata Extraction Library that can be used for tasks such as automatic keyword extraction and Named Entity Recognition (for example, places, names, concepts, and dates).

The plugin can be provided both as an UpdateRequestProcessor subclass, to decorate the index process chain, or as a set of Tokenizers/Filters, to add such behavior in the (index or query) text analysis phase.

Using this module, you can enrich your Solr documents with additional metadata information extracted from the input data. UIMA provides an analysis engine that involves several components arranged in a pipeline. The default pipeline supports the use of existing analysis engines such as Alchemy or OpenCalais. Keep in mind that these engines are not free-of-charge, but they provide a free trial period. You can register and obtain an API key that must be configured in the solrconfig.xml file. Other components are used for language and sentence detection.

Note

Under the contrib/uima folder, you will find a README file with detailed information about the Solr UIMA module usage.

The UIMA UpdateRequestProcessor intercepts the documents that are being indexed and sends them to its analysis pipeline. Those documents will be automatically enriched with extracted information such as sentences, languages, or named entities (for example, places or names).

MapReduce

The MapReduce contrib module provides integration with Apache Hadoop. MapReduce is the name of a paradigm (programming model) that is implemented in Apache Hadoop to process large datasets with a parallel and distributed algorithm.

The contribution contains a MapReduce job to build Solr indexes and merge them into a Solr cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.74.25