Cassandra and Solr

Apache Solr is a text search platform written on top of Apache Lucene. Solr uses the Lucene search library and provides a simpler interface to manage indexes and perform search over a variety of sources such as RDBMS, text, and rich documents, for example, PDF and Word. Solr can be started as an independent Java web service on any application container such as Tomcat or Jetty.

Lucene, Solr, and search mechanism/indexing each require a separate book of their own. We will keep this section brief. You may learn about Solr from the Apache Solr wiki page (http://wiki.apache.org/solr/).

In this section we will see how we can use Cassandra to serve as a database backend of Solr. So, we will have Solr running on top of Cassandra. Please note that this does not give us the ability to text search in Cassandra.

Solandra (https://github.com/tjake/Solandra) is an open source project that allows you to set up Cassandra to be used as storage for Solr. To configure Solandra, you need to follow the steps mentioned:

$ git clone https://github.com/tjake/Solandra.git
Cloning into 'Solandra'...
[-- snip --]
$ cd Solandra
$ ant  -Dcassandra=/opt/cassandra11cassandra-dist
$ $CASSANDRA_HOME/bin/solandra
INFO 11:41:52,784 Starting Messaging Service on port 7000
[-- snip --]
INFO 11:41:53,063 Bootstrap/Replace/Move completed! Now serving reads.
INFO 11:41:53,116 Binding thrift service to localhost/127.0.0.1:9160
[-- snip --]
INFO 11:41:54,547 QuerySenderListener done.
INFO 11:41:54,547 [] Registered new searcher Searcher@63ab3977 main
INFO 11:41:54,548 user.dir=/home/nishant/apps/solandra/Solandra
INFO 11:41:54,548 SolrDispatchFilter.init() done
INFO 11:41:54,551 Started [email protected]:8983

Now Solr is ready to serve. You can post data to it to index. For a large-scale data search in a cluster environment, you will need to have embedded Solandra running on each node of the ring. This way Solandra makes Solr as scalable as Cassandra. Let's check Solr with Cassandra by executing one of the built-in examples.

$ cd $SOLANDRA_HOME/reuters-demo/
$ ./1-download-data.sh
[-- snip --]
Data downloaded, now run ./2-import-data.sh
$ ./2-import-data.sh
Posted schema.xml to http://localhost:8983/solandra/schema/reuters
Loading data to solandra, note: this importer uses a slow xml parser
READING FILE: /home/nishant/apps/solandra/Solandra/reuters-demo/data/reut2-002.sgm
 - reut2-002.sgm(0) title(1.0)={JAGUAR SEES STRONG GROWTH IN NEW MODEL SALES}
 - reut2-002.sgm(1) title(1.0)={OCCIDENTAL PETROLEUM COMMON STOCK OFFERING RAISED TO 36 MLN SHARES
}
 - reut2-002.sgm(2) title(1.0)={CCC ACCEPTS BONUS BID ON WHEAT FLOUR TO IRAQ}
 - reut2-002.sgm(3) title(1.0)={DIAMOND SHAMROCK RAISES CRUDE POSTED PRICES ONE DLR, EFFECTIVE MARCH 4, WTI NOW 17.00 DLRS/BBL
}
 - reut2-002.sgm(4) title(1.0)={NORD RESOURCES CORP <NRD> 4TH QTR NET}

[-- snip --]

Data loaded, now open ./website/index.html in your favorite browser!

If you observe, all this does is it downloads some data and posts it to Solr the same way you would do for regular Solr. You can open ./website/index.html to play with Solr (see the next figure).

It is sometimes confusing to someone who first learns about Solandra; it seems that it provides a text search facility to Cassandra. If you want to achieve something like that, you will have to make two calls: one to update Solr by posting appropriate changes to it and the other to make those changes in Cassandra. The good thing is you do not need to run two Cassandra instances for Solr and your application.

Development note on Solandra

Solandra is a pretty impressive project. One of the saddening things about this project is that it is no longer actively developed. At the time of writing, the Solandra project was last updated a year ago. The owner of the project has moved to DataStax to provide more efficient integration of Solr with Cassandra and a text search capability to Cassandra that one always wished for (we will discuss briefly about this in the next section).

The project seems to be in a decent working condition, but it may not be recommended for production-ready deployments. If you are planning to use Solandra, test it rigorously.

Development note on Solandra

Figure 8.6: Solandra in action

DataStax Enterprise – the next level Solr integration

DataStax Enterprise solution for Cassandra comes with built-in Solr integration to provide a text search facility directly from column families. DataStax documentation says that its Solr implementation is two to four folds faster than Solandra.

To learn more about Solr in the DataStax offering, you may visit their blog on the URL http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.113.163