Summary

So, we can store large data and run MapReduce on them to analyze the data. We can also set up Hadoop in such a manner that it does not impact the transactional part of Cassandra in a negative way. We also know how to set up Pig for those who want to quickly assemble an analysis instead of writing lengthy Java code. We can also power Solr searches by Cassandra, making Solr more scalable than it already is.

With a plethora of analytical tooling available in the market, you may or may not choose Cassandra. Maybe you could perform stream analysis that does not require data to be stored and analyzed later; for example, if you decide to apply multiple operations on live streaming tweets and show the result immediately, you would likely use a tool such as Twitter Storm. Although there is no explicit project to guide you on how to do that, it is pretty simple to configure Twitter as Storm Spout, which will emit the tweet stream to the next Bolt, get it processed and forwarded to the next Bolt, and finally you can use the Cassandra Java driver to simply store the result. It is as simple as that. You may want to put a queue between Bolt and Cassandra as a buffer if you find Tweets are too fast for Cassandra. But normally, you wouldn't need that.

Some distributed computation tools such as Spark have people developing nice integration tools such as Calliope (http://tuplejump.github.io/calliope/). In general, you wouldn't be disappointed for choosing Cassandra because there is no documentation available to integrate it with a relatively popular framework.

Cassandra is a rapidly developing project. The changes and feature additions in this open source project takes place once in six months and doesn't happen in many big label proprietary applications. You get faster, stronger, and better Cassandra for free (obviously, there are technical debts) every half year. While this is a great thing, it comes with a pain point—new learning. To be able to upgrade, you will need to know new ways to do things. There may be changes that require you to change things at code level to keep pace with Cassandra. Most times, you could just upgrade Cassandra and things will work as expected. But you may not be taking benefit of new features. The next chapter is about the current edition of Cassandra, Cassandra 1.2.x. It has many new features, new ways to do things, and new ways to visualize the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.26.221