Cassandra

Cassandra is distributed, decentralized, fault tolerant, eventually consistent, linearly scalable, and a column-oriented data store. This means Cassandra is made to easily deploy over a cluster of machines located at geographically different places. There is no central master server, so no single point of failure, no bottleneck, data is replicated, and a faulty node can be replaced without any downtime. It's eventually consistent. It is linearly scalable, which means with a greater number of nodes, the requests served per second per node would not go down. Also, the total throughput of the system will increase with each node being added. And finally, it's column oriented, much like a map (or better, a map of sorted maps) or a table with flexible columns where each column is essentially a key-value pair. So, you can add columns as you go, and each row can have a different set of columns (key-value pairs). It does not provide any relational integrity. It is up to the application developer to perform relation management.

So, if Cassandra is so good at everything, why not everyone drop whatever database they are using and jump start with Cassandra? This is a natural question. We'll discuss in a later chapter what Cassandra is not good at, but there may be several obvious reasons, such as not everyone needs a super-scalable data store, and some are good with rather slow, but cozy RDBMS tools. Some applications require strong ACID compliance, such as a booking system. If you are a person who goes by statistics, you'd ask how Cassandra fares with other existing data stores. TilmannRabl et al in their paper, Solving Big Data Challenges for Enterprise Application Performance Management (http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf), told that, "In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput from one to 12 nodes. This comes at the price of a high write and read latency. Cassandra's performance is best for high insertion rates."

If you go through the paper, Cassandra wins in almost all the criteria. Equipped with proven concepts of distributed computing, made to reliably serve from commodity servers, and simple and easy maintenance, Cassandra is one of the most scalable, fastest, and very robust NoSQL database. So, the next natural question is what makes Cassandra so blazing fast? Let us dive deeper into the Cassandra architecture.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.152.58