Preface

Back in 2007, Twitter users would experience "fail whale" captioned with "Too many tweets..." occasionally. On August 03, 2013, Twitter posted a new high-tweet rate record: 143,199 per second, and we rarely saw the fail whale. Many things changed since 2007. People and things connected to the Internet have increased exponentially. Cloud computing and hardware on demand have become cheap and easily available. Distributed computing and the NoSQL paradigm have taken off with a plethora of freely available, robust, proven, and open source projects to store large datasets, process it, and visualize it. "Big Data" has become a cliché. With massive amounts of data that get generated at a very high speed via people or machines, our capability to store and analyze data has increased. Cassandra is one of the most successful data stores that scales linearly, is easy to deploy and manage, and is blazing fast.

This book is about Cassandra and its ecosystem. The aim of this book is to take you from the basics of Apache Cassandra to understand what goes on under the hood. The book has three broad goals. First, to help you take right design decisions and understand the patterns and antipatterns. Second, to enable you to manage infrastructure on a rainy day. Third, to introduce you to some of the tools that work with Cassandra to monitor and manage Cassandra and to analyze the big data that you have inside it.

This book does not take a purist approach, rather a practical one. You will come to know proprietary tools, GitHub projects, shell scripts, third-party monitoring tools, and enough references to go beyond and dive deeper if you want.

What this book covers

Chapter 1, Quick Start, is about getting excited and having the instant gratification of Cassandra. If you have no prior experience with Cassandra, you leave this chapter with enough information to get yourself started on the next big project.

Chapter 2, Cassandra Architecture, covers design decisions and Cassandra's internal plumbing. If you have never worked with a distributed system, this chapter has some gems of distributed design concepts. It will be helpful for the rest of the book when we look at patterns and infrastructure management. This chapter will also help you understand the discussion of the Cassandra mailing list and JIRA. It is a theoretical chapter; you can skip it and come back to it later if you wish.

Chapter 3, Effective CQL, covers CQL, which is the de facto language to communicate with Cassandra. This chapter goes into the details of CQL and various things that you can do using it.

Chapter 4, Deploying a Cluster, is about deploying a cluster right. Once you go through the chapter, you will realize it is not really hard to deploy a cluster. It is probably one of the simplest distributed systems.

Chapter 5, Performance Tuning, deals with getting the maximum out of the hardware the cluster is deployed on. Usually, you will not need to rotate lots of knobs, and the default is just fine.

Chapter 6, Managing a Cluster – Scaling, Node Repair, and Backup, is about the daily DevOps drills. Scaling up a cluster, shrinking it down, replacing a dead node, and balancing the data load across the cluster is covered in this chapter.

Chapter 7, Monitoring, talks about the various tools that can be used to monitor Cassandra. If you already have a monitoring system, you would probably want to plug Cassandra health monitoring to it, or you can choose the dedicated and thorough Cassandra monitoring tools.

Chapter 8, Integration with Hadoop, covers Cassandra, which is about large datasets, fast writes and reads, and terabytes of data. What is the use of data if you can't analyze it? This chapter gives an introduction to get you started with the Cassandra and Hadoop setups.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.133.180