Chapter 4. Deploying a Cluster

So, you have played a bit with Cassandra on your local machine and read something about how great it scales. Now it's time to evaluate all the tall claims that Cassandra makes.

This chapter deals with cluster deployment and the decision that you need to make that will affect a number of nodes, types of machines, and tweaks in the Cassandra configuration file. We start with hardware evaluations and then dive into OS-level tweaks, followed by the prerequisite software applications and how to install them. Once the base machine is ready, we will discuss the Cassandra installation—which is fairly easy. The rest of the chapter discusses various settings available, what fits in which situation, the pros and cons, and so on. Having been equipped with all this information, you are ready to launch your first cluster. The chapter provides working code that deploys Cassandra on n number of nodes, sets the entire configuration, and starts Cassandra, effectively launching each node in about 40 seconds, thus enabling you to get going with an eight-node cluster in about 5 minutes.

Note

Code pattern

All the shell commands mentioned in this chapter follow a pattern. Each line starting with a # sign is just a comment to clarify the context. Each line starting with a $ sign is a command. Some longer commands may be broken into multiple lines for reading clarity. If a command is broken, the end of the line contains a line-continuation character—a backslash (). Each line that does not have either of these symbols is the output of a command. Please follow this pattern unless specified otherwise.

Evaluating requirements

It is generally a good idea to examine what kind of load Cassandra is going to face when deployed on a production server. It does not have to be accurate, but some sense of traffic can give a little more clarity to what you expect from Cassandra (criteria for load tests), whether you really need Cassandra (the halo effect), or whether you can bear all the expenses that a running Cassandra cluster can incur on a daily basis (the value proposition). Let's see how to choose various hardware specifications for a specific need.

Hard disk capacity

A rough disk space calculation of the user that will be stored in Cassandra involves adding up data stored in four data components on disk: commit logs, SSTable, an index file, and a bloom filter. When the incoming data is compared with the data on the disk, you need to take account of the database overheads associated with each type of data. The data on disk can be about two times as large as raw data. Disk usage can be calculated using the following code snippet:

# Size of one normal column
column_size (in bytes) = column_name_size + column_val_size + 15

# Size of an expiring or counter column
col_size (in bytes) = column_name_size + column_val_size + 23

# Size of a row
row_size (bytes) = size_of_all_columns + row_key_size + 23

# Primary index file size
index_size (bytes) = number_of_rows * (32 + mean_key_size)

# Addition space consumption due to replication
replication_overhead = total_data_size * (replication_factor - 1)

Apart from this, the disk also faces high read-write during compaction. Compaction is the process that merges SSTables to improve search efficiency. The important thing about compaction is that it may, in the worst case, utilize as much space as occupied by user data. Therefore, it is a good idea to have a lot of space left. We'll discuss this again, but it depends on the choice of compaction_strategy that is applied. For LeveledCompactionStrategy, a balance of 10 percent is enough. On the other hand, SizeTieredCompactionStrategy requires 50 percent free disk space in the worst case. Here are some rules of thumb with regard to disk choice and disk operations:

  • Commit logs and data files on separate disks: Commit logs are updated on each write and are read-only for startups, which is rare. A data directory, on the other hand, is used to flush MemTables into SSTables asynchronously. It is read through and written on during compaction, and most importantly, it might be looked up by a client to satisfy the consistency level. Having the two directories on the same disk may potentially cause the client operation to be blocked. Their I/O patterns are quite different too. Commit log is basically an append-only write operation, whereas SSTable is basically random access.

    Note

    It is important to note that keeping commitlog and data directories only matters if you have a spinning disk. For Solid State Drives (SSDs), you can keep them on the same disk. However, if you have an SSD and a spinning disk as your storage devices, it is recommended that you keep commit log on the spinning disk, and data directory on the SSD.

  • RAID 0: Cassandra performs inbuilt replication by means of a replication factor. Therefore, it does not possess any sort of hardware redundancy. If one node dies completely, the data is available on other replica nodes, with no difference between the two. This is the reason that RAID 0 (http://en.wikipedia.org/wiki/RAID#RAID_0) is the most preferred RAID level. Another reason is improved disk performance and extra space.
  • Filesystem: If one has choices, XFS is the most preferred filesystem for Cassandra deployment. XFS supports 16 TB on a 32-bit architecture, and a whopping 8 EiB (Exabibyte) on 64-bit machines. Owing to storage space limitations, the ext4, ext3, and ext2 filesystems (in that order) can be considered to be used for Cassandra.
  • SCSI and SSD: With disks, the guideline is the faster, the better. SCSI is faster than SATA, and SSD is faster than SCSI. SSDs are extremely fast as there are no moving parts. It is recommended that you use rather low-priced consumer SSD for Cassandra, as enterprise-grade SSD has no particular benefit over it.
  • No EBS on EC2: This is specific to Amazon Web Services (AWS) users. Elastic Block Store (EBS) from AWS is strongly discouraged for the purpose of storing Cassandra data—either of data directories or commit log storage. Poor throughput and issues such as getting unusably slow, instead of cleanly dying, are major roadblocks of the network attached storage.
  • XFS filesystem: http://en.wikipedia.org/wiki/XFS.
  • AWS EBS: http://aws.amazon.com/ebs/. Instead of using EBS, use ephemeral devices attached to the instance (also known as an instance store). Instance stores are fast and do not suffer any problems as EBS. Instance stores can be configured as RAID 0 to utilize it even further.

RAM

Larger memory boosts Cassandra performance from multiple aspects. More memory can hold larger MemTables, which means that fresh data stays for a longer duration in memory and leads to lesser disk accesses for recent data. This also implies that there will be fewer flushes (less frequent disk I/O) of MemTable to SSTable, and the SSTables will be larger and fewer. This leads to improved read performance as fewer SSTables are needed to scan during a lookup. Larger RAM can accommodate larger row cache, thus decreasing disk access.

For any sort of production setup, a RAM capacity less than 8 GB is not recommended. A memory capacity above 16 GB is preferred.

CPU

Cassandra is a highly-concurrent application. All of the CPU-intesive tasks, such as compaction, writes, and fetching results from multiple SSTables and joining them to create a single view for the client, keep running during the life cycle of Cassandra. It is suggested to use an eight-core CPU, but anything with a higher core will just be better.

For a cloud-based setup, a couple of things need to be kept in mind:

  • A provider that gives a CPU-bursting feature should be used. One such provider is Rackspace.
  • AWS Micro instances should be avoided for any serious work. There are many reasons for this. AWS Micro comes with EBS storage and no option to use an instance store. But the deal-breaker issue is CPU throttling that makes it useless for Cassandra. If one performs a CPU-intensive task for 10 seconds or so, CPU usage gets restricted on micro instances. However, AWS Micro instances may be good (cheap), if one just wants to get started with Cassandra.

Is node a server?

With vnodes from 1.2 onward, it is confusing what we mean by node. This section assumes a node is a machine. Unless you have specifically turned off vnodes by setting initial tokens and commenting out num_tokens in cassandra.yaml, you would have as many vnodes on that machine as the number specified by num_tokens. With the virtualization of nodes making each Cassandra machine behave as multiple small Cassandra nodes that hold a small range of row keys, version 1.2 and onward can hold lot more data than they could before.

The suggested size of data per machine closely depends on an application's read and write load. However, in general, keeping the data size of per machine to 1 TB or below is a good idea, although Cassandra can work decently well up to 5 TB of data per node.

Network

As any other distributed system, Cassandra is highly dependent on the network. Although Cassandra is tolerant to network partitioning, a reliable network with less outages is preferred for the system—less repairs and less inconsistencies.

A congestion-free, high-speed (gigabit or higher), reliable network is pretty important as each read-write, replication, and moving and/or draining node puts heavy load on the network.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.210.71