Configuring a Cassandra cluster

Now that you have a single node setup, you may start Cassandra by executing <cassandra_installation>/bin/cassandra for a tarball install or by running sudo service cassandra start for a repository installation. (We'll see later in this chapter how to write a .init script for Cassandra and set it up to start on boot.) However, it needs a couple of configuration tweaks in order to get a Cassandra cluster working.

If you look at cassandra.yaml, you will find that it has the following six sections:

  • Cluster setup properties: These are basically startup properties, file location, ports, replica placement strategies, and inter-node communication settings.
  • Performance tuning properties: These help in setting up appropriate values to system/network resources based on your setup.
  • Client connection properties: These help in setting up the behavior of client-to-node connectivity, things such as number of requests per client, maximum number of threads (clients), and much more.
  • Inter-node communication: This section contains configurations for node-to-node communication within a cluster. These include hinted handoff settings and failure detection settings.
  • Backup settings: These settings are Cassandra-automated backup items.
  • Authorization and authentication settings: This provides protected access to the cluster. The default is to allow all.

In most of the cases, you will never have to bother about client connection properties and inter-node communication settings. Even by default, the configuration is very smart and robust for any modern-day computer. The rest of this chapter will discuss the cluster setup properties and various options that Cassandra provides out of the box. Security will be discussed briefly. In the section Authorization and authentication, we will tune Cassandra using various properties in cassandra.yaml.

The cluster name

A cluster name is used to logically group nodes that belong to the same cluster. It works as a namespace for nodes. In a multiple cluster environment, it works as a preventive mechanism for nodes of a cluster to join one another.

It is always a good idea to give a meaningful cluster name, even if you have a single cluster at the time. Consider cassandra.yaml:

cluster_name: 'MyAppMainCluster'

It is suggested to change the cluster name before you launch the cluster. Changing the cluster name when the cluster is up and running throws an exception. This is because the cluster name resides in the system keyspace, and as of Version 1.1.11, you cannot edit the system keyspace. (It was possible in some older versions.)

If you have to change the cluster name, there is an unofficial trick to do that. A cluster name is stored in the LocationInfo column family of the system keyspace. So, you need to stop the nodes, change the cassandra.yaml file with a new cluster name, and delete the contents of the <data_directory>/system/LocationInfo directory (or just move the contents elsewhere so that you can replace them if something goes wrong). On restarting the node, you can see when you connect to a local node via cassandra-cli that you are welcomed to a new cluster name. This process needs to be repeated for all the nodes in the cluster.

The seed node

A seed node is the one that a newly-launched node consults to, to know about the cluster. Although gossip (refer to section Gossip, Chapter 2, Cassandra Architecture) is the way how nodes know one another, but to a new node, seeds are the first nodes that it will know and start a gossip with. And, eventually, all nodes will know its presence and it will know about the other nodes.

There must be at least one seed node in a cluster. Seed nodes should be a rather stable node. One may configure more than one seed node for added redundancy and increased availability of seed nodes. In cassandra.yaml, seed nodes are a comma-separated list of seed addresses:

seed_provider: 
 - class_name: org.apache....SimpleSeedProvider
   parameters: 
    # Ex: "<ip1>,<ip2>,<ip3>" 
    - seeds: "c1.mydomain.com,c2.mydomain.com"

Listen, broadcast, and RPC addresses

A listen address is the address that nodes use to connect to one another for communication/gossip. It is important to set this to the private IP of the machine. If it is not set, it picks up the hostname value, which if incorrectly configured can cause problems.

A broadcast address is the public address of the node. If it is not set, it will take whatever value the listen address bears. It is generally required only if you are setting up Cassandra to multiple data centers.

Note

EC2 users : With multiple data center installation, you need to use EC2 Snitch . listen_address may be set to blank because AWS assigns a hostname that is based on a private IP, but you may set it to a private IP or private DNS. broadcast_address must be a public IP or a public DNS. Also, remember not to forget to open storage_port (default is 7000) in your security group that holds Cassandra instances.

An RPC address is for client connections to the node. It can be anything—the IP address, hostname, or 0.0.0.0. If not set, it will take the hostname.

In cassandra.yaml, these properties look like the following:

# listen_address is not set, so it takes hostname.
listen_address:

# broadcast is commented. So, it is what listen_address is.
# broadcast_address: 1.2.3.4
# rpc_address is not set, so it is hostname.
rpc_address:

Initial token

An initial token is a number assigned to a node that decides the range of keys that it will hold. A node basically holds the tokens ranging from the previous node's token (the node that holds a token immediately less than the current node) Tn-1 + 1 to the current node's token Tn.

In cassandra.yaml, you may mention an initial token as follows:

initial_token: 85070591730234615865843651857942052864

It is important to choose initial tokens wisely to make sure the data is evenly distributed across the cluster. If you do not assign initial tokens to the nodes at the start of a new cluster, it may get automatically assigned, which may lead to a hotspot. However, adding a new node is relatively smart. If you do not mention an initial token to the new node, Cassandra will take a node that is loaded the most number of times and assign the new node a key that splits the token's own loaded node into half. It is possible to load balance a running cluster, and it is fairly easy. We'll learn more about load balancing in the section Load balancing, Chapter 6, Managing a Cluster – Scaling, Node Repair, and Backup. The next logical question is how do we determine the initial token for a node? It depends on the partitioner that you are using. And basically, we divide the whole range of keys that the partitioner supports into N equal parts, where N is the total number of nodes in the cluster. Then we assign each node a number. We will see this in the next section.

Partitioners

By assigning initial tokens, what we have done is created buckets of keys. What determines which key goes to what bucket? It is partitioner . A partitioner generates a hash value of the row key; based on the hash value, Cassandra determines to which "bucket" (node) this row needs to go. This is a good way in which hash will always generate a unique number for a row key. So, this is also what is used to determine the node to read from.

Like everything else in Cassandra, partitioner is a pluggable interface. You can implement your own partitioner by implementing org.apache.cassandra.dht.IPartitioner and drop the .class or .jar file in Cassandra's lib directory.

Here is how you mention the preference for partitioner in cassandra.yaml:

partitioner: org.apache.cassandra.dht.RandomPartitioner

In most cases, the default partitioner is generally good for you. It distributes keys evenly. As of Version 1.1.11, the default is RandomPartitioner, but in 1.2 or higher, this is going to be changed to a more efficient version, Murmur3Partitioner.

Be warned that it is a pretty critical decision to choose a partitioner, because this determines what data stays where. It affects the SSTable structure. If you decide to change it, you need to clean the data directory. So, whatever decision is made for the partitioner at the start of the cluster is likely to stay for the life of the storage.

Cassandra provides three partitioners by default.

Note

Actually, there are five. But two are deprecated, so we will not be discussing them here. It is not recommended to use them. They are OrderPreservingPartitioner and CollatingOrderPreservingPartitioner.

The random partitioner

The random partitioner is the default partitioner till Version 1.1. It uses MD5 hash to generate hash values for row keys. Since hashes are not generated in any orderly manner, it does not guarantee any ordering. This means that two lexicographically close row keys can possibly be thrown into two different nodes. This random assignment of token to a key is what makes it suitable for even distribution of keys among nodes. This means it is highly unlikely that a balanced node is ever going to have hotspots.

The keys generated by a random partitioner may vary in the range of 0 to 2127 - 1. So, for the ith node in an N nodes cluster, the initial token can be calculated by 2127 * (i - 1) / N. The following is a simple Python code to generate the complete sequence of initial tokens for a random partitioner of a cluster of eight nodes:

# running in Python shell
>>>nodes = 8 
>>> print ("
".join(["Node #" + str(i+1) +": " + str((2 ** 127)*i/nodes) for i in xrange(nodes) ]))

Node #1: 0 
Node #2: 21267647932558653966460912964485513216 
Node #3: 42535295865117307932921825928971026432 
Node #4: 63802943797675961899382738893456539648 
Node #5: 85070591730234615865843651857942052864 
Node #6: 106338239662793269832304564822427566080 
Node #7: 127605887595351923798765477786913079296 
Node #8: 148873535527910577765226390751398592512

The byte-ordered partitioner

A byte-ordered partitioner, as the name suggests, generates tokens for row keys that are in the order of hexadecimal representations of the key. This makes it possible that rows are ordered by row keys and can iterate through rows as iterating through an ordered list. But this benefit comes with a major drawback: hotspots. The reason for the creation of a hotspot is uneven distribution of data across a cluster. If you have a cluster with 26 nodes, a partitioner such as ByteOrderedPartitioner, and each node is responsible for one letter. So, the first node is responsible for all the keys starting with A, the second for B, and so on. A column family that uses the usernames as row keys will have uneven data distribution across the ring. The data distribution will be skewed with nodes X, Q, and Z being very light, and nodes A and S being heavily loaded. This is bad for multiple reasons, but the most important one is generating a hotspot. The nodes with more data will be accessed more than the ones with less data. The overall performance of a cluster may be dropped down to the number of requests that a couple of highly loaded nodes can serve.

The best way to assign the initial token to a cluster using ByteOrderedPartitioner is to sample data and determine what keys are the best to assign as initial tokens to ensure an equally balanced cluster.

Let's take a hypothetical case where your keys of all keyspaces can be represented by five character strings from "00000" to "zzzzz". Here is how we generate initial tokens in Python:

>>> start = int("00000".encode('hex'), 16) 
>>> end = int("zzzzz".encode('hex'), 16) 
>>> range = end – start 
>>> nodes = 8 
>>> print "
".join([ "Node #" + str(i+1) + ": %032x" % (start + range*i/nodes)  for i in xrange(nodes) ])

Node #1: 00000000000000000000003030303030 
Node #2: 00000000000000000000003979797979 
Node #3: 000000000000000000000042c2c2c2c2 
Node #4: 00000000000000000000004c0c0c0c0b 
Node #5: 00000000000000000000005555555555 
Node #6: 00000000000000000000005e9e9e9e9e 
Node #7: 000000000000000000000067e7e7e7e7 
Node #8: 00000000000000000000007131313130

Remember, this is just an example; in a real case you will decide this only after evaluating the data. Or, probably want to have initial tokens assigned by UUIDs.

The Murmur3 partitioner

The Murmur3 partitioner is the new default for Cassandra Version 1.2 or higher. If you are starting a new cluster, it is suggested to keep the Murmur3 partitioner. It is not order preserving, and it has all the features of a random partitioner plus it is fast and provides better performance than a random partitioner. One difference with a random partitioner is that it generates token values between -263 and +263.

If you are migrating from a previous version to 1.2 or higher, please make sure that you are using the same partitioner as the previous one. If you were using a default, it is likely that you were using a random partitioner. This will cause trouble, if you have not edited cassandra.yaml to change the new default Murmur3 partitioner back to a random partitioner.

To generate initial tokens, we'll again apply our familiar formula, but this time the start position is not zero, so the range of tokens is (end – start): +263 - ( -263) = 264. Here is the simple Python script to do this:

>>> nodes = 8 
>>> print "
".join(["Node #" + str(i+1) + ": " + str( -(2 ** 63) + (2 ** 64)*i/nodes) for i in xrange(nodes)] )

Node #1: -9223372036854775808 
Node #2: -6917529027641081856 
Node #3: -4611686018427387904 
Node #4: -2305843009213693952 
Node #5: 0 
Node #6: 2305843009213693952 
Node #7: 4611686018427387904 
Node #8: 6917529027641081856

Snitches

Snitches are the way to tell Cassandra about the topology of cluster, and about nodes' locations and their proximities. There are two tasks that snitches help Cassandra with. They are as follows:

  • Replica placement: As discussed in the section Replication from Chapter 2, Cassandra Architecture, depending on the configured replication factor, data gets written to more than one node. And snitches are the decision-making mechanism where the replicas are sent to. An efficient snitch will send place replicas in a manner that provides the highest availability of data.
  • Efficient read and write routing: Snitches are all about defining cluster schema, and thus they help Cassandra in deciding the most efficient path to perform reads and writes.

Similar to partitioners, snitches are pluggable. You can plug in your own custom snitch by extending org.apache.cassandra.locator.EndPointSnitch. The PropertyFileEndPointSnitch class can be used as a guideline on how to write a snitch. To configure a snitch, you need to alter endpoint_snitch in cassandra.yaml:

endpoint_snitch: SimpleSnitch

For custom snitches, mention the fully-qualified class name of the snitch, assuming you have dropped the custom snitch .class/.jar file in Cassandra's lib directory.

Out of the box, Cassandra provides the snitches detailed in the following sections.

SimpleSnitch

SimpleSnitch is basically a do-nothing snitch. If you see the code, it basically returns rack1 and datacenter1 for whatever IP address the endpoint has. Since it discards any information that may be retrieved from the IP address, it is appropriate for installations where data center-related information is not available, or all the nodes are in the same data center.

This is the default snitch as of Version 1.1.11.

PropertyFileSnitch

PropertyFileSnitch is a way to explicitly tell Cassandra the relative location of various nodes in the clusters. It gives you a means to handpick the nodes to group under a data center and a rack. The location definition of each node in the cluster is stored in a configuration file, cassandra-topology.properties, which can be found under the conf directory (for a tarball installation, it is <installation>/conf; for repository installations, it is /etc/cassandra). Note that if you are using PropertyFileSnitch, all the nodes must have an identical topology file.

cassandra-topology.properties is a standard properties file with keys as the IP address of the node and value as <data-center-name>:<rack-name>; it is up to you what data center name and what rack name you give. Two nodes with the same data center name will be treated as nodes within a single data center. And two nodes with the same data center name and rack name combo will be treated as two nodes on the same rack.

Here is an example topology file:

# Cassandra Node IP=Data Center:Rack
# Data-center 1
10.110.6.30=DC1:RAC1 
10.110.6.11=DC1:RAC1 
10.110.4.30=DC1:RAC2
# Data-center 2
10.120.8.10=DC2:RAC1 
10.120.8.11=DC2:RAC1

# Data-center 3
10.130.1.13=DC3:RAC1 
10.130.2.10=DC3:RAC2

# default for unknown nodes
default=DC1:RAC0

DCX and RACX are commonly used patterns to denote a data center and a rack respectively. But you are free to choose anything that suits you. The default option is to take care of any node that is not listed in PropertyFileSnitch.

GossipingPropertyFileSnitch

Even with all the fancy naming and grouping, one thing that keeps bugging in PropertyFileSnitch is the manual effort to keep the topology files updated with every addition or removal of the node. GossipingPropertyFileSnitch is there to solve this problem. This snitch uses the gossip mechanism to propagate the information about the node's location.

In each node, you put a file named cassandra-rackdc.properties under the conf directory. This file contains two things: the name of the node's data center and the name of the node's rack. It looks like this:

dc=DC3
rack=RAC2

RackInferringSnitch

If SimpleSnitch is one end of the spectrum, where snitch does not assume anything, RackInferringSnitch is the other extreme of the spectrum. RackInferringSnitch uses an IP address to guess the data center and rack of the node. It assumes that the second octet of the IP address uniquely denotes a data center, and the third octet uniquely represents a rack within a data center. So, for 10.110.6.30, 10.110.6.4, 10.110.2.42, and 10.108.10.1, this snitch assumes that the first two nodes reside in the same data center and in the same rack, while the third node lives in the same data center but in a different rack. It assumes that the fourth node exists in a different data center than the rest of the nodes in the example:

    +---------> Data center
    |  +------> Rack
    |  |
10.110.6.30

This can be dangerous to use if your machines do not use this pattern for IP assignment to the machines in data centers.

EC2Snitch

Ec2Snitch is a snitch specially written for Amazon AWS installations. It uses the node's local metadata to get its availability zone and then breaks it into pieces to determine the rack and data center. Please note that rack and data center determined this way do not correspond to the physical location of hardware in Amazon's cloud facility, but it gives a pretty nice abstraction.

Ec2Snitch treats the region name as the data center name and availability zone as the rack name. It does not work cross-region. So, effectively, Ec2Snitch is the same as a single data center setup. If one of your nodes is in us-east-1a and another in us-east-1b, it means that you have two nodes in a data center named us-east in two racks, 1a and 1b.

EC2MultiRegionSnitch

Ec2Snitch does not work well if you decide to keep nodes across different EC2 regions. The reason being EC2Snitch uses private IPs, which will not work across regions (but do work across availability zones in a region).

If your cluster spans multiple regions, you should use Ec2MultiRegionSnitch.

Note

EC2 users: If you plan to distribute nodes in different regions, there is more than just a proper snitch that is needed to make nodes successfully communicate with one another. You need to change the following:

  • broadcast_address: This should be the public IP or public DNS of the node.
  • listen_address: This should be set to a private IP or DNS. But if not set, the hostname on EC2 is generally derived from the private IP, which is fine.
  • endpoint_snitch: This should be set to Ec2MultiRegionSnitch.
  • storage_port: The default 7000 is fine, but remember to open this port in the security group that holds Cassandra instances.

Replica placement strategies

Apart from putting data in various buckets based on nodes' tokens, Cassandra has to replicate the data depending on what replication factor is associated with the keyspace. Replica placement strategies come into action when Cassandra has to decide where a replica is needed to be placed.

There are two strategies that can be used based on the demand and structure of the cluster.

SimpleStrategy

SimpleStrategy places the data on the node that owns it based on the configured partitioner. It then moves to the next node (toward a higher bucket), places a replica, moves to the next node and places another, and so on, until the replication factor is met.

SimpleStrategy is blind to cluster topology. It does not check whether the next node to place the replica in is in the same rack or not. Thus, this may not be the most robust strategy to use to store data. What happens if all three replicas of a key range are physically located in the same rack (assuming RF=3) and there is a power failure of that rack; you lose access to some data until power is restored. This leads us into a rather smarter strategy, NetworkTopologyStrategy.

Although we discussed how bad SimpleStrategy can be, this is the default strategy. Plus, if you do not know the placement or any configuration details of your data center and you decide to stay in a single data center, NetworkTopologyStrategy cannot help you much.

NetworkTopologyStrategy

NetworkTopologyStrategy, as the name suggests, is a data-center- and rack-aware replica placement strategy. NetworkTopologyStrategy tries to avoid the pitfalls of SimpleStrategy by considering the rack name and data center names that it figures out from the configured snitch. With the appropriate strategy_option, stating how many replicas go to which data centers, NetworkTopologyStrategy is a very powerful and robust-mirrored database system.

NetworkTopologyStrategy requires the system admin to put a little extra thought into deciding appropriate values for initial tokens for multiple data center installations. For a single data center setup, initial tokens make up an evenly divided token range assigned to various nodes.

NetworkTopologyStrategy and multiple data center setups

Here is the issue with multiple data center setups. Suppose you have two data centers with each having three nodes in it; here is what the keyspace looks like:

CREATE KEYSPACE myKS
  WITH placement_strategy = 'NetworkTopologyStrategy' 
  AND strategy_options={DC1:3, DC2:3};

It says that there are at least six nodes in the ring; keep three copies of each row in DC1 and three more copies in DC2.

Assume the system actually has four nodes in each data center, and you calculated the initial token by dividing the possible token range into eight equidistant values. If you assign the first four tokens to four nodes in DC1 and the rest to the nodes in DC2, you will end up having a lopsided data distribution.

Let's take an example. Say we have a partitioner that generates tokens from 0 to 200. The token distribution, if done in the way previously mentioned, will have a resulting ring that looks like the following figure. Since the replication factor is bound by the data center, all the data from 25 to 150 will go to one single node in Data Center 1, while other nodes in the data center will owe a relatively smaller number of keys. The same happens to Data Center 2, which has one overloaded node.

This creates a need for a mechanism that balances nodes within each data center. The first option is to divide the partitioner range by the number of nodes in each data center and assign the values to nodes in data centers. But, it wouldn't work, because no two nodes can have the same token.

NetworkTopologyStrategy and multiple data center setups

Figure 4.1: Multiple data centers – even key distribution causing lopsided nodes

There are two ways to avoid this imbalance in key distribution:

  • Offsetting tokens slightly: This mechanism is the same as the one that we just discussed. The algorithm is as follows:
    1. Calculate the token range as if each data center is a ring.
    2. Offset the values that are duplicated by a small amount. Say by 1 or 10 or 100.

    Here is an example. Let's say we have a cluster spanning three data centers. Data-center 1 and Data-center 2 each has three nodes, and Data-center 3 has two nodes. We use RandomPartitioner. Here is the split (the final value is used, the duplicates are offset):

    # Duplicates are offsett, final are assigned
    
    Data-center 1:
    node1: 
    0 (final)
    
    node2: 
    56713727820156410577229101238628035242 (final)
    
    node3: 
    113427455640312821154458202477256070485 (final)
    
    Data-center 2:
    node1: 
    0 (duplicate, offset to avoid collision) 
    1 (final)
    
    node2: 
    56713727820156410577229101238628035242 (duplicate, offset) 
    56713727820156410577229101238628035243 (final)
    
    node3: 
    113427455640312821154458202477256070485 (duplicate, offset)
    113427455640312821154458202477256070486 (final)
    
    Data-center 3:
    node1:
    0 (duplicate, offset) 
    2 (final)
    
    node2:
    85070591730234615865843651857942052864 (final)

    If you draw the ring and re-evaluate the token ownership, you will find that all the data centers have balanced nodes.

  • Alternating token assignment: This is a much simpler technique than the previous one, but it works when all the data centers have an equal number of nodes, which is a pretty common setup.

    In this methodology, we divide the token range by the total number of nodes across all the clusters. Then we take the first token value, assign it to a node in Data-center 1, take a second token and assign it to a node in the second data center, and so on. Keep revolving through the data centers and assigning the next initial token to nodes until all the nodes are done (and all the tokens are exhausted).

    For a three data centers' setup, with each having two nodes, here are the details:

    $ python -c 'print "
    ".join([str((2 ** 127)*i/6) for i in xrange(6) ])'
    
    0 
    28356863910078205288614550619314017621
    56713727820156410577229101238628035242
    85070591730234615865843651857942052864
    113427455640312821154458202477256070485
    141784319550391026443072753096570088106
    Data-center1: node1
    0
    
    Data-center2: node1
    28356863910078205288614550619314017621
    
    Data-center3: node1
    56713727820156410577229101238628035242
    
    Data-center1: node2
    85070591730234615865843651857942052864
    
    Data-center2: node2
    113427455640312821154458202477256070485
    
    Data-center3: node2
    141784319550391026443072753096570088106

Launching a cluster with a script

Now that we have configured the machines, we know the cluster settings to carry out, what snitch to use, and what should be the initial tokens, we'll download the latest Cassandra install on multiple machines, set it up, and start it. But it is too much work to do it manually.

We will see a custom script that does all these for us—after all we are dealing with a large data and a large number of machines, so doing all manually can be prone to error and exhausting (and more importantly, there's no fun!). This script is available on GitHub at https://github.com/naishe/mastering_cassandra. You may tweak it as per your needs and work with it.

There are two scripts: install_cassandra.sh and upload_and_execute.sh. The former is the one that is supposed to be executed on the to-be Cassandra nodes, and the latter is the one that uploads the former to all the nodes, passes the appropriate initial token, and executes it. It is the latter that you need to execute on your local machine and make sure both scripts are in the same directory from where you are executing. If you are planning to use this script, you may need to change a couple of variables at the top.

Here is the script configured to set up a three-node cluster on Amazon EC2 machines. Please note that it uses EC2Snitch, so it does not need to set up any snitch configuration file as it would have, if it was using PropertyFileSnitch or GossippingPropertyFileSnitch. If you are using those snitches, you may need to upload those files to appropriate locations in remote machines too:

#install_cassandra.sh

#!/bin/bash
set -e 

# This script does the following: 
# 1. downloadcassandra
# 2. create directories 
# 3. updatescassandra.yaml with 
#      cluster_name
#      seeds 
#      listen_address
#      rpc_address
#      initial_token
#      endpoint_snitch
# 4. start Cassandra

#SYNOPSYS 
function printHelp(){ 
  cat << EOF 
Synopsis: 
  $0 <initial_token>
  Downloads, installs, configures, and starts Cassandra. 
  Required Parameters: 
<initial_token>: initial token for this node 
EOF 
}


if [ $# -lt 1 ] ; then 
printHelp
  exit 1 
fi

# VARIABLES !!! EDIT AS YOUR CONFIG
download_url='http://mirror.sdunix.com/apache/cassandra/1.1.11/apache-cassandra-1.1.11-bin.tar.gz' 
name='apache-cassandra-1.1.11' 
install_dir='/opt' 
data_dir='/mnt/cassandra-data' 
logging_dir='/mnt/cassandra-logs'

# CASSANDRA CONFIG !!! EDIT AS YOUR CONFIG 
cluster_name='"My Cluster"' 
seeds='"10.99.9.67"' #yeah, the double quotes within the quotes! listen_address='' 
rpc_address='' 
initial_token="$1" 
endpoint_snitch="Ec2Snitch" 
cassandra_user="$2"

echo "--- DOWNLOADING CASSANDRA" 
wget -P /tmp/ ${download_url}

echo "--- EXTRACTING..." 
sudo tar xzf /tmp/apache-cassandra-1.1.11-bin.tar.gz -C ${install_dir}

echo "--- SETTING UP SYM-LINK" 
sudo ln -s ${install_dir}/${name} ${install_dir}/cassandra

echo "--- CREATE DIRECTORIES" 
sudo mkdir -p ${data_dir}/data ${data_dir}/commitlog ${logging_dir} 
sudo chown -R ${USER} ${data_dir} ${logging_dir}
echo "--- UPDATING CASSANDRA YAML (in place)" 
sudo cp ${install_dir}/cassandra/conf/cassandra.yaml ${install_dir}/cassandra/conf/cassandra.yaml.BKP

sudo sed -i  
  -e "s/^cluster_name.*/cluster_name: ${cluster_name}/g"  
  -e "s/(-s*seeds:).*/1 ${seeds}/g"  
  -e "s/^listen_address.*/listen_address: ${listen_address}/g"  
  -e "s/^rpc_address.*/rpc_address: ${rpc_address}/g"  
  -e "s/^initial_token.*/initial_token: ${initial_token}/g" 
  -e "s/^endpoint_snitch.*/endpoint_snitch: ${endpoint_snitch}/g"  
  -e "s|/var/lib/cassandra/data|${data_dir}/data|g"  
  -e "s|/var/lib/cassandra/commitlog|${data_dir}/commitlog|g"  
  -e "s|/var/lib/cassandra/saved_caches|${data_dir}/saved_caches|g"  
  ${install_dir}/cassandra/conf/cassandra.yaml
sudo sed -i  
  -e "s|/var/log/cassandra/system.log|${logging_dir}/system.log|g"  
  ${install_dir}/cassandra/conf/log4j-server.properties

echo "--- STARTING CASSANDRA" 
# NOHUP, ignore SIGNUP signal to kill Cassandra Daemon 
nohup ${install_dir}/cassandra/bin/cassandra> ${logging_dir}/startup.log &

sleep 5

echo "--- INSTALLATION FINISHED" 
exit 0
# The following is a code for upload_and_execute.sh
#!/bin/bash
set –e
# This script: 
#  - takes array of node addresses
#  - generates initial-tokens for RandomPartitioner
#  - uploads the install_cassandra.sh and executes it

## !!! Change these variables to suit your settings 
identity_file="${HOME}/.ec2/prodadmin.pem"
remote_user="ec2-user" 
install_script="${HOME}/Desktop/install_cassandra.sh" 
servers=( 'c1.mydomain.com' 'c2.mydomain.com' 'c2.mydomain.com' )
nodes=${#servers[@]} 
init_tokens=( 'python -c "print ' '.join([str((2 ** 127)*i/${nodes}) for i in xrange(${nodes}) ])"' )
i=0 

for server in ${servers[@]} ; do 
  ikey=${init_tokens[$i]}
 
  echo ">> Uploading script to ${server} to remote user's home" 
  scp -i ${identity_file} ${install_script} ${remote_user}@${server}:~/install_cassandra.sh

  echo ">> Executing script with initial_key=${ikey}" 
  ssh -t -i ${identity_file} ${remote_user}@${server} "sudo chmod a+x ~/install_cassandra.sh && ~/install_cassandra.sh ${ikey}"

  echo ">> Installation finished for server: ${server}" 
  echo "----------------------------------------------" 
  i=$(($i+1))
done

echo ">> Cluster initialization is Finished." 
exit 0;

When the author executes this script for the demonstration of a three-nodes cluster, it takes less than two minutes to get up and running. Here is how it looks:

$ ./upload_and_execute.sh

>> Uploading script to c1.mydomain.com to remote user's home 
install_cassandra.sh    100% 2484     2.4KB/s   00:00
>> Executing script with initial_key=0
--- DOWNLOADING CASSANDRA
[-- snip --]
Saving to: '/tmp/apache-cassandra-1.1.11-bin.tar.gz' 
100%[=========>] 1,29,73,061  598KB/s   in 22s

--- EXTRACTING...
--- SETTING UP SYM-LINK 
--- CREATE DIRECTORIES 
--- UPDATING CASSANRA YAML (in place) 
--- STARTING CASSANDRA 
--- INSTALLATION FINISHED 
>> Installation finished for server: c1.mydomain.com 
----------------------------------------------

>> Uploading script to c2.mydomain.com to remote user's home[–- snip --]
>> Executing script with initial_key=56713727820156410577229101238628035242
--- DOWNLOADING CASSANDRA
[–- snip --] 

 --- EXTRACTING... 
--- SETTING UP SYM-LINK 
--- CREATE DIRECTORIES 
--- UPDATING CASSANRA YAML (in place) 
--- STARTING CASSANDRA 
--- INSTALLATION FINISHED 
[–- snip --] 
----------------------------------------------

>> Uploading script to c3.mydomain.com to remote user's home
[–- snip –]
--- DOWNLOADING CASSANDRA 
[-- snip --] 

--- EXTRACTING... 
--- SETTING UP SYM-LINK
--- CREATE DIRECTORIES 
--- UPDATING CASSANRA YAML (in place) 
--- STARTING CASSANDRA 
--- INSTALLATION FINISHED
----------------------------------------------

>> Cluster initialization is Finished.

Let's check what newly-launched cluster looks like:

Launching a cluster with a script

Figure 4.2: The ring query showing all three nodes are up and running with tokens equally distributed among them

Creating a keyspace

It may strike you as odd why creating a keyspace is discussed here in a chapter that is oriented more toward system administration tasks. The reason to do this is that keyspace creation is hard-linked with the way you have set the snitch.

Unless you are using SimpleSnitch, you should use NetworkTopologyStrategy as the replica placement strategy. It needs to know the replication factor for the keyspace for each data center.

So, if you have PropertyFileSnitch or GossipingPropertyFileSnitch, your keyspace creation looks like the following code:

CREATE KEYSPACE myKS
  WITH placement_strategy = 'NetworkTopologyStrategy' 
  AND strategy_options={DC1:3, DC2:3};

Here, strategy_options has keys as data center names defined in snitch configuration files and values are replication factors in each data center.

In EC2-related snitches, Ec2MultiRegionSnitch or Ec2Snitch, the data center name is nothing but the name of the region as it appears in the availability zone. So, for us-east-1a, the data center is us-east. The command to create a space is as follows (for EC2MultiRegionSnitch) :

CREATE KEYSPACE myKS
  WITH placement_strategy = 'NetworkTopologyStrategy' 
  AND strategy_options={us-east:3,us-west:3};

So, if you have set the replication factor smartly, and your queries make use of the right consistency level, your request will never have to travel beyond the one data center (if all the replicas in that data center are up).

For SimpleSnitch, you just specify replication_factor as the strategy option as it is oblivious to the data center or the rack.

createkeyspacemyKS
  with placement_strategy = 'SimpleStrategy' 
  and strategy_options = {replication_factor : 2}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.200.71