Now that you have a single node setup, you may start Cassandra by executing <cassandra_installation>/bin/cassandra
for a tarball install or by running sudo service cassandra start
for a repository installation. (We'll see later in this chapter how to write a .init
script for Cassandra and set it up to start on boot.) However, it needs a couple of configuration tweaks in order to get a Cassandra cluster working.
If you look at cassandra.yaml
, you will find that it has the following six sections:
In most of the cases, you will never have to bother about client connection properties and inter-node communication settings. Even by default, the configuration is very smart and robust for any modern-day computer. The rest of this chapter will discuss the cluster setup properties and various options that Cassandra provides out of the box. Security will be discussed briefly. In the section Authorization and authentication, we will tune Cassandra using various properties in cassandra.yaml
.
A cluster name is used to logically group nodes that belong to the same cluster. It works as a namespace for nodes. In a multiple cluster environment, it works as a preventive mechanism for nodes of a cluster to join one another.
It is always a good idea to give a meaningful cluster name, even if you have a single cluster at the time. Consider cassandra.yaml
:
cluster_name: 'MyAppMainCluster'
It is suggested to change the cluster name before you launch the cluster. Changing the cluster name when the cluster is up and running throws an exception. This is because the cluster name resides in the system keyspace, and as of Version 1.1.11, you cannot edit the system keyspace. (It was possible in some older versions.)
If you have to change the cluster name, there is an unofficial trick to do that. A cluster name is stored in the LocationInfo
column family of the system keyspace. So, you need to stop the nodes, change the cassandra.yaml
file with a new cluster name, and delete the contents of the <data_directory>/system/LocationInfo
directory (or just move the contents elsewhere so that you can replace them if something goes wrong). On restarting the node, you can see when you connect to a local node via cassandra-cli
that you are welcomed to a new cluster name. This process needs to be repeated for all the nodes in the cluster.
A seed node is the one that a newly-launched node consults to, to know about the cluster. Although gossip (refer to section Gossip, Chapter 2, Cassandra Architecture) is the way how nodes know one another, but to a new node, seeds are the first nodes that it will know and start a gossip with. And, eventually, all nodes will know its presence and it will know about the other nodes.
There must be at least one seed node in a cluster. Seed nodes should be a rather stable node. One may configure more than one seed node for added redundancy and increased availability of seed nodes. In cassandra.yaml
, seed nodes are a comma-separated list of seed addresses:
seed_provider: - class_name: org.apache....SimpleSeedProvider parameters: # Ex: "<ip1>,<ip2>,<ip3>" - seeds: "c1.mydomain.com,c2.mydomain.com"
A listen address is the address that nodes use to connect to one another for communication/gossip. It is important to set this to the private IP of the machine. If it is not set, it picks up the hostname
value, which if incorrectly configured can cause problems.
A broadcast address is the public address of the node. If it is not set, it will take whatever value the listen address bears. It is generally required only if you are setting up Cassandra to multiple data centers.
EC2 users
: With multiple data center installation, you need to use EC2 Snitch
. listen_address
may be set to blank because AWS assigns a hostname that is based on a private IP, but you may set it to a private IP or private DNS. broadcast_address
must be a public IP or a public DNS. Also, remember not to forget to open storage_port
(default is 7000
) in your security group that holds Cassandra instances.
An RPC address is for client connections to the node. It can be anything—the IP address, hostname, or 0.0.0.0
. If not set, it will take the hostname.
In cassandra.yaml
, these properties look like the following:
# listen_address is not set, so it takes hostname. listen_address: # broadcast is commented. So, it is what listen_address is. # broadcast_address: 1.2.3.4 # rpc_address is not set, so it is hostname. rpc_address:
An initial token is a number assigned to a node that decides the range of keys that it will hold. A node basically holds the tokens ranging from the previous node's token (the node that holds a token immediately less than the current node) Tn-1 + 1 to the current node's token Tn.
In cassandra.yaml
, you may mention an initial token as follows:
initial_token: 85070591730234615865843651857942052864
It is important to choose initial tokens wisely to make sure the data is evenly distributed across the cluster. If you do not assign initial tokens to the nodes at the start of a new cluster, it may get automatically assigned, which may lead to a hotspot. However, adding a new node is relatively smart. If you do not mention an initial token to the new node, Cassandra will take a node that is loaded the most number of times and assign the new node a key that splits the token's own loaded node into half. It is possible to load balance a running cluster, and it is fairly easy. We'll learn more about load balancing in the section Load balancing, Chapter 6, Managing a Cluster – Scaling, Node Repair, and Backup. The next logical question is how do we determine the initial token for a node? It depends on the partitioner that you are using. And basically, we divide the whole range of keys that the partitioner supports into N equal parts, where N is the total number of nodes in the cluster. Then we assign each node a number. We will see this in the next section.
By assigning initial tokens, what we have done is created buckets of keys. What determines which key goes to what bucket? It is partitioner . A partitioner generates a hash value of the row key; based on the hash value, Cassandra determines to which "bucket" (node) this row needs to go. This is a good way in which hash will always generate a unique number for a row key. So, this is also what is used to determine the node to read from.
Like everything else in Cassandra, partitioner is a pluggable interface. You can implement your own partitioner by implementing org.apache.cassandra.dht.IPartitioner
and drop the .class
or .jar
file in Cassandra's lib
directory.
Here is how you mention the preference for partitioner in cassandra.yaml
:
partitioner: org.apache.cassandra.dht.RandomPartitioner
In most cases, the default partitioner is generally good for you. It distributes keys evenly. As of Version 1.1.11, the default is RandomPartitioner
, but in 1.2 or higher, this is going to be changed to a more efficient version, Murmur3Partitioner
.
Be warned that it is a pretty critical decision to choose a partitioner, because this determines what data stays where. It affects the SSTable structure. If you decide to change it, you need to clean the data directory. So, whatever decision is made for the partitioner at the start of the cluster is likely to stay for the life of the storage.
Cassandra provides three partitioners by default.
The random partitioner is the default partitioner till Version 1.1. It uses MD5 hash to generate hash values for row keys. Since hashes are not generated in any orderly manner, it does not guarantee any ordering. This means that two lexicographically close row keys can possibly be thrown into two different nodes. This random assignment of token to a key is what makes it suitable for even distribution of keys among nodes. This means it is highly unlikely that a balanced node is ever going to have hotspots.
The keys generated by a random partitioner may vary in the range of 0 to 2127 - 1. So, for the ith node in an N nodes cluster, the initial token can be calculated by 2127 * (i - 1) / N
. The following is a simple Python code to generate the complete sequence of initial tokens for a random partitioner of a cluster of eight nodes:
# running in Python shell >>>nodes = 8 >>> print (" ".join(["Node #" + str(i+1) +": " + str((2 ** 127)*i/nodes) for i in xrange(nodes) ])) Node #1: 0 Node #2: 21267647932558653966460912964485513216 Node #3: 42535295865117307932921825928971026432 Node #4: 63802943797675961899382738893456539648 Node #5: 85070591730234615865843651857942052864 Node #6: 106338239662793269832304564822427566080 Node #7: 127605887595351923798765477786913079296 Node #8: 148873535527910577765226390751398592512
A byte-ordered partitioner, as the name suggests, generates tokens for row keys that are in the order of hexadecimal representations of the key. This makes it possible that rows are ordered by row keys and can iterate through rows as iterating through an ordered list. But this benefit comes with a major drawback: hotspots. The reason for the creation of a hotspot is uneven distribution of data across a cluster. If you have a cluster with 26 nodes, a partitioner such as ByteOrderedPartitioner
, and each node is responsible for one letter. So, the first node is responsible for all the keys starting with A, the second for B, and so on. A column family that uses the usernames as row keys will have uneven data distribution across the ring. The data distribution will be skewed with nodes X, Q, and Z being very light, and nodes A and S being heavily loaded. This is bad for multiple reasons, but the most important one is generating a hotspot. The nodes with more data will be accessed more than the ones with less data. The overall performance of a cluster may be dropped down to the number of requests that a couple of highly loaded nodes can serve.
The best way to assign the initial token to a cluster using ByteOrderedPartitioner
is to sample data and determine what keys are the best to assign as initial tokens to ensure an equally balanced cluster.
Let's take a hypothetical case where your keys of all keyspaces can be represented by five character strings from "00000" to "zzzzz". Here is how we generate initial tokens in Python:
>>> start = int("00000".encode('hex'), 16) >>> end = int("zzzzz".encode('hex'), 16) >>> range = end – start >>> nodes = 8 >>> print " ".join([ "Node #" + str(i+1) + ": %032x" % (start + range*i/nodes) for i in xrange(nodes) ]) Node #1: 00000000000000000000003030303030 Node #2: 00000000000000000000003979797979 Node #3: 000000000000000000000042c2c2c2c2 Node #4: 00000000000000000000004c0c0c0c0b Node #5: 00000000000000000000005555555555 Node #6: 00000000000000000000005e9e9e9e9e Node #7: 000000000000000000000067e7e7e7e7 Node #8: 00000000000000000000007131313130
Remember, this is just an example; in a real case you will decide this only after evaluating the data. Or, probably want to have initial tokens assigned by UUIDs.
The Murmur3
partitioner is the new default for Cassandra Version 1.2 or higher. If you are starting a new cluster, it is suggested to keep the Murmur3
partitioner. It is not order preserving, and it has all the features of a random partitioner plus it is fast and provides better performance than a random partitioner. One difference with a random partitioner is that it generates token values between -263 and +263.
If you are migrating from a previous version to 1.2 or higher, please make sure that you are using the same partitioner as the previous one. If you were using a default, it is likely that you were using a random partitioner. This will cause trouble, if you have not edited cassandra.yaml
to change the new default Murmur3
partitioner back to a random partitioner.
To generate initial tokens, we'll again apply our familiar formula, but this time the start position is not zero, so the range of tokens is (end – start): +263 - ( -263) = 264
. Here is the simple Python script to do this:
>>> nodes = 8 >>> print " ".join(["Node #" + str(i+1) + ": " + str( -(2 ** 63) + (2 ** 64)*i/nodes) for i in xrange(nodes)] ) Node #1: -9223372036854775808 Node #2: -6917529027641081856 Node #3: -4611686018427387904 Node #4: -2305843009213693952 Node #5: 0 Node #6: 2305843009213693952 Node #7: 4611686018427387904 Node #8: 6917529027641081856
Snitches are the way to tell Cassandra about the topology of cluster, and about nodes' locations and their proximities. There are two tasks that snitches help Cassandra with. They are as follows:
Similar to partitioners, snitches are pluggable. You can plug in your own custom snitch by extending org.apache.cassandra.locator.EndPointSnitch
. The PropertyFileEndPointSnitch
class can be used as a guideline on how to write a snitch. To configure a snitch, you need to alter endpoint_snitch
in cassandra.yaml
:
endpoint_snitch: SimpleSnitch
For custom snitches, mention the fully-qualified class name of the snitch, assuming you have dropped the custom snitch .class
/.jar
file in Cassandra's lib
directory.
Out of the box, Cassandra provides the snitches detailed in the following sections.
SimpleSnitch
is basically a do-nothing snitch. If you see the code, it basically returns rack1
and datacenter1
for whatever IP address the endpoint has. Since it discards any information that may be retrieved from the IP address, it is appropriate for installations where data center-related information is not available, or all the nodes are in the same data center.
This is the default snitch as of Version 1.1.11.
PropertyFileSnitch
is a way to explicitly tell Cassandra the relative location of various nodes in the clusters. It gives you a means to handpick the nodes to group under a data center and a rack. The location definition of each node in the cluster is stored in a configuration file, cassandra-topology.properties
, which can be found under the conf
directory (for a tarball installation, it is <installation>/conf
; for repository installations, it is /etc/cassandra
). Note that if you are using PropertyFileSnitch
, all the nodes must have an identical topology file.
cassandra-topology.properties
is a standard properties file with keys as the IP address of the node and value as <data-center-name>:<rack-name>
; it is up to you what data center name and what rack name you give. Two nodes with the same data center name will be treated as nodes within a single data center. And two nodes with the same data center name and rack name combo will be treated as two nodes on the same rack.
Here is an example topology file:
# Cassandra Node IP=Data Center:Rack # Data-center 1 10.110.6.30=DC1:RAC1 10.110.6.11=DC1:RAC1 10.110.4.30=DC1:RAC2 # Data-center 2 10.120.8.10=DC2:RAC1 10.120.8.11=DC2:RAC1 # Data-center 3 10.130.1.13=DC3:RAC1 10.130.2.10=DC3:RAC2 # default for unknown nodes default=DC1:RAC0
DCX
and RACX
are commonly used patterns to denote a data center and a rack respectively. But you are free to choose anything that suits you. The default option is to take care of any node that is not listed in PropertyFileSnitch
.
Even with all the fancy naming and grouping, one thing that keeps bugging in PropertyFileSnitch
is the manual effort to keep the topology files updated with every addition or removal of the node. GossipingPropertyFileSnitch
is there to solve this problem. This snitch uses the gossip mechanism to propagate the information about the node's location.
In each node, you put a file named cassandra-rackdc.properties
under the conf
directory. This file contains two things: the name of the node's data center and the name of the node's rack. It looks like this:
dc=DC3 rack=RAC2
If SimpleSnitch
is one end of the spectrum, where snitch does not assume anything, RackInferringSnitch
is the other extreme of the spectrum. RackInferringSnitch
uses an IP address to guess the data center and rack of the node. It assumes that the second octet of the IP address uniquely denotes a data center, and the third octet uniquely represents a rack within a data center. So, for 10.110.6.30, 10.110.6.4, 10.110.2.42, and 10.108.10.1, this snitch assumes that the first two nodes reside in the same data center and in the same rack, while the third node lives in the same data center but in a different rack. It assumes that the fourth node exists in a different data center than the rest of the nodes in the example:
+---------> Data center | +------> Rack | | 10.110.6.30
This can be dangerous to use if your machines do not use this pattern for IP assignment to the machines in data centers.
Ec2Snitch
is a snitch specially written for Amazon AWS installations. It uses the node's local metadata to get its availability zone and then breaks it into pieces to determine the rack and data center. Please note that rack and data center determined this way do not correspond to the physical location of hardware in Amazon's cloud facility, but it gives a pretty nice abstraction.
Ec2Snitch
treats the region name as the data center name and availability zone as the rack name. It does not work cross-region. So, effectively, Ec2Snitch
is the same as a single data center setup. If one of your nodes is in us-east-1a
and another in us-east-1b
, it means that you have two nodes in a data center named us-east
in two racks, 1a
and 1b
.
Ec2Snitch
does not work well if you decide to keep nodes across different EC2 regions. The reason being EC2Snitch
uses private IPs, which will not work across regions (but do work across availability zones in a region).
If your cluster spans multiple regions, you should use Ec2MultiRegionSnitch
.
EC2 users: If you plan to distribute nodes in different regions, there is more than just a proper snitch that is needed to make nodes successfully communicate with one another. You need to change the following:
broadcast_address
: This should be the public IP or public DNS of the node.listen_address
: This should be set to a private IP or DNS. But if not set, the hostname on EC2 is generally derived from the private IP, which is fine.endpoint_snitch
: This should be set to Ec2MultiRegionSnitch
.storage_port
: The default 7000
is fine, but remember to open this port in the security group that holds Cassandra instances.Apart from putting data in various buckets based on nodes' tokens, Cassandra has to replicate the data depending on what replication factor is associated with the keyspace. Replica placement strategies come into action when Cassandra has to decide where a replica is needed to be placed.
There are two strategies that can be used based on the demand and structure of the cluster.
SimpleStrategy
places the data on the node that owns it based on the configured partitioner. It then moves to the next node (toward a higher bucket), places a replica, moves to the next node and places another, and so on, until the replication factor is met.
SimpleStrategy is blind to cluster topology. It does not check whether the next node to place the replica in is in the same rack or not. Thus, this may not be the most robust strategy to use to store data. What happens if all three replicas of a key range are physically located in the same rack (assuming RF=3) and there is a power failure of that rack; you lose access to some data until power is restored. This leads us into a rather smarter strategy, NetworkTopologyStrategy
.
Although we discussed how bad SimpleStrategy
can be, this is the default strategy. Plus, if you do not know the placement or any configuration details of your data center and you decide to stay in a single data center, NetworkTopologyStrategy
cannot help you much.
NetworkTopologyStrategy
, as the name suggests, is a data-center- and rack-aware replica placement strategy. NetworkTopologyStrategy
tries to avoid the pitfalls of SimpleStrategy
by considering the rack name and data center names that it figures out from the configured snitch. With the appropriate strategy_option
, stating how many replicas go to which data centers, NetworkTopologyStrategy
is a very powerful and robust-mirrored database system.
NetworkTopologyStrategy
requires the system admin to put a little extra thought into deciding appropriate values for initial tokens for multiple data center installations. For a single data center setup, initial tokens make up an evenly divided token range assigned to various nodes.
Here is the issue with multiple data center setups. Suppose you have two data centers with each having three nodes in it; here is what the keyspace looks like:
CREATE KEYSPACE myKS
WITH placement_strategy = 'NetworkTopologyStrategy'
AND strategy_options={DC1:3, DC2:3};
It says that there are at least six nodes in the ring; keep three copies of each row in DC1
and three more copies in DC2
.
Assume the system actually has four nodes in each data center, and you calculated the initial token by dividing the possible token range into eight equidistant values. If you assign the first four tokens to four nodes in DC1
and the rest to the nodes in DC2
, you will end up having a lopsided data distribution.
Let's take an example. Say we have a partitioner that generates tokens from 0 to 200. The token distribution, if done in the way previously mentioned, will have a resulting ring that looks like the following figure. Since the replication factor is bound by the data center, all the data from 25 to 150 will go to one single node in Data Center 1, while other nodes in the data center will owe a relatively smaller number of keys. The same happens to Data Center 2, which has one overloaded node.
This creates a need for a mechanism that balances nodes within each data center. The first option is to divide the partitioner range by the number of nodes in each data center and assign the values to nodes in data centers. But, it wouldn't work, because no two nodes can have the same token.
There are two ways to avoid this imbalance in key distribution:
Here is an example. Let's say we have a cluster spanning three data centers. Data-center 1
and Data-center 2
each has three nodes, and Data-center 3
has two nodes. We use RandomPartitioner
. Here is the split (the final value is used, the duplicates are offset):
# Duplicates are offsett, final are assigned Data-center 1: node1: 0 (final) node2: 56713727820156410577229101238628035242 (final) node3: 113427455640312821154458202477256070485 (final) Data-center 2: node1: 0 (duplicate, offset to avoid collision) 1 (final) node2: 56713727820156410577229101238628035242 (duplicate, offset) 56713727820156410577229101238628035243 (final) node3: 113427455640312821154458202477256070485 (duplicate, offset) 113427455640312821154458202477256070486 (final) Data-center 3: node1: 0 (duplicate, offset) 2 (final) node2: 85070591730234615865843651857942052864 (final)
If you draw the ring and re-evaluate the token ownership, you will find that all the data centers have balanced nodes.
In this methodology, we divide the token range by the total number of nodes across all the clusters. Then we take the first token value, assign it to a node in Data-center 1
, take a second token and assign it to a node in the second data center, and so on. Keep revolving through the data centers and assigning the next initial token to nodes until all the nodes are done (and all the tokens are exhausted).
For a three data centers' setup, with each having two nodes, here are the details:
$ python -c 'print " ".join([str((2 ** 127)*i/6) for i in xrange(6) ])' 0 28356863910078205288614550619314017621 56713727820156410577229101238628035242 85070591730234615865843651857942052864 113427455640312821154458202477256070485 141784319550391026443072753096570088106 Data-center1: node1 0 Data-center2: node1 28356863910078205288614550619314017621 Data-center3: node1 56713727820156410577229101238628035242 Data-center1: node2 85070591730234615865843651857942052864 Data-center2: node2 113427455640312821154458202477256070485 Data-center3: node2 141784319550391026443072753096570088106
Now that we have configured the machines, we know the cluster settings to carry out, what snitch to use, and what should be the initial tokens, we'll download the latest Cassandra install on multiple machines, set it up, and start it. But it is too much work to do it manually.
We will see a custom script that does all these for us—after all we are dealing with a large data and a large number of machines, so doing all manually can be prone to error and exhausting (and more importantly, there's no fun!). This script is available on GitHub at https://github.com/naishe/mastering_cassandra. You may tweak it as per your needs and work with it.
There are two scripts: install_cassandra.sh
and upload_and_execute.sh
. The former is the one that is supposed to be executed on the to-be Cassandra nodes, and the latter is the one that uploads the former to all the nodes, passes the appropriate initial token, and executes it. It is the latter that you need to execute on your local machine and make sure both scripts are in the same directory from where you are executing. If you are planning to use this script, you may need to change a couple of variables at the top.
Here is the script configured to set up a three-node cluster on Amazon EC2 machines. Please note that it uses EC2Snitch
, so it does not need to set up any snitch configuration file as it would have, if it was using PropertyFileSnitch
or GossippingPropertyFileSnitch
. If you are using those snitches, you may need to upload those files to appropriate locations in remote machines too:
#install_cassandra.sh #!/bin/bash set -e # This script does the following: # 1. downloadcassandra # 2. create directories # 3. updatescassandra.yaml with # cluster_name # seeds # listen_address # rpc_address # initial_token # endpoint_snitch # 4. start Cassandra #SYNOPSYS function printHelp(){ cat << EOF Synopsis: $0 <initial_token> Downloads, installs, configures, and starts Cassandra. Required Parameters: <initial_token>: initial token for this node EOF } if [ $# -lt 1 ] ; then printHelp exit 1 fi # VARIABLES !!! EDIT AS YOUR CONFIG download_url='http://mirror.sdunix.com/apache/cassandra/1.1.11/apache-cassandra-1.1.11-bin.tar.gz' name='apache-cassandra-1.1.11' install_dir='/opt' data_dir='/mnt/cassandra-data' logging_dir='/mnt/cassandra-logs' # CASSANDRA CONFIG !!! EDIT AS YOUR CONFIG cluster_name='"My Cluster"' seeds='"10.99.9.67"' #yeah, the double quotes within the quotes! listen_address='' rpc_address='' initial_token="$1" endpoint_snitch="Ec2Snitch" cassandra_user="$2" echo "--- DOWNLOADING CASSANDRA" wget -P /tmp/ ${download_url} echo "--- EXTRACTING..." sudo tar xzf /tmp/apache-cassandra-1.1.11-bin.tar.gz -C ${install_dir} echo "--- SETTING UP SYM-LINK" sudo ln -s ${install_dir}/${name} ${install_dir}/cassandra echo "--- CREATE DIRECTORIES" sudo mkdir -p ${data_dir}/data ${data_dir}/commitlog ${logging_dir} sudo chown -R ${USER} ${data_dir} ${logging_dir} echo "--- UPDATING CASSANDRA YAML (in place)" sudo cp ${install_dir}/cassandra/conf/cassandra.yaml ${install_dir}/cassandra/conf/cassandra.yaml.BKP sudo sed -i -e "s/^cluster_name.*/cluster_name: ${cluster_name}/g" -e "s/(-s*seeds:).*/1 ${seeds}/g" -e "s/^listen_address.*/listen_address: ${listen_address}/g" -e "s/^rpc_address.*/rpc_address: ${rpc_address}/g" -e "s/^initial_token.*/initial_token: ${initial_token}/g" -e "s/^endpoint_snitch.*/endpoint_snitch: ${endpoint_snitch}/g" -e "s|/var/lib/cassandra/data|${data_dir}/data|g" -e "s|/var/lib/cassandra/commitlog|${data_dir}/commitlog|g" -e "s|/var/lib/cassandra/saved_caches|${data_dir}/saved_caches|g" ${install_dir}/cassandra/conf/cassandra.yaml sudo sed -i -e "s|/var/log/cassandra/system.log|${logging_dir}/system.log|g" ${install_dir}/cassandra/conf/log4j-server.properties echo "--- STARTING CASSANDRA" # NOHUP, ignore SIGNUP signal to kill Cassandra Daemon nohup ${install_dir}/cassandra/bin/cassandra> ${logging_dir}/startup.log & sleep 5 echo "--- INSTALLATION FINISHED" exit 0
# The following is a code for upload_and_execute.sh #!/bin/bash set –e # This script: # - takes array of node addresses # - generates initial-tokens for RandomPartitioner # - uploads the install_cassandra.sh and executes it ## !!! Change these variables to suit your settings identity_file="${HOME}/.ec2/prodadmin.pem" remote_user="ec2-user" install_script="${HOME}/Desktop/install_cassandra.sh" servers=( 'c1.mydomain.com' 'c2.mydomain.com' 'c2.mydomain.com' ) nodes=${#servers[@]} init_tokens=( 'python -c "print ' '.join([str((2 ** 127)*i/${nodes}) for i in xrange(${nodes}) ])"' ) i=0 for server in ${servers[@]} ; do ikey=${init_tokens[$i]} echo ">> Uploading script to ${server} to remote user's home" scp -i ${identity_file} ${install_script} ${remote_user}@${server}:~/install_cassandra.sh echo ">> Executing script with initial_key=${ikey}" ssh -t -i ${identity_file} ${remote_user}@${server} "sudo chmod a+x ~/install_cassandra.sh && ~/install_cassandra.sh ${ikey}" echo ">> Installation finished for server: ${server}" echo "----------------------------------------------" i=$(($i+1)) done echo ">> Cluster initialization is Finished." exit 0;
When the author executes this script for the demonstration of a three-nodes cluster, it takes less than two minutes to get up and running. Here is how it looks:
$ ./upload_and_execute.sh >> Uploading script to c1.mydomain.com to remote user's home install_cassandra.sh 100% 2484 2.4KB/s 00:00 >> Executing script with initial_key=0 --- DOWNLOADING CASSANDRA [-- snip --] Saving to: '/tmp/apache-cassandra-1.1.11-bin.tar.gz' 100%[=========>] 1,29,73,061 598KB/s in 22s --- EXTRACTING... --- SETTING UP SYM-LINK --- CREATE DIRECTORIES --- UPDATING CASSANRA YAML (in place) --- STARTING CASSANDRA --- INSTALLATION FINISHED >> Installation finished for server: c1.mydomain.com ---------------------------------------------- >> Uploading script to c2.mydomain.com to remote user's home[–- snip --] >> Executing script with initial_key=56713727820156410577229101238628035242 --- DOWNLOADING CASSANDRA [–- snip --] --- EXTRACTING... --- SETTING UP SYM-LINK --- CREATE DIRECTORIES --- UPDATING CASSANRA YAML (in place) --- STARTING CASSANDRA --- INSTALLATION FINISHED [–- snip --] ---------------------------------------------- >> Uploading script to c3.mydomain.com to remote user's home [–- snip –] --- DOWNLOADING CASSANDRA [-- snip --] --- EXTRACTING... --- SETTING UP SYM-LINK --- CREATE DIRECTORIES --- UPDATING CASSANRA YAML (in place) --- STARTING CASSANDRA --- INSTALLATION FINISHED ---------------------------------------------- >> Cluster initialization is Finished.
Let's check what newly-launched cluster looks like:
It may strike you as odd why creating a keyspace is discussed here in a chapter that is oriented more toward system administration tasks. The reason to do this is that keyspace creation is hard-linked with the way you have set the snitch.
Unless you are using SimpleSnitch
, you should use NetworkTopologyStrategy
as the replica placement strategy. It needs to know the replication factor for the keyspace for each data center.
So, if you have PropertyFileSnitch
or GossipingPropertyFileSnitch
, your keyspace creation looks like the following code:
CREATE KEYSPACE myKS
WITH placement_strategy = 'NetworkTopologyStrategy'
AND strategy_options={DC1:3, DC2:3};
Here, strategy_options
has keys as data center names defined in snitch configuration files and values are replication factors in each data center.
In EC2-related snitches, Ec2MultiRegionSnitch
or Ec2Snitch
, the data center name is nothing but the name of the region as it appears in the availability zone. So, for us-east-1a
, the data center is us-east
. The command to create a space is as follows (for EC2MultiRegionSnitch
) :
CREATE KEYSPACE myKS
WITH placement_strategy = 'NetworkTopologyStrategy'
AND strategy_options={us-east:3,us-west:3};
So, if you have set the replication factor smartly, and your queries make use of the right consistency level, your request will never have to travel beyond the one data center (if all the replicas in that data center are up).
For SimpleSnitch
, you just specify replication_factor
as the strategy option as it is oblivious to the data center or the rack.
createkeyspacemyKS with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 2}
3.128.200.71