Installing Cassandra

With JVM ready, installing Cassandra is as easy as downloading the appropriate tarball from the Apache Cassandra download page, http://cassandra.apache.org/download, and untarring it. On Debian or Ubuntu, you may choose either to install from a .tar file or from an Apache Software Foundation repository.

Installing from a tarball

This guide assumes that Cassandra is installed in the /opt directory, the datafiles in the /cassandra-data directory, and the system logs in /var/log/cassandra. These are just some conventions that were chosen by me. You may choose a location that suits you best:

# Download. Please select appropriate version and 
# URL from http://cassandra.apache.org/download page 
$ wget 
http://mirror.sdunix.com/apache/cassandra/1.1.11/apache-cassandra-1.1.11-bin.tar.gz 
[-- snip --]
Saving to: 'apache-cassandra-1.1.11-bin.tar.gz'

# extract
$ tar xzf apache-cassandra-1.1.11-bin.tar.gz

# (optional) Symbolic link to easily switch versions in
# future without having to change dependent scripts 
$ ln -s apache-cassandra-1.1.11 cassandra

Installing from ASFRepository for Debian/Ubuntu

Apache Software Foundation provides Debian packages for different versions of Cassandra to directly install it from the repository. To list the packages, run the following command:

# Edit sources
$ sudo vi /etc/apt/sources.list

Also, append the following three lines:

# Cassandra repo 
deb http://www.apache.org/dist/cassandra/debian 11x main deb-src http://www.apache.org/dist/cassandra/debian 11x main

Next, execute sudo apt-get update, as shown in the following code:

$ sudo apt-get update 
Ign http://security.ubuntu.com natty-security InRelease
[-- snip --]
GPG error: http://www.apache.org 11x InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 4BD736A82B5C1B00

If you get this error, add the public keys as shown:

$ gpg --keyserver pgp.mit.edu --recv-keys F758CE318D77295D 
$ gpg --export --armor F758CE318D77295D | sudo apt-key add –

$ gpg --keyserver pgp.mit.edu --recv-keys 2B5C1B00 
$ gpg --export --armor 2B5C1B00 | sudo apt-key add -

Now, you can install Cassandra using the following commands:

$ sudo apt-get update
$ sudo apt-get install cassandra

This installation does most of the system-wide configurations for you. It makes all the executables available to the $PATH system path, copies the configuration file to /etc/cassandra, and adds the .init script to set up proper JVM and ulimits. It also sets run-level, so Cassandra starts at boot as "cassandra" user.

Anatomy of the installation

There are a couple of programs and files that one must know about to work effectively with Cassandra. These things come to use during investigation, maintenance, configuration, and optimization.

Depending on how the installation is done, the file may be available at different locations. For a tarball installation, everything is neatly packaged under the directory where Cassandra is installed: binaries under the bin directory and the configuration file under the conf directory. For repository-based installations, binaries are available in /usr/bin and /usr/sbin directories; and configuration files under /etc/cassandra and /etc/default/cassandra.

Cassandra binaries

These contain executables for various tasks. Let's take a quick glance at them:

  • cassandra: It starts the Cassandra daemon using default configuration. To start Cassandra in the foreground, use the -f option. You can use Ctrl + C to kill Cassandra and view logs on the console. One may also use -p <pid_file> to have a handle and to kill Cassandra running in the background by using kill 'cat <pid_file>'.

    If Cassandra is installed from the repository, it must have created a service for it. So, one should use sudo service cassandra start, sudo service cassandra stop, and sudo service cassandra status to start, stop, and query the status of Cassandra.

  • cassandra-cli: Cassandra's command-line interface (CLI) gives a very basic access to execute simple commands to modify and access keyspaces and column families. More discussion on Cassandra's CLI can be found at http://wiki.apache.org/cassandra/CassandraCli. The typical use of Cassandra looks like this:
    cassandra-cli -h <hostname> -p <port> -k <keyspace>

    A file of statements can be passed to the CLI using the -f option.

  • cqlsh: This is a command-line interface to execute CQL queries. The default version is CQL 3 as of Cassandra Version 1.1.*. It may change in Version 1.2.0+. One may switch to CQL 3 using the -cql3 switch. Typically, the cqlsh connect command looks like this:
    cqlsh <hostname> <port> -k <keyspace>
  • json2sstable and sstable2json: As the name suggests, they represent the yin and yang of serializing and deserializing the data in SSTable. It can be vaguely assumed to be similar to the mysqldump --xml <database> command, except that it works in the JSON format.
  • sstable2json provides SSTable as JSON, and json2sstable takes JSON to materialize a functional SSTable.
  • sstable2json may have the following three options:
    • -k: the keys to be dumped
    • -x: the keys to be excluded
    • -e: it makes sstable2json to dump just keys, no column family data
      sstable2json -k <key1> -k <key2> <sstable_path>

    One can use the -k or -x switches up to 500 times. A general sstable2json executable looks like this:

  • sstable_path must be the full path to SSTable such as /cassandra-data/data/mykeyspace/mykeyspace-hc-1.data. Also, the key variable must be a hex string.
  • sstablekeys: This is essentially sstable2json with a -e switch.
  • sstableloader: This is used to bulk load to Cassandra. One can simply copy SSTable datafiles and load to another Cassandra setup without much hassle. Essentially, sstableloader reads the datafiles and streams to the current Cassandra setup as specified by Cassandra's YAML file. We will see this tool in more detail in section Using Cassandra bulk loader to restore the data, Chapter 6, Managing a Cluster – Scaling, Node Repair, and Backup.

Configuration files

Cassandra has a central configuration file named cassandra.yaml. It contains cluster settings, node-to-node communication specifications, performance-related settings, authentication, security, and backup settings.

Apart from this, there are the log4j-server.properties and cassandra-topology.properties files. The log4j-server.properties file is used to tweak Cassandra logging settings. The only thing that one may want to change in this file is the following line so that we can change the location where logs are located:

log4j.appender.R.File=/var/log/cassandra/system.log

The cassandra-topology.properties file is to be filled with cluster-specific values if you use PropertyFileSnich. We'll discuss more on this in this chapter.

cassandra.yaml and other files can be accessed from the conf directory under the installation directory for a tarball installation. For a repository installation, the cassandra.yaml file and others can be found under /etc/cassandra.

Setting up Cassandra's data directory and commit log directory

As discussed earlier, one should configure the data directory and the commit log directory to separate disk drives to improve performance. The cassandra.yaml file holds all these configurations and more.

Note

AWS EC2 users: Although it is emphasized to have data and commit logs on two drives, for EC2 instance store instances, it is suggested to set up the RAID0 configuration and use it for both the data directories and the commit log. It performs better than having one of those on the root device and the other on ephemeral.

EBS-backed instances are a bad choice for a Cassandra installation due to slow I/O performance, and the same goes for any NAS setup.

To update data directories, edit the following lines in cassandra.yaml:

# directories where Cassandra should store data on disk.
data_file_directories: 
    - /var/lib/cassandra/data

Change /var/lib/cassandra/data to a directory that is suitable for your setup. You may as well add more directories spanning different hard disks. Then change the commit log directory as shown in the following code:

# commit log 
commitlog_directory: /var/lib/cassandra/commitlog

Edit this to set a desired location.

These directories (data or commit log) must be available for write. If it is not a fresh install, one may want to migrate data from the old data directories and the commit log directory to new ones.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.27.251