Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. Installing Cassandra

For those among us who like instant gratification, we’ll start by installing Cassandra. Because Cassandra introduces a lot of new vocabulary, there might be some unfamiliar terms as we walk through this. That’s OK; the idea here is to get set up quickly in a simple configuration to make sure everything is running properly. This will serve as an orientation. Then, we’ll take a step back and understand Cassandra in its larger context.

Installing the Apache Distribution

Cassandra is available for download from the Web at http://cassandra.apache.org. Just click the link on the home page to download a version as a gzipped tarball. Typically two versions of Cassandra are provided. The latest release is recommended for those starting new projects not yet in production. The most stable release is the one recommended for production usage. For all releases, the prebuilt binary is named apache-cassandra-x.x.x-bin.tar.gz, where x.x.x represents the version number. The download is around 23MB.

Extracting the Download

The simplest way to get started is to download the prebuilt binary. You can unpack the compressed file using any regular ZIP utility. On Unix-based systems such as Linux or MacOS, GZip extraction utilities should be preinstalled; on Windows, you’ll need to get a program such as WinZip, which is commercial, or something like 7-Zip, which is freeware.

Open your extracting program. You might have to extract the ZIP file and the TAR file in separate steps. Once you have a folder on your filesystem called apache-cassandra-x.x.x, you’re ready to run Cassandra.

What’s In There?

Once you decompress the tarball, you’ll see that the Cassandra binary distribution includes several files and directories.

The files include the NEWS.txt file, which includes the release notes describing features included in the current and prior releases, and the CHANGES.txt, which is similar but focuses on bug fixes. You’ll want to make sure to review these files whenever you are upgrading to a new version so you know what changes to expect.

Let’s take a moment to look around in the directories and see what we have.

bin: This directory contains the executables to run Cassandra as well as clients, including the query language shell (cqlsh) and the command-line interface (CLI) client. It also has scripts to run the nodetool, which is a utility for inspecting a cluster to determine whether it is properly configured, and to perform a variety of maintenance operations. We look at nodetool in depth later. The directory also contains several utilities for performing operations on SSTables, including listing the keys of an SSTable (sstablekeys), bulk extraction and restoration of SSTable contents (sstableloader), and upgrading SSTables to a new version of Cassandra (sstableupgrade).
conf: This directory contains the files for configuring your Cassandra instance. The required configuration files include: the cassandra.yaml file, which is the primary configuration for running Cassandra; and the logback.xml file, which lets you change the logging settings to suit your needs. Additional files can optionally be used to configure the network topology, archival and restore commands, and triggers. We see how to use these configuration files when we discuss configuration in Chapter 7.
interface: This directory contains a single file, called cassandra.thrift. This file defines a legacy Remote Procedure Call (RPC) API based on the Thrift syntax. The Thrift interface was used to create clients in Java, C++, PHP, Ruby, Python, Perl, and C# prior to the creation of CQL. The Thrift API has been officially marked as deprecated in the 3.2 release and will be deleted in the 4.0 release.
javadoc: This directory contains a documentation website generated using Java’s JavaDoc tool. Note that JavaDoc reflects only the comments that are stored directly in the Java code, and as such does not represent comprehensive documentation. It’s helpful if you want to see how the code is laid out. Moreover, Cassandra is a wonderful project, but the code contains relatively few comments, so you might find the JavaDoc’s usefulness limited. It may be more fruitful to simply read the class files directly if you’re familiar with Java. Nonetheless, to read the JavaDoc, open the javadoc/index.html file in a browser.
lib: This directory contains all of the external libraries that Cassandra needs to run. For example, it uses two different JSON serialization libraries, the Google collections project, and several Apache Commons libraries.
pylib: This directory contains Python libraries that are used by cqlsh.
tools: This directory contains tools that are used to maintain your Cassandra nodes. We’ll look at these tools in Chapter 11.

Additional Directories

If you’ve already run Cassandra using the default configuration, you will notice two additional directories under the main Cassandra directory: data and log. We’ll discuss the contents of these directories momentarily.

Building from Source

Cassandra uses Apache Ant for its build scripting language and Maven for dependency management.

Downloading Ant

You can download Ant from http://ant.apache.org. You don’t need to download Maven separately just to build Cassandra.

Building from source requires a complete Java 7 or 8 JDK, not just the JRE. If you see a message about how Ant is missing tools.jar, either you don’t have the full JDK or you’re pointing to the wrong path in your environment variables. Maven downloads files from the Internet so if your connection is invalid or Maven cannot determine the proxy, the build will fail.

Downloading Development Builds

If you want to download the most cutting-edge builds, you can get the source from Jenkins, which the Cassandra project uses as its Continuous Integration tool. See http://cassci.datastax.com for the latest builds and test coverage information.

If you are a Git fan, you can get a read-only trunk version of the Cassandra source using this command:

$ git clone git://git.apache.org/cassandra.git

What Is Git?

Git is a source code management system created by Linus Torvalds to manage development of the Linux kernel. It’s increasingly popular and is used by projects such as Android, Fedora, Ruby on Rails, Perl, and many Cassandra clients (as we’ll see in Chapter 8). If you’re on a Linux distribution such as Ubuntu, it couldn’t be easier to get Git. At a console, just type >apt-get install git and it will be installed and ready for commands. For more information, visit http://git-scm.com.

Because Maven takes care of all the dependencies, it’s easy to build Cassandra once you have the source. Just make sure you’re in the root directory of your source download and execute the ant program, which will look for a file called build.xml in the current directory and execute the default build target. Ant and Maven take care of the rest. To execute the Ant program and start compiling the source, just type:

$ ant

That’s it. Maven will retrieve all of the necessary dependencies, and Ant will build the hundreds of source files and execute the tests. If all went well, you should see a BUILD SUCCESSFUL message. If all did not go well, make sure that your path settings are all correct, that you have the most recent versions of the required programs, and that you downloaded a stable Cassandra build. You can check the Jenkins report to make sure that the source you downloaded actually can compile.

More Build Output

If you want to see detailed information on what is happening during the build, you can pass Ant the -v option to cause it to output verbose details regarding each operation it performs.

Additional Build Targets

To compile the server, you can simply execute ant as shown previously. This command executes the default target, jar. This target will perform a complete build including unit tests and output a file into the build directory called apache-cassandra-x.x.x.jar.

If you want to see a list of all of the targets supported by the build file, simply pass Ant the -p option to get a description of each target. Here are a few others you might be interested in:

test: Users will probably find this the most helpful, as it executes the battery of unit tests. You can also check out the unit test sources themselves for some useful examples of how to interact with Cassandra.
stress-build: This target builds the Cassandra stress tool, which we will try out in Chapter 12.
clean: This target removes locally created artifacts such as generated source files and classes and unit test results. The related target realclean performs a clean and additionally removes the Cassandra distribution JAR files and JAR files downloaded by Maven.

Running Cassandra

In earlier versions of Cassandra, before you could start the server there were some required steps to edit configuration files and set environment variables. But the developers have done a terrific job of making it very easy to start using Cassandra immediately. We’ll note some of the available configuration options as we go.

Required Java Version

Cassandra requires a Java 7 or 8 JVM, preferably the latest stable version. It has been tested on both the Open JDK and Oracle’s JDK. You can check your installed Java version by opening a command prompt and executing java -version. If you need a JDK, you can get one at http://www.oracle.com/technetwork/java/javase/downloads/index.html.

On Windows

Once you have the binary or the source downloaded and compiled, you’re ready to start the database server.

Setting the JAVA_HOME environment variable is recommended. To do this on Windows 7, click the Start button and then right-click on Computer. Click Advanced System Settings, and then click the Environment Variables... button. Click New... to create a new system variable. In the Variable Name field, type JAVA_HOME. In the Variable Value field, type the path to your Java installation. This is probably something like C:Program FilesJavajre7 if running Java 7 or C:Program FilesJavajre1.8.0_25 if running Java 8.

Remember that if you create a new environment variable, you’ll need to reopen any currently open terminals in order for the system to become aware of the new variable. To make sure your environment variable is set correctly and that Cassandra can subsequently find Java on Windows, execute this command in a new terminal: echo %JAVA_HOME%. This prints the value of your environment variable.

You can also define an environment variable called CASSANDRA_HOME that points to the top-level directory where you have placed or built Cassandra, so you don’t have to pay as much attention to where you’re starting Cassandra from. This is useful for other tools besides the database server, such as nodetool and cqlsh.

Once you’ve started the server for the first time, Cassandra will add directories to your system to store its data files. The default configuration creates these directories under the CASSANDRA_HOME directory.

data: This directory is where Cassandra stores its data. By default, there are three sub-directories under the data directory, corresponding to the various data files Cassandra uses: commitlog, data, and saved_caches. We’ll explore the significance of each of these data files in Chapter 6. If you’ve been trying different versions of the database and aren’t worried about losing data, you can delete these directories and restart the server as a last resort.

logs: This directory is where Cassandra stores its logs in a file called system.log. If you encounter any difficulties, consult the log to see what might have happened.

Data File Locations

The data file locations are configurable in the cassandra.yaml file, located in the conf directory. The properties are called data_file_directories, commit_log_directory, and saved_caches_directory. We’ll discuss the recommended configuration of these directories in Chapter 7.

On Linux

The process on Linux and other *nix operating systems (including Mac OS) is similar to that on Windows. Make sure that your JAVA_HOME variable is properly set, according to the earlier description. Then, you need to extract the Cassandra gzipped tarball using gunzip. Many users prefer to use the /var/lib directory for data storage. If you are changing this configuration, you will need to edit the conf/cassandra.yaml file and create the referenced directories for Cassandra to store its data and logs, making sure to configure write permissions for the user that will be running Cassandra:

$ sudo mkdir -p /var/lib/cassandra
$ sudo chown -R username /var/lib/cassandra

Instead of username, substitute your own username, of course.

Starting the Server

To start the Cassandra server on any OS, open a command prompt or terminal window, navigate to the <cassandra-directory>/bin where you unpacked Cassandra, and run the command cassandra -f to start your server.

Starting Cassandra in the Foreground

Using the -f switch tells Cassandra to stay in the foreground instead of running as a background process, so that all of the server logs will print to standard out and you can see them in your terminal window, which is useful for testing. In either case, the logs will append to the system.log file, described earlier.

In a clean installation, you should see quite a few log statements as the server gets running. The exact syntax of logging statements will vary depending on the release you’re using, but there are a few highlights we can look for. If you search for “cassandra.yaml”, you’ll quickly run into the following:

DEBUG [main] 2015-12-08 06:02:38,677 YamlConfigurationLoader.java:104 - 
  Loading settings from file:/.../conf/cassandra.yaml
INFO [main] 2015-12-08 06:02:38,781 YamlConfigurationLoader.java:179 - 
  Node configuration:[authenticator=AllowAllAuthenticator; 
  authorizer=AllowAllAuthorizer; auto_bootstrap=false; auto_snapshot=true; 
  batch_size_fail_threshold_in_kb=50; ...

These log statements indicate the location of the cassandra.yaml file containing the configured settings. The Node configuration statement lists out the settings from the config file.

Now search for “JVM” and you’ll find something like this:

INFO [main] 2015-12-08 06:02:39,239 CassandraDaemon.java:436 - 
  JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.8.0_60
INFO [main] 2015-12-08 06:02:39,239 CassandraDaemon.java:437 - 
  Heap size: 519045120/519045120

These log statements provide information describing the JVM being used, including memory settings.

Next, search for versions in use—“Cassandra version”, “Thrift API Version”, “CQL supported versions”:

INFO [main] 2015-12-08 06:02:43,931 StorageService.java:586 - 
  Cassandra version: 3.0.0
INFO [main] 2015-12-08 06:02:43,932 StorageService.java:587 - 
  Thrift API version: 20.1.0
INFO [main] 2015-12-08 06:02:43,932 StorageService.java:588 - 
  CQL supported versions: 3.3.1 (default: 3.3.1)

We can also find statements where Cassandra is initializing internal data structures such as caches:

INFO [main] 2015-12-08 06:02:43,633 CacheService.java:115 - 
  Initializing key cache with capacity of 24 MBs.
INFO [main] 2015-12-08 06:02:43,679 CacheService.java:137 - 
  Initializing row cache with capacity of 0 MBs
INFO [main] 2015-12-08 06:02:43,686 CacheService.java:166 - 
  Initializing counter cache with capacity of 12 MBs

If we search for terms like “JMX”, “gossip”, and “clients”, we can find statements like the following:

WARN [main] 2015-12-08 06:08:06,078 StartupChecks.java:147 - 
  JMX is not enabled to receive remote connections. 
  Please see cassandra-env.sh for more info.
INFO [main] 2015-12-08 06:08:18,463 StorageService.java:790 - 
  Starting up server gossip
INFO [main] 2015-12-08 06:02:48,171 Server.java:162 - 
  Starting listening for CQL clients on /127.0.0.1:9042 (unencrypted)

These log statements indicate the server is beginning to initiate communications with other servers in the cluster and expose publicly available interfaces. By default, the management interface via the Java Management Extensions (JMX) is disabled for remote access. We’ll explore the management interface in Chapter 10.

Finally, search for “state jump” and you’ll see the following:

INFO [main] 2015-12-08 06:02:47,351 StorageService.java:1936 - 
  Node /127.0.0.1 state jump to normal

Congratulations! Now your Cassandra server should be up and running with a new single node cluster called Test Cluster listening on port 9160. If you continue to monitor the output, you’ll begin to see periodic output such as memtable flushing and compaction, which we’ll learn about soon.

Starting Over

The committers work hard to ensure that data is readable from one minor dot release to the next and from one major version to the next. The commit log, however, needs to be completely cleared out from version to version (even minor versions).

If you have any previous versions of Cassandra installed, you may want to clear out the data directories for now, just to get up and running. If you’ve messed up your Cassandra installation and want to get started cleanly again, you can delete the data folders.

Stopping Cassandra

Now that we’ve successfully started a Cassandra server, you may be wondering how to stop it. You may have noticed the stop-server command in the bin directory. Let’s try running that command. Here’s what you’ll see on Unix systems:

$ ./stop-server
please read the stop-server script before use

So you see that our server has not been stopped, but instead we are directed to read the script. Taking a look inside with our favorite code editor, you’ll learn that the way to stop Cassandra is to kill the JVM process that is running Cassandra. The file suggests a couple of different techniques by which you can identify the JVM process and kill it.

The first technique is to start Cassandra using the -p option, which provides Cassandra with the name of a file to which it should write the process identifier (PID) upon starting up. This is arguably the most straightforward approach to making sure we kill the right process.

However, because we did not start Cassandra with the -p option, we’ll need to find the process ourselves and kill it. The script suggests using pgrep to locate processes for the current user containing the term “cassandra”:

user=`whoami`
pgrep -u $user -f cassandra | xargs kill -9

Stopping Cassandra on Windows

On Windows installations, you can find the JVM process and kill it using the Task Manager.

Other Cassandra Distributions

The instructions we just reviewed showed us how to install the Apache distribution of Cassandra. In addition to the Apache distribution, there are a couple of other ways to get Cassandra:

DataStax Community Edition: This free distribution is provided by DataStax via the Planet Cassandra website. Installation options for various platforms include RPM and Debian (Linux), MSI (Windows), and a MacOS library. The community edition provides additional tools, including an integrated development environment (IDE) known as DevCenter, and the OpsCenter monitoring tool. Another useful feature is the ability to configure Cassandra as an OS-managed service on Windows. Releases of the community edition generally track the Apache releases, with availability soon after each Apache release.

DataStax Enterprise Edition: DataStax also provides a fully supported version certified for production use. The product line provides an integrated database platform with support for complementary data technologies such as Hadoop and Apache Spark. We’ll explore some of these integrations in Chapter 14.

Virtual machine images: A frequent model for deployment of Cassandra is to package one of the preceding distributions in a virtual machine image. For example, multiple such images are available in the Amazon Web Services (AWS) Marketplace.

We’ll take a deeper look at several options for deploying Cassandra in production environments, including cloud computing environments, in Chapter 14.

Selecting the right distribution will depend on your deployment environment; your needs for scale, stability, and support; and your development and maintenance budgets. Having both open source and commercial deployment options provides the flexibility to make the right choice for your organization.

Running the CQL Shell

Now that you have a Cassandra installation up and running, let’s give it a quick try to make sure everything is set up properly. We’ll use the CQL shell (cqlsh) to connect to our server and have a look around.

Deprecation of the CLI

If you’ve used Cassandra in releases prior to 3.0, you may also be familiar with the command-line client interface known as cassandra-cli. The CLI was removed in the 3.0 release because it depends on the legacy Thrift API.

To run the shell, create a new terminal window, change to the Cassandra home directory, and type the following command (you should see output similar to that shown here):

$ bin/cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.0.0 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh>

Because we did not specify a node to which we wanted to connect, the shell helpfully checks for a node running on the local host, and finds the node we started earlier. The shell also indicates that you’re connected to a Cassandra server cluster called “Test Cluster”. That’s because this cluster of one node at localhost is set up for you by default.

Renaming the Default Cluster

In a production environment, be sure to change the cluster name to something more suitable to your application.

To connect to a specific node, specify the hostname and port on the command line. For example, the following will connect to our local node:

$ bin/cqlsh localhost 9042

Another alternative for configuring the cqlsh connection is to set the environment variables $CQLSH_HOST and $CQLSH_PORT. This approach is useful if you will be frequently connecting to a specific node on another host. The environment variables will be overriden if you specify the host and port on the command line.

Connection Errors

Have you run into an error like this while trying to connect to a server?

Exception connecting to localhost/9160. Reason: 
  Connection refused.

If so, make sure that a Cassandra instance is started at that host and port, and that you can ping the host you’re trying to reach. There may be firewall rules preventing you from connecting.

To see a complete list of the command-line options supported by cqlsh, type the command cqlsh -help.

Basic cqlsh Commands

Let’s take a quick tour of cqlsh to learn what kinds of commands you can send to the server. We’ll see how to use the basic environment commands and how to do a round-trip of inserting and retrieving some data.

Case in cqlsh

The cqlsh commands are all case insensitive. For our examples, we’ll adopt the convention of uppercase to be consistent with the way the shell describes its own commands in help topics and output.

cqlsh Help

To get help for cqlsh, type HELP or ? to see the list of available commands:

cqlsh> HELP
Documented shell commands:
===========================
CAPTURE      COPY  DESCRIBE  EXPAND  PAGING  SOURCE
CONSISTENCY  DESC  EXIT      HELP    SHOW    TRACING

CQL help topics:
================
ALTER                        CREATE_TABLE_TYPES  PERMISSIONS        
ALTER_ADD                    CREATE_USER         REVOKE             
ALTER_ALTER                  DATE_INPUT          REVOKE_ROLE        
ALTER_DROP                   DELETE              SELECT             
ALTER_RENAME                 DELETE_COLUMNS      SELECT_COLUMNFAMILY
ALTER_USER                   DELETE_USING        SELECT_EXPR        
ALTER_WITH                   DELETE_WHERE        SELECT_LIMIT       
APPLY                        DROP                SELECT_TABLE       
ASCII_OUTPUT                 DROP_AGGREGATE      SELECT_WHERE       
BEGIN                        DROP_COLUMNFAMILY   TEXT_OUTPUT        
BLOB_INPUT                   DROP_FUNCTION       TIMESTAMP_INPUT    
BOOLEAN_INPUT                DROP_INDEX          TIMESTAMP_OUTPUT   
COMPOUND_PRIMARY_KEYS        DROP_KEYSPACE       TIME_INPUT         
CREATE                       DROP_ROLE           TRUNCATE           
CREATE_AGGREGATE             DROP_TABLE          TYPES              
CREATE_COLUMNFAMILY          DROP_USER           UPDATE             
CREATE_COLUMNFAMILY_OPTIONS  GRANT               UPDATE_COUNTERS    
CREATE_COLUMNFAMILY_TYPES    GRANT_ROLE          UPDATE_SET         
CREATE_FUNCTION              INSERT              UPDATE_USING       
CREATE_INDEX                 INT_INPUT           UPDATE_WHERE       
CREATE_KEYSPACE              LIST                USE                
CREATE_ROLE                  LIST_PERMISSIONS    UUID_INPUT         
CREATE_TABLE                 LIST_ROLES        
CREATE_TABLE_OPTIONS         LIST_USERS

cqlsh Help Topics

You’ll notice that the help topics listed differ slightly from the actual command syntax. The CREATE_TABLE help topic describes how to use the syntax > CREATE TABLE ..., for example.

To get additional documentation about a particular command, type HELP <command>. Many cqlsh commands may be used with no parameters, in which case they print out the current setting. Examples include CONSISTENCY, EXPAND, and PAGING.

Describing the Environment in cqlsh

After connecting to your Cassandra instance Test Cluster, if you’re using the binary distribution, an empty keyspace, or Cassandra database, is set up for you to test with.

To learn about the current cluster you’re working in, type:

cqlsh> DESCRIBE CLUSTER;
Cluster: Test Cluster
Partitioner: Murmur3Partitioner
...

For releases 3.0 and later, this command also prints out a list of token ranges owned by each node in the cluster, which have been omitted here for brevity.

To see which keyspaces are available in the cluster, issue this command:

cqlsh> DESCRIBE KEYSPACES;
system_auth   system_distributed  system_schema
system        system_traces

Initially this list will consist of several system keyspaces. Once you have created your own keyspaces, they will be shown as well. The system keyspaces are managed internally by Cassandra, and aren’t for us to put data into. In this way, these keyspaces are similar to the master and temp databases in Microsoft SQL Server. Cassandra uses these keyspaces to store the schema, tracing, and security information. We’ll learn more about these keyspaces in Chapter 6.

You can use the following command to learn the client, server, and protocol versions in use:

cqlsh> SHOW VERSION;
[cqlsh 5.0.1 | Cassandra 3.0.0 | CQL spec 3.3.1 | Native protocol v4]

You may have noticed that this version info is printed out when cqlsh starts. There are a variety of other commands with which you can experiment. For now, let’s add some data to the database and get it back out again.

Creating a Keyspace and Table in cqlsh

A Cassandra keyspace is sort of like a relational database. It defines one or more tables or “column families.” When you start cqlsh without specifying a keyspace, the prompt will look like this: cqlsh>, with no keyspace specified.

Let’s create our own keyspace so we have something to write data to. In creating our keyspace, there are some required options. To walk through these options, we could use the command HELP CREATE_KEYSPACE, but instead we’ll use the helpful command-completion features of cqlsh. Type the following and then hit the Tab key:

cqlsh> CREATE KEYSPACE my_keyspace WITH

When you hit the Tab key, cqlsh begins completing the syntax of our command:

cqlsh> CREATE KEYSPACE my_keyspace WITH replication = {'class': '

This is informing us that in order to specify a keyspace, we also need to specify a replication strategy. Let’s Tab again to see what options we have:

cqlsh> CREATE KEYSPACE my_keyspace WITH replication = {'class': '
  NetworkTopologyStrategy    SimpleStrategy            
  OldNetworkTopologyStrategy

Now cqlsh is giving us three strategies to choose from. We’ll learn more about these strategies in Chapter 6. For now, we will choose the SimpleStrategy by typing the name. We’ll indicate we’re done with a closing quote and Tab again:

cqlsh> CREATE KEYSPACE my_keyspace WITH replication = {'class': 
  'SimpleStrategy', 'replication_factor':

The next option we’re presented with is a replication factor. For the simple strategy, this indicates how many nodes the data in this keyspace will be written to. For a production deployment, we’d want copies of our data stored on multiple nodes, but because we’re just running a single node at the moment, we’ll ask for a single copy. Let’s specify a value of “1” and Tab again:

cqlsh> CREATE KEYSPACE my_keyspace WITH replication = {'class': 
  'SimpleStrategy', 'replication_factor': 1};

We see that cqlsh has now added a closing bracket, indicating we’ve completed all of the required options. Let’s complete our command with a semicolon and return, and our keyspace will be created.

Keyspace Creation Options

For a production keyspace, we would probably never want to use a value of 1 for the replication factor. There are additional options on creating a keyspace depending on the replication strategy that is chosen. The command completion feature will walk through the different options.

Let’s have a look at our keyspace using theDESCRIBE KEYSPACE command:

cqlsh> DESCRIBE KEYSPACE my_keyspace
CREATE KEYSPACE my_keyspace WITH replication = {'class': 
  'SimpleStrategy', 'replication_factor': '1'} AND 
  durable_writes = true;

We see that the table has been created with the SimpleStrategy, a replication_factor of one, and durable writes. Notice that our keyspace is described in much the same syntax that we used to create it, with one additional option that we did not specify: durable_writes = true. Don’t worry about these settings now; we’ll look at them in detail later.

After you have created your own keyspace, you can switch to it in the shell by typing:

cqlsh> USE my_keyspace;
cqlsh:my_keyspace>

Notice that the prompt has changed to indicate that we’re using the keyspace.

Now that we have a keyspace, we can create a table in our keyspace. To do this in cqlsh, use the following command:

cqlsh:my_keyspace> CREATE TABLE user ( first_name text , 
  last_name text, PRIMARY KEY (first_name)) ;

This creates a new table called “user” in our current keyspace with two columns to store first and last names, both of type text. The text and varchar types are synonymous and are used to store strings. We’ve specified the first_name column as our primary key and taken the defaults for other table options.

Using Keyspace Names in cqlsh

We could have also created this table without switching to our keyspace by using the syntax CREATE TABLE my_keyspace.user (... .

We can use cqlsh to get a description of a the table we just created using the DESCRIBE TABLE command:

cqlsh:my_keyspace> DESCRIBE TABLE user;
CREATE TABLE my_keyspace.user (
    first_name text PRIMARY KEY,
    last_name text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.
      SizeTieredCompactionStrategy', 'max_threshold': '32', 
      'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 
      'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

You’ll notice that cqlsh prints a nicely formatted version of the CREATE TABLE command that we just typed in but also includes values for all of the available table options that we did not specify. These values are the defaults, as we did not specify them. We’ll worry about these settings later. For now, we have enough to get started.

Writing and Reading Data in cqlsh

Now that we have a keyspace and a table, we’ll write some data to the database and read it back out again. It’s OK at this point not to know quite what’s going on. We’ll come to understand Cassandra’s data model in depth later. For now, you have a keyspace (database), which has a table, which holds columns, the atomic unit of data storage.

To write a value, use the INSERT command:

cqlsh:my_keyspace> INSERT INTO user (first_name, last_name ) 
  VALUES ('Bill', 'Nguyen');

Here we have created a new row with two columns for the key Bill, to store a set of related values. The column names are first_name and last_name. We can use the SELECT COUNT command to make sure that the row was written:

cqlsh:my_keyspace> SELECT COUNT (*) FROM user;
 count
-------
     1

(1 rows)

Now that we know the data is there, let’s read it, using the SELECT command:

cqlsh:my_keyspace> SELECT * FROM user WHERE first_name='Bill';

 first_name | last_name
------------+-----------
       Bill |    Nguyen

(1 rows)

In this command, we requested to return rows matching the primary key Bill including all columns. You can delete a column using the DELETE command. Here we will delete the last_name column for the Bill row key:

cqlsh:my_keyspace> DELETE last_name FROM USER WHERE 
  first_name='Bill';

To make sure that it’s removed, we can query again:

cqlsh:my_keyspace> SELECT * FROM user WHERE first_name='Bill';

 first_name | last_name
------------+-----------
       Bill |      null

(1 rows)

Now we’ll clean up after ourselves by deleting the entire row. It’s the same command, but we don’t specify a column name:

cqlsh:my_keyspace> DELETE FROM USER WHERE first_name='Bill';

To make sure that it’s removed, we can query again:

cqlsh:my_keyspace> SELECT * FROM user WHERE first_name='Bill';

 first_name | last_name
------------+-----------

(0 rows)

If we really want to clean up after ourselves, we can remove all data from the table using the TRUNCATE command, or even delete the table schema using the DROP TABLE command.

cqlsh:my_keyspace> TRUNCATE user;
cqlsh:my_keyspace> DROP TABLE user;

cqlsh Command History

Now that you’ve been using cqlsh for a while, you may have noticed that you can navigate through commands you’ve executed previously with the up and down arrow key. This history is stored in a file called cqlsh_history, which is located in a hidden directory called .cassandra within your home directory. This acts like your bash shell history, listing the commands in a plain-text file in the order Cassandra executed them. Nice!

Summary

Now you should have a Cassandra installation up and running. You’ve worked with the cqlsh client to insert and retrieve some data, and you’re ready to take a step back and get the big picture on Cassandra before really diving into the details.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. Installing Cassandra

Create new playlist

Sign In

Sign Up

Chapter 3. Installing Cassandra

Installing the Apache Distribution

Extracting the Download

What’s In There?

Additional Directories

Building from Source

Downloading Ant

Downloading Development Builds

What Is Git?

More Build Output

Additional Build Targets

Running Cassandra

Required Java Version

On Windows

Data File Locations

On Linux

Starting the Server

Starting Cassandra in the Foreground

Starting Over

Stopping Cassandra

Stopping Cassandra on Windows

Other Cassandra Distributions

Running the CQL Shell

Deprecation of the CLI

Renaming the Default Cluster

Connection Errors

Basic cqlsh Commands

Case in cqlsh

cqlsh Help

cqlsh Help Topics

Describing the Environment in cqlsh

Creating a Keyspace and Table in cqlsh

Keyspace Creation Options

Using Keyspace Names in cqlsh

Writing and Reading Data in cqlsh

cqlsh Command History

Summary

Table of Contents for
3. Installing Cassandra