For those among us who like instant gratification, we’ll start by installing Cassandra. Because Cassandra introduces a lot of new vocabulary, there might be some unfamiliar terms as we walk through this. That’s OK; the idea here is to get set up quickly in a simple configuration to make sure everything is running properly. This will serve as an orientation. Then, we’ll take a step back and understand Cassandra in its larger context.
Cassandra is available for download from the Web at http://cassandra.apache.org. Just click the link on the home page to download a version as a gzipped tarball. Typically two versions of Cassandra are provided. The latest release is recommended for those starting new projects not yet in production. The most stable release is the one recommended for production usage. For all releases, the prebuilt binary is named apache-cassandra-x.x.x-bin.tar.gz, where x.x.x represents the version number. The download is around 23MB.
The simplest way to get started is to download the prebuilt binary. You can unpack the compressed file using any regular ZIP utility. On Unix-based systems such as Linux or MacOS, GZip extraction utilities should be preinstalled; on Windows, you’ll need to get a program such as WinZip, which is commercial, or something like 7-Zip, which is freeware.
Open your extracting program. You might have to extract the ZIP file and the TAR file in separate steps. Once you have a folder on your filesystem called apache-cassandra-x.x.x, you’re ready to run Cassandra.
Once you decompress the tarball, you’ll see that the Cassandra binary distribution includes several files and directories.
The files include the NEWS.txt file, which includes the release notes describing features included in the current and prior releases, and the CHANGES.txt, which is similar but focuses on bug fixes. You’ll want to make sure to review these files whenever you are upgrading to a new version so you know what changes to expect.
Let’s take a moment to look around in the directories and see what we have.
This directory contains the executables to run Cassandra as well as clients, including the query language shell (cqlsh
) and the command-line interface (CLI) client. It also has scripts to run the nodetool
, which is a utility for inspecting a cluster to determine whether it is properly configured, and to perform a variety of maintenance operations. We look at nodetool
in depth later. The directory also contains several utilities for performing operations on SSTables, including listing the keys of an SSTable (sstablekeys
), bulk extraction and restoration of SSTable contents (sstableloader
), and upgrading SSTables to a new version of Cassandra (sstableupgrade
).
This directory contains the files for configuring your Cassandra instance. The required configuration files include: the cassandra.yaml file, which is the primary configuration for running Cassandra; and the logback.xml file, which lets you change the logging settings to suit your needs. Additional files can optionally be used to configure the network topology, archival and restore commands, and triggers. We see how to use these configuration files when we discuss configuration in Chapter 7.
This directory contains a single file, called cassandra.thrift. This file defines a legacy Remote Procedure Call (RPC) API based on the Thrift syntax. The Thrift interface was used to create clients in Java, C++, PHP, Ruby, Python, Perl, and C# prior to the creation of CQL. The Thrift API has been officially marked as deprecated in the 3.2 release and will be deleted in the 4.0 release.
This directory contains a documentation website generated using Java’s JavaDoc tool. Note that JavaDoc reflects only the comments that are stored directly in the Java code, and as such does not represent comprehensive documentation. It’s helpful if you want to see how the code is laid out. Moreover, Cassandra is a wonderful project, but the code contains relatively few comments, so you might find the JavaDoc’s usefulness limited. It may be more fruitful to simply read the class files directly if you’re familiar with Java. Nonetheless, to read the JavaDoc, open the javadoc/index.html file in a browser.
This directory contains all of the external libraries that Cassandra needs to run. For example, it uses two different JSON serialization libraries, the Google collections project, and several Apache Commons libraries.
This directory contains Python libraries that are used by cqlsh
.
This directory contains tools that are used to maintain your Cassandra nodes. We’ll look at these tools in Chapter 11.
If you’ve already run Cassandra using the default configuration, you will notice two additional directories under the main Cassandra directory: data and log. We’ll discuss the contents of these directories momentarily.
Cassandra uses Apache Ant for its build scripting language and Maven for dependency management.
You can download Ant from http://ant.apache.org. You don’t need to download Maven separately just to build Cassandra.
Building from source requires a complete Java 7 or 8 JDK, not just the JRE. If you see a message about how Ant is missing tools.jar, either you don’t have the full JDK or you’re pointing to the wrong path in your environment variables. Maven downloads files from the Internet so if your connection is invalid or Maven cannot determine the proxy, the build will fail.
If you want to download the most cutting-edge builds, you can get the source from Jenkins, which the Cassandra project uses as its Continuous Integration tool. See http://cassci.datastax.com for the latest builds and test coverage information.
If you are a Git fan, you can get a read-only trunk version of the Cassandra source using this command:
$ git clone git://git.apache.org/cassandra.git
Git is a source code management system created by Linus Torvalds to manage development of the Linux kernel. It’s increasingly popular and is used by projects such as Android, Fedora, Ruby on Rails, Perl, and many Cassandra clients (as we’ll see in Chapter 8). If you’re on a Linux distribution such as Ubuntu, it couldn’t be easier to get Git. At a console, just type >apt-get install git and it will be installed and ready for commands. For more information, visit http://git-scm.com.
Because Maven takes care of all the dependencies, it’s easy to build Cassandra once you have the source. Just make sure you’re in the root directory of your source download and execute the ant
program, which will look for a file called build.xml in the current directory and execute the default build target. Ant and Maven take care of the rest. To execute the Ant program and start compiling the source, just type:
$ ant
That’s it. Maven will retrieve all of the necessary dependencies, and Ant will build the hundreds of source files and execute the tests. If all went well, you should see a BUILD SUCCESSFUL
message. If all did not go well, make sure that your path settings are all correct, that you have the most recent versions of the required programs, and that you downloaded a stable Cassandra build. You can check the Jenkins report to make sure that the source you downloaded actually can compile.
If you want to see detailed information on what is happening during the build, you can pass Ant the -v
option to cause it to output verbose details regarding each operation it performs.
To compile the server, you can simply execute ant as shown previously. This command executes the default target, jar. This target will perform a complete build including unit tests and output a file into the build directory called apache-cassandra-x.x.x.jar.
If you want to see a list of all of the targets supported by the build file, simply pass Ant the -p
option to get a description of each target. Here are a few others you might be interested in:
Users will probably find this the most helpful, as it executes the battery of unit tests. You can also check out the unit test sources themselves for some useful examples of how to interact with Cassandra.
This target builds the Cassandra stress tool, which we will try out in Chapter 12.
This target removes locally created artifacts such as generated source files and classes and unit test results. The related target realclean performs a clean and additionally removes the Cassandra distribution JAR files and JAR files downloaded by Maven.
In earlier versions of Cassandra, before you could start the server there were some required steps to edit configuration files and set environment variables. But the developers have done a terrific job of making it very easy to start using Cassandra immediately. We’ll note some of the available configuration options as we go.
Cassandra requires a Java 7 or 8 JVM, preferably the latest stable version. It has been tested on both the Open JDK and Oracle’s JDK. You can check your installed Java version by opening a command prompt and executing java -version
. If you need a JDK, you can get one at http://www.oracle.com/technetwork/java/javase/downloads/index.html.
Once you have the binary or the source downloaded and compiled, you’re ready to start the database server.
Setting the JAVA_HOME
environment variable is recommended. To do this on Windows 7, click the Start button and then right-click on . Click , and then click the button. Click to create a new system variable. In the Variable Name field, type JAVA_HOME
. In the Variable Value field, type the path to your Java installation. This is probably something like C:Program FilesJavajre7 if running Java 7 or C:Program FilesJavajre1.8.0_25 if running Java 8.
Remember that if you create a new environment variable, you’ll need to reopen any currently open terminals in order for the system to become aware of the new variable. To make sure your environment variable is set correctly and that Cassandra can subsequently find Java on Windows, execute this command in a new terminal: echo %JAVA_HOME%
. This prints the value of your environment variable.
You can also define an environment variable called CASSANDRA_HOME
that points to the top-level directory where you have placed or built Cassandra, so you don’t have to pay as much attention to where you’re starting Cassandra from. This is useful for other tools besides the database server, such as nodetool
and cqlsh
.
Once you’ve started the server for the first time, Cassandra will add directories to your system to store its data files. The default configuration creates these directories under the CASSANDRA_HOME directory.
This directory is where Cassandra stores its data. By default, there are three sub-directories under the data directory, corresponding to the various data files Cassandra uses: commitlog, data, and saved_caches. We’ll explore the significance of each of these data files in Chapter 6. If you’ve been trying different versions of the database and aren’t worried about losing data, you can delete these directories and restart the server as a last resort.
This directory is where Cassandra stores its logs in a file called system.log. If you encounter any difficulties, consult the log to see what might have happened.
The data file locations are configurable in the cassandra.yaml file, located in the conf directory. The properties are called data_file_directories
, commit_log_directory
, and saved_caches_directory
. We’ll discuss the recommended configuration of these directories in Chapter 7.
The process on Linux and other *nix operating systems (including Mac OS) is similar to that on Windows. Make sure that your JAVA_HOME
variable is properly set, according to the earlier description. Then, you need to extract the Cassandra gzipped tarball using gunzip. Many users prefer to use the /var/lib directory for data storage. If you are changing this configuration, you will need to edit the conf/cassandra.yaml file and create the referenced directories for Cassandra to store its data and logs, making sure to configure write permissions for the user that will be running Cassandra:
$ sudo mkdir -p /var/lib/cassandra $ sudo chown -R username /var/lib/cassandra
Instead of username
, substitute your own username, of course.
To start the Cassandra server on any OS, open a command prompt or terminal window, navigate to the <cassandra-directory>/bin where you unpacked Cassandra, and run the command cassandra -f
to start your server.
Using the -f
switch tells Cassandra to stay in the foreground instead of running as a background process, so that all of the server logs will print to standard out and you can see them in your terminal window, which is useful for testing. In either case, the logs will append to the system.log file, described earlier.
In a clean installation, you should see quite a few log statements as the server gets running. The exact syntax of logging statements will vary depending on the release you’re using, but there are a few highlights we can look for. If you search for “cassandra.yaml”, you’ll quickly run into the following:
DEBUG [main] 2015-12-08 06:02:38,677 YamlConfigurationLoader.java:104 - Loading settings from file:/.../conf/cassandra.yaml INFO [main] 2015-12-08 06:02:38,781 YamlConfigurationLoader.java:179 - Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=false; auto_snapshot=true; batch_size_fail_threshold_in_kb=50; ...
These log statements indicate the location of the cassandra.yaml file containing the configured settings. The Node configuration
statement lists out the settings from the config file.
Now search for “JVM” and you’ll find something like this:
INFO [main] 2015-12-08 06:02:39,239 CassandraDaemon.java:436 - JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.8.0_60 INFO [main] 2015-12-08 06:02:39,239 CassandraDaemon.java:437 - Heap size: 519045120/519045120
These log statements provide information describing the JVM being used, including memory settings.
Next, search for versions in use—“Cassandra version”, “Thrift API Version”, “CQL supported versions”:
INFO [main] 2015-12-08 06:02:43,931 StorageService.java:586 - Cassandra version: 3.0.0 INFO [main] 2015-12-08 06:02:43,932 StorageService.java:587 - Thrift API version: 20.1.0 INFO [main] 2015-12-08 06:02:43,932 StorageService.java:588 - CQL supported versions: 3.3.1 (default: 3.3.1)
We can also find statements where Cassandra is initializing internal data structures such as caches:
INFO [main] 2015-12-08 06:02:43,633 CacheService.java:115 - Initializing key cache with capacity of 24 MBs. INFO [main] 2015-12-08 06:02:43,679 CacheService.java:137 - Initializing row cache with capacity of 0 MBs INFO [main] 2015-12-08 06:02:43,686 CacheService.java:166 - Initializing counter cache with capacity of 12 MBs
If we search for terms like “JMX”, “gossip”, and “clients”, we can find statements like the following:
WARN [main] 2015-12-08 06:08:06,078 StartupChecks.java:147 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info. INFO [main] 2015-12-08 06:08:18,463 StorageService.java:790 - Starting up server gossip INFO [main] 2015-12-08 06:02:48,171 Server.java:162 - Starting listening for CQL clients on /127.0.0.1:9042 (unencrypted)
These log statements indicate the server is beginning to initiate communications with other servers in the cluster and expose publicly available interfaces. By default, the management interface via the Java Management Extensions (JMX) is disabled for remote access. We’ll explore the management interface in Chapter 10.
Finally, search for “state jump” and you’ll see the following:
INFO [main] 2015-12-08 06:02:47,351 StorageService.java:1936 - Node /127.0.0.1 state jump to normal
Congratulations! Now your Cassandra server should be up and running with a new single node cluster called Test Cluster listening on port 9160. If you continue to monitor the output, you’ll begin to see periodic output such as memtable flushing and compaction, which we’ll learn about soon.
The committers work hard to ensure that data is readable from one minor dot release to the next and from one major version to the next. The commit log, however, needs to be completely cleared out from version to version (even minor versions).
If you have any previous versions of Cassandra installed, you may want to clear out the data directories for now, just to get up and running. If you’ve messed up your Cassandra installation and want to get started cleanly again, you can delete the data folders.
Now that we’ve successfully started a Cassandra server, you may be wondering how to stop it. You may have noticed the stop-server
command in the bin directory. Let’s try running that command. Here’s what you’ll see on Unix systems:
$ ./stop-server please read the stop-server script before use
So you see that our server has not been stopped, but instead we are directed to read the script. Taking a look inside with our favorite code editor, you’ll learn that the way to stop Cassandra is to kill the JVM process that is running Cassandra. The file suggests a couple of different techniques by which you can identify the JVM process and kill it.
The first technique is to start Cassandra using the -p
option, which provides Cassandra with the name of a file to which it should write the process identifier (PID) upon starting up. This is arguably the most straightforward approach to making sure we kill the right process.
However, because we did not start Cassandra with the -p
option, we’ll need to find the process ourselves and kill it. The script suggests using pgrep
to locate processes for the current user containing the term “cassandra”:
user=`whoami` pgrep -u $user -f cassandra | xargs kill -9
The instructions we just reviewed showed us how to install the Apache distribution of Cassandra. In addition to the Apache distribution, there are a couple of other ways to get Cassandra:
This free distribution is provided by DataStax via the Planet Cassandra website. Installation options for various platforms include RPM and Debian (Linux), MSI (Windows), and a MacOS library. The community edition provides additional tools, including an integrated development environment (IDE) known as DevCenter, and the OpsCenter monitoring tool. Another useful feature is the ability to configure Cassandra as an OS-managed service on Windows. Releases of the community edition generally track the Apache releases, with availability soon after each Apache release.
DataStax also provides a fully supported version certified for production use. The product line provides an integrated database platform with support for complementary data technologies such as Hadoop and Apache Spark. We’ll explore some of these integrations in Chapter 14.
A frequent model for deployment of Cassandra is to package one of the preceding distributions in a virtual machine image. For example, multiple such images are available in the Amazon Web Services (AWS) Marketplace.
We’ll take a deeper look at several options for deploying Cassandra in production environments, including cloud computing environments, in Chapter 14.
Selecting the right distribution will depend on your deployment environment; your needs for scale, stability, and support; and your development and maintenance budgets. Having both open source and commercial deployment options provides the flexibility to make the right choice for your organization.
Now that you have a Cassandra installation up and running, let’s give it a quick try to make sure everything is set up properly. We’ll use the CQL shell (cqlsh
) to connect to our server and have a look around.
If you’ve used Cassandra in releases prior to 3.0, you may also be familiar with the command-line client interface known as cassandra-cli
. The CLI was removed in the 3.0 release because it depends on the legacy Thrift API.
To run the shell, create a new terminal window, change to the Cassandra home directory, and type the following command (you should see output similar to that shown here):
$ bin/cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.0.0 | CQL spec 3.3.1 | Native protocol v4] Use HELP for help. cqlsh>
Because we did not specify a node to which we wanted to connect, the shell helpfully checks for a node running on the local host, and finds the node we started earlier. The shell also indicates that you’re connected to a Cassandra server cluster called “Test Cluster”. That’s because this cluster of one node at localhost
is set up for you by default.
In a production environment, be sure to change the cluster name to something more suitable to your application.
To connect to a specific node, specify the hostname and port on the command line. For example, the following will connect to our local node:
$ bin/cqlsh localhost 9042
Another alternative for configuring the cqlsh
connection is to set the environment variables $CQLSH_HOST
and $CQLSH_PORT
. This approach is useful if you will be frequently connecting to a specific node on another host. The environment variables will be overriden if you specify the host and port on the command line.
Have you run into an error like this while trying to connect to a server?
Exception connecting to localhost/9160. Reason: Connection refused.
If so, make sure that a Cassandra instance is started at that host and port, and that you can ping the host you’re trying to reach. There may be firewall rules preventing you from connecting.
To see a complete list of the command-line options supported by cqlsh
, type the command cqlsh -help
.
Let’s take a quick tour of cqlsh
to learn what kinds of commands you can send to the server. We’ll see how to use the basic environment commands and how to do a round-trip of inserting and retrieving some data.
The cqlsh
commands are all case insensitive. For our examples, we’ll adopt the convention of uppercase to be consistent with the way the shell describes its own commands in help topics and output.
To get help for cqlsh
, type HELP
or ?
to see the list of available commands:
cqlsh> HELP Documented shell commands: =========================== CAPTURE COPY DESCRIBE EXPAND PAGING SOURCE CONSISTENCY DESC EXIT HELP SHOW TRACING CQL help topics: ================ ALTER CREATE_TABLE_TYPES PERMISSIONS ALTER_ADD CREATE_USER REVOKE ALTER_ALTER DATE_INPUT REVOKE_ROLE ALTER_DROP DELETE SELECT ALTER_RENAME DELETE_COLUMNS SELECT_COLUMNFAMILY ALTER_USER DELETE_USING SELECT_EXPR ALTER_WITH DELETE_WHERE SELECT_LIMIT APPLY DROP SELECT_TABLE ASCII_OUTPUT DROP_AGGREGATE SELECT_WHERE BEGIN DROP_COLUMNFAMILY TEXT_OUTPUT BLOB_INPUT DROP_FUNCTION TIMESTAMP_INPUT BOOLEAN_INPUT DROP_INDEX TIMESTAMP_OUTPUT COMPOUND_PRIMARY_KEYS DROP_KEYSPACE TIME_INPUT CREATE DROP_ROLE TRUNCATE CREATE_AGGREGATE DROP_TABLE TYPES CREATE_COLUMNFAMILY DROP_USER UPDATE CREATE_COLUMNFAMILY_OPTIONS GRANT UPDATE_COUNTERS CREATE_COLUMNFAMILY_TYPES GRANT_ROLE UPDATE_SET CREATE_FUNCTION INSERT UPDATE_USING CREATE_INDEX INT_INPUT UPDATE_WHERE CREATE_KEYSPACE LIST USE CREATE_ROLE LIST_PERMISSIONS UUID_INPUT CREATE_TABLE LIST_ROLES CREATE_TABLE_OPTIONS LIST_USERS
You’ll notice that the help topics listed differ slightly from the actual command syntax. The CREATE_TABLE
help topic describes how to use the syntax > CREATE TABLE ...
, for example.
To get additional documentation about a particular command, type HELP <command>
. Many cqlsh
commands may be used with no parameters, in which case they print out the current setting. Examples include CONSISTENCY
, EXPAND
, and PAGING
.
After connecting to your Cassandra instance Test Cluster, if you’re using the binary distribution, an empty keyspace, or Cassandra database, is set up for you to test with.
To learn about the current cluster you’re working in, type:
cqlsh> DESCRIBE CLUSTER; Cluster: Test Cluster Partitioner: Murmur3Partitioner ...
For releases 3.0 and later, this command also prints out a list of token ranges owned by each node in the cluster, which have been omitted here for brevity.
To see which keyspaces are available in the cluster, issue this command:
cqlsh> DESCRIBE KEYSPACES; system_auth system_distributed system_schema system system_traces
Initially this list will consist of several system
keyspaces. Once you have created your own keyspaces, they will be shown as well. The system
keyspaces are managed internally by Cassandra, and aren’t for us to put data into. In this way, these keyspaces are similar to the master and temp databases in Microsoft SQL Server. Cassandra uses these keyspaces to store the schema, tracing, and security information. We’ll learn more about these keyspaces in Chapter 6.
You can use the following command to learn the client, server, and protocol versions in use:
cqlsh> SHOW VERSION; [cqlsh 5.0.1 | Cassandra 3.0.0 | CQL spec 3.3.1 | Native protocol v4]
You may have noticed that this version info is printed out when cqlsh
starts. There are a variety of other commands with which you can experiment. For now, let’s add some data to the database and get it back out again.
A Cassandra keyspace is sort of like a relational database. It defines one or more tables or “column families.” When you start cqlsh
without specifying a keyspace, the prompt will look like this: cqlsh>
, with no keyspace specified.
Let’s create our own keyspace so we have something to write data to. In creating our keyspace, there are some required options. To walk through these options, we could use the command HELP CREATE_KEYSPACE
, but instead we’ll use the helpful command-completion features of cqlsh
. Type the following and then hit the Tab key:
cqlsh> CREATE KEYSPACE my_keyspace WITH
When you hit the Tab key, cqlsh
begins completing the syntax of our command:
cqlsh> CREATE KEYSPACE my_keyspace WITH replication = {'class': '
This is informing us that in order to specify a keyspace, we also need to specify a replication strategy. Let’s Tab again to see what options we have:
cqlsh> CREATE KEYSPACE my_keyspace WITH replication = {'class': ' NetworkTopologyStrategy SimpleStrategy OldNetworkTopologyStrategy
Now cqlsh
is giving us three strategies to choose from. We’ll learn more about these strategies in Chapter 6. For now, we will choose the SimpleStrategy
by typing the name. We’ll indicate we’re done with a closing quote and Tab again:
cqlsh> CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor':
The next option we’re presented with is a replication factor. For the simple strategy, this indicates how many nodes the data in this keyspace will be written to. For a production deployment, we’d want copies of our data stored on multiple nodes, but because we’re just running a single node at the moment, we’ll ask for a single copy. Let’s specify a value of “1” and Tab again:
cqlsh> CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
We see that cqlsh
has now added a closing bracket, indicating we’ve completed all of the required options. Let’s complete our command with a semicolon and return, and our keyspace will be created.
For a production keyspace, we would probably never want to use a value of 1 for the replication factor. There are additional options on creating a keyspace depending on the replication strategy that is chosen. The command completion feature will walk through the different options.
Let’s have a look at our keyspace using theDESCRIBE KEYSPACE
command:
cqlsh> DESCRIBE KEYSPACE my_keyspace CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
We see that the table has been created with the SimpleStrategy
, a replication_factor
of one, and durable writes. Notice that our keyspace is described in much the same syntax that we used to create it, with one additional option that we did not specify: durable_writes = true
. Don’t worry about these settings now; we’ll look at them in detail later.
After you have created your own keyspace, you can switch to it in the shell by typing:
cqlsh> USE my_keyspace; cqlsh:my_keyspace>
Notice that the prompt has changed to indicate that we’re using the keyspace.
Now that we have a keyspace, we can create a table in our keyspace. To do this in cqlsh
, use the following command:
cqlsh:my_keyspace> CREATE TABLE user ( first_name text , last_name text, PRIMARY KEY (first_name)) ;
This creates a new table called “user” in our current keyspace with two columns to store first and last names, both of type text. The text
and varchar
types are synonymous and are used to store strings. We’ve specified the first_name
column as our primary key and taken the defaults for other table options.
We could have also created this table without switching to our keyspace by using the syntax CREATE TABLE my_keyspace.user (...
.
We can use cqlsh
to get a description of a the table we just created using the DESCRIBE TABLE
command:
cqlsh:my_keyspace> DESCRIBE TABLE user; CREATE TABLE my_keyspace.user ( first_name text PRIMARY KEY, last_name text ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction. SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE';
You’ll notice that cqlsh
prints a nicely formatted version of the CREATE TABLE
command that we just typed in but also includes values for all of the available table options that we did not specify. These values are the defaults, as we did not specify them. We’ll worry about these settings later. For now, we have enough to get started.
Now that we have a keyspace and a table, we’ll write some data to the database and read it back out again. It’s OK at this point not to know quite what’s going on. We’ll come to understand Cassandra’s data model in depth later. For now, you have a keyspace (database), which has a table, which holds columns, the atomic unit of data storage.
To write a value, use the INSERT
command:
cqlsh:my_keyspace> INSERT INTO user (first_name, last_name ) VALUES ('Bill', 'Nguyen');
Here we have created a new row with two columns for the key Bill
, to store a set of related values. The column names are first_name
and last_name
. We can use the SELECT COUNT
command to make sure that the row was written:
cqlsh:my_keyspace> SELECT COUNT (*) FROM user; count ------- 1 (1 rows)
Now that we know the data is there, let’s read it, using the SELECT
command:
cqlsh:my_keyspace> SELECT * FROM user WHERE first_name='Bill'; first_name | last_name ------------+----------- Bill | Nguyen (1 rows)
In this command, we requested to return rows matching the primary key Bill
including all columns. You can delete a column using the DELETE
command. Here we will delete the last_name
column for the Bill
row key:
cqlsh:my_keyspace> DELETE last_name FROM USER WHERE first_name='Bill';
To make sure that it’s removed, we can query again:
cqlsh:my_keyspace> SELECT * FROM user WHERE first_name='Bill'; first_name | last_name ------------+----------- Bill | null (1 rows)
Now we’ll clean up after ourselves by deleting the entire row. It’s the same command, but we don’t specify a column name:
cqlsh:my_keyspace> DELETE FROM USER WHERE first_name='Bill';
To make sure that it’s removed, we can query again:
cqlsh:my_keyspace> SELECT * FROM user WHERE first_name='Bill'; first_name | last_name ------------+----------- (0 rows)
If we really want to clean up after ourselves, we can remove all data from the table using the TRUNCATE
command, or even delete the table schema using the DROP TABLE
command.
cqlsh:my_keyspace> TRUNCATE user; cqlsh:my_keyspace> DROP TABLE user;
Now that you’ve been using cqlsh
for a while, you may have noticed that you can navigate through commands you’ve executed previously with the up and down arrow key. This history is stored in a file called cqlsh_history, which is located in a hidden directory called .cassandra within your home directory. This acts like your bash shell history, listing the commands in a plain-text file in the order Cassandra executed them. Nice!
Now you should have a Cassandra installation up and running. You’ve worked with the cqlsh
client to insert and retrieve some data, and you’re ready to take a step back and get the big picture on Cassandra before really diving into the details.
18.118.1.232