CQL 3 basics

This section is a brief introduction to CQL 3 on Cassandra 1.1.x. We will see queries and discuss options available. For more formal query definitions, please refer to the Cassandra CQL 3 documentation that is appropriate for the Cassandra version that you are using. One of the good sources for this is the Apache CQL 3 documentation and the DataStax Cassandra documentation. Most of the queries that are mentioned here should work in 1.1.x as well as in 1.2.x versions, but the queries are not tested against 1.2.x and further versions.

Note

Apache CQL 3 documentation at http://cassandra.apache.org/doc/cql3/CQL.html.

Cassandra CQL documentation at http://www.datastax.com/docs/1.1/references/cql/index.

The CREATE KEYSPACE query

Creating keyspace just requires keyspace name, strategy class, and options for the specific strategy class as mandatory elements. Keyspace name has the same restrictions as Java variable naming has. Keyspace names should be alpha-numeric and they must start with an alphabet.

CREATE KEYSPACE "MyKeyspace1" 
with
STRATEGY_CLASS = SimpleStrategy
and
STRATEGY_OPTIONS:REPLICATION_FACTOR = 2;

Double quotes around the keyspace name are not mandatory. If you do not use quotes, the casing will be ignored and the name will be stored in lowercase.

STRATEGY_CLASS has two options: SimpleStrategy and NetworkTopologyStrategy. Refer to section Replica placement strategies from Chapter 4, Deploying a Cluster, for details.

If you use SimpleStrategy, all you need to specify is REPLICATION_FACTOR as a part of STRATEGY_OPTIONS.

If you use NetworkTopologyStrategy, you will need to specify the replication factor for each data center. Something like this:

CREATE KEYSPACE "MyKeyspace1" 
with
STRATEGY_CLASS = NetworkTopologyStrategy
and
STRATEGY_OPTIONS:DC1 = 2
and
STRATEGY_OPTIONS:DC2 = 1;

The preceding statement basically says to set the replication factor to 2 in the data center DC1 and the replication factor to 1 in the data center DC2.

Apart from this there is another optional parameter called DURABLE_WRITES. Its value can be true or false, depending on your need.

The CREATE TABLE query

A table is the way CQL 3 chooses to call a column family. Always translate a table to a column family, if you are used to seeing things the Thrift way.

In the simplest form creating a table is just like an SQL table creation (except, obviously, for foreign keys and some other constraints). The other settings are for the column family and are set to a sensible default.

# create table with default settings
CREATE TABLE simpletable
  (
id UUID PRIMARY KEY, 
name VARCHAR, 
age INT, 
aboutMe TEXT
);

# let's see what are the default assignments
cqlsh:MyKeyspace>DESC TABLE simpletable; 

CREATE TABLE simpletable ( 
id uuid PRIMARY KEY, 
aboutme text, 
age int, 
name text 
) WITH 
comment='' AND 
caching='KEYS_ONLY' AND 
read_repair_chance=0.100000 AND 
gc_grace_seconds=864000 AND 
replicate_on_write='true' AND 
compaction_strategy_class='SizeTieredCompactionStrategy' AND 
compression_parameters:sstable_compression='SnappyCompressor';

The latter CREATE TABLE command in the previous snippet is interesting. Apart from describing everything that constitutes that column family/table, it also prints the command to create this table with all the optional parameters. Let's see the optional parameters quickly.

  1. comment: It is used to store any comment or information that anyone might be interested in while looking at the table.
  2. caching: As we know from section Cache settings, Chapter 5, Performance Tuning, there are four caching options available to a column family. These options can be KEYS_ONLY, ROWS_ONLY, ALL, and NONE. The default option is KEYS_ONLY. If you have no idea about caching, leave it to the default. It is generally a good option.
  3. read_repair_chance: It denotes the chance to query extra nodes than what the consistency level dictates for read-repairs. Refer to section Read repair and Anti-entropy from Chapter 2, Cassandra Architecture for read-repairs.
  4. gc_grace_seconds: It determines the duration to wait after deletion to remove the tombstones.
  5. replicate_on_write: This is a boolean variable that can be set to true or false based on whether you want to replicate the data on write. This can be set to false only for column families/tables with counter column values. It is not advisable to turn this to false.
  6. compaction_strategy_class: The approach that you wanted to take on SSTable compression for this column family. Refer section Choosing the right compaction strategy, Chapter 5, Performance Tuning, for details. There are two options available: SizeTieredCompactionStrategy and LeveledCompactionStrategy. The former option is the default option.
  7. compression_parameters:sstable_compression: This command sets the type of compressor you wanted to use for a column family. There are two compressors available out of the box: SnappyCompressor and DeflateCompressor. SnappyCompressor is the default compressor.
  8. To disable compression, use an empty string '' (two single quotes without anything in between).
  9. compression_parameters:chunk_length_kb: SSTable unit block size to be compressed by the specified compressor. The larger the chunk length, the better the compression. But a large block size affects the amount of the data read negatively. Read more on compression in section Enabling compression from Chapter 5, Performance Tuning, The default value is 64, which implies 64 kilobytes.
  10. compression_parameters:crc_check_chance: The probability of validating checksums during reads. Always check that by default this parameter is 1. It helps avoid corrupted data to move across replicas. It can be set to any value between 0 and 1, both included. To disable CRC, set it to zero, but this is not a good idea.

Here is an example of a completely customized CREATE TABLE command with all the anti-default settings:

CREATE TABLE non_default_table ( 
id uuid PRIMARY KEY, 
aboutme text, 
age int, 
name text 
 ) WITH 
comment='test c' AND 
caching='ALL' AND 
read_repair_chance=0.900000 AND 
gc_grace_seconds=432000 AND 
replicate_on_write='false' AND 
compaction_strategy_class='LeveledCompactionStrategy' AND 
compression_parameters:chunk_length_kb='128' AND 
compression_parameters:sstable_compression='DeflateCompressor' AND 
compression_parameters:crc_check_chance=0.5; 

Compact storage

Compact storage is a true representation of wide rows. Have a look at the following statement:

# Wide row definition in CQL 3 with compact storage
CREATE TABLE compact_cf
(key BIGINT, column1 VARCHAR, column2 varchar, PRIMARY KEY(key, column1)) 
WITH COMPACT STORAGE; 

In a case where you want to have a row key, a column name—it may be of composite type, and a column value, the preceding statement is exactly the same as the following:

# Wide row definition in cassandra-cli
CREATE COLUMN FAMILY compact_cf
WITH 
KEY_VALIDATION_CLASS = LongType
AND COMPARATOR = UTF8Type
AND DEFAULT_VALIDATION_CLASS = UTF8Type;

Compact storage is generally discouraged, especially if you are starting with CQL 3. There are a couple of reasons for this. First, like wide rows in Thrift, you can have just one value per column name. Speaking in terms of CQL 3, if you declare a compact storage, you will have all but one column that is not a part of the primary key. It is because compact storage takes all the first components of primary keys to be column names (in THE Thrift realm) and the first component is treated as a row key. It makes the structure rigid. A non-compact table does this by representing the column name as a static slice of a wide row. So, a non-compact wide row in CQL looks like this:

cqlsh:MyKeyspace> CREATE TABLE non_compact_cf
(key BIGINT, column1 VARCHAR, column2 varchar, value VARCHAR, PRIMARY KEY(key, column1)); 

cqlsh:MyKeyspace> INSERT INTO non_compact_cf (key, column1, column2, value)
VALUES (1, 'text1', 'text2', 'val1' ); 

cqlsh:MyKeyspace> SELECT * FROM non_compact_cf; 
 key | column1 | column2 | value 
-----+---------+---------+------- 
   1 |   text1 |   text2 |  val1

This is basically a smartly sliced materialized row sort of thing when you see it from the Thrift perspective:

[default@MyKeyspace] LIST non_compact_cf; 
[-- snip --] 
RowKey: 1 
=> (column=text1:column2, value=text2, timestamp=1375074891655000) 
=> (column=text1:value, value=val1, timestamp=1375074891655001)

You can see that non-key columns just create a static slice for a given column name.

The second reason to discourage compact storage is static tables with collections (please note that collections are introduced in Cassandra 1.2.x, so this is not valid in older versions). Static tables, as one would expect, should have no dynamic parts. So, the column names should be simple UTF8Type, but that is not true. It uses CompositeType internally, the reason being a new feature to support—collections. We will see collections support later in this chapter under the Collections support section. A collection provides the ability to store sets, lists, and maps as column values, and to get them working you need to have composite column names. So, the following code snippet will fail:

cqlsh:my_keyspace> CREATE TABLE compact_with_set
(id INT PRIMARY KEY, name VARCHAR, emails SET<VARCHAR>) 
WITH COMPACT STORAGE; 
Bad Request: Collection types are not supported with COMPACT STORAGE 

The bottom line is unless you are sure that you are not going to modify a table, do not use compact storage. It provides a little storage benefit at the cost of major impairment in flexibility.

Creating a secondary index

Secondary indexes can be created on a static column family, on named columns. You can create a secondary index only after defining the table. It is a good practice to provide a name to the index, but it is optional.

 CREATE INDEX 
index_name ON 
table_name(column_name );

The previous query creates an index named index_name on a table named table_name for a column named column_name.

The INSERT query

The insert query looks exactly the same as SQL, except there are a few extra optional parameters that you can set, such as TTL (Time To Live), TIMESTAMP, and obviously, CONSISTENCY. These three have their usual meaning.

INSERT INTO 
simpletable (id, name, age) 
VALUES 
('f7a1cb71-50b4-4128-a805-fc3cd0c1623c', 'Foo Bar', '42') 
USING 
CONSISTENCY QUORUM 
and
TTL 864000 
and
TIMESTAMP 1374893426000;

A few things worth noting:

  • Did you notice how there is no equal sign after the TTL or TIMESTAMP keywords?
  • Although Cassandra is a schema-optional database, CQL 3 does not allow you to add any random column without declaring it during table creation. If you try to insert an unknown column, you will get an error.
  • Cassandra 1.2.x and onward does not have the CONSISTENCY setting.

To view the remaining TTL and write TIMESTAMP, you can always query using special functions for each column. Here is an example:

cqlsh:MyKeyspace> select name, age, WRITETIME(name), TTL(age) from simpletable; 

   name   | age | writetime(name) | ttl(age) 
----------+-----+-----------------+---------- 
 Leon Kik |  59 |   1374893123000 |   853426 
  Foo Bar |  42 |   1374893426000 |   855792 

Note that insert is basically upsert (update or insert). Insertion on an existing key will overwrite the contents.

The SELECT query

The select statement has pseudo SQL select like pattern. There are fine prints while using select, if you are coming from SQL development. The caveats are due to the difference between the underlying storage mechanisms. The Cassandra storage model is different from a fixed tabular or B-tree-like structure. The general structure looks like this:

SELECT <select expression>
FROM <column family name>
USING CONSISTENCY <consistency level>            //OPTIONAL 
WHERE <AND separated where clauses>             //OPTIONAL
ORDER BY <orderable field>                     //OPTIONAL SUBCLAUSE
LIMIT <limit number>                          //OPTIONAL

The simplest select query is just pulling all the columns from a column family. Since we do not specify a limit, it returns 10,000 records as default. Here's what it looks like:

SELECT * FROM select_test; 
id                                   | age | location | name 
-------------------------------------+-----+----------+------- 
5bc25ea7-1636-4a54-b292-bc0877653d24 |  15 |       LA | name4
367d2c7e-d878-4211-8021-3e92cb2a45f5 |  13 |       VA | name2
ae64284b-5413-41b8-a446-fdcd373c8fc0 |  14 |       NY | name3
98cf99b9-8c93-4b64-bd06-a3c91f57e5b9 |  12 |       TX | name1

Notice that the rows are unordered. They are not ordered by key. They are not in the order of insertion. Basically, an order is not preserved since we use RandomPartitioner in cassandra.yaml. For more on partitioners, read section Partitioner from Chapter 2, Cassandra Architecture.

Let's see what all those terms within angled brackets in the previous snippet mean.

select expression

select expression can be a * to select all the columns of a CQL row. To select specific columns, use comma-separated column names to be fetched. One can apply WRITETIME and TTL functions on column names to retrieve the timestamp that was set during the latest write of that column and TTL tells about the remaining time of that particular cell after which it will be deleted and marked as a tombstone.

COUNT(*) is another function that can be used to fetch a number of CQL columns for a particular query. It looks exactly the same as an SQL count function.

select count(*) from select_test; 
count
------- 
     4 

The WHERE clause

This clause is used to narrow down the result set by specifying constraints. The WHERE clause in CQL is not as fancy as it is in the SQL world and it is where you need to understand what you can or cannot do with Cassandra.

The WHERE clause supports only the AND conjunction. There is no OR conjunction. You can only use the columns that compose PRIMARY KEY or have a secondary index created on them. But you cannot use a partition key (or a row key in Thrift terms) with an inequality. Here are a few examples on the data we inserted in the previous example:

# WHERE clause with equality sign on partitioning key
cqlsh:MyKeyspace> select * from select_test where 
id = 'ae64284b-5413-41b8-a446-fdcd373c8fc0'; 

 id                                   | age | location | name 
--------------------------------------+-----+----------+------- 
 ae64284b-5413-41b8-a446-fdcd373c8fc0 |  14 |       NY | name3 

# Inequality is not supported
cqlsh:MyKeyspace> select * from select_test where 
id >= 'ae64284b-5413-41b8-a446-fdcd373c8fc0'; 

Bad Request: Only EQ and IN relation are supported on first component of the PRIMARY KEY for RandomPartitioner (unless you use the token() function) 


#If you have to use inequality you may use token comparison, it may or may not serve your purpose.
cqlsh:MyKeyspace1> select * from select_test where 
token(id) >= token('ae64284b-5413-41b8-a446-fdcd373c8fc0'), 

 id                                   | age | location | name 
--------------------------------------+-----+----------+------- 
 ae64284b-5413-41b8-a446-fdcd373c8fc0 |  14 |       NY | name3 
 98cf99b9-8c93-4b64-bd06-a3c91f57e5b9 |  12 |       TX | name1

The trick to use inequality with partition keys is to use the token function. However, it may not help you. What it does is it creates a token of the partition key based on the partitioner that you are using (RandomPartitioner, ByteOrderPartitioner, and so on) and uses it to fetch the rows with matching tokens. It may not make sense to use tokens for rows in case of RandomPartitioner, but be aware that in ByteOrderPartitioner case, the tokens are ordered by bytes, so it may or may not be lexicographically ordered.

You can, however, use the IN keyword to provide a comma-separated list of partition keys to pull the rows for them. IN follows the same format as in SQL.

There is another caveat with compound keys. If you use compound keys, the selection is limited to contiguous CQL 3 rows. It is much easier to think in terms of SLICE from Thrift. The Thrift columns or CQL 3 cells are ordered entities. You can apply SLICE only to continuous regions. This example will clarify why this happens:

# Create table
cqlsh:MyKeyspace> CREATE TABLE where_clause_test
(id int, name varchar, role int, message text, PRIMARY KEY (id, name, role) );

# Insert some data
cqlsh:MyKeyspace> INSERT INTO where_clause_test (id, name, role, message) 
VALUES (1, 'A Smith', 3, 'Hi! everyone'), 
cqlsh:MyKeyspace> INSERT INTO where_clause_test (id, name, role, message) 
VALUES (2, 'B Washington', 2, 'Logging out'), 
cqlsh:MyKeyspace> INSERT INTO where_clause_test (id, name, role, message) 
VALUES (2, 'B Washington', 3, 'Logging in'), 
cqlsh:MyKeyspace> INSERT INTO where_clause_test (id, name, role, message) 
VALUES (2, 'A Smith', 1, 'Creating new tickets'), 
cqlsh:MyKeyspace> INSERT INTO where_clause_test (id, name, role, message) 
VALUES (2, 'B Washington', 4, 'Added new employees'), 


# How it looks in CQL point of view
cqlsh:MyKeyspace> SELECT * FROM where_clause_test;

id  | name         | role | message 
----+--------------+------+---------------------- 
  1 |      A Smith |    3 |         Hi! everyone
  2 |      A Smith |    1 | Creating new tickets 
  2 | B Washington |    2 |          Logging out 
  2 | B Washington |    3 |           Logging in 
  2 | B Washington |    4 |  Added new employees 

Let's experiment with a few select queries:

# Get all users logged via machine id = 2, named 'B Washington' as a # role less than equal 3 (3 = manager priviledge, say).
cqlsh:MyKeyspace> SELECT * FROM where_clause_test WHERE id = 2 AND name = 'B Washington' AND role <= 3; 

id  | name         | role | message 
----+--------------+------+------------- 
  2 | B Washington |    2 | Logging out 
  2 | B Washington |    3 |  Logging in 

# get all the users logged in from machine is = 2 as manager (role = # 3) and name anything lexically greater than 'A'
cqlsh:MyKeyspace> SELECT * FROM where_clause_test WHERE id = 2 AND name > 'A' AND role = 3; 
Bad Request: PRIMARY KEY part role cannot be restricted (preceding part name is either not restricted or by a non-EQ relation)

# get all the users logged into the system from machine id = 2 as a # manager (role =3) irrespective of their names.
cqlsh:MyKeyspace> SELECT * FROM where_clause_test WHERE id = 2 AND role = 3; 
Bad Request: PRIMARY KEY part role cannot be restricted (preceding part name is either not restricted or by a non-EQ relation) 

It is a bit frustrating for a beginner that you can see the columns just like in an RDBMS; you can run some of the queries but you cannot run some others that look closely similar. The answer lies in the underlying representation. When we defined the composite key, the data for each composite key does not actually go in separate rows. Instead, the first part of the compound key is the row key and the rest is just secondary keys creating partitions in a wide row. This view can be seen using Thrift.

# incassandra-cli
[default@MyKeyspace1] list where_clause_test; 
Using default limit of 100 
Using default column limit of 100 
------------------- 
RowKey: 1 
=> (column=A Smith:3:message, value=Hi! everyone, timestamp=1374934726143000) 
------------------- 
RowKey: 2 
=> (column=A Smith:1:message, value=Creating new tickets, timestamp=1374934870771000) 
=> (column=B Washington:2:message, value=Logging out, timestamp=1374934807710000) 
=> (column=B Washington:3:message, value=Logging in, timestamp=1374934836937000) 
=> (column=B Washington:4:message, value=Added new employees, timestamp=1374934915607000) 

2 Rows Returned. 
Elapsed time: 84 msec(s). 

Now, you can see why we cannot use inequality in any of the primary keys but the last one.

The ORDER BY clause

ORDER BY enables ordering of the rows by the second component of the primary key, even if your primary key contains more than two components. The reason again is the same as we discussed in the previous section, when discussing the WHERE clause for tables with compound keys. ORDER BY can order only one column and there must be a valid WHERE clause before it. Here is an example:

cqlsh:MyKeyspace> SELECT * FROM where_clause_test WHERE id = 2 ORDER BY name DESC;

id  | name         | role | message 
----+--------------+------+---------------------- 
  2 | B Washington |    4 |  Added new employees 
  2 | B Washington |    3 |           Logging in 
  2 | B Washington |    2 |          Logging out 
  2 |      A Smith |    1 | Creating new tickets 

You can sort in a descending order by specifying the DESC keyword. The ascending column value is default if nothing is mentioned. To explicitly mention ascending, use the ASC keyword.

The LIMIT clause

The limit clause works the same way as in an RDBMS. It limits the number of rows to be returned to the user. It can be used to perform pagination. If nothing is mentioned, CQL limits the results to 10,000 rows. Use limit to fetch either more or fewer results than the default limit value.

# limiting resulting rows to 2 of total 4 rows
cqlsh:MyKeyspace1> SELECT * FROM where_clause_test LIMIT 2; 

id  | name    | role | message 
----+---------+------+---------------------- 
  1 | A Smith |    3 |         Hi! everyone
  2 | A Smith |    1 | Creating new tickets 

The USING CONSISTENCY clause

In Cassandra 1.1.x, you can specify the consistency level for a read request. For example:

# Consistency clause in Cassandra 1.1.x select query
cqlsh:MyKeyspace> SELECT * FROM where_clause_test USING CONSISTENCY QUORUM LIMIT 2;

id  | name    | role | message 
----+---------+------+---------------------- 
  1 | A Smith |    3 |         Hi! everyone
  2 | A Smith |    1 | Creating new tickets 

The UPDATE query

The UPDATE query is basically an upsert query, which means that it will update the specified column for the given row or rows in existence, else it will create the column and/or the row. It is the same thing as insert except it has a different syntax.

In its simplest form, UPDATE is the same as an SQL update:

# Updating a column requires complete information about PRIMARY KEY
cqlsh:MyKeyspace> UPDATE where_clause_test SET message = 'Deleting old tickets' WHERE id = 2 AND name = 'A Smith' AND role = 1; 

# Column is updated
cqlsh:MyKeyspace> SELECT * FROM where_clause_test; 
id  | name         | role | message 
----+--------------+------+---------------------- 
  1 |      A Smith |    3 |         Hi! everyone
  2 |      A Smith |    1 | Deleting old tickets 
  2 | B Washington |    2 |          Logging out 
  2 | B Washington |    3 |           Logging in 
  2 | B Washington |    4 |  Added new employees 

# A non-existing column in UPDATE statement can lead to new column or # row row creation
cqlsh:MyKeyspace1> UPDATE where_clause_test SET message = 'This row will be inserted' WHERE id = 3 AND name = 'W Smith' AND role = 1; 

# New row is created!
cqlsh:MyKeyspace1> SELECT * FROM where_clause_test; 
 id | name         | role | message 
----+--------------+------+--------------------------- 
  1 |      A Smith |    3 |              Hi! everyone
  2 |      A Smith |    1 |      Deleting old tickets 
  2 | B Washington |    2 |               Logging out 
  2 | B Washington |    3 |                Logging in 
  2 | B Washington |    4 |       Added new employees 
  3 |      W Smith |    1 | This row will be inserted 

There are a few optional elements for the UPDATE statement and they are the same as the optional elements for the SELECT statement. These are TTL, write TIMESTAMP, and CONSISTENCY level. These must be specified before the SET keyword:

UPDATE where_clause_test
USING 
  CONSISTENCY QUORUM AND 
  TIMESTAMP 1374945623000 AND 
  TTL 84600 
SET 
  MESSAGE = 'Fixing bugs.' 
WHERE 
  id = 2 AND 
  name = 'A Smith' AND 
  role = 1;

The DELETE query

The DELETE query deletes columns by marking them with a tombstone and a timestamp. These columns are deleted after gc_grace_period ends. Refer to the Tombstones section in Chapter 2, Cassandra Architecture, for more details.

Unlike SQL, in CQL, you can delete specified columns from a row. A simple query looks like an SQL delete query with the columns to be deleted mentioned before the FROM keyword.

# before deletion
cqlsh:MyKeyspace> SELECT * FROM my_table; 
id | address                | age | name 
----+------------------------+-----+---------- 
  1 |     36, Keyye Blvd, NJ |  42 | John Doe 
  2 | 277, Block 4, James St |  31 |  Sam Woe 

# Delete the address for id = 1
cqlsh:MyKeyspace> DELETE address FROM my_table WHERE id = 1; 

# Deleted column is shown as null

cqlsh:MyKeyspace> SELECT * FROM my_table; 
id  | address                | age | name 
----+------------------------+-----+---------- 
  1 |                   null |  42 | John Doe 
  2 | 277, Block 4, James St |  31 |  Sam Woe 

There are two optional parameters: CONSISTENCY and tombstone creation TIMESTAMP. These can be specified before the WHERE keyword like this:

DELETE address FROM my_table
USING 
CONSISTENCY ALL AND 
TIMESTAMP 1374945623000 
WHERE id = 2; 

The TRUNCATE query

Truncate is like the DELETE FROM TABLE_NAME query in SQL without any WHERE clause. It deletes all the data in the column family. The syntax is like this:

 TRUNCATE my_table;

The ALTER TABLE query

The ALTER TABLE query performs four tasks: adding a new column, dropping an existing column, altering the type of a column, and changing the table options.

Adding a new column

Adding a new column follows a syntax like this:

ALTER TABLE my_table ADD email varchar;

Existing data is not validated, so there may or may not be an existing column in the rows.

Dropping an existing column

Dropping an existing column is not the same as deleting data and removing the column name from the metadata. It is just the latter. On the DROP query execution, the column family metadata loses the column definition that is being dropped but the data in the existing rows still stays. Let's delete the address column from the my_table column family and see what happens:

# Existind data in my_table
cqlsh:MyKeyspace> SELECT * FROM my_table; 
id  | address                | age | email | name 
----+------------------------+-----+-------+---------- 
  1 |     36, Keyye Blvd, NJ |  42 |  null | John Doe 
  2 | 277, Block 4, James St |  31 |  null |  Sam Woe 

# Drop the address column
cqlsh:MyKeyspace> ALTER TABLE my_table DROP address; 

# CQL does not show it. 
cqlsh:MyKeyspace> SELECT * FROM my_table; 
id  | age | email | name 
----+-----+-------+---------- 
  1 |  42 |  null | John Doe 
  2 |  31 |  null |  Sam Woe

# But it exists, let's use Thrift call (cassandra-cli console)
[default@MyKeyspace] list my_table; 
[-- snip --] 
RowKey: 1 
=>(column=address, value=36, Keyye Blvd, NJ, timestamp=1374948385939000)
=> (column=age, value=42, timestamp=1374948385939001) 
=> (column=name, value=John Doe, timestamp=1374948385939002) 
------------------- 
RowKey: 2 
=>(column=address, value=277, Block 4, James St, timestamp=1374948401402000) 
=> (column=age, value=31, timestamp=1374948401402001) 
=> (column=name, value=Sam Woe, timestamp=1374948401402002) 

Modifying the data type of an existing column

The syntax to change the data type of an existing column looks like this:

ALTER TABLE my_table1 ALTER name TYPE int; 

The bad thing about this is it does not check the data type of the underlying existing cells. This may lead to a problem if you modify a column to a type that is not compatible to the existing data. It may cause problems during deserialization.

# Exisiting data – name is a varchar type
cqlsh:MyKeyspace> SELECT * FROM my_table; 
id  | age | email | name 
----+-----+-------+---------- 
  1 |  42 |  null | John Doe 
  2 |  31 |  null |  Sam Woe 

# altered the name column to an incompatible type
cqlsh:MyKeyspace> ALTER TABLE my_table ALTER name TYPE int; 

# accessing the columns throws deserialization exception
cqlsh:MyKeyspace> SELECT * FROM my_table; 
id  | age | email | name 
----+-----+-------+------------ 
  1 |  42 |  null | 'John Doe' 
  2 |  31 |  null |  'Sam Woe' 

Failed to decode value 'John Doe' (for column 'name') as int: unpack requires a string argument of length 4 
Failed to decode value 'Sam Woe' (for column 'name') as int: unpack requires a string argument of length 4 

Altering table options

The table's options that were used during the table creation can be altered using this command. Here is a sample of that:

cqlsh:MyKeyspace> ALTER TABLE my_table WITH comment = 'updated caching to none' and caching = NONE; 

cqlsh:MyKeyspace>DESC TABLE my_table; 

CREATE TABLE my_table ( 
id int PRIMARY KEY, 
age int, 
email text, 
name int
) WITH 
comment='updated caching to none' AND 
caching='NONE' AND 
read_repair_chance=0.100000 AND 
gc_grace_seconds=864000 AND 
replicate_on_write='true' AND 
compaction_strategy_class='SizeTieredCompactionStrategy' AND 
compression_parameters:sstable_compression='SnappyCompressor'; 

The ALTER KEYSPACE query

ALTER KEYSPACE lets you change all the aspects of a keyspace that are available during keyspace creation. The query looks very similar to CREATE KEYSPACE specifications. Here is an example:

CREATE KEYSPACE "MyKespace" 
  WITH strategy_class = NetworkTopologyStrategy
  AND strategy_options:DC1 = 2 
  AND strategy_options:DC2 = 2 
  AND durable_writes=false;

This updates the placement strategy and sets durable writes as false.

Note

CQL is a case-insensitive query language. This means the query is toLowerCase and processed. If you have created a column family or keyspaces that contains uppercase letters using the Thrift interface or using double quotes to preserve case in CQL, you would not be able to access it in CQL without putting double quotes around it as we did in the previous query.

BATCH querying

BATCH enables users to execute a set of data modification operations such as INSERT, UPDATE, and DELETE, to club into a single logical execution. CONSISTENCY level and TIMESTAMP may be provided at the batch level that is applied to all the statements under the batch. An example of a batch is as follows:

BEGIN BATCH USING CONSISTENCY ALL TIMESTAMP 1374983889000 
  INSERT INTO my_table1 (id, name, age, email) VALUES (42, 'Cameleon Woe', 59, 'email not available') 
  UPDATE my_table1 SET name = 'Leon Woe' WHERE id = 42 
  DELETE email FROM my_table1 WHERE id = 42 
APPLY BATCH;

Note that there is no semicolon after the individual statements inside the batch. Semicolon is the delimiter, so it should be applied at the end of BATCH. Execution will break if you place a semicolon within the BATCH statement.

The DROP INDEX query

DROP INDEX drops secondary indexes from a table. The syntax is like this:

DROP INDEX index_name;

index_name is the name of the index that you provided during index creation. If you haven't provided a name for the index during creation, the index_name can be replaced with <table_name>_<column_name>_idx. For example, an unnamed index for a column name in a table named users will be referred to as users_name_idx in the DROP INDEX statement.

The DROP TABLE query

It deletes the table and all its contents. The syntax follows an SQL pattern.

DROP TABLE my_table;

The DROP KEYSPACE query

It deletes the keyspace and all the tables that it contains.

DROP KEYSPACE my_keyspace;

The USE statement

The USE statement tells the client to connect to a specified keyspace. After this statement is issued, the client's current session gets authorized to use tables and indexes in the current keyspace. The syntax is like this:

USE my_keyspace;
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.82.4