Chapter 3. Design Patterns

This chapter will introduce you to some of the most commonly used patterns in application development. A lot of these models originated from the way Cassandra internally stores the data and the fact that there is no relational integrity. The chapter has two parts: one discusses data models and the patterns that emerge from the data models, and the other part exploits Cassandra's non-relational ability to store large amounts of data per row.

Although many patterns are discussed in this chapter, it certainly doesn't cover all the cases. Coming up with an innovative modeling approach from a very specific and obscure problem you encounter depends on your imagination. But no matter what the problem is, the most efficient approach is to attack the problem by keeping the following things in mind:

  • Denormalize, denormalize, and denormalize: Forget about old-school 3NF (read more about Normal Forms at https://en.wikipedia.org/wiki/Database_normalization#Normal_forms). In Cassandra, the fewer the network trips, the better the performance. Denormalize wherever you can for quicker retrieval and let application logic handle the responsibility of reliably updating all the redundancies.
  • Rows are gigantic and sorted: The giga-sized rows (a row can accommodate two billion columns) can be used to store sortable and sliceable columns. Need to sort comments by timestamp? Need to sort bids by quoted price? Put in a column with the appropriate comparator (you can always write your own comparator).
  • One row, one machine: Each row stays on one machine. Rows are not sharded across nodes. So beware of this. A high-demand row may create a hotspot.
  • From query to model: Unlike RDBMS, where you model most of the tables with entities in the application and then run analytical queries to get data out of it, Cassandra has no such provision. So you may need to denormalize your model in such a way that all your queries stay limited to a bunch of simple commands such as get, slice, count, multi_get, and some simple indexed searches.

Note

Before you go ahead with this chapter, please note that the data model uses Pycassa, a Python client for Cassandra, and the Cassandra command-line interface shell called cassandra-cli to demonstrate various programming aspects of Cassandra. Also, it might be a good idea to learn about the Thrift interface for Cassandra. It will help you understand when we talk about GET, MULTIGET_SLICE, SLICE, and other operations. At the time of writing of this book, CQL 3 was in its beta, so the examples are done using the Thrift interface. Thrift is supported across all the versions of Cassandra and will continue to be supported in future versions. CQL 3 is poised to be the preferred interfacing language from Cassandra 1.2 onward. Here are a couple of resources from the book that can help you:

  • Refer to the online documentation and Apache Cassandra Wiki for a quick tutorial on Thrift API, Pycassa, and cassandra-cli.
  • Refer to Chapter 9, Introduction to CQL 3 and Cassandra 1.2 for an introduction to CQL 3 and how to think in terms of CQL 3 when you are coming from a Thrift world.
  • It is very important to have full code in hand while working through this book. The examples in this chapter are just a relevant snippet from the full code that fits into the context. The full code for all examples and other relevant material can be downloaded from the author's GitHub account (https://github.com/naishe/mastering_cassandra) or the download page for this book on the publisher's website (http://www.packtpub.com/support).

The Cassandra data model

To a person coming across from the relational database world to the NoSQL world, it would seem like a pretty featureless system. First, there is no relational integrity, then there is a whole different approach for defining a query before modeling your tables, which is quite the opposite of what we learned, that is, to model entities and then think of queries.

It may be confusing if you keep thinking in terms of a relational setup and translating it to an equivalent Cassandra representation. So forget about tables, foreign keys, joins, cascade delete, update on insert, and the like, when we speak in the context of Cassandra. If it helps, think of a problem you are dealing with. For example, you need to show number of votes by day and by city. We cannot run a sort or a group by; instead, we will have a column family, which will have counter as the data type and date as column names (at this point, if you start to think like an RDBMS person, you'll think how would you create a table whose columns are dynamic! Well, you can't, but you have smart functions there: sort and group. In Cassandra, your application manages these). Every time a vote is cast for city A, we look up the column family, go to the row by its city name, then find and update the column by date. If you did not understand this quick tour, it's fine. We'll see them again later in this chapter.

The Cassandra data model

Figure 3.1: The Cassandra data model

Note

CQL 3 may be a big relief to people from the relational background. It lets you express queries and schema in a way much closer to SQL. The internal representation of the data may be different.

In the heart of Cassandra lie three structures: column family, column, and super column. There is a container entity for these three entities called keyspace. In our discussion, we'll use the ground-up approach where we will start with the smallest unit, the column, and go up to the top-level container, Keyspace.Column.

The column is the atomic unit of the Cassandra data model. It is the smallest component that can be operated on. Columns are contained within a column family. A column is essentially a key-value pair. The word column is confusing; it creates a mental image of a tabular structure where a column is a unit vertical block that stores the value referred by the heading of the column. This is not entirely true with Cassandra. Cassandra's columns are best represented by a tuple with the first element as a name/key and the second as a value. The key of a column is commonly referred to as a column name or a column key.

A column can be represented as a map of key/column name, value, and timestamp. A timestamp is used to resolve conflicts during read repair or to reconcile two writes that happen to the same column at the same time; the one written later wins. A timestamp is a client-supplied data, and since it is critical to write a resolution, it is a good idea to have all your client application servers clock-synchronized (refer to NTP, http://en.wikipedia.org/wiki/Clock_synchronization#Network_Time_Protocol).

# A column, much like a relational system
{
  name: "username",
  value: "Leo Scott",
  timestamp: 1366048948904
}

# A column with its name as timestamp, value as page-viewed
{
  name: 1366049577,
  value: "http://foo.com/bar/view?itemId=123&ref=email",
  timestamp: 1366049578003
}

This is an example of two columns; the first looks more like a traditional column, and one would expect each row of the users column family to have one userName column. The latter is more like a dynamic column. Its name is the timestamp when a user accesses a web page and the value is the URL of the page.

Further in this book, we'll ignore the timestamp field whenever we refer to a column because it is generally not needed for application use and is used by Cassandra internally. A column can be viewed as shown in the following figure:

The Cassandra data model

Figure 3.2: Representing a column

The counter column

A counter column is a special-purpose column to keep count. A client application can increment or decrement it by an integer value. Counter columns cannot be mixed with regular or any other column types (as of Cassandra v1.2). When we have to use a counter, we either plug it into an existing counter column family or create a separate column family with the validator as CounterColumnFamily.

Counters require tight consistency, and this makes it a little complex for Cassandra. Under the hood, Cassandra tracks distributed counters and uses system-generated timestamps. So clock synchronization is crucial.

The counter column family behaves a little differently than the regular ones. Cassandra makes a read once in the background when a write for a counter column occurs. So it reads the counter value before updating, which ensures that all the replicas are consistent. We can leverage this property while writing since the data is always consistent. While writing to a counter column, we can use a consistency level, ONE. We know fromssandra Architecture, that the lower the consistency level, the faster the read/write operation. The counter writes can be very fast without risking a false read.

Note

Clock synchronization can easily be achieved with NTP (http://www.ntp.org) daemon (ntpd). In general, it's a good idea to keep your servers in sync.

Here is an example of a column family with counter columns in it and the way to update it:

[default@mastering_cassandra] create column family votes_by_candidate with 
default_validation_class = CounterColumnType and 
key_validation_class = UTF8Type and 
comparator = UTF8Type;

a84caeec-b3dd-3484-9011-5c292e04105d

[default@mastering_cassandra] incrvotes_by_candidate['candidate1']['voter1'] by 5;

Value incremented.

[-- snip multiple increment commands --]
[default@mastering_cassandra] list votes_by_candidate;

Using default limit of 100 
Using default column limit of 100 
------------------- 
RowKey: candidate2 
=> (counter=voter1, value=3) 
------------------- 
RowKey: candidate1 
=> (counter=voter1, value=5) 
=> (counter=voter2, value=-4)
2 Rows Returned.

There is an optional attribute that can be used with counter columns called replicate_on_write. When set to true, this attribute tells Cassandra to write to all the replicas irrespective of the consistency level set by the client. It should always be set to true for the counter columns. The default value is true for counter columns (except for the Versions 0.8.1 and 0.8.2).

Note

Please note that counter updates are not idempotent. In the event of a write failure, the client will have no idea if the write operation succeeded. A retry to update the counter columns may cause the columns to be updated twice—leading to the column value to be incremented or decremented by twice the value intended.

The expiring column

A column is referred to as an expiring column if an optional time-to-live (TTL) attribute is added to it. The expiring column will be deleted after the TTL is reached from the time of insertion. This means that the client cannot see it in the result.

On insertion of a TTL-containing column, the coordinator node sets a deletion timestamp by adding the current local time to the TTL provided. The column expires when the local time of a querying node goes past the set expiration timestamp. The deleted node is marked for deletion with a tombstone and is removed during a compaction after the expiration timestamp or during repair. Expiring columns take eight bytes of extra space to record the TTL. Here is an example:

# Create column family normally

[default@mastering_cassandra] create column family user_session 
with 
default_validation_class = UTF8Type and 
key_validation_class = UTF8Type and 
comparator = UTF8Type;

0055e984-d3e9-3b74-9a48-89b63a6e371d

# Add a regular column
[default@mastering_cassandra] set user_session['user1']['keep_loggedin'] = 'false';

# Add a column with 60s TTL[default@mastering_cassandra] set 
user_session['user1']['session_data'] = '{sessionKey: "ee207430-a6b2-11e2-9e96-0800200c9a66", via: "mobileApp"}' 
with ttl = 60;

# Retrieve the column family data immediately
[default@mastering_cassandra] list user_session;

RowKey: user1 
=> (column=keep_loggedin, value=false, …)
=> (column=session_data, value={sessionKey: "ee207430-a6b2-11e2-9e96-0800200c9a66", via: "mobileApp"}, …, ttl=60)

# Wait for more than 60 seconds, retrieve again
[default@mastering_cassandra] list user_session;

RowKey: user1 
=> (column=keep_loggedin, value=false, …)

A few things to be noted about expiring columns:

  • The TTL is in seconds, so the smallest TTL can be one second
  • You can change the TTL by reinserting the column (that is, read the column, update the TTL, and insert the column)
  • Although the client does not see the expired column, the space is kept occupied until the compaction is triggered, but note that tombstones take a rather small space

Expiring columns can have some good utility; they remove the need of constantly watching cron-like tasks that delete the data that has expired or not required any more. For example, an expiring shopping coupon or a user session can be stored with a TTL.

The super column

A super column is a column containing more columns. It contains an ordered map of columns and subcolumns. It is often used for denormalization by putting multiple rows of a column family in one single row, which can be used as a materialized view on data retrieval. Refer to the following diagram:

The super column

Figure 3.2: A row of super columns

The diagram shows a row of super columns that contains one brand per super column, and each super column has subcolumns that contain models of the brands corresponding to that super column. Each subcolumn, in turn, holds values of the relative URL. So where can it be used? One good place is online shops such as Amazon. They show drop downs of categories such as books, mobiles, and many more. You can hover your mouse over the mobiles' menu, and a submenu shows up with brand names such as Apple, Samsung, and many more. You scroll down to the submenu to find links to the detail pages for each model of the mobile. To show this, all you needed to do is click on the mobile row, which contains the super columns that contain the models with URLs.

Although the idea of super columns seems pretty lucrative, it has some serious drawbacks:

  • To read a single subcolumn value, you will need to deserialize all the subcolumns of a super column
  • A secondary index cannot be created on the subcolumns

Note

CQL 3 does not support super columns. DataStax (http://www.datastax.com) suggests not using a super column. Use composite keys instead of super columns. It performs better than a super column and covers most of the use cases for super columns.

The column family

A column family is a collection of rows where each row is a key-value pair. The key to a row is called a row key and the value is a sorted collection of columns. Essentially, a column family is a map with its keys as row keys and values as an ordered collection of columns. To define a column family, you need to provide a comparator that determines the sorting order of columns within a row.

Internally, each column family is stored in a file of its own, and there is no relational integrity between two column families. So one should keep all related information that the application might require in the column family.

In a column family, the row key is unique and serves as the primary key. It is used to identify and get/set records to a particular row of a column family.

Although a column family looks like a table from the relational database world, it is not. When you use CQL, it is treated as a table, but having an idea about the underlying structure helps in designing—how the columns are sorted, sliced, and persisted, and the fact that it's a schema-free map of maps.

The column family

Figure 3.3: A dynamic column family showing daily hits on a website; each column represents a city, and the column value is the number of hits from that city

There are dynamic and static column families also known as wide and narrow column families respectively, but I will follow the dynamic and static terminologies here. These are two design patterns in Cassandra that take advantage of its flexibility to fit into application requirements.

A dynamic column family utilizes the column family's ability to store an arbitrary number of columns (key-value pairs). A typical use of a dynamic column family is for statistics aggregation, for example, to store the number of hits to a certain web page from various cities on a day-by-day basis. It will be cumbersome to make a static column family with all the cities of the world as the column names. (Plus, the developer can get some coffee in the time it would take to type in all the cities.) A time series data may be another example. Figure 3.3 displays a dynamic column family.

The following is the syntax:

[default@mastering_cassandra] create column family daily_hits
with 
default_validation_class = CounterColumnType
and key_validation_class = LongType
and comparator = UTF8Type;

Let's take a moment to see what's going on here. We ask Cassandra to create a column family named daily_hits whose column values will be validated for CounterColumnType because that's what default_validation_class is. The row keys are going to be of the type dictated by key_validation_class, which is LongType, because we are storing the Unix epoch (the time since January 01, 1970) in days. And finally, the column names will be sorted by comparator, which is the UTF8Type text. We'll look into validators and comparators shortly.

A static column family is vaguely close to a table in relational database systems. A static column family has some predefined column names and their validators. But this does not prevent you from adding random columns on the fly. A column family that represents a user is a typical example of a static column family. The major benefit of a static family is validation. You define validation_class on a per column basis, and so you can control what column should have what data type. The client will reject an incompatible data type. Consider the following code:

# Typical definition of static column family
[default@mastering_cassandra] create column family users 
with 
comparator = UTF8Type and
key_validation_class = UTF8Type andcolumn_metadata = [ 
 {column_name: username, validation_class: UTF8Type}, 
 {column_name: email, validation_class: UTF8Type}, 
 {column_name: last_login, validation_class: DateType}, 
 {column_name: is_admin, validation_class: BooleanType} 
];

# Insert something valid[default@mastering_cassandra] set users['username']['user1'] = 'Leo Scott';

# something invalid
[default@mastering_cassandra] set users['user1']['is_admin'] = 'Yes';
java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: unable to make boolean from 'Yes'

# extend to undefined fields
[default@mastering_cassandra] set users['user1']['state'] = utf8('VA'),

# Observe
[default@mastering_cassandra] get users['user1']; 
=> (column=state, value=VA, timestamp=...) 
=> (column=username, value=Leo Scott, timestamp=...)

Keyspaces

Keyspaces are the outermost shells of Cassandra containers. It contains column families and super column families. It can be roughly imagined as a database of a relational database system. Its purpose is to group column families. In general, one application uses one keyspace, much like RDBMS.

Keyspaces hold properties such as replication factors and replica placement strategies, which are globally applied to each column family in the keyspace. Keyspaces are global management points for an application. Here is an example:

[default@unknown] create keyspace mastering_cassandra 
... with placement_strategy = SimpleStrategy 
... and strategy_options = {replication_factor: 1}; 

We will discuss more on storage configuration in Chapter 4 Deploying a Cluster.

Data types – comparators and validators

Comparators and validators are the mechanisms used to define and validate data types of components (row key, column names, and column values) in a column family:

  • Validators: Validators are the means to fix data types for row keys and column values. In a column family, the key_validation_class command specifies the data type for row keys, default_validation_class specifies the data type of column values, and the column_metadata property is used to specify the data type for individual column names.
  • Comparators: Comparators specify the data type of a column name. In a row, columns are sorted; this property determines the order of the columns in a row. The comparator is specified by a comparator keyword while creating the column family.

In a dynamic column family, where any number of columns may be present, it becomes crucial to decide how the columns will be sorted. An example of this would be a column family that stores the daily stock values minute-by-minute in the row. A natural comparator for this case will be DateType. Since the columns will be sorted by a timestamp, a slice query can pull all the variation to a stock value from 11 A.M. to 1 P.M.

For static column families, columns' sorting does not matter that much and the column name is generally a character string.

If comparators and validators are not set, hex byte array (BytesType) is assumed. Although comparators and validators seem the same from their description, one crucial difference between them is that validators can be added or modified in a column definition at any time but comparators can't. So a little thinking on how the columns need to be sorted before implementing the solution may be worth the effort.

As of Cassandra 1.2, the following basic data types are supported (refer to http://www.datastax.com/docs/1.2/cql_cli/using_cli#about-data-types-comparators-and-validators):

Type

CQL type

Description

AsciiType

ascii

US-ASCII character string

BooleanType

boolean

True or false

BytesType

blob

Arbitrary hexadecimal bytes (no validation)

CounterColumnType

counter

Distributed counter value (8-byte long)

DateType

timestamp

Date plus time, encoded as 8 bytes since epoch

DecimalType

decimal

Variable-precision decimal

DoubleType

double

8-byte floating point

FloatType

float

4-byte floating point

InetAddressType

inet

IP address string in xxx.xxx.xxx.xxx form

Int32Type

int

4-byte integer

IntegerType

varint

Arbitrary-precision integer

LongType

bigint

8-byte long

TimeUUIDType

timeuuid

Type 1 UUID only (CQL3)

UTF8Type

text, varchar

UTF-8 encoded string

UUIDType

uuid

Type 1 or type 4 UUID

Cassandra 1.2 also supports collection types, such as set, list, and map. This means one can add and retrieve collections without having ad hoc methods such as having multiple columns for multiple shipping addresses with column names patterned as shipping_addr1, shipping_addr2..., or just bundling the whole set of shipping addresses as one long UTF8Type deserialized JSON.

Writing a custom comparator

With the number of comparators provided by Cassandra, chances are that you'll never need to write one of your own. But if there must be some column ordering that cannot be achieved or worked around using the given comparators, you can always write your own. For example, sorting with UTF8Type is case sensitive; capital letters come before small letters as seen in the following code:

[default@mastering_cassandra] list daily_hits;
RowKey: 15812 
=> (counter=CA, value=9502) 
=> (counter=MA, value=123) 
=> (counter=NY, value=31415) 
=> (counter=la, value=6023) 
=> (counter=ma, value=43)

One may want to have a case-insensitive sorting. To write a custom comparator, you need to extend org.apache.cassandra.db.marshal.AbstractType<T> and implement the abstract methods. Once you are done with your comparator and supporting classes, package them into a JAR file and copy this file to Cassandra's lib directory on all the servers you want this comparator to be on. It may be a good idea to look into the org.apache.cassandra.db.marshal package and observe how the comparators are implemented. Here is an example. The following snippet shows a custom comparator that orders the columns by the length of the column name, which behaves mostly like UTF8Type except for the ordering:

public class LengthComparator extends AbstractType<String> {

  public static final LengthComparator instance = new LengthComparator();
  
  public int compare(ByteBuffer o1, ByteBuffer o2) {
    return (getString(o1).length() - getString(o2).length());
  }

  // Rest of the methods utilize UTF8Type for operations
}

You can view the complete code online. One thing to note is if you don't declare an instance singleton, Cassandra will throw an exception at the startup. Compile this class and put the JAR or .class file in the $CASSANDRA_HOME/lib folder. You can see the JAR file listed in the Cassandra startup log in the classpath listing. Here is an example of a column family creation, insertion, and retrieval using this comparator. You will see that the columns are ordered in the increasing length of their column names:

# Create Column Family with Custom Comparator
[default@MyKeyspace] CREATE COLUMN FAMILY custom_comparator 
WITH 
KEY_VALIDATION_CLASS = LongType AND 
COMPARATOR = 'in.naishe.mc.comparator.LengthComparator' AND
DEFAULT_VALIDATION_CLASS = UTF8Type;

# Insert some data
[default@MyKeyspace] SET custom_comparator[1]['hello'] = 'world';
[default@MyKeyspace] SET custom_comparator[1]['hell'] = 'whirl';
[default@MyKeyspace] SET custom_comparator[1]['he'] = 'she';    
[default@MyKeyspace] SET custom_comparator[1]['mimosa pudica'] = 'some plant';
[default@MyKeyspace] SET custom_comparator[1]['a'] = 'smallest col name';

# Get Data, columns are ordered by column name length
[default@MyKeyspace] get custom_comparator[1];
=> (column=a, value=smallest col name, timestamp=1375673015868000)
=> (column=he, value=she, timestamp=1375672967041000)
=> (column=hell, value=whirl, timestamp=1375672959028000)
=> (column=hello, value=world, timestamp=1375672928673000)
=> (column=mimosa pudica, value=some plant, timestamp=1375672990036000)Indexes

Indexing a database is a means to improve the retrieval speed of data. Before Cassandra 0.7, there was only one type of index—the default one—which is the index on row keys.

If you are coming from the RDBMS world, you may be a bit disappointed with what Cassandra has to offer in terms of indexing. Cassandra indexing is a little inferior. It is better to think of indexes as hash keys. In this topic, we'll discuss a little on the primary index or row key index, then we'll move to use cases where we'll see a couple of handy techniques to create an index on column names (secondary index) like effect. Next is the secondary index; we'll discuss a little on it, its pros and cons, and the help that it provides in keeping the boilerplate code down.

The primary index

A primary key or row key is the unique identifier of a row, in much the same way as the primary key of a table from a relational database system. It provides quick and random access to the rows. Since the rows are sharded among the servers of the ring, each server just has a subset of rows, and hence, primary keys are distributed too. Cassandra uses the partitioner (cluster-level setting) and replica placement strategy (keyspace-level setting) to locate the respective node in the ring to access a particular row. On a node, an index file and sample index is maintained locally that can be looked up via binary search followed by a short sequential read (see Chapter 1, Quick Start).

The problem with primary keys is that their location is governed by partitioners. Partitioners use a hash function to convert a row key into a unique number (called token ) and then write/read that key to/from the node that owns this token. This means that if you use a partitioner that does not use a hash that follows the key's natural ordering, chances are that you can't sequentially read the keys just by accessing the next token on the node. The following snippet shows an example of this. The row keys 1234 and 1235 should naturally fall next to each other if they are not altered (or if an order-preserving partitioner is used). However, if we take a consistent MD5 hash of these values, we can see that the two values are far away from each other. There is a good chance that they might not even live on the same machine.

ROW KEY | MD5 HASH VALUE
--------+----------------------------------
1234    | 81dc9bdb52d04dc20036dbd8313ed055
1235    | 9996535e07258a7bbfd8b132435c5962

Let's take an example of two partitioners: ByteOrderPartitioner that preserves lexical ordering by bytes, and RandomPartitioner that uses MD5 hash to generate a row key. Let's assume that we have a users_visits column family with a row key, <city>_<userId>. ByteOrderPartioner will let you iterate through rows to get more users from the same city in much the same way as a SortedMap interface. (Refer to http://docs.oracle.com/javase/6/docs/api/java/util/SortedMap.html). However, in RandomPartioner, the key being the MD5 hash value of <city>_<userId>, the two consecutive userIds from the same city may lie on two different nodes. So, we cannot just iterate and expect grouping to work, like accessing entries of HashMap.

It may be useful to keep in mind that queries, such as the range slice query, pull rows from the start row key to the end row key provided by the user. If the user is not using an order-preserving partitioner, the rows returned will not be ordered by the key; rather, they'll be orders by the partitioner that is set.

We will see partitioners in more detail in Chapter 4, Deploying a Cluster, section Partioners. But using the obviously better-looking partitioner, ByteOrderPartitioner , is assumed to be a bad practice. There are a couple of reasons for this, the major reason being an uneven row key distribution across nodes. This can potentially cause a hotspot in the ring. reasons for this, the major reason being an uneven row key distribution across nodes. This can potentially cause a hotspot in the ring.

The wide-row index

For any real application, one necessarily needs information grouped by criteria and to be able to be searched by search criteria. Secondary indexes came into existence in Version 0.7, so the natural question is how were these types of requirements fulfilled before it? The answer is a wide-row index , aka manual index, aka alternate index. Basically, we create a column family that holds the row keys of other column families as its columns, and the row key of this column family is the value we wanted to group by. We can use this in the city-user example that we discussed in the previous section. Instead of relying on the primary key, we can create a column family that has city as the row key and user ID as the column name (and probably, username as the column value to cache it in such a manner that we do not pull data from the parent column family unless we need more details other than the username)—all done. Whenever you need to get users by their city names, you can select the row of that city.

Before we go ahead and discuss some of the patterns using a wide-row index, the one thing that should be kept in mind is that a lot of these cases can be handled by secondary indexes, and it would be worth counting the pros and cons of using a wide-row pattern versus a secondary index.

Simple groups

The idea is simple: you create a column family with its row keys as the name of the group (the group by predicate in SQL) and the column names as the row keys of the column family that you wanted to group.

Let's take an example of a social networking application that lets users create groups and other people can join it. You have a users column family and groups column family. You make another column family that holds a user per group and call it group_users. This will have the group's row key as the row key, and the user's row key as the column name. Does it ring a bell? It's the same thing as a join table or link table in a many-to-many relationship in the RDBMS world.

Another use case is grouping by field. Let's say your application provides hotel search facilities where you can select a city and see the hotels. Or perhaps your video-streaming website offers a tagging mechanism as a part of video metadata and you wanted users to click on the tags and see the videos with the same tags—it is the same logic. We create a column family named tag_videos where each row key is a tag and the columns are row keys of the videos columns' family. On clicking a tag, we just load the whole row from tag_videos for that tag and show it to the user (with pagination, perhaps):

Simple groups

Figure 3.4: An example of grouping by tag name

One improvement can be made with this mechanism to add some meaningful values to the columns. For example, it may be worth storing the video title as the column value. This will save an extra query to the videos table. Just pull the columns, and show the names. Pull more data when and if required.

Sorting for free, free as in speech

Unlike row keys, columns are sorted by the comparator that you provide. If you provide a UTF8Type comparator, the columns will be sorted in a string order. In the video tag example, you may want videos to be sorted by the username of the user who uploaded the video. You may just set up the column key as <userName>:<videoId> (note that the separator can be confusing if either of the values that make a column name have a separator as a part of it) and now, you get videos for a given tag sorted by the username. You can range slice the column or further filter the videos for a given tag by a particular user. Here is a sample of the code:

#Get hold of CFs
vidCF = ColumnFamily(con, 'videos') 
tagCF = ColumnFamily(con, 'tag_videos') 

#insert in videos as well as tag index CF
vidCF.insert( 
  rowKey, #<title>:<uploader> 
  { 
  'title':title, 
  'user_name':uploader, 
  'runtime_in_sec':runtime, 
  'tags_csv': tags  #this is CSV string of tags
  }) 
  
for tag in tags.split(','): 
  tagCF.insert( 
    tag.strip().lower(), #index CF's row-key = tag
    {
     #key=<uploader>_<rowKeyOfVideosCF>, value=<title>
     uploader+ "_" + rowKey: title 
    } 
  ); 

#retrieve video details grouped by tag
tag = 'action'
movies = tagCF.get(tag.strip().lower()) 
for key, val in movies.iteritems(): 
  vidId = key.split('_')[1] 
  movieDetail = vidCF.get(vidId) 
  print ''' 
  {{ 
    user: {0}, 
    movie: {1}, 
    tags: {2} 
  }}'''.format(movieDetail['user_name'], movieDetail['title'], movieDetail['tags_csv']) 
  
#Result for tag='action' sorted by user name
    { 
      user: Kara, 
      movie: Olympus Has Fallen, 
      tags: action, thriller 
    }
    { 
      user: Kara, 
      movie: The Croods, 
      tags: animation, action, mystery 
    } 
    { 
      user: Leo Scott, 
      movie: Oblivion, 
      tags: action, mystery, sci-fi 
    } 
    { 
      user: Sally, 
      movie: G.I. Joe: Retaliation, 
      tags: action, adventure 
    }

Push all in one: In Cassandra, there is often more than one way to skin a cat. Our previous approach uses multiple rows, one for each tag name. But you can very well push all the tags in one single row. You may just have one single row, perhaps named tag_index, with column names as <tag>_<videoId> and comparator as UTF8Type. Since the columns are sorted, you can slice the row with column names starting with the desired tag.

Sorting for free, free as in speech

Figure 3.5: Indexing all videos by tag in one row

There are a couple of drawbacks with this approach: one, a row is entirely stored on one machine (and the replicas too). If it's a frequently accessed index, these machines may get excessively loaded. Two, the number of columns per row is limited to approximately two billion, and if you happen to be having more videos, this may be an issue.

An inverse index with a super column family

In the previous section, we saw a common pattern where we put column names as a concatenated string of two properties of another object—username and video ID. In an even simpler version, we just dumped all the movies into one row with the column name as a tag appended with a separator and video ID. One thing that was etched in the previous implementation is the fact that if our chosen separator is a part of either of the strings that we concatenate, it will be a nightmare splitting them to get an attribute value. What if there are more than two attributes to create a column name? Conjure a super column, a column of columns.

The idea is simple: a super column is a level two nesting. Let's work out some tagging problems using a super column. It may not seem super advantageous over what we had with the string concatenation mechanism, but we can see cleaner decoupling here.

An inverse index with a super column family

Figure 3.6: An inverse index with a super column

The code is as follows:

# Tag insertion in super-column

for tag in tags.split(','): 
  tagCF.insert( 
   tag.strip().lower(), #row-key = tag name 
   { 
     uploader: {        #level 1 nesting = uploader name
       vidId: title     #level 2 nesting: key=videoId, value=title 
         } 
      } 
  )

# Fetch videos by tag, grouped by users
tagCF = ColumnFamily(connection, 'tag_videos_sup') 
movies = tagCF.get(tag.strip().lower())

for key, val in movies.iteritems():           #level 1 iteration 
  username = key 
  # Some string formatting  print ''' 
  {{ 
    {0}:'''.format(username)
  for k, v in val.iteritems():                #level 2 iteration 
    print '''	{{{0}=> {1}}}'''.format(k, v) 
  print ''' 
  }'''

# Result
tag: action 
    { 
      Kara: 
       {Olympus Has Fallen:Kara=> Olympus Has Fallen} 
       {The Croods:Kara=> The Croods} 
    }

    {
      Leo Scott: 
       {Oblivion:Leo Scott=> Oblivion} 
    }

    { 
      Sally: 
       {G.I. Joe: Retaliation:Sally=> G.I. Joe: Retaliation} 
       {Scary MoVie:Sally=> Scary MoVie} 
    }

This is the exact same code as we had in the previous section with two exceptions:

  • It uses a super column to store and retrieve the tag index
  • We did not have to use string operations to fetch the video ID

Let's see what we gain by using super columns. One thing that's clear is that we avoided the dangers of manually splitting strings. We have a cleaner structure now. This is a typical use case for super columns.

All good again! Let's think of the situation where we dumped all the tags in a row. Can we convert that into a super-column-based index to something like the following snippet?

tag_index:
[
  tag1:[
    {user1:[{rowKey1: videoId1},{rowKey2: videoId2},...]},
    {user2:[{rowKey3: videoId3},{rowKey4: videoId4},...]},
    ...
  ],
  tag2:[
    {user4:[{rowKey9: videoId9},{rowKey7: videoId7},...]},
    {user2:[{rowKey3: videoId3},{rowKey8: videoId8},...]},
    ...
  ],
  ...
]

Basically, we are asking for one more level of nesting, that is, subcolumns, in order to be able to have a column family or subcolumns of a subcolumn. Unfortunately, Cassandra just provides two levels of nesting. This takes us back to having some sort of ad hoc mechanism at a subcolumn level, something like having subcolumn names as <username>:<videoId>.

An inverse index with composite keys

The super-column-based approach did give some relief, but it is not the best solution. Plus, it has its own limitation such as one-level nesting. Subcolumns are not sorted; they are just a bunch of unordered key-value pairs. Super columns, although not deprecated, are not really preferred by the Cassandra community. The general consensus is to use composite columns and avoid super columns. It is a better, cleaner, and elegant solution and it scales much better.

Composite column names : Composite column names are made of multiple components. They can be viewed as ordered tuples of multiple types. For example, this is a composite column name: (username, city, age, SSN, loginTimestamp). The components are UTF8Type, UTF8Type, IntType, UTF8Type, and DateType, in that order.

Note

There is a very interesting long-hauled discussion on the Cassandra bug tracker. You can observe how CompositeType evolved at https://issues.apache.org/jira/browse/CASSANDRA-2231.

Composite types behave much in the same way as normal comparators do: columns are sorted by them, names are validated with definitions, and slice queries can be executed. Plus, you don't need to manage the ad hoc arrangement for keys with multiple components such as in string concatenation. Composite keys are deserialized as a tuple and easily used on the client side.

The sorting of composite comparators works in the same way as the SQL query order by field1, field2, ... does. Columns get sorted by the first component of the key and then within each first component, it is ordered by the second component, and so on. Validation is done for each component individually. And slice can be applied to each component in the order in which they appear in the column family definition.

Out of the box, Cassandra provides two composite types: static CompositeType, which can have a fixed number of components that are ordered, and DynamicCompositeType, which can have any number of components of all the types defined at the column family creation in any order. So let's use the static CompositeType column names to get rid of super columns and string concatenation as shown in the following code snippet:

# Tag insertion in super-column
for tag in tags.split(','): 
  tagCF.insert( 
      tag.strip().lower(), #row-key = tag name 
      { 
        uploader: {        #level 1 nesting = uploader name
          rowKey: title    #level 2 nesting:
                           # key=videoId, value=title 
         } 
      } 
  )
  
# Fetch videos by tag, grouped by users
tagCF = ColumnFamily(con, 'tag_videos_sup') 
movies = tagCF.get(tag.strip().lower()) 
for key, val in movies.iteritems(): #level 1 iteration 
  username = key 
  print ''' 
  {{ 
    {0}:'''.format(username) 
  for k, v in val.iteritems(): #level 2 iteration 
    print '''	{{{0}=> {1}}}'''.format(k, v) 

  print ''' 
  }'''

# Result
tag: action 
    { 
      Kara: 
       {Olympus Has Fallen:Kara=> Olympus Has Fallen} 
       {The Croods:Kara=> The Croods} 
    } 
    {
      Leo Scott: 
       {Oblivion:Leo Scott=> Oblivion} 
    } 
    { 
      Sally: 
       {G.I. Joe: Retaliation:Sally=> G.I. Joe: Retaliation} 
       {Scary MoVie:Sally=> Scary MoVie} 
    }

Let's stop for a moment and think about the applicability of composite columns. You can have tags on a video page. On clicking a tag, you can show a whole list of videos just by pulling a row from the tag_videos index column family. If you have a composed key, such as <title>:<length>:<username>:<rowkey>, you can get all the details shown in the list, plus you may have the thumbnail's URI as the column value to be displayed next to each list item. Time series data can be stored and sorted by timestamps that can later be pulled to view the history, and the timestamp can be pulled off the composite column name. These are just a couple of examples; composite type is a really powerful tool to come up with innovative solutions.

Note

Check for composite column support in the Cassandra driver API you are using. Pycassa, for example, does not support DynamicCompositeType as of Version 1.8.0. But it is likely to be available anytime now (refer to https://github.com/pycassa/pycassa/pull/183).

The secondary index

Working through previous examples, a relational database person would think that all this manual management of a custom index family is a little too much work. In RDBMS, one can just perform ...where user_name = 'Kara'. If the data is large, one can index it to one user_name value and improve read/search by taxing the writes. If one wanted to just get the videos that are longer than a minute's runtime, it is going to be ...where user_name = 'Kara' and runtime_in_sec> 60. Cassandra Version 0.6 and the earlier versions had no way of doing something like this without holding denormalized data in a separate column family, much like a materialized view. Similar to what we have been discussing, Version 0.7 or higher has a secondary index and it is well supported by CQL and client APIs.

The indexes that are created on columns are called secondary indexes. Secondary indexes—much the same way as a primary index—are similar to hashes and are not like a B-Tree, which is commonly used in relational databases and filesystems. This means you can run the equality operator on secondary indexes, but a range query or inequality cannot be done (actually, you can, but we'll see that case later on) on index columns.

Let's take our old video example and search for all the videos uploaded by a particular user without having to use an index column family. Later, we'll try to get all the videos by a user that are shorter than 7,500 seconds. Check the following code:

# Please note: The symbol  is used for line continuation. 
# It means that the next line is a continuation of current 
# line and treated as single
# line. They are kept in separate line due to formatting issues.

# creating index on columns
sys = SystemManager('localhost:9160') #assume default installation
sys.create_index(keyspace, 'videos', 'user_name', UTF8_TYPE) 
sys.create_index(keyspace, 'videos', 'runtime_in_sec', INT_TYPE)

# criteria: where user_name= <username>
username_criteria = 
create_index_expression('user_name', username) 

# criteria: where runtime <= <max_length>
#LTE? Less than or equals
length_criteria = 
create_index_expression('runtime_in_sec', max_length, LTE) 

# query: where user_name = 'Sally'
#just pull 3 

user_only_clause = 
create_index_clause([username_criteria], count=3)

movies_by_user = 
videoCF.get_indexed_slices(user_only_clause)

print_movie(movies_by_user)

# query: where user_name = 'Sally' and runtime < 7500
#pull all 
user_runtime_clause = 
create_index_clause([username_criteria, length_criteria])

movies_by_user_runtime = 
videoCF.get_indexed_slices(user_runtime_clause)

print '''-- movies for username: {} 
  and length <= {} --'''.format(username, max_length)

print_movie(movies_by_user_runtime)

# results
-- movies for username: Sally  -- 
{
  user: Sally, 
  title: Scary MoVie, 
  runtime: 7610 
}
{
 user: Sally, 
  title: Oz the Great and Powerful, 
  runtime: 7800 
}
{
 user: Sally, 
  title: The Place Beyond the Pines, 
  runtime: 6610 
}
-- movies for username: Sally and length <= 7500 – 
{ 
  user: Sally, 
  title: The Place Beyond the Pines, 
  runtime: 6610 
}
{
 user: Sally, 
  title: G.I. Joe: Retaliation, 
  runtime: 7410 
}

So now that we are excited by the secondary index, let's try to get all the movies that are smaller than 7,500 seconds across all the users using the following code:

# Fetch using in equality
# Intended to fail, At least on equality necessary! 
runtime_clause = 
create_index_clause([length_criteria])

movies_by_runtime = 
videoCF.get_indexed_slices(runtime_clause)

print_movie(movies_by_runtime)

# Result
InvalidRequestException(why='No indexed columns present in index clause with operator EQ')

Oops! But, we were kind of expecting that, as a secondary index is much like a hash map. What we were not expecting is how the inequality worked in the query that was running along with an equality (for example, where user_name = 'Sally' and runtime <=7500). Ideally, it should fail there as well. The secret behind this magic is in memory comparison. The result, returned from the equality operation, gets sliced in the memory on the coordinator node.

So, now that we have seen that the secondary index is good but not so great, let's demystify the secondary index further. The first thing to keep in mind is that though a secondary index seems similar to a traditional database index, it is inferior to it. In fact, the mechanism itself is different. Under the hood, indexes are stored in their own column family and cannot be accessed by users. They are synchronized with the node local data, which means the index column family will always be locally consistent.

This hidden column family has a row key as the index value and the columns are corresponding row keys of the column family whose data we are indexing. Did this ring a bell? Remember this, we have done the same thing in an earlier section, Simple groups, with tags as the row key and the video column family's row key as column names in our manual index. But there is a fine distinction between the two. Each node just indexes the data on the local machine—this means if the replication factor is RF, each node will have data on RF number of nodes; so, each node will have indexes for them. This way, each query needs to touch at least the NUMBER_OF_NODES / RF value in the secondary index. While in manually managed cases we are going to touch just one node—the one that has the row corresponding to the index value. Row and index updates are performed as a single atomic operation. But a manual index requires to be managed by application logic, while a secondary index is managed by Cassandra. Also, keeping a manual index restricts the upper bound of the number of row keys per index to be equal to the row limit, which is two billion. The automated index just needs to store the data on that node, so it's a bit more spacious than the manual index.

Tip

A word of caution

It is recommended to use secondary indexes with columns having low cardinality. Indexes on the columns such as day_of_week, tag, city, and item_type are expected to have a limited number of values, so they are good for secondary indexes. On the other hand, values that are almost unique to each row are a bad idea. So, timestamp, video_title, geo_coordinate, and item_name are bad choices.

Cardinality: A fancy word for the number of items in a set. A week has cardinality equal to 7 as it has seven distinct elements—one for each day.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.133.61