Chapter 8. Clients

We’re used to connecting to relational databases using drivers. For example, in Java, JDBC is an API that abstracts the vendor implementation of the relational database to present a consistent way of storing and retrieving data using Statements, PreparedStatements, ResultSets, and so forth. To interact with the database you get a driver that works with the particular database you’re using, such as Oracle, SQL Server, or MySQL; the implementation details of this interaction are hidden from the developer. Given the right driver, you can use a wide variety of programming languages to connect to a wide variety of databases.

Cassandra is somewhat different in that there are no drivers for it. If you’ve decided to use Python to interact with Cassandra, you don’t go out and find a Cassandra driver for Python; there is no such thing. Instead of just abstracting the database interactions from the developer’s point of view, the way JDBC does, an entirely different mechanism is used. This is a client generation layer, provided by the Thrift API and the Avro project. But there are also high-level Cassandra clients for Java, Scala, Ruby, C#, Python, Perl, PHP, C++, and other languages, written as conveniences by third-party developers.

There are benefits to these clients, in that you can easily embed them in your own applications (which we’ll see how to do) and that they frequently offer more features than the basic Thrift interface does, including connection pooling and JMX integration and monitoring.

In the following sections, we see how Thrift and Avro work and how they’re used with Cassandra. Then, we move on to examine more robust client projects that independent developers have written in various languages to offer different options for working with the database.

Note

If you’re going to write a Cassandra application, use one of these clients instead of writing all of that plumbing code yourself. The only difficulty is in choosing a client that will continue to stay in lockstep with updates to Cassandra itself.

Basic Client API

In Cassandra version 0.6 and earlier, Thrift served as the foundation for the entire client API. With version 0.7, Avro started being supported due to certain limitations of the Thrift interface and the fact that Thrift development is no longer particularly active. For example, there are a number of bugs in Thrift that have remained open for over a year, and the Cassandra committers wanted to provide a client layer that is more active and receives more attention. Thrift is currently in version 0.2, with no release since 2009, and there is precious little documentation.

It is not clear at the time of this writing how long Thrift will be supported in addition to Avro; they’re both in the now-current tree. Because it is currently unclear which will ultimately be supported, I’ve included a little about them both here.

Thrift

Thrift is the driver-level interface; it provides the API for client implementations in a wide variety of languages. Thrift was developed at Facebook and donated as an Apache project with Incubator status in 2008. It’s available at http://incubator.apache.org/thrift, though you don’t need to download it separately to use Cassandra.

Thrift is a code generation library for clients in C++, C#, Erlang, Haskell, Java, Objective C/Cocoa, OCaml, Perl, PHP, Python, Ruby, Smalltalk, and Squeak. Its goal is to provide an easy way to support efficient RPC calls in a wide variety of popular languages, without requiring the overhead of something like SOAP.

To use it, you create a language-neutral service definition file that describes your data types and service interface. This file is then used as input into the engine that generates RPC client code libraries for each of the supported languages. The effect of the static generation design choice is that it is very easy for the developer to use, and the code can perform efficiently because validation happens at compile time instead of runtime.

Note

You can read the full paper that describes the Thrift implementation, written by its creators, at http://incubator.apache.org/thrift/static/thrift-20070401.pdf.

The design of Thrift offers the following features:

Language-independent types

Because types are defined in a language-neutral manner using the definition file, they can be shared between different languages. For example, a C++ struct can be exchanged with a Python dictionary.

Common transport interface

The same application code can be used whether you are using disk files, in-memory data, or a streaming socket.

Protocol independence

Thrift encodes and decodes the data types for use across protocols.

Versioning support

The data types are capable of being versioned to support updates to the client API.

The data definitions are created using a file ending with a .thrift extension. Under your Cassandra source folder, there’s a folder called interface. In it is a file called cassandra.thrift. This file holds the data definitions for Cassandra. I won’t include the whole file contents here, but it looks like this:

//data structures
struct Column {
   1: required binary name,
   2: required binary value,
   3: required i64 timestamp,
}
struct SuperColumn {
   1: required binary name,
   2: required list<Column> columns,
}

//exceptions
exception NotFoundException {
}
//etc...

//service API structures
enum ConsistencyLevel {
    ZERO = 0,
    ONE = 1,
    QUORUM = 2,
    DCQUORUM = 3,
    DCQUORUMSYNC = 4,
    ALL = 5,
    ANY = 6,
}
struct SliceRange {
    1: required binary start,
    2: required binary finish,
    3: required bool reversed=0,
    4: required i32 count=100,
}
struct SlicePredicate {
    1: optional list<binary> column_names,
    2: optional SliceRange   slice_range,
}
struct KeyRange {
    1: optional string start_key,
    2: optional string end_key,
    3: optional string start_token,
    4: optional string end_token,
    5: required i32 count=100
}

//service operations
service Cassandra {
  # auth methods
  void login(1: required string keyspace, 2:required AuthenticationRequest 
auth_request) 
             throws (1:AuthenticationException authnx, 2:AuthorizationException 
authzx),

  i32 get_count(1:required string keyspace, 
                2:required string key, 
                3:required ColumnParent column_parent, 
                4:required ConsistencyLevel consistency_level=ONE)
      throws (1:InvalidRequestException ire, 2:UnavailableException ue, 
3:TimedOutException te),
//etc...

//meta-APIs
  /** list the defined keyspaces in this cluster */
  set<string> describe_keyspaces(),

//etc...

Here I have shown a representative sample of what’s in the Thrift definition source file that makes up the Cassandra API. By looking at this file you can understand how Thrift definitions are made, what types of operations are available from the Cassandra client interface, and what kind of data structures they use.

In the Cassandra distribution, the same interface folder that has this .thrift file also has a folder called thrift. This folder contains subfolders, one for each language that has bindings generated from Thrift based on the definitions.

When Cassandra is built, here’s what happens. The Ant build executes the following targets to create bindings for Java, Python, and Perl:

    <target name="gen-thrift-java">
      <echo>Generating Thrift Java code from ${basedir}/interface/cassandra.thrift 
....</echo>
      <exec executable="thrift" dir="${basedir}
/interface">
        <arg line="--gen java" />
        <arg line="-o ${interface.thrift.dir}" />
        <arg line="cassandra.thrift" />
      </exec>
    </target>
    <target name="gen-thrift-py">
      <echo>Generating Thrift Python code from ${basedir}
/interface/cassandra.thrift ....</echo>
      <exec executable="thrift" dir="${basedir}/interface">
        <arg line="--gen py" />
        <arg line="-o ${interface.thrift.dir}" />
        <arg line="cassandra.thrift" />
      </exec>
    </target>
//etc...

These Ant targets call the Thrift program directly, passing arguments to it for each of the different languages. Note that the distribution ships with generated Java API; these targets are not called during a regular build. So if you want to get a Perl or Python interface, you need to execute these targets directly (or modify the build file to include these targets).

Note

To generate Thrift bindings for other languages, pass it to the --gen switch (for example, thrift --gen php).

The Ant targets use the libthrift-r917130.jar located in Cassandra’s lib directory. Note that the Thrift JAR version number changes as Cassandra is updated.

Thrift Support for Java

To build Thrift for Java, navigate to the directory <thrift-home>/lib/java. Execute the build.xml script by typing >ant in a terminal.

Exceptions

There are several exceptions that can be thrown from the client interface that you might see on occasion. The following is a list of basic exceptions and explanations of why you might see them, though a couple of them are not in the Thrift definition:

AuthenticationException

The user has invalid credentials or does not exist.

AuthorizationException

The user exists but has not been granted access to this keyspace.

ConfigurationException

This is thrown when the class that loads the database descriptor can’t find the configuration file, or if the configuration is invalid. This can happen if you forgot to specify a partitioner or endpoint snitch for your keyspace, used a negative integer for a value that only accepts a positive integer, and so forth. This exception is not thrown from the Thrift interface.

InvalidRequestException

The user request is improperly formed. This might mean that you’ve asked for data from a keyspace or column family that doesn’t exist, or that you haven’t included all required parameters for the given request.

NotFoundException

The user requested a column that does not exist.

TException

You might get this when invoking a Thrift method that is no longer valid for the server. This can occur if you mix and match different Thrift versions with server versions. This exception is not thrown from the Thrift interface, but is part of Thrift itself. TExceptions are uncaught, unexpected exceptions that bubble up from the server and terminate the current Thrift call. They are not used as application exceptions, which you must define yourself.

TimedOutException

The response is taking longer than the configured limit, which by default is 10 seconds. This typically happens because the server is overloaded with requests, the node has failed but this failure has not yet been detected, or a very large amount of data has been requested.

UnavailableException

Not all of the Cassandra replicas that are required to meet the quorum for a read or write are available. This exception is not thrown from the Thrift interface.

Thrift Summary

If you want to use Thrift directly for your project, there are numerous prerequisites for working on Windows (see http://wiki.apache.org/thrift/ThriftInstallationWin32) and many things that can go wrong. Partly because the Thrift project is still nascent, has limitations with transport implementations, and has not received a lot of direct attention since its open source inception, Cassandra is probably switching to Avro.

Avro

The Apache Avro project is a data serialization and RPC system targeted as the replacement for Thrift in Cassandra, starting with version 0.7. Avro was created by Doug Cutting, most famous perhaps for creating Apache Hadoop, the implementation of Google’s MapReduce algorithm.

Avro provides many features similar to those of Thrift and other data serialization and RPC mechanisms such as Google’s Protocol Buffer, including:

  • Robust data structures

  • An efficient, small binary format for RPC calls

  • Easy integration with dynamically typed languages such as Python, Ruby, Smalltalk, Perl, PHP, and Objective-C

Avro has certain advantages that Thrift doesn’t, in particular the fact that static code generation is not required to use RPC for your application, though you can use it for performance optimization for statically typed languages. The project is somewhat more mature (the current release version is 1.3.2) and more active.

When you execute the Cassandra Ant file, the build target calls, among other things, the avro-generate target, which generates the Avro interfaces. These files are in the same interface directory where Thrift-generated files are. To find the complete definition of the Avro interface for Cassandra, look in the cassandra.avpr file, which contains the JSON defining all the messages and operations that Cassandra clients can use.

{
  "namespace":  "org.apache.cassandra.avro",
  "protocol":   "Cassandra",

  "types": [
      {"name": "ColumnPath", "type": "record",
          "fields": [
            {"name": "column_family", "type": "string"},
            {"name": "super_column", "type": ["bytes", "null"]},
            {"name": "column", "type": ["bytes", "null"]}
        ]
      },
      {"name": "Column", "type": "record",
          "fields": [
            {"name": "name", "type": "bytes"},
            {"name": "value", "type": "bytes"},
            {"name": "timestamp", "type": "long"}
        ]
      },
      {"name": "SuperColumn", "type": "record",
          "fields": [
            {"name": "name", "type": "bytes"},
            {"name": "columns", "type": {"type": "array", "items": "Column"}}
        ]
      },
//etc..
      }
  ],

  "messages": {
    "get": {
        "request": [
            {"name": "keyspace", "type": "string"},
            {"name": "key", "type": "string"},
            {"name": "column_path", "type": "ColumnPath"},
            {"name": "consistency_level", "type": "ConsistencyLevel"}
        ],
        "response": "ColumnOrSuperColumn",
        "errors": ["InvalidRequestException", "NotFoundException",
            "UnavailableException", "TimedOutException"]
    },
    "insert": {
        "request": [
            {"name": "keyspace", "type": "string"},
            {"name": "key", "type": "string"},
            {"name": "column_path", "type": "ColumnPath"},
            {"name": "value", "type": "bytes"},
            {"name": "timestamp", "type": "long"},
            {"name": "consistency_level", "type": "ConsistencyLevel"}
        ],
        "response": "null",
        "errors": ["InvalidRequestException", "UnavailableException",
            "TimedOutException"]
    },
//etc...

The JSON format is concise and easy to read. You can see, for example, that there are at least two kinds of messages, one representing a get request and the other representing an insert request. The exceptions possible for each type are packaged with their messages.

Note

The meaning of each exception is discussed earlier in Thrift.

Avro Ant Targets

The Cassandra build.xml file defines two Ant targets that are executed when Cassandra is compiled from source, which is made possible by avro-1.2.0-dev.jar in the Cassandra lib directory. These targets are shown in the following listing:

    <!--
       Generate avro code
    -->
    <target name="check-avro-generate">
        <uptodate property="avroUpToDate"
                  srcfile="${interface.dir}/cassandra.avpr"
                  targetfile="${interface.avro.dir}/org/apache/cassandra/avro/
Cassandra.java" />
      <taskdef name="protocol"
               classname="org.apache.avro.specific.ProtocolTask">
        <classpath refid="cassandra.classpath" />
      </taskdef>
      <taskdef name="schema" classname="org.apache.avro.specific.SchemaTask">
        <classpath refid="cassandra.classpath" />
      </taskdef>
      <taskdef name="paranamer" 
          classname="com.thoughtworks.paranamer.ant.ParanamerGeneratorTask">
        <classpath refid="cassandra.classpath" />
      </taskdef>
    </target>

    <target name="avro-generate" unless="avroUpToDate"
            depends="init,check-avro-generate">
      <echo>Generating avro code...</echo>
      <protocol destdir="${interface.avro.dir}">
        <fileset dir="${interface.dir}">
          <include name="**/*.avpr" />
        </fileset>
      </protocol>
  
      <schema destdir="${interface.avro.dir}">
        <fileset dir="${interface.dir}">
          <include name="**/*.avsc" />
        </fileset>
      </schema>
    </target>

The Ant tasks for Thrift generation directly ran the Thrift executable (as you would on the command line). But the Avro targets are more complex. The check-avro-generate target uses a custom property called avroUpToDate. The avro-generate target does not run if it is determined by the check-avro-generate task that all of the files are up to date already. If the generated client API files are not up to date with the current schema, then the cassandra.avpr file will be reread and will generate the sources again under <CASSANDRA_HOME>/interface/avro. The org.apache.cassandra.avro.Cassandra.java file represents the runtime Java Avro interface.

The Ant <taskdef> tag defines custom extension tasks that subsequent targets can execute. Here we see two: the SchemaTask and the ParanamerGeneratorTask. The SchemaTask is part of Avro itself, and is used to generate a Java interface and classes for the described protocol. The ParaNamer library (paranamer-generator-2.1.jar) is used to allow the parameter names of nonprivate methods and constructors—which are typically dropped by the compiler—to be accessed at runtime. It reads the source directory of Avro-generated Java classes and outputs to the build directory Java classes that maintain the names defined in the source.

Avro Specification

The Avro project defines a specification, so you could theoretically write your own implementation of the Avro spec. Avro supports six kinds of complex types: records, enums, arrays, maps, unions, and fixed.

Note

If you’re interested, you can read the complete Avro specification at http://avro.apache.org/docs/current/spec.html, though it is definitely not required to work with Cassandra.

Avro definitions are written as schemas, using JavaScript Object Notation (JSON). This is different from in Thrift, however, because the schemas are always present along with the data when it is read. This is an advantage because it means that less type information needs to be sent with the data, so the serialization is compact and efficient. Avro stores data with its schema, which means that any program can process it later, independently of the RPC mechanism.

Avro Summary

As of Cassandra version 0.7, Avro is the RPC and data serialization mechanism for Cassandra. It generates code that remote clients can use to interact with the database. It’s well-supported in the community and has the strength of growing out of the larger and very well-known Hadoop project. It should serve Cassandra well for the foreseeable future.

You can read more about Avro at its Apache project page: http://avro.apache.org.

A Bit of Git

Cassandra doesn’t use Git directly, but understanding at least a bit about Git will help you use a variety of client projects that do use Git. (If you’re already familiar with Git, feel free to skip this section.) Many open source projects have started moving to GitHub recently. Git is a relatively new, free source code revision system, written by Linus Torvalds to help him develop the Linux kernel. It includes social features. GitHub is a Git project hosting site, written in Ruby on Rails, that offers free and commercial options.

The following client libraries, which we’ll look at individually, all use Git: the Web Console, Hector, Pelops, and other Cassandra-related satellite projects such as Twissandra (as discussed in the example of Twitter implementation using Cassandra).

The simplest way to get the code for a Git project is to find the project’s home page on GitHub and click the Download Source button to get either a .tar or .zip file of the project’s trunk.

Note

If you’re on a Linux distribution such as Ubuntu, it couldn’t be easier to get Git. At a console, just type >apt-get install git and it will be installed and ready for commands.

If you want to work with the source itself (i.e., to fork it), you’ll need a Git client. If you’re on Windows, you’ll first have to get the Cygwin POSIX emulator and then install Git. Next, go to the GitHub page hosting the project you’re interested in and find the project’s Git URL. Open a terminal in the directory you want to put the source code into and use the clone command. This will produce output like this:

.../gitrep>git clone http://github.com/suguru/cassandra-webconsole.git
Initialized empty Git repository in C:/git/cassandra-webconsole/.git
remote: Counting objects: 604, done.
remote: Compressing objects: 100% (463/463), done.
remote: Total 604 (delta 248), reused 103 (delta 9)
Receiving objects: 100% (604/604), 6.24 MiB | 228 KiB/s, done.
Resolving deltas: 100% (248/248), done.

Now we have a subdirectory named after the Git project so that we can build the project and start using it. This is enough to get you started so that you can run the clients. A full discussion of Git is outside the scope of this book, but you can read more about it at GitHub.com. There’s a good help section on how to get set up at http://help.github.com, and the site http://gitref.org offers a really good reference for beginners.

Connecting Client Nodes

Once your cluster is set up, it does not matter which node in the cluster your client connects to. That’s because Cassandra nodes are symmetrical, and any node will act as a proxy to a given request, forwarding the request to the node that handles the desired range.

There are a few options here to keep things organized and efficient.

Client List

The most straightforward way to connect to a cluster is to maintain a list of the addresses or hostnames of the servers in the cluster and cycle through them on the client side. You allow clients to choose among them according to some algorithm of your choosing, such as random or sequential. In this way, you’re setting up a sort of poor-man’s client balancer.

This has the advantage of being the simplest to set up, and requires no intervention from operations. For testing, this is fine, but it ultimately will become difficult to manage.

Round-Robin DNS

Another option is to create a record in DNS that represents the set of servers in the cluster. Using round-robin DNS allows clients to connect simply and cleanly. This has the significant advantage of not requiring any maintenance or logic on the client side for connecting to different nodes, and is the recommended approach.

Load Balancer

The third option is to have operations deploy a load balancer in front of the Cassandra cluster, and then configure clients to connect to it. The load balancer will act as the configuration point.

Cassandra Web Console

There’s a web console available from Suguru Namura, who contributed the code as a GitHub project. The console makes it easy to interact with Cassandra to perform a variety of tasks and view information about your cluster. I’m starting with this console before getting into the real clients you will use to interact with the database because it gives you a very user-friendly view into the configuration of your Cassandra instance.

You can download the console, which runs as a WAR, at http://github.com/suguru/cassandra-webconsole. If you want to modify the source code, either fork it with Git or just grab one of the binaries from the Downloads page of the project. The console requires Java 6 and Tomcat 6. If you want to compile the project, you’ll need Maven 2.

Let’s take a brief look at its features:

Keyspaces

The console allows you to view keyspace properties and add, rename, and drop a keyspace. You can view the configuration information about each keyspace as well, including the column families.

Column families

You can add or drop column families and view their keys.

Ring

You can view system information such as uptime and heap usage.

Note

You will likely need to modify the port on which you start Tomcat if you also have Cassandra running on the same box, because Cassandra will use ports 8080 and 8084 for JMX.

I’ve started up the console on my local machine at http://localhost:9999/cassandra-webconsole. The first time you start the console, you’ll be presented with a screen that lets you enter the information required to connect to a particular Cassandra server.

Once you’ve connected to a Cassandra server, the web console reads its configuration information and brings you to a screen to start interacting with Cassandra.

Figure 8-1 shows the configuration screen for the console itself. Figure 8-2 shows a screenshot of the keyspace and column family configuration information that the web console lets you view. You can see here that I have four keyspaces. Keyspace1 has been selected to show the column family definitions; the others are system, Test, and Twitter. Using this screen, you can add a column family to the keyspace, rename a keyspace, drop a keyspace entirely, or create a new keyspace.

The Setup configuration screen for a Cassandra web console
Figure 8-1. The Setup configuration screen for a Cassandra web console
Keyspace and column family information in the web console
Figure 8-2. Keyspace and column family information in the web console

Adding a column family or super column family is easy, as shown in Figure 8-3. However, the web console doesn’t let you add data to your column families, as you might expect.

Adding a super column family in the web console
Figure 8-3. Adding a super column family in the web console

You can determine how long your server has been up, how much memory it’s using, and the load by viewing the Ring screen, as shown in Figure 8-4.

The Ring screen shows system usage
Figure 8-4. The Ring screen shows system usage

Overall, the web console presents an intuitive, attractive interface that makes it easy to perform basic administration tasks for your Cassandra setup.

Hector (Java)

Hector is an open source project written in Java using the MIT license. It was created by Ran Tavory of Outbrain (previously of Google) and is hosted at GitHub. It was one of the early Cassandra clients and is used in production at Outbrain. It wraps Thrift and offers JMX, connection pooling, and failover.

Note

In Greek mythology, Hector was the builder of the city of Troy and was known as an outstanding warrior. He was also Cassandra’s brother.

Because Hector was one of the first Cassandra client projects, and because it is used by a wide variety of developers and even has other client projects based on it (see HectorSharp (C#)), we’ll write a complete but simple example application using it.

To get Hector, clone it from its GitHub site at http://github.com/rantav/hector. Use the git command if you want the source, or just download the binary from the Downloads tab.

Features

Hector is a well-supported and full-featured Cassandra client, with many users and an active community. It offers the following:

High-level object-oriented API

Java developers should find the interfaces that Hector offers, such as Keyspace and Column, very natural to use.

Fail over support

Thrift does not provide support for failed clients. This is because Cassandra is intended to be used in a highly distributed fashion and has good support for failed nodes in the database ring. But if your client connects to a node that has gone down, it would be nice to have your client fail over—to automatically search for another node to use to complete your request. Happily, Hector provides this.

Connection pooling

Cassandra is specifically built for very high scalability, and it therefore also becomes a requirement on the client side to support connection pools so that your application doesn’t become a bottleneck that robs you of Cassandra’s speed. It’s expensive to open and close connections, just as it is in JDBC. Hector’s connection pooling uses Apache’s GenericObjectPool.

JMX support

Cassandra makes liberal use of JMX, which comes in very handy for monitoring. Hector directly supports JMX by exposing metrics such as bad connections, available connections, idle connections, and more.

The Hector API

The following is an example from Ran Tavory’s blog (http://prettyprint.me) illustrating how Hector simplifies working with Cassandra:

// Create a cluster
Cluster c = HFactory.getOrCreateCluster("MyCluster", "cassandra1:9160");
// Choose a keyspace
KeyspaceOperator ko = HFactory.createKeyspaceOperator("Keyspace1", c);
// create an string extractor.
StringExtractor se = StringExtractor.get();
// insert value
Mutator m = HFactory.createMutator(keyspaceOperator);
m.insert("key1", "ColumnFamily1", createColumn("column1", "value1", se, se));
 
// Now read a value
// Create a query
ColumnQuery<String, String> q = HFactory.createColumnQuery(keyspaceOperator, se, se);
// set key, name, cf and execute
Result<HColumn<String, String>> r = q.setKey("key1").
        setName("column1").
        setColumnFamily("ColumnFamily1").
        execute();
// read value from the result
HColumn<String, String> c = r.get();
String value =  c.getValue();
System.out.println(value);

HectorSharp (C#)

HectorSharp is a C# port of Ran Tavory’s Hector Java client (Tavory is also a committer on the HectorSharp project). Its features are similar to Hector:

  • A high-level client with an intuitive, object-oriented interface

  • Client-side failover

  • Connection pooling

  • Load balancing

Let’s walk through creating an application using HectorSharp as our interface to Cassandra. This is probably the best way to see how to incorporate it into your projects. We’ll create a simple C# console project that reads and writes some data to Cassandra so you can see how it’s used.

Note

As of this writing, HectorSharp works with version 0.6 of Cassandra, but not 0.7.

Using Git, download HectorSharp from http://github.com/mattvv/hectorsharp. Remember, to easily get source code from Git, open a terminal in the directory you want as the parent, and use the git clone command with the .git URL, like this:

>git clone http://github.com/mattvv/hectorsharp.git

Once you have the source, make sure you also have the .NET framework, version 3.5 or better, which you can download for free from Microsoft.com. You can also use the Visual Studio .NET 2010 Express IDE, which is free, uses the .NET 4.0 framework, and makes it very easy to work with C# projects. Download Visual Studio C# Express from http://www.microsoft.com/express/Downloads. This may take a while and will require you to restart your computer.

Once Visual Studio is installed, open the HectorSharp project so you can view the source code and add it as a reference to our own project. To open the project, choose File > Open Project... and then select the file HectorSharp.sln. The Express version of Visual Studio may complain about not doing Solution files, but don’t worry about that.

Build the HectorSharp source by right-clicking the HectorSharp project name in the Solution Explorer window. You should see a notice in the bottom-left corner saying “Build Succeeded”. This will produce the HectorSharp .dll client that we can use in our own application.

To create our application that wraps HectorSharp, choose File > New Project... > Console Application. Call your new project ExecuteHector. You’ll be presented with a shell class called Program.cs with a main method.

Now let’s reference the HectorSharp DLL so we can use its classes. To do this, choose Project > Add Reference. When the dialog window comes up, pick the Browse tab, then navigate to the location where you unpacked HectorSharp. Go to the binRelease directory and pick the HectorSharp.dll file. You should see HectorSharp added as a reference in your Solution Explorer.

Note

I have changed the name of my application to CassandraProgram.cs. If you do this too, you’ll need to change the executable in the project by choosing Project > ExecuteHector Properties. Choose the Application tab, and then enter your program name in the Startup Object field.

Let’s take a quick look at some of the high-level constructs that HectorSharp makes available:

ICassandraClient

This is the interface used by HectorSharp client objects, whose implementation type is typically a KeyedCassandraClientFactory object.

Pool

HectorSharp pools its connections to Cassandra, so you use a factory method to create a pool, like this: Pool = new CassandraClientPoolFactory().Create();. Then, using the pool, you can create a client.

Client

From the connection pool, you can get a Client that is used to connect with Cassandra. This is neat:

Client = new KeyedCassandraClientFactory(
             Pool,
             new KeyedCassandraClientFactory.Config { Timeout = 10 })
             .Make( new Endpoint("localhost", 9160) );

You pass the pool into your client factory, then you can specify additional configuration details (such as timeout in seconds), and finally build an endpoint using the host and port you want to connect to. In the preceding example, we specified a new timeout value of 10 (the default is 20).

Keyspace

This represents a Cassandra keyspace, which you obtain from the Client object. It allows you to specify the name of the keyspace to connect to and a consistency level to use:

Keyspace = Client.GetKeyspace(
                "Keyspace1", 
                ConsistencyLevel.ONE, 
                new FailoverPolicy(0) { Strategy = FailoverStrategy.FAIL_FAST });

The FailoverPolicy class allows you to indicate what HectorSharp should do if it encounters an error in communication (not an application error); that is, if it thinks that a node it’s trying to connect to is down. You can retry, retry on increments, or just decide to quit, which is what I’ve specified here.

ColumnPath

A ColumnPath is a simple wrapper that allows you to easily reference an entire column family, a super column within a particular column family, or a single column within a column family. It consists of nothing but C# properties for each of those three items, plus constructors.

HectorSharp uses the Gang of Four Command pattern for Data Access Objects (DAOs), because that’s how Hector works. So you could create a DAO with a get method as shown here:

  /**
   * Get a string value.
   * @return The string value; null if no value exists for the given key.
   */
  public String get(String key) {
    return execute(new Command<String>(){
      public String execute(Keyspace ks) {
        try {
          return string(ks.getColumn(key, createColumnPath(COLUMN_NAME)).getValue());
        } catch (NotFoundException e) {
          return null;
        }
      }
    });
  }

  protected static <T> T execute(Command<T> command) {
    return command.execute(CASSANDRA_HOST, CASSANDRA_PORT, CASSANDRA_KEYSPACE);
  }

The get command is using the parameterized execute method, as are other sibling commands for insert and delete (not shown in the example). For our sample application, we’ll just keep it simple, but this is a reasonable pattern to follow for such a use case.

We’re finally ready to write some code. Your application should look like the listing in Example 8-1.

Example 8-1. CassandraProgram.cs
using System;
using HectorSharp;
using HectorSharp.Utils;
using HectorSharp.Utils.ObjectPool;

/**
 * Stands in for some C# application that would use HectorSharp 
 * as a high-level Cassandra client.
 */
namespace ExecuteHector
{
    class CassandraProgram
    {
        internal ICassandraClient Client;
        internal IKeyspace Keyspace;
        internal IKeyedObjectPool<Endpoint, ICassandraClient> Pool;

        static void Main(string[] args)
        {
            CassandraProgram app = new CassandraProgram();

            Console.WriteLine("Starting HectorSharp...");
            
            app.Pool = new CassandraClientPoolFactory().Create();
            Console.WriteLine("Set up Pool.");

            app.Client = new KeyedCassandraClientFactory(app.Pool,
                new KeyedCassandraClientFactory.Config { Timeout = 10 })
                .Make(new Endpoint("localhost", 9160));
            Console.WriteLine("Created client.");

            app.Keyspace = app.Client.GetKeyspace(
                "Keyspace1", 
                ConsistencyLevel.ONE, 
                new FailoverPolicy(0) { Strategy = FailoverStrategy.FAIL_FAST });
            Console.WriteLine("Found keyspace " + app.Keyspace.Name);
            
            //set up column path to use
            var cp = new ColumnPath("Standard1", null, "MyColumn");

            // write values
            Console.WriteLine("
Performing write using " + cp.ToString());
            for (int i = 0; i < 5; i++)
            {
                String keyname = "key" + i;
                String value = "value" + i;
                app.Keyspace.Insert(keyname, cp, value);
                Console.WriteLine("wrote to key: " + keyname  + " with value: " + value);
            }

            // read values
            Console.WriteLine("
Performing read.");
            for (int i = 0; i < 5; i++) 
            {
                String keyname = "key" + i;
                var column = app.Keyspace.GetColumn(keyname, cp);
                Console.WriteLine("got value for " + keyname + " = " + column.Value);
            }
            
            Console.WriteLine("All done.");
        }
    }
}

Compile this code into a console application by choosing Debug > Build Solution.

Now we’re ready to test it out. Open a console and start Cassandra as usual: >bincassandra -f. Now open a second console and navigate to the directory where you have your “ExecuteHector” project, then switch into the binRelease directory. This directory has our executable in it; to run our program, just enter ExecuteHector.exe at the prompt. You should see output similar to the following:

C:gitExecuteHectorinRelease>ExecuteHector.exe
Starting HectorSharp...
Set up Pool.
Created client.
Found keyspace Keyspace1

Performing write using ColumnPath(family: 'Standard1', super: '', column: 'MyColumn'
wrote to key: key0 with value: value0
wrote to key: key1 with value: value1
wrote to key: key2 with value: value2
wrote to key: key3 with value: value3
wrote to key: key4 with value: value4

Performing read.
got value for key0 = value0
got value for key1 = value1
got value for key2 = value2
got value for key3 = value3
got value for key4 = value4
All done.

C:gitExecuteHectorinRelease>

As you can see, if you’re creating a C# application and want to use Cassandra as the backing database, it is very easy to get started with HectorSharp, and its object model is very high-level, intuitive, and easy to use. Just be aware that as of this writing, HectorSharp is still in the nascent stages, so make sure that your requirements are supported before going too far.

You can find out more about HectorSharp at http://hectorsharp.com.

Chirper

If you’re a .NET developer, you might be interested in Chirper. Chirper is a port of Twissandra to .NET, written by Chaker Nakhli. It’s available under the Apache 2.0 license, and the source code is on GitHub at http://github.com/nakhli/Chirper. You can read a blog post introducing Chirper at http://www.javageneration.com/?p=318.

Chiton (Python)

Chiton is a Cassandra browser written by Brandon Williams that uses the Python GTK framework. You can get it from http://github.com/driftx/chiton. It has several prerequisites, so a little setup is required. Before you can use it, make sure you have the following setup:

  • Python 2.5 or better.

  • Twisted Python (an event-driven networking interface for Python), available at http://twistedmatrix.com/trac.

  • Thrift (0.2).

  • PyGTK 2.14 or later (a graphical user interface kit for Python), available at http://www.pygtk.org. This in turn requires GTK+. You likely already have it if you’re on Linux; you can download the binary if you’re on Windows. Just uncompress the download into a directory and manually add the bin subfolder to the system’s path environment variable.

Pelops (Java)

Pelops is a free, open source Java client written by Dominic Williams. It is similar to Hector in that it’s Java-based, but it was started more recently. This has become a very popular client. Its goals include the following:

  • To create a simple, easy-to-use client

  • To completely separate concerns for data processing from lower-level items such as connection pooling

  • To act as a close follower to Cassandra so that it’s readily up to date

And the API is much simpler than using the low-level stuff exposed by Thrift and Avro. To write data, you just need a Mutator class; to read data, just use a Selector. Here’s a brief sample from Williams’ website that creates a connection pool to a list of Cassandra servers, then writes multiple subcolumn values to a super column:

Pelops.addPool(
    "Main",
    new String[] { "cass1.database.com", "cass2.database.com", "cass3.database.com"},
    9160,
    new Policy());

Mutator mutator = Pelops.createMutator("Main", "SupportTickets");
    
UuidHelper.newTimeUuidBytes(), // using a UUID value that sorts by time
mutator.newColumnList(
  mutator.newColumn("category", "videoPhone"),
  mutator.newColumn("reportType", "POOR_PICTURE"),
  mutator.newColumn("createdDate", NumberHelper.toBytes(System.currentTimeMillis())),
  mutator.newColumn("capture", jpegBytes),
  mutator.newColumn("comment") ));

mutator.execute(ConsistencyLevel.ONE);

Consider how much easier that is than using the API provided out of the box.

You can get the source code from http://code.google.com/p/pelops, and you can read some samples and explanations of how to use Pelops at Dominic Williams’ site, http://ria101.wordpress.com.

If you’re using Cassandra from a Java application, I encourage you to give Pelops a try.

Kundera (Java ORM)

Kundera is an object-relational mapping (ORM) implementation for Cassandra written using Java annotations. It’s available at http://kundera.googlecode.com under an Apache 2.0 license. According to its author, Impetus Labs, the aim of Kundera is:

...to make working with Cassandra drop-dead simple and fun. Kundera does not reinvent the wheel by making another client library; rather it leverages the existing libraries and builds—on top of them—a wrap-around API to help developers do away with unnecessary boiler plate codes, and program a neater-and-cleaner code that reduces code-complexity and improves quality. And above all, improves productivity.

Kundera uses Pelops under the hood. A sample Java entity bean looks like this:

@Entity 
@ColumnFamily(keyspace = "Keyspace1", family = "Band") 
public class Band { 
    @Id 
    private String id; 
    @Column(name = "name") 
    private String name; 
    @Column(name = "instrument") 
    private String instrument; 

You can perform a JPA query like this:

Query query = entityManager.createQuery("SELECT m from Band c where name='george'"); 
List<SimpleComment> list = query.getResultList();

This library is quite new at the time of this writing, so it has yet to be seen how readily it will be adopted. Still, it does appear promising, and it speaks to burgeoning interest in Cassandra among general application developers.

Fauna (Ruby)

Ryan King of Twitter and Evan Weaver created a Ruby client for the Cassandra database called Fauna. If you envision using Cassandra from Ruby, this might fit the bill. To find out more about Fauna, see http://github.com/fauna/cassandra/blob/master/README.rdoc.

Summary

You should now have an understanding of the variety of client interfaces available for Cassandra and how to install and use them. There are many Cassandra clients, each with its own strengths and limitations, in different languages and varying degrees of production-readiness. It’s not possible to cover all of them here, so I’ve taken a representative sample from a few different languages. To see a variety of other options that might fit your needs better, see the Cassandra project wiki Client Options page at http://wiki.apache.org/cassandra/ClientOptions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.201.209