Using Groovy to access Apache Cassandra

The Apache Cassandra project was started at Facebook in 2007 to offer users a better experience when searching their inbox. The challenges that Facebook engineers had to face was mostly related to massive amount of data, very high throughput, and scalability at a mind-blowing rate.

Cassandra is a distributed column-oriented database designed to manage humongous amounts of structured data in a decentralized, highly scalable way. The absence of a single point of failure makes Cassandra highly available and fault tolerant.

While Cassandra resembles a traditional database and shares some design strategies, it does not support a full relational data model. On the contrary, the Cassandra's data model is flexible, because each row can contain a variable number of columns.

In this recipe, we will go through different aspects of connecting and querying a Cassandra database.

Getting ready

For this recipe, we assume that the reader has already some familiarity with the Cassandra core concepts and data model (columns, super columns, column family, and keyspaces). Installing Cassandra is straightforward.

The only requirement for running Cassandra is a Java 1.6 JVM. Just download the distribution from the product website (http://cassandra.apache.org/download/), unzip it, and run the bin/cassandra or bin/cassandra.bat executable to start a single node.

Before we start, we have to create a couple of entities in Cassandra, a Keyspace and a Column Family. Fire up the CQLSH console located in the bin folder and type:

create keyspace hr with strategy_class='SimpleStrategy'
and strategy_options:replication_factor=1;
use hr;
create columnfamily employee (empid int primary key);

These commands create a Keyspace named hr and a column family named employee. The column family has one field only, which is also a primary key (of type int).

How to do it...

There are several client strategies to interact with Cassandra. In this recipe, we are going to use the open source library Hector, which is a well-established high level Java client.

The Hector APIs are not very fluent, and it takes a lot of boilerplate code to insert or manipulate data with it.

Why not create a simpler, more fluent wrapper on top of the Hector API using Groovy?

@Grab('org.hectorclient:hector-core:1.1-2')
@GrabExclude('org.apache.httpcomponents:httpcore')
import me.prettyprint.hector.api.Cluster
import me.prettyprint.hector.api.factory.HFactory
import me.prettyprint.hector.api.Keyspace
import me.prettyprint.cassandra.serializers.*
import me.prettyprint.hector.api.Serializer
import me.prettyprint.hector.api.mutation.Mutator
import me.prettyprint.hector.api.ddl.*
import me.prettyprint.hector.api.beans.ColumnSlice

class Gassandra {

    def cluster
    def keyspace
    def colFamily
    Serializer serializer
    def stringSerializer = StringSerializer.get()

    private Gassandra (Keyspace keyspace) {
      this.keyspace = keyspace
    }

    Gassandra() {}

    void connect(clusterName, host, port) {
      cluster = HFactory.
        getOrCreateCluster(
        clusterName,
        "$host:$port"
        )
    }

    List<KeyspaceDefinition> getKeyspaces() {
      cluster.describeKeyspaces()
    }

    Gassandra withKeyspace(keyspaceName) {
      keyspace = HFactory.
        createKeyspace(
          keyspaceName,
          cluster
        )
      new Gassandra(keyspace)
    }

    Gassandra withColumnFamily(columnFamily, Serializer c) {
      colFamily =  columnFamily
      serializer = c
      this
    }

    Gassandra insert(key, columnName, value) {
      def mutator = HFactory.
          createMutator(
            keyspace,
            serializer
          )
      def column  = HFactory.
          createStringColumn(
            columnName,
            value
          )
      mutator.insert(key, colFamily, column)
      this
    }

    Gassandra insert(key, Map args) {
      def mutator = HFactory.
          createMutator(
            keyspace,
            serializer
          )
      args.each {
        mutator.insert(
          key,
          colFamily,
          HFactory.
            createStringColumn(
              it.key,
              it.value
            )
        )
      }
      this
    }
    ColumnSlice findByKey(key)  {
      def sliceQuery = HFactory.
          createSliceQuery(
            keyspace,
            serializer,
            stringSerializer,
            stringSerializer
          )
      sliceQuery.
        setColumnFamily(colFamily).
        setKey(key).
        setRange('', '', false, 100).
        execute().
        get()
    }
}

How it works...

The Gassandra class exposes a very simple, fluent interface that leverages the dynamic nature of Groovy. The class imports the Hector API and allows writing code as follows:

def g = new Gassandra()
g.connect('test', 'localhost', '9160')

def employee = g
     .withKeyspace('hr')
     .withColumnFamily('employee', IntegerSerializer.get())
employee.insert(5005, 'name', 'Zoe')
employee.insert(5005, 'lastName', 'Ross')
employee.insert(5005, 'age', '31')

The withKeySpace and withColumnFamily methods are written in a fluent style, so that we can pass the relevant information to Hector. Note that the withColumnFamily requires a Serializer type to specify the type of the primary key.

The insert method accepts a Map as well, so that the previous code can be rewritten as:

employee.insert('5005',
    ['name': 'Zoe',
    'lastName': 'Ross',
    'age': '31'
    ])

To find a row by primary key, there is a findByKey method that returns a me.prettyprint.hector.api.beans.ColumnSlice object.

println employee.findByKey(5005)

The previous statement will output:

ColumnSlice([HColumn(age=31),
    HColumn(lastName=Ross),
    HColumn(name=Zoe)])

The Gassandra class lacks many basic methods to update or delete rows and other advanced query features. We leave them to the reader as an exercise.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.255.86