Persisting data to Cassandra

We have examined Apache Ignite's write-behind with MySQL persistence. In this section, we'll persist data to a NoSQL datastore.

NoSQL data stores can be categorized into four sub-categories:

Key-Value pair: DynamoDB and Redis
Graph DB: Neo4j
Document Store: MongoDB and Apache CouchDB
Column Store: Apache Cassandra and HBase

Apache Ignite offers NoSQL integration with the document (MongoDB) and column stores (Apache Cassandra). Apache Cassandra is a peer-to-peer, distributed, high performance, linearly scalable, and fault-tolerant NoSQL open source data store. It was designed at Facebook to achieve the scalability of Amazon's DynamoDB and Google's Bigtable. Cassandra can be used to store online transaction processing (OLTP) as well as reporting (OLAP) data.

The main advantages of Cassandra are as follows:

It was designed to scale, and supports big data scalability at the petabyte level.
It offers linear scalability. You can add more nodes to store more data.
There are no master-slave nodes. Peer-to-peer distributed nodes make sure that there is no single point of failure. Nodes communicate with each other, every second, using the Gossip protocol.
It provides flexible schema design. You can store structured and unstructured data any time you add a new column.
It is easy to replicate data.
It provides automatic data partitioning.
It also provides caching out-of-the-box. Data is first stored in memory (memtable). When the buffer is full, it automatically moves the data onto disk (SStable).
Commits are durable, similar to Ignite's WAL. It keeps track of every commit into a log.

Before we write code to store and retrieve data from Cassandra, we need to understand the basic building blocks of Cassandra. The following are the key concepts of Cassandra:

Clusters: A cluster is a collection of Cassandra nodes and contains keyspaces.
Keyspaces: Keyspaces are analogous to relational databases. A keyspace is a namespace with a name and a set of attributes, such as replication strategy and replication factor. It supports two replication strategies: simple strategy and network topology strategy. It contains a set of column families.
Column family: A column family is a collection of rows. Each row contains a collection of columns:
- row 1 may contain 2 columns key → 1 value → { 'name' : 'sujoy', age: 10}
- row 2 may contain 4 columns key →2 and value → {'name' : 'Vijay', age:30, sex:'M', married: true}
CQL: Cassandra Query Language (CQL) is used to access and manipulate data.

You can download the installation media from http://cassandra.apache.org/download/ and untar/unzip the media to start playing with Apache Cassandra:

Open a Terminal/Command Prompt.
Go to the <<CASSENDRA_INSTALLATION_DIRECTORY>>/bin folder.
Execute the cassandra/cassandra.bat file.

It will start Cassandra. Once the node starts, you can launch the CQL terminal to define the data models.
Launch cqlsh/cqlsh.bat.
It will open the cqlsh> prompt.
Type describe keyspaces;. It will list the existing keyspaces:

Now create your own keyspace called 'persistence' to verify the Cassandra persistence. Enter the create keyspace persistence with replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; CQL command:

It will create the new namespace. Type describe keyspaces; again to view our new keyspace:

We will use this namespace to create our table:

Create our conference table with CREATE TABLE conference(id int primary key, name text, startDateTime timestamp, endDateTime timestamp);. You must specify the primary key column.
It will create the table. Now view the 'persistence' namespace by executing the following CQL: describe persistence:

Apache Cassandra, the namespace, and the table are configured. Now it's time to examine the Apache Ignite's integration with Apache Cassandra. Perform the following steps:

Add the following Gradle dependencies to enable Cassandra. ignite-cassandra-store contains the Cassandra integration classes. cassandra-driver-core is the datastax driver for accessing Cassandra's datastore. ignite-cassandra-serializers is needed for Blob persistence, where the entire object is encrypted, serialized, and stored in a Cassandra column:

       //Cassandra 
       compile group: 'org.apache.ignite', name:'ignite-cassandra-store',
       version: "${igniteVersion}"
       compile group: 'com.datastax.cassandra', name: 'cassandra-driver-core', 
       version: '3.0.0'
       compile group: 'org.apache.ignite', name: 
       'ignite-cassandra-serializers', version: '2.6.0'
       compile group: 'commons-io', name: 'commons-io', version: '2.6'

Create a Conference POJO class to represent our data model. Make sure to define the default constructor. It will be invoked by the persistence settings to move data from Cassandra to POJO and POJO to Cassandra:

      public class Conference implements Serializable {
       private static final long serialVersionUID = 1L;
       private Integer id;
       private String name;
       private Date startDateTime;
       private Date endDateTime;
       public Conference() {}
       public Conference(Integer id, String name, Date startDateTime, 
       Date endDateTime) {
           super();
           this.id = id;
           this.name = name;
           this.startDateTime = startDateTime;
           this.endDateTime = endDateTime;
       }
      //Getters/setters and toString
       ... 
      }

Create persistence settings for the cache store factory for our conference table. The persistence settings must have the following attributes:
- Keyspace and table name: <persistence keyspace="persistence" table="conference">
- <keyPersistence class='' , strategy='' , ... > and <valuePersistence class='' , strategy='' , ...>
  - class (required): Java class name for the Ignite cache key
  - strategy (required): One of three possible persistent strategies:
    - PRIMITIVE: Stores the key value as is by mapping it to a Cassandra table column with the corresponding type. It should only be used for simple Java types (int, long, String, double, Date), which can be mapped to corresponding Cassadra types.
    - BLOB: Stores the key value as a BLOB by mapping it to a Cassandra table column with the blob type. It could be used for any Java object. The conversion of a Java object to a BLOB is handled by the serializer, which could be specified in the serializer attribute.
    - POJO: Stores each field of an object as a column with the corresponding type in the Cassandra table. It provides the ability to utilize Cassandra secondary indexes for object fields.
  - serializer (optional): Required for the BLOB strategy to serialize the keys/values. Here are the implementations:
    - org.apache.ignite.cache.store.cassandra.serializer.JavaSerializer—Java serialization
    - org.apache.ignite.cache.store.cassandra.serializer.KryoSerializer—uses the Kryo serialization framework
  - column (optional): Specifies the column name for PRIMITIVE and BLOB strategies to store the key/value. If not specified, the column with the key name will be used for keyPersistence, and the column with the value name will be used for valuePersistence.

Table of Contents for Persisting data to Cassandra

Create new playlist

Sign In

Sign Up

Table of Contents for
Persisting data to Cassandra