Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Apache Cassandra

Apache Cassandra mixes features of key-value and traditional relational databases. In a conventional relational database, the columns of a table are fixed. In Cassandra, however, rows within the same table can have different columns. Cassandra is therefore column oriented, since it allows a flexible schema for each row. Columns are organized in so-called column families, which are equivalent to tables in relational databases. Joins and subqueries are not possible with Cassandra. Cassandra can be downloaded from http://cassandra.apache.org/download/. The latest version at the time of writing was 2.0.9. Please refer to http://wiki.apache.org/cassandra/GettingStarted to get started.

Run the server from the command line as follows:

$ bin/cassandra –f

If you run the previous command, you may get the following error message:

Cassandra 2.0 and later require Java 7 or later.

Java in this context is a high-level programming language such as Python. Java 7 refers to version 1.7 (it's a marketing ploy). If you have Java installed, you can check its version as follows:

$ java –version
java version "1.7.0_60"

Note

For most operating systems, except Mac OS X, you can download Java from http://www.oracle.com/technetwork/java/javase/downloads/index.html.

Instructions for installing Java on Mac are given at http://docs.oracle.com/javase/7/docs/webnotes/install/mac/mac-jdk.html. Since this is a Python book, we will not dwell too long on the details of installing Java. A quick web search should give you more than enough information.

Create the directories listed in conf/cassandra.yaml or tweak them as follows:

data_file_directories:
/tmp/lib/cassandra/data
commitlog_directory: /tmp/lib/cassandra/commitlog
saved_caches_directory: /tmp/lib/cassandra/saved_caches

The following commands make sense if you don't want to keep the data:

$ mkdir -p /tmp/lib/cassandra/data
$ mkdir –p /tmp/lib/cassandra/commitlog
$ mkdir –p /tmp/lib/cassandra/saved_caches

Install a Python driver with the following command:

$ sudo pip install cassandra-driver
$ pip freeze|grep cassandra-driver
cassandra-driver==2.0.2

You might get the following error message:

The required version of setuptools (>=0.9.6) is not available,
    and can't be installed while this script is running. Please
    install a more recent version first, using
    'easy_install -U setuptools'.

This seems pretty self-explanatory.

Now it's time for the code. Connect to a cluster and create a session as follows:

cluster = Cluster()
session = cluster.connect()

Cassandra has the concept of keyspace. A keyspace holds tables. Cassandra has its own query language called Cassandra Query Language (CQL). CQL is very similar to SQL. Create the keyspace and set the session to use it:

session.execute("CREATE KEYSPACE IF NOT EXISTS mykeyspace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };")
session.set_keyspace('mykeyspace')

Now, create a table for the sunspots data:

session.execute("CREATE TABLE IF NOT EXISTS sunspots (year decimal PRIMARY KEY, sunactivity decimal);")

Create a statement that we will use in a loop to insert rows of the data as tuples:

query = SimpleStatement(
    "INSERT INTO sunspots (year, sunactivity) VALUES (%s, %s)",
    consistency_level=ConsistencyLevel.QUORUM)

The following line inserts the data:

for row in rows:
    session.execute(query, row)

Get the count of the rows in the table:
```
print session.execute("SELECT COUNT(*) FROM sunspots")
```
This prints the row count as follows:
```
[Row(count=309)]
```

Drop the keyspace and shut down the cluster:

session.execute('DROP KEYSPACE mykeyspace')
cluster.shutdown()

Refer to the cassandra_demo.py file in this book's code bundle:

from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
import statsmodels.api as sm

cluster = Cluster()
session = cluster.connect()
session.execute("CREATE KEYSPACE IF NOT EXISTS mykeyspace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };")
session.set_keyspace('mykeyspace')
session.execute("CREATE TABLE IF NOT EXISTS sunspots (year decimal PRIMARY KEY, sunactivity decimal);")

query = SimpleStatement(
    "INSERT INTO sunspots (year, sunactivity) VALUES (%s, %s)",
    consistency_level=ConsistencyLevel.QUORUM)

data_loader = sm.datasets.sunspots.load_pandas()
df = data_loader.data
rows = [tuple(x) for x in df.values]
for row in rows:
    session.execute(query, row)

print session.execute("SELECT COUNT(*) FROM sunspots")

session.execute('DROP KEYSPACE mykeyspace') 
cluster.shutdown()

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Apache Cassandra

Create new playlist

Sign In

Sign Up

Apache Cassandra

Note

Table of Contents for
Apache Cassandra