Chapter 8. Application Development with Drivers

Now that we’ve looked at how to design a microservice architecture for a hotel application, let’s look at how you might implement one of the services within that application - the Reservation Service. To write an application using Cassandra you’re going to need a driver, and thankfully you are in good hands.

You’re likely used to connecting to relational databases using drivers. For example, in Java, JDBC is an API that abstracts the vendor implementation of the relational database to present a consistent way of storing and retrieving data using Statements, PreparedStatements, ResultSets, and so forth. To interact with the database, you get a driver that works with the particular database you’re using, such as Oracle, SQL Server, or MySQL; the implementation details of this interaction are hidden from the developer.

There are a number of client drivers available for Cassandra as well, including support for most popular languages. There are benefits to these clients, in that you can easily embed them in your own applications, and that they frequently offer more features than the CQL native interface does, including connection pooling and JMX integration and monitoring. In the following sections, you’ll learn about the various clients available and the features they offer.

DataStax Java Driver

The introduction of CQL was the impetus for a major shift in the landscape of Cassandra client drivers. The simplicity and familiar syntax of CQL made the development of client programs similar to traditional relational database drivers. DataStax made a strategic investment of open source drivers for Java and several additional languages in order to fuel Cassandra adoption. These drivers quickly became the de facto standard for new development projects. You can access the drivers as well as additional connectors and tools at https://github.com/datastax.

DataStax Driver Compatibility Matrix

Visit the driver matrix page to access documentation and identify driver versions that are compatible with your server version.

The DataStax Java driver is the oldest and most popular of these drivers, and typically the driver in which new features appear first. For this reason, we’ll focus on using the Java driver and use this as an opportunity to learn about the features that are provided by the DataStax drivers across multiple languages.

Development Environment Configuration

First, you’ll need to access the driver in your development environment. You could download the driver directly from the URL listed before and manage the dependencies manually, but it is more typical in modern Java development to use a tool like Maven or Gradle to manage dependencies. If you’re using Maven, you’ll need to add something like the following to your project pom.xml file, while specifying a value for the driver version:

<dependency>
  <groupId>com.datastax.oss</groupId>
  <artifactId>java-driver-core</artifactId>
  <version>${driver.version}</version>
</dependency>

You can find the documentation manual for the Java drivers at https://docs.datastax.com/en/developer/java-driver/latest, and Javadoc for the Java driver at https://docs.datastax.com/en/drivers/java/latest/. Alternatively, the Javadocs are also part of the source distribution.

All of the DataStax drivers are managed as open source projects on GitHub. If you’re interested in seeing the Java driver source, you can get a read-only trunk version using this command:

$ git clone https://github.com/datastax/java-driver.git

If you’re interested in learning more about the internals of the driver or even potentially contributing to the project, there’s also a Developer Guide on the DataStax documentation site.

Driver API changes

The 4.0 release of the Java driver in late 2018 included significant breaking changes to the API and configuration of the driver in order to simplify application development and discourage configurations contrary to best practices. This book conforms to the newer APIs. The Clients chapter in the second edition of this book remains a good resource for those using the Java Driver 3.x and earlier.

In September 2019, DataStax announced a significant change to its driver strategy. Prior to that point, DataStax had maintained separate open source and enterprise drivers for use with Apache Cassandra and DataStax Enterprise, respectively. In early 2020, the codebases for the drivers in each of the supported languages were merged, bringing the benefit of several performance and availability improvements which were previously only available to DSE customers. DSE-specific driver features are out of the scope of this book but are well documented on the sites referenced above.

Connecting to a Cluster

Once you’ve configured your environment, it’s time to start coding. We’ll base the code samples for this chapter around the Reservation Service, a microservice implementation based on the hotel data model introduced in Chapter 5 and the corresponding application design discussed in Chapter 7. The source code for the Reservation Service is available at https://github.com/jeffreyscarpenter/reservation-service.

To start building your application, you’ll use the driver’s API to connect to a cluster. In the Java driver, connectivity to a cluster is represented by the com.datastax.oss.driver.api.core.CqlSession class.

The CqlSession class is the main entry point of the driver. It supports a fluent-style API using the builder pattern. For example, the following line creates a CqlSession that will attempt to connect to a Cassandra node on the local host at the default Cassandra native protocol port number:

CqlSession cqlSession = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("127.0.0.1", 9042))
    .build()

Elimination of the Cluster object

Previous versions of DataStax drivers supported the concept of a Cluster object used to create Session objects. Recent driver versions (for example the 4.0 Java driver and later) have combined Cluster and Session into CqlSession.

In the terminology of the driver, the nodes you explicitly identify when creating a CqlSession are known as contact points. Contact points are similar to the concept of seed nodes that a Cassandra node uses to connect to other nodes in the same cluster.

The minimum required information to create a CqlSession is a single contact point. The driver defaults to a single contact point consisting of the local host and default port, so this statement is equivalent to the previous one:

CqlSession cqlSession = CqlSession.builder().build()

While this configuration is useful for development, when you might be running a Cassandra node on your local machine, for production environments you’ll want to specify multiple contact points. This is a good practice in case one of the nodes you pick happens to be down when the client application is attempting to create a CqlSession. You’ll also want to specify the name of the local data center. We’ll discuss naming of data centers in Chapter 10.

CqlSession cqlSession = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("<some IP address>", 9042))
    .addContactPoint(new InetSocketAddress("<another IP address>", 9042))
    .withLocalDatacenter("<data center name>")
    .build()

When we create a CqlSession, the driver connects to one of the configured contact points to obtain metadata about the cluster. This action will throw a NoHostAvailableException if none of the contact points is available, or an AuthenticationException if authentication fails. We’ll discuss authentication in more detail in Chapter 14.

You can optionally provide the name of a keyspace to connect to, as in this example that connects to the reservation keyspace:

CqlSession cqlSession = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("<some IP address>", 9042))
    .addContactPoint(new InetSocketAddress("<another IP address>", 9042))
    .withKeyspace("reservation")
    .build()

If you do not specify a keyspace name when creating the CqlSession, you’ll have to qualify every table reference in your queries with the appropriate keyspace name.

Each CqlSession manages connections to a Cassandra cluster, which are used to execute queries and control operations using the Cassandra native protocol. The CqlSession contains a pool of TCP connections for each host.

Sessions Are Expensive

Because a CqlSession maintains TCP connections to multiple nodes, it is a relatively heavyweight object. In most cases, you’ll want to create a single CqlSession and reuse it throughout your application, rather than continually building up and tearing down CqlSessions. Another acceptable option is to create a CqlSession per keyspace, if your application is accessing multiple keyspaces.

Statements

Once you have created a CqlSession to connect to a cluster, you’re ready to perform reads or writes. To begin doing some real application work, you’ll create and execute CQL statements using implementations of Statement. Statement is an interface with several implementations, including SimpleStatement, BoundStatement, and BatchStatement.

The simplest way to create and execute a statement is to call the CqlSession.execute() operation with a string representing the statement. Here’s an example of a statement that will return the entire contents of the reservations table:

cqlSession.execute("SELECT * from reservation.reservations_by_confirmation");

This statement creates and executes a query in a single method call. In practice, this could turn out to be a very expensive query to execute in a large database, but it does serve as a useful example of a very simple query. Most queries will be more complex, as you’ll have search criteria to specify or specific values to insert. You can certainly use Java’s various string utilities to build up the syntax of your query by hand, but this of course is error prone. It may even expose your application to injection attacks, if you’re not careful to sanitize strings that come from end users.

Simple Statements

Thankfully, you needn’t make things so hard on yourself. The Java driver provides the SimpleStatement class to help construct parameterized statements. As it turns out, the execute() operation is a convenience method for creating a SimpleStatement. The code above is equivalent to the following, using the SimpleStatement.newInstance() method:

cqlSession.execute(SimpleStatement.newInstance("SELECT * from reservation.reservations_by_confirmation"));

The newInstance() is most useful in cases where you already have a set query string. Let’s try building a query with variable parameters using a SimpleStatementBuilder. Here’s an example of a statement that will insert a row in the reservations table, which you can then execute:

SimpleStatement reservationInsert = SimpleStatement.builder(
  "INSERT INTO reservations_by_confirmation (confirmation_number, hotel_id, start_date, end_date, room_number, guest_id) VALUES (?, ?, ?, ?, ?, ?)")
  .addPositionalValue("RS2G0Z")
  .addPositionalValue("NY456")
  .addPositionalValue("2020-06-08")
  .addPositionalValue("2020-06-10")
  .addPositionalValue(111)
  .addPositionalValue("1b4d86f4-ccff-4256-a63d-45c905df2677")
  .build();
cqlSession.execute(reservationInsert);

The first parameter to the call is the basic syntax of your query, indicating the table and columns you are interested in. The question marks are used to indicate values that you’ll be providing in additional parameters. You use simple strings to hold the values of the hotel ID, name, and phone number.

If you’ve created your statement correctly, the insert will execute successfully (and silently). Now let’s create another statement to read back the row you just inserted:

SimpleStatement reservationSelect = SimpleStatement.builder(
  "SELECT * FROM reservations_by_confirmation WHERE confirmation_number=?")
  .addPositionalValue("RS2G0Z")
  .build();
ResultSet reservationSelectResult = cqlSession.execute(reservationSelect);

Again, you make use of parameterization to provide the ID for the search. This time, when you execute the query, make sure to receive the ResultSet which is returned from the execute() method. You can iterate through the rows returned by the ResultSet as follows:

for (Row row : reservationSelectResult) {
  System.out.format("confirmation_number: %s, hotel_id: %, start_date: %s, end_date %s, room_number: %i, guest_id: %s
",
  row.getString("confirmation_number"), row.getString("hotel_id"), row.getLocalDate("start_date"), row.getLocalDate("end_date"), row.getInt("room_number"), row.getUuid("guest_id"));
}

This code uses the ResultSet.iterator() option to get an Iterator over the rows in the result set and loop over each row, printing out the desired column values. Note that you use special accessors to obtain the value of each column depending on the desired type—in this case, Row.getString(), getInt(), and getUuid(). As you might expect, this will print out a result such as:

confirmation_number: RS2G0Z, hotel_id: NY456, start_date: 2020-06-08, end_date: 2020-06-10, room_number: 111, guest_id: 1b4d86f4-ccff-4256-a63d-45c905df2677

Of course, you typically will set columns to values you receive as variables, rather than the hardcoded value used here. You can find code samples for working with SimpleStatements on the simple-statement-solution branch of the Reservation Service repository.

Prepared Statements

While SimpleStatements are quite useful for creating ad hoc queries, most applications tend to perform the same set of queries repeatedly. The PreparedStatement is designed to handle these queries more efficiently. The structure of the statement is sent to nodes a single time for preparation, and a handle for the statement is returned. To use the prepared statement, only the handle and the parameters need to be sent.

As you’re building your application, you’ll typically create PreparedStatements for reading data, corresponding to each access pattern you derive in your data model, plus others for writing data to your tables to support those access patterns.

Let’s create some PreparedStatements to represent the same reservation queries as before, using the CqlSession.prepare() operation:

PreparedStatement reservationInsertPrepared = cqlSession.prepare(
  "INSERT INTO reservations_by_confirmation (confirmation_number, hotel_id, start_date, end_date, room_number, guest_id) VALUES (?, ?, ?, ?, ?, ?)");

PreparedStatement reservationSelectPrepared = cqlSession.prepare(
  "SELECT * FROM reservations_by_confirmation WHERE confirmation_number=?");

Note that the PreparedStatement uses the same parameterized syntax used earlier for the SimpleStatement. A key difference, however, is that a PreparedStatement is not a subtype of Statement. This prevents the error of trying to pass an unbound PreparedStatement to the CqlSession to execute. Note that there is also a variant of CqlSession.prepare() that accepts a parameterized SimpleStatement as input.

Let’s take a step back and discuss what is happening behind the scenes of the CqlSession.prepare() operation:

  • The driver passes the contents of your PreparedStatement to a Cassandra node and gets back a unique identifier for the statement. This unique identifier is referenced when you create a BoundStatement. If you’re curious, you can actually see this reference by calling PreparedStatement.getId().

  • Once the driver prepares the statement on one node, it proceeds to prepare the statement on the other nodes in the cluster. Nodes keep track of prepared statements in an internal table so that they are present if the node goes down and comes back up.

  • The driver also provides the advanced.prepared-statements.reprepare-on-up configuration options. If re-preparation is enabled (the default), it will re-prepare statements on nodes that have come back up.

  • If the driver tries to execute a PreparedStatement on a node where it has not been prepared, the driver automatically prepares the statement, at the cost of an additional round trip between the driver and the node.

You can think of a PreparedStatement as a template for creating queries. In addition to specifying the form of your query, there are other attributes that you can set on a PreparedStatement that will be used as defaults for statements it is used to create, including a default consistency level, retry policy, and tracing.

In addition to improving efficiency, PreparedStatements also improve security by separating the query logic of CQL from the data. This provides protection against injection attacks, which attempt to embed commands into data fields in order to gain unauthorized access.

Bound statement

Now your PreparedStatement is available to use to create queries. In order to make use of a PreparedStatement, you bind it with actual values by calling the bind() operation. For example, you can bind the SELECT statement created earlier as follows:

BoundStatement reservationSelectBound = reservationSelectPrepared.bind("RS2G0Z");

The bind() operation used here allows you to provide values that match each variable in the PreparedStatement. It is possible to provide the first n bound values, in which case the remaining values must be bound separately before executing the statement. There is also a version of bind() which takes no parameters, in which case all of the parameters must be bound separately. There are several set() operations provided by BoundStatement that can be used to bind values of different types. For example, you can take the INSERT prepared statement from above and bind the name and phone values using the setString() operation:

BoundStatement reservationInsertBound = reservationInsertPrepared.bind()
  .setString("confirmation_number", "RS2G0Z")
  .setString("hotel_id", "NY456")
  .setLocalDate("start_date", "2020-06-08")
  .setLocalDate("end_date", "2020-06-10")
  .setShort(111)
  .setUuid("1b4d86f4-ccff-4256-a63d-45c905df2677")

Once you have bound all of the values, execute a BoundStatement using CqlSession.execute(). If you have failed to bind any of the values, they will be ignored on the server side, if protocol v4 (Cassandra 3.0 or later) is in use. The driver behavior for older protocol versions is to throw an IllegalStateException if there are any unbound values.

You can find code samples for working with PreparedStatement and BoundStatement on the prepared-statement-solution branch of the Reservation Service repository.

Query Builder

The driver also provides a QueryBuilder, which uses a fluent-style API for creating queries programmatically. This is especially useful for cases where there is variation in the query structure (such as optional parameters) that would make using PreparedStatements difficult. Similar to PreparedStatement, it also provides some protection against injection attacks.

To use the QueryBuilder, you’ll need to include an additional dependency, for example in a Maven POM file:

<dependency>
  <groupId>com.datastax.oss</groupId>
  <artifactId>java-driver-query-builder</artifactId>
  <version>${driver.version}</version>
</dependency>

The QueryBuilder provides a set of static methods to facilitate building different types of statements represented by different classes. The common usage is to import the static methods of the QueryBuilder class:

import static com.datastax.oss.driver.api.querybuilder.QueryBuilder.*;

Importing methods statically improves code readability, as you’ll see as we look at some examples.

The QueryBuilder produces objects that implement the com.datastax.oss.driver.api.querybuilder.BuildableQuery interface and its sub-interfaces, such as Select, Insert, Update, Delete and others. The methods on these interfaces return objects that represent the content of a query as it is being built up. You’ll likely find your IDE quite useful in helping to identify the allowed operations as you’re building queries.

Let’s reproduce the queries from before using the QueryBuilder to see how it works. First, build a CQL INSERT query:

Insert reservationInsert = insertInto("reservation", "reservations_by_confirmation")
  .value("confirmation_number", "RS2G0Z")
  .value("hotel_id", "NY456")
  .value("start_date", "2020-06-08")
  .value("end_date", "2020-06-10")
  .value("room_number", 111)
  .value("guest_id", "1b4d86f4-ccff-4256-a63d-45c905df2677");

SimpleStatement reservationInsertStatement = reservationInsert.build();

The first operation calls the QueryBuilder.insertInto() operation to create an Insert statement for the reservations_by_confirmation table. Then use the Insert.value() operation repeatedly to specify values for each column you are inserting. The Insert.build() operation returns a SimpleStatement you can then pass to CqlSession.execute().

The construction of the CQL SELECT command is similar:

Select reservationSelect = selectFrom("reservation", "reservations_by_confirmation")
  .all()
  .whereColumn("confirmation_number").isEqualTo("RS2G0Z");

SimpleStatement reseravationSelectStatement = reservationSelect.build();

For this query, call QueryBuilder.selectFrom() to create a Select statement. You use the Select.all() operation to select all columns, although you could also have used the column() operation to select specific columns. Add a CQL WHERE clause via the Select.whereColumn() operation, to which you pass the name of the column and then add an equality check for the confirmation number uwing the isEqualTo() operation.

This sample demonstrates how you can use the QueryBuilder to create a PreparedStatement instead of a SimpleStatement, using the concept of a bind marker as a placeholder for a value to be specified when the PreparedStatement is bound:

Select reservationSelect = selectFrom("reservation", "reservations_by_confirmation")
  .all()
  .whereColumn("confirmation_number").isEqualTo(bindMarker());
PreparedStatement reservationSelectPrepared = cqlSession.prepare(reservationSelect.build());

// later
SimpleStatement reservationSelectStatement = reservationSelectPrepared.bind("RS2G0Z");

For a complete code sample using the QueryBuilder, see the query-builder-solution branch of the Reservation Service repository.

Object Mapper

We’ve explored several techniques for creating and executing query statements with the driver. There is one final technique to look at that provides a bit more abstraction. The Java driver provides an object mapper that allows you to focus on developing and interacting with domain models (or data types used on APIs). The object mapper works off of annotations in source code that are used to map Java classes to tables or user-defined types (UDTs). The object mapper is a useful tool for abstracting some of the details of interacting with Cassandra, especially if you have an existing domain model.

The mapper is provided in two separate libraries for use at compile time and runtime, so you will need to include additional additional Maven dependencies in order to use Mapper in your project. You’ll add the following dependency to the compile path of your application:

<dependency>
  <groupId>com.datastax.oss</groupId>
  <artifactId>java-driver-mapper-processor</artifactId>
  <version>${driver.version}</version>
</dependency>

You’ll also add the runtime library as a runtime dependency:

<dependency>
  <groupId>com.datastax.oss</groupId>
  <artifactId>java-driver-mapper-runtime</artifactId>
  <version>${driver.version}</version>
</dependency>

The mapper API is based on standard design patterns for data access, including entity classes and data access objects (DAOs). You create an entity class to represent each table in your design, a DAO interface to specify queries on entities, and a mapper interface that helps generate DAO instances. The mapper generates code based on the classes and interfaces you provide.

For a complete example of using the mapper, you’ll want to look at the mapper-solution branch of the Reservation Service repository. We’ll share some of the highlights here. Let’s begin by creating a ReservationsByConfirmation entity class which will represent rows in the reservations_by_confirmation table:

import com.datastax.oss.driver.api.mapper.annotations.Entity;
import com.datastax.oss.driver.api.mapper.annotations.PartitionKey;
import com.datastax.oss.driver.api.mapper.annotations.NamingStrategy;
import static com.datastax.oss.driver.api.mapper.entity.naming.NamingConvention.SNAKE_CASE_INSENSITIVE;

@Entity
@NamingStrategy(convention = SNAKE_CASE_INSENSITIVE)
public class ReservationsByConfirmation {

    @PartitionKey
    private String confirmationNumber;

    private String hotelId;
    private LocalDate startDate;
    private LocalDate endDate;
    private short roomNumber;
    private UUID guestId;

    // constructors, get/set methods, hashcode, equals
}

There are several annotations used in this example. The class is denoted as an @Entity, and also as having a @NamingStrategy, which is a way of specifying how the mapper should correlate Java identifiers to CQL. For example, you can specify a SNAKE_CASE_INSENSITIVE convention as above, which means that the mapper will convert Java-style class and member names to lowercase, with underscores separating words, which is the recommended CQL naming style. Thus the class name ReservationsByConfirmation will be mapped to the reservations_by_confirmation table, the confirmationNumber member will be mapped to the confirmation_number column, and so on.

The Reservation Service uses an additional entity class ReservationsByHotelDate that is used with the reservations_by_hotel_date table. Its implementation is quite similar, so we won’t reproduce it here.

You can also create entity classes corresponding to User Defined Types (UDTs). If your domain model contains classes that reference other classes, you can annotate the referenced classes as user-defined types with the @Entity annotation. The Object Mapper processes objects recursively using your annotated types.

Next, you’ll create a DAO interface to represent queries on these entity classes:

import com.datastax.oss.driver.api.core.PagingIterable;
import com.datastax.oss.driver.api.mapper.annotations.*;

@Dao
public interface ReservationDao {

    @Select
    ReservationsByConfirmation findByConfirmationNumber(String confirmationNumber);

    @Query("SELECT * FROM ${tableId}")
    PagingIterable<ReservationsByConfirmation> findAll();

    @Insert
    void save(ReservationsByConfirmation reservationsByConfirmation);

    @Delete
    void delete(ReservationsByConfirmation reservationsByConfirmation);

    @Select (customWhereClause = "hotel_id = :hotelId AND start_date = :date")
    PagingIterable<ReservationsByHotelDate> findByHotelDate(
            @CqlName("hotel_id") String hotelId,
            @CqlName("start_date") LocalDate date);

    @Insert
    void save(ReservationsByHotelDate reservationsByHotelDate);

    @Delete
    void delete(ReservationsByHotelDate reservationsByHotelDate);
}

The ReservationDao interface is annotated as a @Dao, and the various queries are marked with annotations such as @Select, @Insert, @Delete, and @Query.

The next step is to create a Mapper interface that can be used to obtain DAO instances:

import com.datastax.oss.driver.api.mapper.annotations.DaoFactory;
import com.datastax.oss.driver.api.mapper.annotations.Mapper;

@Mapper
public interface ReservationMapper {

    @DaoFactory
    ReservationDao reservationDao();

}

Annotate the interface with @Mapper and each operation that returns a DAO with @DaoFactory. When you compile the application, the object mapper interprets your annotations to create a ReservationMapperBuilder class that you can invoke to obtain an implementation of ReservationMapper interface that wraps the CqlSession, and from there obtain an object implementing the ReservationDao interface:

ReservationMapper reservationMapper = new ReservationMapperBuilder(cqlSession).build();
ReservationDao reservationDao = reservationMapper.reservationDao();

Since the mapper and DAO objects are using your CqlSession, you should reuse them just as you do the CqlSession.

Now you can use the ReservationDao to perform queries using your entity classes. Create a ReservationsByConfirmation object using a simple constructor that you can save using the DAO:

ReservationsByConfirmation reservation = new ReservationsByConfirmation(
  "RS2G0Z", "NY456", "2020-06-08", "2020-06-10", 111,
  UUID.fromString("1b4d86f4-ccff-4256-a63d-45c905df2677"));
reservationDao.save(reservation);

You can use the java.util.UUID.fromString() operation here for convenience; in most applications, the value would have been passed in via a remote invocation.

The Mapper.save() operation is all you need to execute to perform a CQL INSERT or UPDATE, as these are really the same operation to Cassandra. The ReservationDao builds and executes the statement on your behalf.

To retrieve a specific reservation, use the ReservationDao.findByConfirmationNumber() operation, passing in an argument list that matches the the partition key:

ReservationsByConfirmation reservation = reservationDao.findByConfirmation Number("RS2G0Z");

Deleting a reservation is also straightforward:

reservationDao.delete(reservation);

The Object Mapper documentation describes more advanced features, including DAO methods that execute asynchronously, the ability to configure CQL statement options such as TTL or consistency level, and customizing how the mapper handles annotations.

Asynchronous Execution

The CqlSession.execute() operation is synchronous, which means that it blocks until a result is obtained or an error occurs, such as a network timeout. The driver also provides the asynchronous executeAsync() operation to support non-blocking interactions with Cassandra. These non-blocking requests can make it simpler to send multiple queries in parallel to speed performance of your client application.

You could take any of the Statements from the examples above and execute it asynchronously:

CompletionStage<AsyncResultSet> resultStage =  cqlSession.executeAsync(statement);

The result is of the type CompletionStage type introduced in Java 8. The CompletionStage

A Future is a Java generic type used to capture the result of an asynchronous operation. Each Future can be checked to see whether the operation has completed, and then queried for the result of the operation according to the bound type. There are also blocking wait() operations to wait for the result. A Future can be cancelled if the caller is no longer interested in the result of the operation. The Future class is a useful tool for implementing asynchronous programming patterns, but requires either blocking or polling to wait for the operation to complete.

To address this drawback, the Java driver leverages the ListenableFuture interface from Google’s Guava framework. The ListenableFuture interface extends Future, and adds an addListener() operation that allows the client to register a callback method that is invoked when the Future completes. The callback method is invoked in a thread managed by the driver, so it is important that the method complete quick ly to avoid tying up driver resources. The ResultSetFuture is bound to the ResultSet type.

Additional Asynchronous Operations

In addition to the CqlSession.executeAsync() operation, the driver supports several other asynchronous operations, including CqlSession.closeAsync(), CqlSession.prepareAsync(), and several operations on the object mapper. You can also build the CqlSession asynchronously using CqlSessionBuilder.buildAsync().

Driver Configuration

We’ve already looked at a few of the available options for configuring the driver, but now let’s take a step back and look at its overall configuration approach.

File-based configuration

While the CqlSession may be configured programmatically via the CqlSession.Builder class, the Java driver also supports a file-based configuration approach based on the Typesafe Config project, an open source library that provides configuration for JVM languages. In most cases it is preferable to use configuration values based on a configuration file rather than programmatic statements. For example, the configuration values provided above could be specified in a configuration file such as the one provided for the Reservation Service:

datastax-java-driver {
  basic {
    contact-points = [ "127.0.0.1:9042", "127.0.0.2:9042" ]
    session-keyspace = reservation
  }
}

The configuration file above is written in the Human-Optimized Config Object Notation (HOCON) format. The Java driver uses the conventions of the Typesafe Config library for configuration file locations; it searches the Java classpath for files named application.conf, application.json, or application.properties. The configuration loader is a pluggable interface which you can override to create your own implementation.

Basic Configuration Options

The Java driver divides configuration values into two categories: basic configuration values that are customized most frequently, and advanced configuration values that are used less frequently. The basic options include the following:

  • Contact points and keyspace name, as above

  • A session-name that will be used in log messages and metrics (if none is provided, they will be generated in the form s1, s2, and so on for each distinct CqlSession created)

  • The config-reload-interval that specifies how often configuration values will be reloaded from the file (defaults to 5 minutes)

  • Default parameters applied to each request, including the request.timeout, the request.consistency (consistency level), the request.page-size which determines how many rows will be retrieved at a time for larger queries

  • The load-balancing-policy, which we’ll discuss in “Load Balancing”

You can configure advanced options on a CqlSession including query execution, connection management, security, logging, and metrics. We’ll examine several of these options in later sections. The DataStax documentation provides a reference configuration file, which is an excellent resource for learning about all of the available configuration options.

Load Balancing

As discussed in Chapter 6, a query can be made to any node in a cluster, which is then known as the coordinator node for that query. Depending on the contents of the query, the coordinator may communicate with other nodes in order to satisfy the query. If a client were to direct all of its queries at the same node, this would produce an unbalanced load on the cluster, especially if other clients are doing the same.

To get around this issue, the driver provides a pluggable mechanism to balance the query load across multiple nodes. Load balancing is implemented by selecting an implementation of the com.datastax.oss.driver.api.core.loadbalancing.LoadBalancingPolicy interface.

Each LoadBalancingPolicy must provide a distance() operation to classify each node in the cluster as local, remote, or ignored, according to the HostDistance enumeration. The driver prefers interactions with local nodes and maintains more connections to local nodes than remote nodes. The other key operation is newQueryPlan(), which returns a list of nodes in the order they should be queried. The LoadBalancingPolicy interface also contains operations that are used to inform the policy when nodes are added or removed, or go up or down. These operations help the policy avoid including down or removed nodes in query plans.

Versions of the Java driver through the 3.x series provided multiple LoadBalancingPolicy implementations with a composable API that allowed a custom selection of behaviors. Beginning with the 4.0 release, the DataStax Java Driver ships with a single default LoadBalancingPolicy to simplify the developer experience. This default implementation reflects an opinionated point of view based on best practices observed from many deployments, including the following behaviors:

Round-robin queries

The policy allocates requests across the nodes in the cluster in a repeating pattern to spread the processing load (equivalent to the RoundRobinPolicy from the legacy driver).

Token awareness

The policy uses the token value of the partition key in order to select a node which is a replica for the desired data, thus minimizing the number of nodes that must be queried (equivalent to the TokenAwarePolicy from the legacy driver).

Data center awareness

The policy requires setting a local data center. The default load balancing policy will only include nodes in the local data center as part of its query plans. The local data center must be identified explicitly when building the CqlSession via the withLocalDataCenter() operation, or via the configuration property basic.load-balancing-policy.local-datacenter.

This is a difference from the legacy driver, which provided a DCAwareRoundRobinPolicy that would include remote nodes in query plans after local nodes. This was intended as a reliability mechanism in case all replicas in the local data center were unavailable. In practice, however, if all the replicas in a local DC are down, it is typically a broader outage at the data center level, and shifting traffic to other nodes has proven to have undesirable side effects and to be difficult to debug.

Should you wish to set a different default LoadBalancingPolicy, you may specify it when building a CqlSession via the withLoadBalancingPolicy() operation, or by configuring the properties in the basic.load-balancing-policy group.

Retrying Failed Queries

When Cassandra nodes fail or become unreachable, the driver automatically and transparently tries other nodes and schedules reconnection to the dead nodes in the background according to the configured reconnection policy. The reconnection policy is determined according to the advanced.reconnection-policy configuration options. Two reconnection policies are provided: the ExponentialReconnectionPolicy and the ConstantReconnectionPolicy.

Because temporary changes in network conditions can also make nodes appear offline, the driver also provides a mechanism to retry queries that fail due to protocol or network-related errors. This removes the need to write retry logic in client code.

The driver retries failed queries according to the provided implementation of the com.datastax.oss.driver.api.core.retry.RetryPolicy interface. The onReadTimeout(), onWriteTimeout(), and onUnavailable() operations define the behavior that should be taken when a query fails with protocol or network-related exceptions ReadTimeoutException, WriteTimeoutException, or UnavailableException, respectively. The onErrorResponse() operation describes the behavior for handling other recoverable server errors, and the onRequestAborted() handles cases in which the driver aborts a request before the server responds.

The RetryPolicy operations return a RetryDecision, which indicates whether the query should be retried, and if so, at what consistency level. If the exception is not retried, it can be rethrown, or ignored, in which case the query operation will return an empty ResultSet.

The 4.0 release of the driver provides a single opinionated implementation of the RetryPolicy based on best practices. Releases through 3.x included a FallthroughRetryPolicy that never recommended retries, and a DowngradingConsistencyRetryPolicy that downgrades the consistency level required on retries, as an attempt to get the query to succeed. The issue with the DowngradingConsistencyRetryPolicy was: if you are willing to accept a downgraded consistency level under some circumstances, do you really require a higher consistency level for the general case?

The RetryPolicy implementation can be overridden using the advanced.retry-policy configuration.

Speculative Execution

While it’s great to have a retry mechanism that automates the response to network timeouts, you don’t often have the luxury of being able to wait for timeouts or even long garbage collection pauses. To speed things up, the driver provides a speculative execution feature. If the original coordinator node for a query fails to respond in a predetermined interval, the driver can preemptively start an additional execution of the query against a different coordinator node. When one of the queries returns, the driver provides that response and cancels any other outstanding queries.

Speculative execution is disabled by default via the NoSpeculativeExecutionPolicy, but can be enabled on a CqlSession by setting the ConstantSpeculativeExecutionPolicy. Here’s an example of how you configure this policy in the configuration file by specifying a maximum number of executions and a constant delay between executions (in milliseconds):

advanced.speculative-execution-policy {
  class = ConstantSpeculativeExecutionPolicy
  max-executions = 3
  delay = 100 milliseconds
}

You may create your own policy by implementing the com.datastax.oss.driver.api.core.specex.SpeculativeExecutionPolicy interface.

Connection Pooling

Because the CQL native protocol is asynchronous, it allows multiple simultaneous requests per connection; the maximum is 128 simultaneous requests in protocol v2, while v3 and later allow up to 32,768 simultaneous requests. Because of this larger number of simultaneous requests, fewer connections per node are required. In fact, the default is a single connection per node.

The driver supports the ability to scale the number of connections up or down based on the number of requests per connection. These connection pool settings are configurable via the advanced.connection configuration options, including the number of connections to use for local and remote hosts, and the maximum number of simultaneous requests per connection (defaults to 1024). While the v4 driver does not provide the ability to scale the number of connections up and down as with previous versions, you can adjust these settings by updating the configuration file, and the changes will be applied at the next time the configuration file is reloaded.

The driver uses a connection heartbeat to make sure that connections are not closed prematurely by intervening network devices. This defaults to 30 seconds but can be overridden using the advanced.heartbeat configuration options.

Protocol Version

The driver supports multiple versions of the CQL native protocol. Cassandra 4.0 uses version CQL protocol version 5, while Cassandra 3.X releases support version 4.

By default, the driver negotiates the protocol version when establishing connections, even correctly handling connections to mixed clusters in which multiple versions of Cassandra are in use. You can force a protocol version using the advanced.protocol.version configuration option.

Compression

The driver provides the option of compressing messages between your client and Cassandra nodes, according to the compression options supported by the CQL native protocol. Enabling compression reduces network bandwidth consumed by the driver, at the cost of additional CPU usage for the client and server.

Currently there are two compression algorithms available, LZ4 and SNAPPY. The compression defaults to NONE but can be overridden by setting the advanced.protocol.compression configuration property.

Driver Security

The driver provides a pluggable authentication mechanism that can be used to support a simple username/password login, or integration with other authentication systems. By default, no authentication is performed. You can select an authentication provider by passing an implementation of the com.datastax.oss.driver.api.core.auth.AuthProvider interface such as the PlainTextAuthProvider to the CqlSessionBuilder.withAuthProvider() operation, or by setting the advanced.auth-provider section in your configuration file.

The driver can also encrypt its communications with the server to ensure privacy. Client-server encryption options are specified by each node in its cassandra.yaml file. The driver complies with the encryption settings specified by each node.

We’ll examine authentication, authorization, and encryption from both the client and server perspective in more detail in Chapter 14.

Execution Profiles

While some of the configuration values that we’ve examined can be overridden on individual Statements, many of them cannot. So what can you do when the configuration values chosen are appropriate for some of your queries, but not others? The driver allows you to create execution profiles, which are settings of configuration values that can be applied to individual Statements as an overlay over the default configuration. To learn which configuration options can be set in a profile, see the reference configuration file.

For example, let’s say your default settings include a request timeout of one second and a consistency level of LOCAL_QUORUM. You could create an execution profile to use with requests that you want to give a stronger consistency by adding this to the profiles section of the configuration file:

datastax-java-driver {
  profiles {
    long_request {
      basic.request.timeout = 3 seconds
      basic.request.consistency = QUORUM
    }
}

Then, you can apply the values to a Statement:

statement.setExecutionProfileName("long_request");

There is also a setExecutionProfileName() operation available when using the SimpleStatementBuilder. Or, if you create a PreparedStatement from a SimpleStatement (using CqlSession.prepare()), any execution profile you have set will be inherited from any BoundStatements created from the PreparedStatement.

Metadata

To access the cluster metadata, invoke the CqlSession.getMetadata() method, which returns an object implementing the com.datastax.oss.driver.api.core.metadata.Metadata interface. This object provides information about the cluster at a snapshot in time, including the nodes in the cluster, the tokens assigned to each node, and the schema including keyspaces and tables.

Node Discovery

A CqlSession maintains a control connection connection to the first node it connects with, which it uses to maintain information on the state and topology of the cluster. Using this connection, the driver will discover all the nodes currently in the cluster, and you can obtain this information through the Metadata.getNodes() operation, which returns a list of com.datastax.oss.driver.api.core.metadata.Node objects to represent each node. You can view the state of each node through the Node.getState() operation, or register an implementation of the com.datastax.oss.driver.api.core.metadata.NodeStateListener interface to receive callbacks when nodes are added or removed from the cluster, or when they are up or down. This state information is also viewable in the driver logs, which we’ll discuss below.

Schema Access

The Metadata class also allows the client to learn about the schema in a cluster, including operations that provide descriptions of individual keyspaces and tables. The schema version in use in a cluster can change over time as keyspaces and tables are created, altered, and deleted.

We discussed Cassandra’s support for eventual consistency at great length in Chapter 2. Because schema information is itself stored using Cassandra, it is also eventually consistent, and as a result it is possible for different nodes to have temporarily different versions of the schema. The driver has internal safeguards to check for schema agreement before initiating any statement that would change the schema.

The driver provides a notification mechanism for clients to learn about schema changes by registering a ++com.datastax.oss.driver.api.core.metadata.schema.SchemaChangeListener++ with the ++CqlSession++ as it is built using the ++withSchemaChangeListener()++ operation on the builder, or via the ++advanced.schema-change-listener++ configuration option.

In addition to the schema access we’ve just examined in the Metadata class, the Java driver also provides a facility for managing schema in the com.datastax.oss.driver.api.querybuilder package. The SchemaBuilder provides a fluent-style API for creating Statements representing operations such as CREATE, ALTER, and DROP operations on keyspaces, tables, indexes, and user-defined types (UDTs).

For example, you could create the reservations_by_confirmation table using the createTable() schema builder:

import static com.datastax.oss.driver.api.querybuilder.SchemaBuilder.createTable;
import com.datastax.oss.driver.api.core.type.DataTypes;

cqlSession.execute(createTable("reservation", "reservations_by_confirmation")
  .ifNotExists()
  .withPartitionKey("confirmation_number, DataTypes.TEXT)
  .withColumn("hotel_id", DataTypes.TEXT)
  .withColumn("start_date", DataTypes.DATE)
  .withColumn("end_date", DataTypes.DATE)
  .withColumn("room_number", DataTypes.SMALLINT)
  .withColumn("guest_id", DataTypes.UUID)
  .build());

Managing Case Sensitive Identifiers with the Java Driver

As you learned in Chapter 4, CQL is case-sensitive by default. While the practice is generally discouraged, it is possible to create case-sensitive names for keyspaces, tables, columns by using quotes around identifiers in CQL. In order to simplify the handling of case sensitivity, the Java driver uses the CqlIdentifier class as a wrapper for all identifiers in its schema API. If you are writing code that manipulates schema, it’s a good practice to make use of these identifiers as well. Java Driver APIs that accept identifiers as arguments support both Java String (as shown above) and CqlIdentifier formats (as shown in the Reservation Service implementation).

Debugging and Monitoring

The driver provides features for monitoring and debugging your client’s use of Cassandra, including facilities for logging and metrics. There are also capabilities for query tracing and tracking slow queries, which you’ll learn about in Chapter 13.

Driver Logging

As you will learn in Chapter 11, Cassandra uses a logging API called Simple Logging Facade for Java (SLF4J). The Java driver uses the SLF4J API for logging as well. In order to enable logging on your Java client application, you need to provide a compliant SLF4J implementation on the classpath, such as Logback (used by the Reservation service) or Log4J. The Java driver provides information at multiple levels; the ERROR, WARN, and INFO levels are the most useful to application developers.

You configure logging by taking advantage of Logback’s configuration mechanism, which supports separate configuration for test and production environments. Logback inspects the classpath first for the file logback-test.xml representing the test configuration, and then if no test configuration is found, it searches for the file logback.xml. Here’s an example extract from a logback.xml file configuration file that enables the INFO log level for the Java Driver:

<configuration>
  <!-- other appenders and loggers -->
  <logger name="com.datastax.oss.driver" level="INFO"/>
</configuration>

For more detail on Logback configuration, including sample configuration files for test and production environments, see the configuration page or the Reservation Service implementation.

Driver Metrics

Sometimes it can be helpful to monitor the behavior of client applications over time in order to detect abnormal conditions and debug errors. The Java driver collects metrics on its activities and makes these available using the Dropwizard Metrics library. The driver reports metrics on connections, task queues, queries, and errors such as connection errors, read and write timeouts, retries, and speculative executions. A full list of metrics is available in the reference configuration.

You can access the Java driver metrics locally via the CqlSession.getMetrics() operation. The Metrics library can also integrate with the Java Management Extensions (JMX) to allow remote monitoring of metrics. JMX reporting is disabled by default in the v4 drivers (it was enabled by default in v3), but can be configured.

Other Cassandra Drivers

DataStax Python Driver

The DataStax Python Driver was introduced in 2014, replacing the Pycassa client built on Cassandra’s legacy Thrift interface as the primary Python driver for Cassandra. The driver supports Python 2.7 as well as current Python 3 versions back to 3.4. You can install the driver by running the Python installer pip:

$ pip install cassandra-driver

The Python driver includes an object mapper called cqlengine and makes use of third party libraries for performance, compression, and metrics. The driver source is available on GitHub.

DataStax Node.js Driver

The DataStax Node.js Driver was introduced in October 2014, based on the node-cassandra-cql project developed by Jorge Bay.

The Node.js driver is installed via the node package manager (NPM):

$ npm install cassandra-driver

As with other DataStax Drivers, the source code is available on GitHub.

DataStax C# Driver

First released in July 2013, the DataStax C# driver provides support for Windows clients using the .NET framework. For this reason, it is also frequently referred to as the “.NET Driver.”

The C# Driver is available on NuGet, the package manager for the Microsoft development platform. Within PowerShell, run the following command at the Package Manager Console:

PM> Install-Package CassandraCSharpDriver

To use the driver, create a new project in Visual Studio and add a using directive that references the Cassandra namespace. The C# driver integrates with Language Integrated Query (LINQ), a Microsoft .NET Framework component that adds query capabilities to .NET languages; there is a separate object mapper available as well.

DataStax C/C++ Driver

The DataStax C/C++ Driver was released in February 2014. The C/C++ Driver is a bit different than the other drivers in that its API focuses on asynchronous operations to the exclusion of synchronous operations.

The C/C++ driver uses the libuv library for asynchronous I/O operations, and optionally uses the OpenSSL library if needed for encrypted client-node connections. Instructions for compilation and linking vary by platform, so see the driver documentation for details.

DataStax Ruby and PHP Drivers

DataStax also has drivers available for Ruby and PHP, although these are considered to be in maintenance mode and are updated only for critical bug fixes.

JDBC and ODBC Drivers

Open Database Connectivity (ODBC) is a standard developed by Microsoft that allows applications to access data using SQL. Java Database Connectivity (JDBC) is a Java API that provides an SQL abstraction - see the java.sql package. JDBC and ODBC drivers are available from vendors including Simba and Progress Software.

GoCQL Driver

The Go language created at Google has seen a rapid increase in popularity for server applications since its public introduction in 2009. The language is similar to C syntax but contains similar improvements in terms of memory management and concurrency.

GoCQL is an open source driver for the Go language. It is under active development but provides many of the same features as the DataStax drivers, including connection management, statement execution, paging, batches, and more.

Summary

You should now understand the various drivers available for Cassandra, the features they provide, and how to install and use them. We gave particular attention to the DataStax Java driver in order to get some hands-on experience, which should serve you well even if you choose to use one of the other DataStax or community drivers. You’ll continue to learn other driver features in the coming chapters as we discuss more details of reading and writing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
44.223.39.67