Chapter 11. Monitoring

The term observability is often used to describe a desirable attribute of distributed systems. Observability means having visibility into the various components of a system in order to detect, predict, and perhaps even prevent the complex failures that can occur in distributed systems. Failures in individual components can affect other components in turn, and multiple failures can interact in unforeseen ways, leading to system-wide outages. Common elements of an observability strategy for a system include metrics, logging, and tracing.

In this chapter, you’ll learn how Cassandra supports these elements of observability and how to use available tools to monitor and understand important events in the life cycle of your Cassandra cluster. We’ll look at some simple ways to see what’s going on, such as changing the logging levels and understanding the output.

To begin, let’s discuss how Cassandra uses the Java Management Extensions (JMX) to expose information about its internal operations and allow the dynamic configuration of some of its behavior. That will give you a basis to learn how to monitor Cassandra with various tools.

Monitoring Cassandra with JMX

Cassandra makes use of Java Management Extensions (JMX) to enable remote management of your nodes. JMX started as Java Specification Request (JSR) 160 and has been a core part of Java since version 5.0. You can read more about the JMX implementation in Java by examining the java.lang.management package.

JMX is a Java API that provides management of applications in two key ways. First, it allows you to understand your application’s health and overall performance in terms of memory, threads, and CPU usage—things that apply to any Java application. Second, it allows you to work with specific aspects of your application that you have instrumented.

Instrumentation refers to putting a wrapper around application code that provides hooks from the application to the JVM in order to allow the JVM to gather data that external tools can use. Such tools include monitoring agents, data analysis tools, profilers, and more. JMX allows you not only to view such data but also, if the application enables it, to manage your application at runtime by updating values.

Many popular Java applications are instrumented using JMX, including the JVM itself, the Tomcat application server, and Cassandra. Figure 11-1 shows the JMX architecture as used by Cassandra.

cdg3 1101
Figure 11-1. The JMX architecture

The JMX architecture is simple. The JVM collects information from the underlying operating system. The JVM itself is instrumented, so many of its features are exposed including memory management and garbage collection, threading and deadlock detection, classloading, and logging.

An instrumented Java application (such as Cassandra) runs on top of this, also exposing some of its features as manageable objects. The Java Development Kit (JDK) includes an MBean server that makes the instrumented features available over a remote protocol to a JMX Management Application. The JVM also offers management capabilities via Simple Network Monitoring Protocol (SNMP), which may be useful if you are using SMTP monitoring tools such as Nagios or Zenoss.

Connecting Remotely via JMX

By default, Cassandra runs with JMX enabled for local access only. To enable remote access, edit the file <cassandra-home>/cassandra-env.sh (or cassandra.ps1 on Windows). Search for “JMX” to find the section of the file with options to control the JMX port and other local/remote connection settings.

Within a given application, you can manage only what the application developers have made available for you to manage. Luckily, the Cassandra developers have instrumented large parts of the database engine, making management via JMX fairly straightforward.

Cassandra’s MBeans

A managed bean, or MBean, is a special type of Java bean that represents a single manageable resource inside the JVM. MBeans interact with an MBean server to make their functions remotely available. Many classes in Cassandra are exposed as MBeans, which means in practical terms that they implement a custom interface that describes attributes they expose and operations that need to be implemented and for which the JMX agent will provide hooks.

For example, let’s look at Cassandra’s CompactionManager from the org.apache.cassandra.db.compaction package and how it uses MBeans. Here’s a portion of the definition of the CompactionManagerMBean class, with comments omitted for brevity:

public interface CompactionManagerMBean
{
    public List<Map<String, String>> getCompactions();
    public List<String> getCompactionSummary();
    public TabularData getCompactionHistory();

    public void forceUserDefinedCompaction(String dataFiles);
    public void stopCompaction(String type);
    public void stopCompactionById(String compactionId);

    public int getCoreCompactorThreads();
    public void setCoreCompactorThreads(int number);

    ...
}

Some simple values in the application are exposed as attributes. An example of this is the coreCompactorThreads attribute, for which getter and setter operations are provided. Other attributes that are read-only are the current compactions in progress, the compactionSummary and the compactionHistory. You can refresh to see the most recent values, but that’s pretty much all you can do with them. Because these values are maintained internally in the JVM, it doesn’t make sense to set them externally (they’re derived from actual events, and not configurable).

MBeans can also make operations available to the JMX agent that let you execute some useful action. The forceUserDefinedCompaction() and stopCompaction() methods are operations that you can use to force a compaction to occur or stop a running compaction from a JMX client.

As you can see by this MBean interface definition, there’s no magic going on. This is just a regular interface defining the set of operations. The CompactionManager class implements this interface and does the work of registering itself with the MBean server for the JMX attributes and operations that it maintains locally:

public static final String MBEAN_OBJECT_NAME =
    "org.apache.cassandra.db:type=CompactionManager";

static
{
    instance = new CompactionManager();
    MBeanWrapper.instance.registerMBean(instance, MBEAN_OBJECT_NAME);
}

Note that the MBean is registered in the domain org.apache.cassandra.db with a type of CompactionManager. The attributes and operations exposed by this MBean appear under org.apache.cassandra.db > CompactionManager in JMX clients.

In the following sections, you’ll learn about some of the key MBeans that Cassandra exposes to allow monitoring and management via JMX. Many of these MBeans correspond to the services and managers introduced in Chapter 6. In most cases, the operations and attributes exposed by the MBeans are accessible via nodetool commands discussed throughout this book.

Database MBeans

These are the Cassandra classes related to the core database itself that are exposed to clients in the org.apache.cassandra.db domain. There are many MBeans in this domain, but we’ll focus on a few key ones related to the data the node is responsible for storing, including caching, the commit log, and metadata about specific tables.

Storage Service MBean

Because Cassandra is a database, it’s essentially a very sophisticated storage program; therefore, Cassandra’s storage engine as implemented in the org.apache.cassandra.service.StorageService is an essential focus of monitoring and management. The corresponding MBean for this service is the StorageServiceMBean, which provides many useful attributes and operations.

The MBean exposes identifying information about the node, including its host ID, the cluster name and partitioner in use. It also allows you to inspect the node’s OperationMode, which reports normal if everything is going smoothly (other possible states are leaving, joining, decommissioned, and client). These attributes are used by nodetool commands such as describecluster and info

You can also view the current set of live nodes, as well as the set of unreachable nodes in the cluster. If any nodes are unreachable, Cassandra will tell you their IP addresses in the UnreachableNodes attribute.

To get an understanding of how much data is stored on each node, you can use the getLoadMapWithPort() method, which will return to you a Java Map with keys of IP addresses with values of their corresponding storage loads. You can also use the effectiveOwnershipWithPort(String keyspace) operation to access the percentage of the data in a keyspace owned by each node. This information is used in the nodetool ring command.

If you’re looking for which nodes own a certain partition key, you can use the getNaturalEndpointsWithPort() operation. Pass it the keyspace name, table name and the partition key for which you want to find the endpoint value, and it will return a list of IP addresses (with port number) that are responsible for storing this key.

You can also use the describeRingWithPortJMX() operation to get a list of token ranges in the cluster including their ownership by nodes in the cluster. This is used by the nodetool describering operation.

There are many standard maintenance operations that the StorageServiceMBean affords you, including resumeBootstrap(), joinRing(), flush(), truncate(), repairAsync(), cleanup(), scrub(), drain(), removeNode(), decommission(), and operations to start and stop gossip, and the native transport. We’ll dig into the nodetool commands that correspond to these operations in Chapter 12.

If you want to change Cassandra’s log level at runtime without interrupting service (as you’ll see in “Logging”), you can invoke the setLoggingLevel(String classQualifier, String level) method.

Storage Proxy MBean

As you learned in Chapter 6, the org.apache.cassandra.service.StorageProxy provides a layer on top of the StorageService to handle client requests and inter-node communications. The StorageProxyMBean provides the ability to check and set timeout values for various operations including read and write. Along with many other attributes exposed by Cassandra’s MBeans, these timeout values would originally be specified in the cassandra.yaml file. Setting these attributes takes effect only until the next time the node is restarted, whereupon they’ll be initialized to the values in the configuration file.

This MBean also provides access to hinted handoff settings such as the maximum time window for storing hints. Hinted handoff statistics include getTotalHints() and getHintsInProgress(). You can disable hints for nodes in a particular data center with the disableHintsForDC() operation.

You can also turn this node’s participation in hinted handoff on or off via setHintedHandoffEnabled(), or check the current status via getHintedHandoffEnabled(). These are used by nodetool’s enablehandoff, disablehandoff, and statushandoff commands, respectively.

Hints Service MBean

In addition to the hinted handoff operations on the StorageServiceMBean, Cassandra provides more hinted handoff controls via the org.apache.cassandra.hints.HintsServiceMBean. The MBean exposes the ability to pause and resume hint delivery You can delete hints that are stored up for a specific node with deleteAllHintsForEndpoint().

Additionally, you can pause and resume hint delivery to all nodes with pauseDispatch() and resumeDispatch(). You can delete stored hints for all nodes with the deleteAllHints() operation, or for a specific node with deleteAllHintsForEndpoint(). These are used by nodetool’s pausehandoff, resumehandoff, and truncatehints commands.

Column Family Store MBean

Cassandra registers an instance of the org.apache.cassandra.db.ColumnFamilyStoreMBean for each table stored in the node under org.apache.cassandra.db > Tables (this is a legacy name: tables were known as column families in early versions of Cassandra).

The ColumnFamilyStoreMBean provides access to the compaction and compression settings for each table. This allows you to temporarily override these settings on a specific node. The values will be reset to those configured on the table schema when the node is restarted.

The MBean also exposes a lot of information about the node’s storage of data for this table on disk. The getSSTableCountPerLevel() operation provides a list of how many SStables are in each tier. The estimateKeys() operation provides an estimate of the number of partitions stored on this node. Taken together, this information can give you some insight as to whether invoking the forceMajorCompaction() operation for this table might help free space on this node and increase read performance.

There is also a trueSnapshotsSize() operation that allows you to determine the size of SSTable shapshots which are no longer active. A large value here indicates that you should consider deleting these snapshots, possibly after making an archive copy.

Because Cassandra stores indexes as tables, there is also a ColumnFamilyStoreMBean instance for each indexed column, available under org.apache.cassandra.db > IndexTables (previously IndexColumnFamilies), with the same attributes and operations.

Commit Log MBean

The org.apache.cassandra.db.commitlog.CommitLogMBean exposes attributes and operations that allow you to learn about the current state of commit logs. The CommitLogMBean also exposes the recover() operation which can be used to restore database state from archived commit log files.

The default settings that control commit log recovery are specified in the conf/commitlog_archiving.properties file, but can be overridden via the MBean. You’ll learn more about data recovery in Chapter 12.

Compaction Manager MBean

We’ve already taken a peek inside the source of the org.apache.cassandra.db.compaction.CompactionManagerMBean to see how it interacts with JMX, but we didn’t really talk about its purpose. This MBean allows us to get statistics about compactions performed in the past, and the ability to force compaction of specific SSTable files we identify by calling the forceUserDefinedCompaction method of the CompactionManager class. This MBean is leveraged by nodetool commands including compact, compactionhistory, and compactionstats.

Cache Service MBean

The org.apache.cassandra.service.CacheServiceMBean provides access to Cassandra’s key cache, row cache, and counter cache under the domain org.apache.cassandra.db > Caches. The information available for each cache includes the maximum size and time duration to cache items, and the ability to invalidate each cache.

Internal MBeans

The final MBeans we’ll consider describe internal operations of the Cassandra node, including threading, garbage collection, security, and exposing metrics.

Thread Pool MBeans

Cassandra’s thread pools are implemented via the JMXEnabledThreadPoolExecutor and JMXConfigurableThreadPoolExecutor classes in the org.apache.cassandra.concurrent package. The first class The MBeans for each stage implement the JMXEnabledThreadPoolExecutorMBean and JMXConfigurableThreadPoolExecutorMBean interfaces, respectively, which allow you to view and configure the number of core threads in each thread pool as well as the maximum number of threads. The MBeans for each type of thread pool appear under the org.apache.cassandra.internal domain to JMX clients.

Garbage Collection MBeans

The JVM’s garbage collection processing can impact tail latencies if not tuned properly, so it’s important to monitor its performance, as we’ll see in Chapter 13. The GCInspectorMXBean appears in the org.apache.cassandra.service domain. It exposes the operation getAndResetStats() which retrieves and then resets garbage collection metrics that Cassandra collects on its JVM, which is used by the nodetool gcstats command. It also exposes attributes that control the thresholds at which INFO and WARN logging messages are generated for long GC pauses.

Security MBeans

The org.apache.cassandra.auth domain defines the AuthCacheMBean, which exposes operations used to configure how Cassandra caches client authentication records. We’ll discuss this MBean in Chapter 14.

Metrics MBeans

The ability to access metrics related to application performance, health, and key activities has become an essential tool for maintaining web-scale applications. Fortunately, Cassandra collects a wide range of metrics on its own activities to help us understand the behavior. JMX supports several different styles of metrics including counters, gauges, meters, histograms and timers.

To make its metrics accessible via JMX, Cassandra uses the Dropwizard Metrics open source Java library. Cassandra uses the org.apache.cassandra.metrics.CassandraMetricsRegistry to register its metrics with the Dropwizard Metrics library, which in turn exposes them as MBeans in the org.apache.cassandra.metrics domain. You’ll see below in “Metrics” a summary of the specific metrics that Cassandra reports and learn how these can be exposed to metrics aggregation frameworks.

Monitoring with nodetool

We’ve already explored a few of the commands offered by nodetool in previous chapters, but let’s take this opportunity to get properly acquainted.

nodetool ships with Cassandra and can be found in <cassandra-home>/bin. This is a command-line program that offers a rich array of ways to look at your cluster, understand its activity, and modify it. nodetool lets you get statistics about the cluster, see the ranges each node maintains, move data from one node to another, decommission a node, and even repair a node that’s having trouble.

Behind the scenes, nodetool uses JMX to access the MBeans described above using a helper class called org.apache.cassandra.tools.NodeProbe. The NodeProbe class connects to the JMX agent at a specified node by its IP address and JMX port, locate MBeans, retrieve their data, and invoke their operations. The NodeToolCmd class in the same package is an abstract class which is extended by each nodetool command to provide access to administrative functionality in an interactive command-line interface.

nodetool uses the same environment settings as the Cassandra daemon: bin/cassandra.in.sh and conf/cassandra-env.sh on Unix (or bin/cassandra.in.bat and conf/ cassandra-env.ps1 on Windows). The logging settings are found in the conf/logbacktools.xml file; these work the same way as the Cassandra daemon logging settings found in conf/logback.xml.

Starting nodetool is a breeze. Just open a terminal, navigate to <cassandra-home>, and enter the following command:

$ bin/nodetool help

This causes the program to print a list of available commands, several of which we will cover momentarily. Running nodetool with no arguments is equivalent to the help command. You can also execute help with the name of a specific command to get additional details.

Connecting to a Specific Node

With the exception of the help command, nodetool must connect to a Cassandra node in order to access information about that node or the cluster as a whole.

You can use the -h option to identify the IP address of the node to connect to with nodetool. If no IP address is specified, the tool attempts to connect to the default port on the local machine.

If you have a ccm cluster available as discussed in Chapter 10, you can run nodetool commands against specific nodes, for example:

ccm node1 nodetool help

To get more interesting statistics from a cluster as you try out the commands in this chapter yourself, you might want to run your own instance of the Reservation Service introduced in Chapter 7 and Chapter 8.

Getting Cluster Information

You can get a variety of information about the cluster and its nodes, which we look at in this section. You can get basic information on an individual node or on all the nodes participating in a ring.

describecluster

The describecluster command prints out basic information about the cluster, including the name, snitch, and partitioner. For example, here’s a portion of the output when run against the cluster created for the Reservation Service using ccm:

$ ccm node1 nodetool describecluster

Cluster Information:
	Name: reservation_service
	Snitch: org.apache.cassandra.locator.SimpleSnitch
	DynamicEndPointSnitch: enabled
	Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
	Schema versions:
		2b88dbfd-6e40-3ef1-af11-d88b6dff2c3b: [127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]
...

We’ve shortened the output a bit for brevity. The “Schema versions” portion of the output is especially important for identifying any disagreements in table definitions, or “schema,” between nodes. While Cassandra propagates schema changes through a cluster, any differences are typically resolved quickly, so any lingering schema differences usually correspond a node that is down or unreachable that needs to be restarted, which you should be able to confirm via the summary statistics on nodes that are also printed out.

status

A more direct way to identify the nodes in your cluster and what state they’re in, is to use the status command:

$ ccm node1 nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load        Tokens  Owns (effective)  Host ID                               Rack
UN  127.0.0.1  251.77 KiB  256     48.7%             d23716cb-a6e2-423a-a387-f26f5b623ae7  rack1
UN  127.0.0.2  250.28 KiB  256     50.0%             635f2ab7-e81a-423b-a566-674d8010c819  rack1
UN  127.0.0.3  250.47 KiB  256     53.6%             a1cd5663-02a3-4038-8c34-0ce6de397da7  rack1
UN  127.0.0.4  403.46 KiB  256     47.7%             b493769e-3dce-47d7-bae6-1bed956cf560  rack1

The status is organized by data center and rack. Each node’s status is identified by a two-character code, in which the first character indicates whether the node is up (currently available and ready for queries) or down, and the second character indicates the state or operational mode of the node. The load column represents the byte count of the data each node is holding. The owns column indidates the effective percentage of the token range owned by the node, taking replication into account.

info

The info command tells nodetool to connect with a single node and get basic data about its current state. Just pass it the address of the node you want info for:

$ ccm node2 nodetool info

ID                     : 635f2ab7-e81a-423b-a566-674d8010c819
Gossip active          : true
Native Transport active: true
Load                   : 250.28 KiB
Generation No          : 1574894395
Uptime (seconds)       : 146423
Heap Memory (MB)       : 191.69 / 495.00
Off Heap Memory (MB)   : 0.00
Data Center            : datacenter1
Rack                   : rack1
Exceptions             : 0
Key Cache              : entries 10, size 896 bytes, capacity 24 MiB, 32 hits, 44 requests, 0.727 recent hit rate, 14400 save period in seconds
Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 12 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache            : entries 16, size 256 KiB, capacity 91 MiB, 772 misses, 841 requests, 0.082 recent hit rate, NaN microseconds miss latency
Percent Repaired       : 100.0%
Token                  : (invoke with -T/--tokens to see all 256 tokens)

The information reported includes the memory and disk usage (“Load”) of the node and the status of various services offered by Cassandra. You can also check the status of individual services by the nodetool commands statusgossip, statusbinary, and statushandoff (note that handoff status is not part of info).

ring

To determine what nodes are in your ring and what state they’re in, use the ring command on nodetool, like this:

$ Datacenter: datacenter1
==========
Address    Rack        Status State   Load            Owns                Token
                                                                          9218490134647118760
127.0.0.1  rack1       Up     Normal  251.77 KiB      48.73%              -9166983985142207552
127.0.0.4  rack1       Up     Normal  403.46 KiB      47.68%              -9159867377343852899
127.0.0.2  rack1       Up     Normal  250.28 KiB      49.99%              -9159653278489176223
127.0.0.1  rack1       Up     Normal  251.77 KiB      48.73%              -9159520114055706114
...

This output is organized in terms of vnodes. Here we see the IP addresses of all the nodes in the ring. In this case, we have three nodes, all of which are up (currently available and ready for queries). The load column represents the byte count of the data each node is holding. The output of the describering command is similar but is organized around token ranges.

Other useful status commands provided by nodetool include:

  • The getLoggingLevels and setLoggingLevels commands allow dynamic configuration of logging levels, using the Logback JMXConfiguratorMBean we discussed previously.

  • The gossipinfo command prints the information this node disseminates about itself and has obtained from other nodes via gossip, while failuredetector provides the Phi failure detection value calculated for other nodes.

  • The version command prints the version of Cassandra this node is running.

Getting Statistics

nodetool also lets you gather statistics about the state of your server in the aggregate level as well as down to the level of specific keyspaces and tables. Two of the most frequently used commands are tpstats and tablestats, both of which we examine now.

Using tpstats

The tpstats tool gives us information on the thread pools that Cassandra maintains. Cassandra is highly concurrent, and optimized for multiprocessor/multicore machines, so understanding the behavior and health of the thread pools is important to good Cassandra maintenance.

To find statistics on the thread pools, execute nodetool with the tpstats command:

$ bin/nodetool tpstats
ccm node1 nodetool tpstats

Pool Name                         Active   Pending      Completed   Blocked  All time blocked
ReadStage                              0         0            399         0                 0
MiscStage                              0         0              0         0                 0
CompactionExecutor                     0         0          95541         0                 0
MutationStage                          0         0              0         0                 0
...

Message type           Dropped                  Latency waiting in queue (micros)
                                             50%               95%               99%               Max
READ_RSP                     0              0.00              0.00              0.00              0.00
RANGE_REQ                    0              0.00              0.00              0.00              0.00
PING_REQ                     0              0.00              0.00              0.00              0.00
_SAMPLE                      0              0.00              0.00              0.00              0.00

The top portion of the output presents data on tasks in each of Cassandra’s thread pools. You can see directly how many operations are in what stage, and whether they are active, pending, or completed. For example, by reviewing the number of active tasks in the MutationStage, you can learn how many writes are in progress.

The bottom portion of the output indicates the number of dropped messages for the node. Dropped messages are an indicator of Cassandra’s load shedding implementation, which each node uses to defend itself when it receives more requests than it can handle. For example, internode messages that are received by a node but not processed within the rpc_timeout are dropped, rather than processed, as the coordinator node will no longer be waiting for a response.

Seeing lots of zeros in the output for blocked tasks and dropped messages means that you either have very little activity on the server or that Cassandra is doing an exceptional job of keeping up with the load. Lots of non-zero values is indicative of situations where Cassandra is having a hard time keeping up, and may indicate a need for some of the techniques described in Chapter 13.

Using tablestats

To see overview statistics for keyspaces and tables, you can use the tablestats command. You may also recognize this command from its previous name cfstats. Here is sample output on the reservations_by_confirmation table:

$ ccm node1 nodetool tablestats reservation.reservations_by_confirmation

Total number of tables: 43
----------------
Keyspace : reservation
	Read Count: 0
	Read Latency: NaN ms
	Write Count: 0
	Write Latency: NaN ms
	Pending Flushes: 0
		Table: reservations_by_confirmation
		SSTable count: 0
		Old SSTable count: 0
		Space used (live): 0
		Space used (total): 0
		Space used by snapshots (total): 0
		Off heap memory used (total): 0
		SSTable Compression Ratio: -1.0
		Number of partitions (estimate): 0
		Memtable cell count: 0
		Memtable data size: 0
		Memtable off heap memory used: 0
		Memtable switch count: 0
		Local read count: 0
		Local read latency: NaN ms
		Local write count: 0
		Local write latency: NaN ms
		Pending flushes: 0
		Percent repaired: 100.0
		Bloom filter false positives: 0
		Bloom filter false ratio: 0.00000
		Bloom filter space used: 0
		Bloom filter off heap memory used: 0
		Index summary off heap memory used: 0
		Compression metadata off heap memory used: 0
		Compacted partition minimum bytes: 0
		Compacted partition maximum bytes: 0
		Compacted partition mean bytes: 0
		Average live cells per slice (last five minutes): NaN
		Maximum live cells per slice (last five minutes): 0
		Average tombstones per slice (last five minutes): NaN
		Maximum tombstones per slice (last five minutes): 0
		Dropped Mutations: 0

We can see the read and write latency and total number of reads and writes. We can also see detailed information about Cassandra’s internal structures for the table, including memtables, Bloom filters and SSTables. You can get statistics for all the tables in a keyspace by specifying just the keyspace name, or specify no arguments to get statistics for all tables in the cluster.

Virtual Tables

In the 4.0 release, Cassandra added a virtual tables feature. Virtual tables are so named because they are not actual tables that are stored using Cassandra’s typical write path with data written to memtables and SSTables. Instead, these virtual tables are views that provide metadata about nodes and tables via standard CQL.

This metadata is available via two keyspaces which you may have noticed in earlier chapters when we used the DESCRIBE KEYSPACES command, called system_views and system_virtual_schema:

cqlsh> DESCRIBE KEYSPACES;

reservation    system_traces  system_auth  system_distributed     system_views
system_schema  system       system_virtual_schema

These two keyspaces contain virtual tables that provide different types of metadata. Before we look into them, here are a couple of important things you should know about virtual tables:

  • You may not define your own virtual tables.

  • The scope of virtual tables is the local node.

  • When interacting with Virtual Tables through cqlsh, results will come from the node that cqlsh connected to, as you’ll see next.

  • Virtual tables are not persisted, so any statistics will be reset when the node restarts

System Virtual Schema

Let’s look first at the tables in the system_virtual_schema:

cqlsh:system> USE system_virtual_schema;
cqlsh:system_virtual_schema> DESCRIBE TABLES;

keyspaces  columns  tables

If you examine the schema and contents of the keyspaces table, you’ll see that the schema of this table is extremely simple - it’s just a list of keyspace names.

cqlsh:system_virtual_schema> SELECT * FROM KEYSPACES;

 keyspace_name
-----------------------
           reservation
          system_views
 system_virtual_schema

(3 rows)

The design of the tables table is quite similar, consisting of keyspace_name, table_name and comment columns, in which the primary key is (keyspace_name, table_name).

The columns table is more interesting. We’ll focus on a subset of the available columns:

cqlsh:system_virtual_schema> SELECT column_name, clustering_order, kind, position, type
  FROM columns WHERE keyspace_name = 'reservation' and table_name = 'reservations_by_hotel_date';

 column_name         | clustering_order | kind          | position | type
---------------------+------------------+---------------+----------+----------
 confirmation_number |             none |       regular |       -1 |     text
            end_date |             none |       regular |       -1 |     date
            guest_id |             none |       regular |       -1 |     uuid
            hotel_id |             none | partition_key |        0 |     text
         room_number |              asc |    clustering |        0 | smallint
          start_date |             none | partition_key |        1 |     date

(6 rows)

As you can see, this query provides enough data to describe the schema of a table including the columns and primary key definition. Although it does not include the table options, the response is otherwise quite similar in content to the output of the cqlsh DESCRIBE operations.

Interestingly, cqlsh traditionally scanned tables in the system keyspace to implement these operations but is updated in the 4.0 release to use virtual tables.

System Views

The second keyspace containing virtual tables is the system_views keyspace. Let’s get a list of the available views:

cqlsh:system_virtual_schema> SELECT * from tables where keyspace_name = 'system_views';

 keyspace_name | table_name                | comment
---------------+---------------------------+-----------------------------
  system_views |                    caches |               system caches
  system_views |                   clients | currently connected clients
  system_views |  coordinator_read_latency |
  system_views |  coordinator_scan_latency |
  system_views | coordinator_write_latency |
  system_views |                disk_usage |
  system_views |         internode_inbound |
  system_views |        internode_outbound |
  system_views |        local_read_latency |
  system_views |        local_scan_latency |
  system_views |       local_write_latency |
  system_views |        max_partition_size |
  system_views |             rows_per_read |
  system_views |                  settings |            current settings
  system_views |             sstable_tasks |       current sstable tasks
  system_views |              thread_pools |
  system_views |       tombstones_per_read |

(17 rows)

As you can see, there is a mix of tables that provide latency histograms for reads and writes to local storage and when this node is acting as a coordinator.

The max_partition_size and tombstones_per_read tables are particularly useful in helping to identify some of the situations that lead to poor performance in Cassandra clusters, which we’ll address in Chapter 12.

The disk_usage view provides the storage expressed in mebibytes (1,048,576 bytes). Again, remember this is how much storage for each table on that individual node. Related to this is the max_partition_size. This could be useful in determining if a node is affected by a wide partition. We’ll learn more about how to detect and address these in Chapter 13.

Let’s look a bit more closely at a couple of these tables. First, let’s have a look at the clients table:

cqlsh:system_virtual_schema> USE system_views;
cqlsh:system_views> SELECT * FROM clients;

 address   | port  | connection_stage | driver_name | driver_version | hostname  | protocol_version | request_count | ssl_cipher_suite | ssl_enabled | ssl_protocol | username
-----------+-------+------------------+-------------+----------------+-----------+------------------+---------------+------------------+-------------+--------------+-----------
 127.0.0.1 | 50631 |            ready |        null |           null | localhost |                4 |           261 |             null |       False |         null | anonymous
 127.0.0.1 | 50632 |            ready |        null |           null | localhost |                4 |           281 |             null |       False |         null | anonymous

As you can see, this table provides information about each client with an active connection to the node including its location and number of requests. This information is useful to make sure the list of clients and their level of usage is in line with what you expect for your application.

Another useful table is settings. This allows you to see the values of configurable parameters for the node set via the cassandra.yaml file or subsequently modified via JMX:

cqlsh:system_views> SELECT * FROM settings LIMIT 10;

 name                                         | value
----------------------------------------------+--------------------------------------------------------------------
                 allocate_tokens_for_keyspace |                                                               null
 allocate_tokens_for_local_replication_factor |                                                               null
         audit_logging_options_audit_logs_dir | /Users/jeffreycarpenter/.ccm/reservation_service/node1/logs/audit/
                audit_logging_options_enabled |                                                              false
    audit_logging_options_excluded_categories |
     audit_logging_options_excluded_keyspaces |                         system,system_schema,system_virtual_schema
         audit_logging_options_excluded_users |
    audit_logging_options_included_categories |
     audit_logging_options_included_keyspaces |
         audit_logging_options_included_users |

(10 rows)

You’ll notice that much of this same information could be accessed via various nodetool commands. However, the value of virtual tables is that they may be accessed through any client using the CQL native protocol, including applications you write using the DataStax Java Drivers. Of course, some of these values you may not wish to allow your clients access to; we’ll discuss how to secure access to keyspaces and tables in Chapter 14.

More Virtual Table Functionality

While the 4.0 release provides a very useful set of virtual tables, the Cassandra community has come up with many other virtual table requests, which you can find in the Cassandra Jira:

We expect that the data available via virtual tables will eventually catch up with JMX, and even surpass it in some areas.

If you’re interested in the implementation of virtual tables, you can find the code in the org.apache.cassandra.db.virtual package. For more detail on using virtual tables, see Alexander Dejanovski’s blog post Virtual tables are coming in Cassandra 4.0.

Metrics

As we mentioned at the start of this chapter, metrics are a vital component of the observability of the system. It’s important to have access to metrics at the OS, JVM, and application level. The metrics Cassandra reports at the application level include:

  • Buffer pool metrics describing Cassandra’s use of memory.

  • CQL metrics including the number of prepared and regular statement executions.

  • Cache metrics for key, row, and counter caches such as the number of entries versus capacity, as well as hit and miss rates.

  • Client metrics including the number of connected clients, and information about client requests such as latency, failures, and timeouts.

  • Commit log metrics including the commit log size and statistics on pending and completed tasks.

  • Compaction metrics including the total bytes compacted and statistics on pending and completed compactions.

  • Connection metrics to each node in the cluster including gossip.

  • Dropped message metrics which are used as part of nodetool tpstats.

  • Read repair metrics describing the number of background versus blocking read repairs performed over time.

  • Storage metrics, including counts of hints in progress and total hints.

  • Streaming metrics, including the total incoming and outgoing bytes of data streamed to other nodes in the cluster

  • Thread pool metrics, including active, completed, and blocked tasks for each thread pool.

  • Table metrics, including caches, memtables, SSTables, and Bloom filter usage and the latency of various read and write operations, reported at one, five, and fifteen minute intervals.

  • Keyspace metrics that aggregate the metrics for the tables in each keyspace.

Many of these metrics are used by nodetool commands such as tpstats, tablehistograms, and proxyhistograms. For example, tpstats is simply a presentation of the thread pool and dropped message metrics.

Resetting Metrics

Note that in Cassandra releases through 4.0, the metrics reported are lifetime metrics since the node was started. To reset the metrics on a node, you have to restart it. The JIRA issue CASSANDRA-8433 requests the ability to reset the metrics via JMX and nodetool.

Logging

While you can learn a lot about the overall health of your cluster from metrics, logging provides a way to get more specific detail about what’s happening in your database so that you can investigate and troubleshoot specific issues. Cassandra uses the Simple Logging Facade for Java (SLF4J) API for logging, with Logback as the implementation. SLF4J provides a facade over various logging frameworks such as Logback, Log4J, and Java’s built-in logger (java.util.logging). You can learn more about Logback at http://logback.qos.ch/. Cassandra’s default logging configuration is found in the file <cassandra-home>/conf/logback.xml

The Log4J API is built around the concepts of loggers and appenders. Each class in a Java application has a dedicated logger, plus there are loggers for each level of the package hierarchy as well as a root logger. This allows fine grained control over logging; you can configure the log level for a particular class or any level in the package hierarchy, or even the root level. Log4J uses a progression of log levels: ALL < DEBUG < INFO < WARN < ERROR < FATAL < OFF. When you configure a log level for a logger, messages at that log level and greater will be output via appenders (which we’ll introduce below). You can see how the logging level for Cassandra’s classes is set in the logback.xml file:

  <logger name="org.apache.cassandra" level="DEBUG"/>

Note that the root logger defaults to the INFO logging level, so that is the level at which all other classes that use Log4J will report.

An appender in Log4J is responsible for taking generated log messages that match a provided filter and outputting them to some location. According to the default configuration found in logback.xml, Cassandra provides appenders for logging into three different files:

  • The system.log contains logging messages at the INFO logging level and greate. You’ve already seen some of the contents of the system.log in Chapter 10 as you were starting and stopping a node, so you know that this log will contain information about nodes joining and leaving a cluster. It also contains information about schema changes.

  • The debug.log contains more detailed messages useful for debugging, incorporating the DEBUG log level and above. This log can be pretty noisy but provides a lot of useful information about internal activity within a node, including memtable flushing and compaction.

  • The gc.log contains messages related to the JVM’s garbage collection. This is a standard Java GC log file and is particularly useful for identifying long GC pauses. We’ll discuss GC tuning in Chapter 13.

The default configuration also describes an appender for the console log, which you can access in the terminal window where you start Cassandra by setting the -f flag (to keep output visible in the foreground of the terminal window).

By default, Cassandra’s log files are stored in the logs directory underneath the Cassandra installation directory. If you want to change the location of the logs directory, you can override this value using the CASSANDRA_LOG_DIR environment variable when starting Cassandra, or you could edit the logback.xml file directly.

The default configuration does not pick up changes to the logging settings on a live node. You can ask Logback to rescan the configuration file once a minute, by setting properties in the logback.xml file:

<configuration scan="true" scanPeriod="60 seconds">

You may also view the log levels on a running node through the nodetool getlogginglevels command and override the log level for the logger at any level of the Java package and class hierarchy using nodetool setlogginglevel.

Other settings in the logback.xml file support rolling log files. By default, the logs are configured to use the SizeAndTimeBasedRollingPolicy. Each log file will be rolled to an archive once it reaches a size of 50 MB or at midnight, whichever comes first, with a maximum of 5GB across all system logs. For example, look at the configuration of the rolling policy for the system.log:

<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
  <fileNamePattern>${cassandra.logdir}/system.log.%d{yyyy-MM-dd}.%i.zip</fileNamePattern>
  <maxFileSize>50MB</maxFileSize>
  <maxHistory>7</maxHistory>
  <totalSizeCap>5GB</totalSizeCap>
</rollingPolicy>

Each log file archive is compressed in zip format and named according to the pattern described above, which will lead to files named system.log.2020-05-30.0.zip, system.log.2020-05-30.1.zip, and so on. These archived log files are kept for 7 days by default. The default settings may well be suitable for development and production environments; just make sure you account for the storage that will be required in your planning.

Examining Log Files

You can examine log files in order to determine things that are happening with your nodes. One of the most important tasks in monitoring Cassandra is to regularly check log files for statements at the WARN and ERROR log levels. Several of the conditions under which Cassandra generates WARN log messages are configurable via the cassandra.yaml file:

  • The tombstone_warn_threshold property sets the maximum number of tombstones that Cassandra will encounter on a read before generating a warning. This value defaults to 1000.

  • The batch_size_warn_threshold_in_kb property sets the maximum size of the total data in a batch command, which is useful for detecting clients that might be trying to insert a quantity of data in a batch that will negatively impact the coordinator node performance. The default value is 5kb.

  • The gc_warn_threshold_in_ms property sets the maximum garbage collection pause that will cause a warning log. This defaults to 1000ms (1 second), and the corresponding setting for INFO log messages gc_log_threshold_in_ms is set at 200ms.

Here’s an example of a message you might find in the logs for a query that encounters a large number of tombstones:

WARN  [main] 2020-04-08 14:30:45,111 ReadCommand.java:598 - Read 0 live rows and 3291 tombstone cells for query SELECT * FROM reservation.reservations_by_hotel_date (see tombstone_warn_threshold)

Log Aggregation and Distributed Tracing

As with the use of metrics aggregation, it’s also frequently helpful to aggregate logs across multiple microservices and infrastructure components in order to analyze threads of related activity correlated by time. There are many commercial log aggregation solutions available, and the ELK stack consisting of Elasticsearch, Logstash and Kibana is a popular combination of open source projects used for log aggregation and analysis.

An additional step beyond basic aggregation is the ability to perform distributed traces of calls throughout a system. This involves incorporating a correlation ID into metadata passed on remote calls or messages between services. The correlation ID is a unique identifier, typically assigned by a service at the entry point into the system. The correlation ID can be used as a search criteria through aggregated logs to identify work performed across a system associated with a particular request. You’ll learn more about tracing with Cassandra in Chapter 13.

You can also observe the regular operation of the cluster through the log files. For example, you could connect to a node in a cluster started using ccm as described in Chapter 10 and write a simple value to the database using cqlsh:

$ ccm node2 cqlsh
cqlsh> INSERT INTO reservation.reservations_by_confirmation (confirmation_number, hotel_id, start_date, end_date, room_number, guest_id) VALUES ('RS2G0Z', 'NY456', '2020-06-08', '2020-06-10', 111, 1b4d86f4-ccff-4256-a63d-45c905df2677);

If you execute this command, cqlsh will use node2 as the coordinator (a feature of cqlsh and enabled by the Python driver), so we can check the logs for node2

$ tail ~/.ccm/reservation_service/node2/logs/system.log
INFO  [Messaging-EventLoop-3-5] 2019-12-07 16:00:45,542 OutboundConnection.java:1135 - 127.0.0.2:7000(127.0.0.3:7000)->127.0.0.3:7000(127.0.0.3:62706)-SMALL_MESSAGES-627a8d80 successfully connected, version = 12, framing = CRC, encryption = disabled
INFO  [Messaging-EventLoop-3-10] 2019-12-07 16:00:45,545 OutboundConnection.java:1135 - 127.0.0.2:7000(127.0.0.4:7000)->127.0.0.4:7000(127.0.0.4:62707)-SMALL_MESSAGES-5bc34c55 successfully connected, version = 12, framing = CRC, encryption = disabled
INFO  [Messaging-EventLoop-3-8] 2019-12-07 16:00:45,593 InboundConnectionInitiator.java:450 - 127.0.0.1:7000(127.0.0.2:62710)->127.0.0.2:7000-SMALL_MESSAGES-9e9a00e9 connection established, version = 12, framing = CRC, encryption = disabled
INFO  [Messaging-EventLoop-3-7] 2019-12-07 16:00:45,593 InboundConnectionInitiator.java:450 - 127.0.0.3:7000(127.0.0.2:62709)->127.0.0.2:7000-SMALL_MESSAGES-e037c87e connection established, version = 12, framing = CRC, encryption = disabled

This output shows connections initiated from node2 to the other nodes in the cluster to write replicas, and the corresponding responses. If you examine the debug.log, you’ll see similar information, but not the details of the specific query that was executed.

Full Query Logging

If you want more detail on exact CQL query strings that are used by client applications, use the full query logging feature introduced in Cassandra 4.0. The full query log is a binary log which is designed to be extremely fast and add the minimum possible overhead to your queries. Full query logging is also useful for live traffic capture and replay.

To enable full query logging on a node, create a directory to hold the logs and then set the full_query_logging_options in the cassandra.yaml file to point to the directory:

full_query_logging_options:
  log_dir: /var/tmp/fql_logs

Other configuration options allow you to control how often the log is rolled over to a new file (hourly by default), specify a command used to archive the log files, and to set a limit for full query logs. The full query log will not be enabled until you run the nodetool enablefullquerylog command.

Cassandra provides a tool to read the logs under the tools/bin/fqltool directory. Here’s an example of what the output looks like after running some simple queries:

$ tools/bin/fqltool dump /var/tmp/fql_logs
Type: single-query
Query start time: 1575842591188
Protocol version: 4
Generated timestamp:-9223372036854775808
Generated nowInSeconds:1575842591
Query: INSERT INTO reservation.reservations_by_confirmation (confirmation_number, hotel_id, start_date, end_date, room_number, guest_id) VALUES ('RS2G0Z', 'NY456', '2020-06-08', '2020-06-10', 111, 1b4d86f4-ccff-4256-a63d-45c905df2677);
Values:

Type: single-query
Query start time: 1575842597849
Protocol version: 4
Generated timestamp:-9223372036854775808
Generated nowInSeconds:1575842597
Query: SELECT * FROM reservation.reservations_by_confirmation ;
Values:

Once you’re done collecting full query logs, run the nodetool disablefullquerylog command.

Summary

In this chapter, we looked at ways you can monitor and manage your Cassandra cluster. In particular, we went into some detail on JMX and learned the rich variety of operations Cassandra makes available to the MBean server. We saw how to use nodetool, virtual tables, metrics, and logs to view what’s happening in your Cassandra cluster. You are now ready to learn how to perform routine maintenance tasks to help keep your Cassandra cluster healthy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.93.178.221