Cassandra in-memory data structures

To understand how Cassandra utilizes the available system memory, it is important to understand its in-memory data structures. Cassandra's in-memory data structures are as follows.

Index summary

It is expensive to have the whole index in memory because of its size. Index is a map of row keys and SSTable locations where the actual data resides.

Bloom filter

A bloom filter is the in-memory structure per SSTable, which helps Cassandra avoid a disk seek before the index is scanned for the data bloom filter. It is consulted and checked if the row is present. A bloom filter returns a Boolean advice irrespective of whether the data is in the SSTable or not. It can have a false-positive too. If this happens, we will read the SSTable and return without any row, which is fine since it's an optimization. The bloom filter's false-positive chance can be set in the column family level.

Compression metadata

Compression metadata denotes the start of the compressed block. This metadata is required to be maintained for us to uncompress a block starting from the beginning of the block to read the row.

Cassandra (starting from Version 1.2) uses off-heap memory to store bloom filters, index summary (starting from Version 2.0), and compression metadata. Maintaining the whole index in memory will help speed up the reads; we expect OS to maintain this in its file buffer cache. Since the index is accessed fairly often with reads, there is a better chance for OS to keep the file in memory. In my experience, setting the Java heap from 4 to 8 GB and new generation from 1 to 2 GB gives optimal performance.

The best way to find the right configuration is by testing a few of the custom settings; Visual VM (http://visualvm.java.net/) and jmap (part of JVM, http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html) are your friends.

SSDs versus spinning disks

Cassandra's log structure storage is perfect for spinning as well as SSD drives. When data is read or written to a disc drive, the RW head of the disc needs to move to the position of the disk. The seek time can differ for a given disc due to the varying distance from the start point to where the RW head has been instructed to go. Once the disc seek is complete, the following sequential reads/writes will be faster.

A common problem with SSDs is that frequent updates to a location in the disk can cause a shortening of the time for which it can operate reliably; this is called write amplification. Cassandra inherently uses log-structured storage for its persistence, which is perfect for SSDs. SSDs also eliminate the higher seek times seen in traditional drives.

There are a lot of use cases that are good for spinning drives:

  • Reads to writes ratio is low
  • Entire data set can fit in the page cache

SSDs are good for most use cases if the project can offer it; consider the following cases:

  • Lots of updates that in turn translates to lots of I/O required for compaction
  • Strict SLA requirements on reads
  • Column family needs a lot of I/O for compactions

The filesystem cache minimizes the disk seeks by using the system memory to access the frequently used data sets.

Key cache

Cassandra uses key caches to avoid disk seeks in addition to all other data structures. When a row is looked-up, Cassandra queries multiple SSTables to resolve the row. Caching the key along with an SSTable offset pointer will allow Cassandra to directly go to the location in the file. Since Version 1.2 column indexes are moved into SStable indexes, it is also cached in the key cache. Hence, the reads are exactly one disk seek when cached.

Row cache

Row caches in Cassandra are not query caches; we try to cache the whole row in memory when a query is executed on a row.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.61.218