To understand how Cassandra utilizes the available system memory, it is important to understand its in-memory data structures. Cassandra's in-memory data structures are as follows.
It is expensive to have the whole index in memory because of its size. Index is a map of row keys and SSTable locations where the actual data resides.
A bloom filter is the in-memory structure per SSTable, which helps Cassandra avoid a disk seek before the index is scanned for the data bloom filter. It is consulted and checked if the row is present. A bloom filter returns a Boolean advice irrespective of whether the data is in the SSTable or not. It can have a false-positive too. If this happens, we will read the SSTable and return without any row, which is fine since it's an optimization. The bloom filter's false-positive chance can be set in the column family level.
Compression metadata denotes the start of the compressed block. This metadata is required to be maintained for us to uncompress a block starting from the beginning of the block to read the row.
Cassandra (starting from Version 1.2) uses off-heap memory to store bloom filters, index summary (starting from Version 2.0), and compression metadata. Maintaining the whole index in memory will help speed up the reads; we expect OS to maintain this in its file buffer cache. Since the index is accessed fairly often with reads, there is a better chance for OS to keep the file in memory. In my experience, setting the Java heap from 4 to 8 GB and new generation from 1 to 2 GB gives optimal performance.
The best way to find the right configuration is by testing a few of the custom settings; Visual VM (http://visualvm.java.net/) and jmap (part of JVM, http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html) are your friends.
Cassandra's log structure storage is perfect for spinning as well as SSD drives. When data is read or written to a disc drive, the RW head of the disc needs to move to the position of the disk. The seek time can differ for a given disc due to the varying distance from the start point to where the RW head has been instructed to go. Once the disc seek is complete, the following sequential reads/writes will be faster.
A common problem with SSDs is that frequent updates to a location in the disk can cause a shortening of the time for which it can operate reliably; this is called write amplification. Cassandra inherently uses log-structured storage for its persistence, which is perfect for SSDs. SSDs also eliminate the higher seek times seen in traditional drives.
There are a lot of use cases that are good for spinning drives:
SSDs are good for most use cases if the project can offer it; consider the following cases:
The filesystem cache minimizes the disk seeks by using the system memory to access the frequently used data sets.
Cassandra uses key caches to avoid disk seeks in addition to all other data structures. When a row is looked-up, Cassandra queries multiple SSTables to resolve the row. Caching the key along with an SSTable offset pointer will allow Cassandra to directly go to the location in the file. Since Version 1.2 column indexes are moved into SStable indexes, it is also cached in the key cache. Hence, the reads are exactly one disk seek when cached.
3.22.61.218