I would like to start this chapter by showing you some numbers published by Jeff Dean (a Google fellow, http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf):
Operation |
Time taken | |
---|---|---|
Send 1K bytes over 1 Gbps network |
10,000 ns |
0.01 ms |
Read 4K randomly from SSD* |
150,000 ns |
0.15 ms |
Read 1 MB sequentially from memory |
250,000 ns |
0.25 ms |
Round trip within the same datacenter |
500,000 ns |
0.5 ms |
Read 1 MB sequentially from SSD* |
1,000,000 ns |
1 ms |
Disk seek |
10,000,000 ns |
10 ms |
Send packet CA->Netherlands->CA |
150,000,000 ns |
150 ms |
The preceding table tells us the average cost of a system call performed to complete an operation. Typically, a read/write request in Cassandra involves multiple of the above operations.
To understand the memory requirements for Cassandra, it's important to know that Cassandra is a Java-based service that uses a JVM heap to create temporary objects. Cassandra also uses the heap for its in-memory data structures. Cassandra relies on the OS kernel to manage the page cache of the frequently used file blocks. Most OS kernels have intelligent (multiple) ways to figure out the block of the files that will be accessed by the application and those that can be evicted from its cache.
There are two main functions of any Cassandra node: one is to coordinate the client requests and the other to serve data. The coordinator is a simple proxy, which sends data requests or updates to the nodes that have data and waits for their responses. To achieve quorum, it waits for the N/2 + 1 nodes, or it waits for the required nodes as per the consistency levels. Every node in the cluster handles both of these functions; the coordinator contains the most recent information about the cluster via gossip.
18.116.65.130