Alternative caching mechanisms

While the native RADOS tiering functionality provides numerous benefits around flexibility and allows management by the same Ceph toolset. However, it cannot be denied that for pure performance RADOS tiering lags behind other caching technologies that typically function at the block device level.

Bcache is a block device cache in the Linux kernel, which can use a SSD to cache a slower block device such as a spinning disk.

Bcache is one example of a popular way of increasing the performance of Ceph with SSDs. Unlike RADOS tiering, which you can choose which pool you wish to cache, with bcache the entire OSD is cached. This method of caching brings a number of advantages around performance. The first is that the OSD itself has a much more consistent latency response due to the SSD caching. Filestore adds an increased amount of random I/O to every Ceph request regardless of whether the Ceph request is random of sequential in nature. Bcache can absorb these random I/Os and allow the spinning disk to perform a larger amount of sequential I/O. This can be very helpful during high periods of utilization where normal spinning disk OSDs would start to exhibit high latency. Second, where RADOS tiering operates at the size of the object stored in the pool, which is 4 MB by default for RBD workloads. Bcache caches data in much smaller blocks; this allows it to make better use of available SSD space and also suffer less from promotion overheads.

The SSD capacity assigned to bcache will also be used as a read cache for hot data; this will improve read performance as well as writes. Since bcache will only be using this capacity for read caching, it will only store one copy of the data and so will have 3x more read cache capacity than compared with using the same SSD in a RADOS-tiered pool.

However, there are a number of disadvantages to using bcache that make using RADOS cache pools still look attractive. As mentioned earlier, bcache will cache the entire OSD, in some cases where multiple pools may reside on the same OSDs, this behavior maybe undesirable. Also, once bcache has been configured with SSD and HDD, it is harder to expand the amount of cache if needed in the future. This also applies if your cluster does not currently have any form of caching; in this scenario, introducing bcache would be very disruptive. With RADOS tiering, you can simply add additional SSDs or specifically designed SSD nodes to add or expand the top tier as and when needed.

Another approach is to place the spinning disk OSDs behind a RAID controller with battery backed write back cache. The RAID controller performs a similar role to bcache and absorbs a lot of the random write I/O relating to filestore's extra metadata. Both latency and sequential write performance will increase as a result, read performance will unlikely increase however due to the relatively small size of the RAID controllers cache. Using a RAID controller, the OSD's journal can also be placed directly on the disk instead of using a separate SSD. By doing this, journal writes are absorbed by the RAID controllers cache and will improve the random performance of the journal, as likely most of the time, the journals contents will just be sitting in the controllers cache. Care does need to be taken though, as if the incoming write traffic exceeds the capacity of the controllers cache, journal contents will start being flushed to disk, and performance will degrade. For best performance a separate SSD or NVMe should be used for the filestore journal although attention should be paid to the cost of using both a RAID controller with sufficient performance and cache, in addition to the cost of the SSDs.

Both methods have their merits and should be considered before implementing caching in your cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.123.106