RBDs and erasure-coded pools

When using RBDs stored in erasure-coded pools, to maintain the best performance, you should try to generate full stripe writes wherever possible. When an erasure-coded pool performs a full stripe write, the operation can be done via a single IO and not have the penalties associated with the read-modify-write cycle with partial writes.

The RBD clients have some intelligence where they will issue RADOS, thus writing full commands if they detect that the higher-level client IO is overwriting an entire object. Making sure that the filesystem on top of the RBD is formatted with the correct stripe alignment is important to ensure that as many write fulls are generated as possible.

An example of formatting an XFS filesystem on an RBD on a 4 + 2 EC pool is as follows:

mkfs.xfs /dev/rbd0 -d su=1m,sw=4

This would instruct XFS to align allocations that are best suited for the 4x1 MB shards that make up a 4 MB object stored on a 4 + 2 erasure pool.

Additionally, if the use case requires the direct mounting of RBDs to a Linux server rather than through a QEMU/KVM virtual machine, it is also worth considering using rbd-nbd. The userspace RBD client makes use of librbd, whereas the kernel RBD client relies fully on the Ceph code present in the running kernel.

Not only does librbd mean that you can use the latest features, which may not be present in the running kernel, but it also has the additional feature of a writeback cache. The writeback cache performs a much better job of coalescing writes into full-sized object writes than the kernel client is capable of and so less performance overhead is incurred. Keep in mind that the writeback cache in librbd is not persistent, so any synchronous writes will not benefit.

Table of Contents for RBDs and erasure-coded pools

Create new playlist

Sign In

Sign Up

Table of Contents for
RBDs and erasure-coded pools