Overwrites on erasure code pools with Kraken

Introduced for the first time in the Kraken release of Ceph as an experimental feature was the ability to allow partial overwrites on erasure-coded pools. Partial overwrite support allows RBD volumes to be created on erasure-coded pools, making better use of the raw capacity of the Ceph cluster.

In parity RAID, where a write request doesn't span the entire stripe, a read-modify-write operation is required. This is needed as the modified data chunks will mean the parity chunk is now incorrect. The RAID controller has to read all the current chunks in the stripe, modify them in memory, calculate the new parity chunk, and finally write this back out to the disk.

Ceph is also required to perform this read-modify-write operation, however the distributed model of Ceph increases the complexity of this operation. When the primary OSD for a PG receives a write request that will partially overwrite an existing object, it first works out which shards will not be fully modified by the request and contacts the relevant OSDs to request a copy of these shards. The primary OSD then combines these received shards with the new data and calculates the erasure shards. Finally, the modified shards are sent out to the respective OSDs to be committed. This entire operation needs to conform to the other consistency requirements Ceph enforces; this entails the use of temporary objects on the OSD, should a condition arise that Ceph needs to roll back a write operation.

This partial overwrite operation, as can be expected, has a performance impact. In general, the smaller the write I/Os, the greater the apparent impact. The performance impact is a result of the I/O path now being longer, requiring more disk I/Os, and extra network hops. However, it should be noted that due to the striping effect of erasure-coded pools, in the scenario where full stripe writes occur, performance will normally exceed that of a replication-based pool. This is simply down to there being less write amplification due to the effect of striping. If the performance of an erasure pool is not suitable, consider placing it behind a cache tier made up of a replicated pool.

Despite partial overwrite support coming to erasure-coded pools in Ceph, not every operation is supported. In order to store RBD data on an erasure-coded pool, a replicated pool is still required to hold key metadata about the RBD. This configuration is enabled by using the –data-pool option with the rbd utility. Partial overwrite is also not recommended to be used with filestore. Filestore lacks several features that partial overwrites on erasure-coded pools use; without these features, extremely poor performance is experienced.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.252.56