Flushing and eviction

The main tuning options should be looked at first are the ones that define the size limit to the top tier, when it should flush and when it should evict.

The following two configuration options configure the maximum size of the data to be stored in the top tier pool:

    target_max_bytes

    target_max_objects

The size is either specified in bytes or number objects and does not have to be the same size as the actual pool, but it cannot be larger. The size is also based on the available capacity after replication of the RADOS pool, so for a 3x replica pool, this will be a one-third of your raw capacity. If the number of bytes or objects in this pool goes above this limit, I/O will block; therefore, it's important that thought is given to the other config options later so that this limit is not reached. It's also important that this value is set, as without it, no flushing or evictions will occur and the pool will simply fill up OSDs to their full limit and then block I/O.

The reason that this setting exists instead of Ceph just using the size of the underlying capacity of the disks in the RADOS pool is that by specifying the size, you could if you desire, have multiple top-level tier pools on the same set of disks.

As you have learned earlier, target_max_bytes sets the maximum size of the tiered data on the pool and if this limit is reached, I/O will block. In order to make sure that the RADOS pool does not reach this limit, cache_target_full_ratio instructs Ceph to try and keep the pool at a percentage of target_max_bytes by evicting objects when this target is breached. Unlike promotions and flushes, evictions are fairly low-cost operations:

    cache_target_full_ratio

The value is specified as a value between 0 and 1 and works like a percentage. It should be noted that although target_max_bytes and cache_target_full_ratio are set against the pool, internally Ceph uses these values to calculate per PG limits instead. This can mean that in certain circumstances, some PGs may reach the calculated maximum limit before others and can sometimes lead to unexpected results. For this reason, it is recommended not to set cache_target_full_ratio to high and leave some headroom; a value of 0.8 normally works well. We have the following code:

    cache_target_dirty_ratio

    cache_target_dirty_high_ratio

These two configuration options control when Ceph flushes dirty objects from the top tier to the base tier if the tiering has been configured in writeback mode. An object is considered dirty if it has been modified while being in the top tier, objects modified in the base tier do not get marked as dirty. Flushing involves copying the object out of the top tier and into the base tier, as this is a full object write, the base tier can be an erasure-coded pool. The behavior is asynchronous and aside from increasing I/O on the RADOS pools, is not directly linked to any impact on client I/O. Objects are typically flushed at a lower speed than what they can be evicted at. As flushing is an expensive operation compared with eviction, this means that if required, large amounts of objects can be evicted quickly if needed.

The two ratios control what speed of flushing OSD allows, by restricting the number of parallel flushing threads that are allowed to run at once. These can be controlled by the OSD configuration options osd_agent_max_ops and osd_agent_max_high_ops, respectively. By default, these are set to 2 and 4 parallel threads.

In theory, the percentage of dirty objects should hover around the low dirty ratio during normal cluster usage. This will mean that objects are flushed with a low parallelism of flushing to minimize the impact on cluster latency. As normal bursts of writes hit the cluster, the number of dirty objects may rise, but over time, these writes are flushed down to the base tier.

However, if there are periods of sustained writes that outstrip the low speed flushing's capability, then the number of dirty objects will start to rise. Hopefully, this period of high write I/O will not go on for long enough to fill the tier with dirty objects and thus will gradually reduce back down to the low threshold. However, if the number of dirty objects continues to increase and reaches the high ratio, then the flushing parallelism gets increased and will hopefully be able to stop the number of dirty objects from increasing any further. Once the write traffic reduces, the number of dirty objects will be brought back down the low ratio again. These sequence of events are illustrated in the following graph:

The two dirty ratios should have sufficient difference between them that normal bursts of writes can be absorbed, without the high ratio kicking in. The high ratio should be thought of as an emergency limit. A good value to start with is 0.4 for the low ratio and 0.6 for the high ratio.

The osd_agent_max_ops configuration settings should be adjusted so that in normal operating conditions, the number of dirty objects is hovering around or just over the low dirty ratio. It's not easy to recommend a value for these settings as they will largely depend on the ratio of the size and performance of the top tier to the base tier. However, start with setting osd_agent_max_ops to 1 and increase as necessary and set osd_agent_max_high_ops to at least double.

If you see status messages in the Ceph status screen indicating that high-speed flushing is occurring, then you will want to increase osd_agent_max_ops. If you ever see the top tier getting full and blocking I/O, then you either need to consider lowering the cache_target_dirty_high_ratio variable or increase the osd_agent_max_high_ops setting to stop the tier filling up with dirty objects.

Table of Contents for Flushing and eviction

Create new playlist

Sign In

Sign Up

Table of Contents for
Flushing and eviction