Like erasure coding, the cache tiering feature has also been introduced in the Ceph Firefly release, and it has been one of the most talked about features of Ceph Firefly. Cache tiering creates a Ceph pool that will be constructed on top of faster disks, typically SSDs. This cache pool should be placed in front of a regular, replicated, or erasure pool such that all the client I/O operations are handled by the cache pool first; later, the data is flushed to existing data pools.
The clients enjoy high performance out of the cache pool, while their data is written to regular pools transparently.
Generally, a cache tier is constructed on top of expensive/faster SSD disks, thus it provides clients with better I/O performance. The cache pool is backed up by a storage tier, which is made up of HDDs with type replicated or erasure. In this type of setup, clients submit I/O requests to the cache pool and get instant responses for their requests, whether it's a read or write; the faster cache tier serves the client request. After a while, the cache tier flushes all its data to the backing storage tier so that it can cache new requests from clients. All the data migration between the cache and storage tiers happens automatically and is transparent to clients. Cache tiering can be configured in two modes.
When Ceph cache tiering is configured as a writeback mode, a Ceph client writes the data to the cache tier pool, that is, to the faster pool, and hence receives acknowledgement instantly. Based on the flushing/evicting policy that you have set for your cache tier, data is migrated from the cache tier to the storage tier, and eventually removed from the cache tier by a cache-tiering agent. During a read operation by the client, data is first migrated from the storage tier to the cache tier by the cache-tiering agent, and it is then served to clients. The data remains in the cache tier until it becomes inactive or cold.
When Ceph cache tiering is configured as a read-only mode, it works only for a client's read operations. The client's write operation does not involve cache tiering, rather, all the client writes are done on the storage tier. During read operations by clients, a cache-tiering agent copies the requested data from the storage tier to the cache tier. Based on the policy that you have configured for the cache tier, stale objects are removed from them. This approach is idle when multiple clients need to read large amounts of similar data.
A cache tier is implemented on faster physical disks, generally SSDs, which makes a fast cache layer on top of slower regular pools made up of HDD. In this section, we will create two separate pools, a cache pool and a regular pool, which will be used as cache tier and storage tier, respectively:
In Chapter 7, Ceph Operations and Maintenance, we discussed the process of creating Ceph pools on top of specific OSDs by modifying a CRUSH map. Similarly, we will create a cache pool, which will be based on osd.0, osd.3, and osd.6. Since we do not have real SSDs in this setup, we will assume the OSDs as SSDs and create a cache pool on top of it. The following are the instructions to create a cache pool on osd.0, osd.3, and osd.4:
# ceph osd getcrushmap -o crushmapdump # crushtool -d crushmapdump -o crushmapdump-decompiled
root default
section:# vim crushmapdump-decompiled root cache { id -5 alg straw hash 0 item osd.0 weight 0.010 item osd.3 weight 0.010 item osd.6 weight 0.010 }
rules
section, generally at the end of the file. Finally, save and exit the CRUSH map file:rule cache-pool { ruleset 4 type replicated min_size 1 max_size 10 step take cache step chooseleaf firstn 0 type osd step emit }
# crushtool -c crushmapdump-decompiled -o crushmapdump-compiled # ceph osd setcrushmap -i crushmapdump-compiled
# ceph osd tree
crush_ruleset
as 4
so that the new pool gets created on SSD disks:# ceph osd pool create cache-pool 32 32 # ceph osd pool set cache-pool crush_ruleset 4
# rados -p cache-pool ls
# rados -p cache-pool put object1 /etc/hosts
# rados -p cache-pool ls
# ceph osd map cache-pool object1
# rados -p cache-pool rm object1
In the previous section, we created a pool based on SSDs; we will now use this pool as a cache tier for an erasure-coded pool named EC-pool
that we created earlier in this chapter.
The following instructions will guide you through creating a cache tier with the writeback mode and setting the overlay with an EC-pool:
ceph osd tier add <storage_pool> <cache_pool>
:# ceph osd tier add EC-pool cache-pool
# ceph osd tier cache-mode <cache_pool> writeback
:# ceph osd tier cache-mode cache-pool writeback
# ceph osd tier set-overlay <storage_pool> <cache_pool>
:# ceph osd tier set-overlay EC-pool cache-pool
tier
, read_tier
, and write_tier
set as 16
, which is the pool ID for cache-pool.Similarly, for cache-pool, the settings will be tier_of
set as 15
and cache_mode
as writeback
; all these settings imply that the cache pool is configured correctly:
# ceph osd dump | egrep -i "EC-pool|cache-pool"
A cache tier has several configuration options; you should configure your cache tier in order to set policies for it. In this section, we will configure cache tier policies:
# ceph osd pool set cache-pool hit_set_type bloom
hit_set_count
, which is the number of hits set to store for a cache pool:# ceph osd pool set cache-pool hit_set_count 1
hit_set_period
, which is the duration of the hit set period in seconds to store for a cache pool:# ceph osd pool set cache-pool hit_set_period 300
target_max_bytes
, which is the maximum number of bytes after the cache-tiering agent starts flushing/evicting objects from a cache pool:# ceph osd pool set cache-pool target_max_bytes 1000000
target_max_objects
, which is the maximum number of objects after which a cache-tiering agent starts flushing/evicting objects from a cache pool:# ceph osd pool set cache-pool target_max_objects 10000
cache_min_flush_age
and cache_min_evict_age
, which are the time in seconds a cache-tiering agent will take to flush and evict objects from a cache tier to a storage tier:# ceph osd pool set cache-pool cache_min_flush_age 300 # ceph osd pool set cache-pool cache_min_evict_age 300
cache_target_dirty_ratio
, which is the percentage of cache pool containing dirty (modified) objects before the cache-tiering agent flushes them to the storage tier:# ceph osd pool set cache-pool cache_target_dirty_ratio .01
cache_target_full_ratio
, which is the percentage of cache pool containing unmodified objects before the cache-tiering agent flushes them to the storage tier:# ceph osd pool set cache-pool cache_target_full_ratio .02
# dd if=/dev/zero of=/tmp/file1 bs=1M count=500
The following screenshot shows the preceding commands in action:
Until now, we created and configured a cache tier. Next, we will test it. As explained earlier, during a client write operation, data seems to be written to regular pools, but actually, it is written on cache-pools first, therefore, clients benefit from the faster I/O. Based on the cache tier policies, data is migrated from a cache pool to a storage pool transparently. In this section, we will test our cache tiering setup by writing and observing the objects on cache and storage tiers:
/tmp/file1
; we will now put this file to an EC-pool:# rados -p EC-pool put object1 /tmp/file1
file1
should not get written to the EC-pool at the first state. It should get written to the cache-pool. List each pool to get object names. Use the date command to track time and changes:# rados -p EC-pool ls # rados -p cache-pool ls # date
cache_min_evict_age
to 300 seconds), the cache-tiering agent will migrate object1 from the cache-pool to EC-pool; object1 will be removed from the cache-pool:# rados -p EC-pool ls # rados -p cache-pool ls # date
As explained in the preceding output, data is migrated from a cache-pool to an EC-pool after a certain time.
52.14.82.217