Tuning CephFS

There are two main performance characteristics that determine CephFS performance—the speed of metadata access and the speed of data access, although in the majority of cases, both of these contribute to access requests.

It is important to understand that in CephFS, once the metadata has been retrieved for a file, reads to the actual file data do not require any further metadata operations until the file is closed by the client. Similarly, when writing files, only when dirty data is flushed by the client is the metadata updated. Thus, for a large, sequential buffered IO, metadata operations will likely only make up a small proportion of the total cluster IO.

Similarly, for CephFS filesystems that are dealing with a large number of clients constantly opening and closing lots of smaller files, metadata operations will have a much bigger role to play in determining overall performance. Additionally, metadata is used to supply client information surrounding the filesystem, such as providing directory listings.

Dealing with CephFS's data pool performance should be handled like any other Ceph performance requirements that were covered in this chapter, so for the purpose of this section, metadata performance will be the focus.

Metadata performance is determined by two factors: the speed of reading/writing metadata via the RADOS metadata pool, and the speed at which the MDS can handle client requests. First, make sure that the metadata pool is residing on flash storage, as this will reduce the latency of metadata requests by at least and order of magnitude, if not more. However, as was discussed earlier in the Latency section of this chapter, the latency introduced by a distributed network-storage platform can also have an impact on metadata performance.

To work around some of this latency, the MDS has the concept of a local cache to serve hot metadata requests from. By default, an MDS reserves 1 GB of RAM to use as a cache and, generally speaking, the more ram you can allocate, the better. The reservation is controlled by the mds_cache_memory_limit variable. By increasing the amount of memory the MDS can use as a cache, the number of requests having to go to the RADOS pool are reduced, and the locality of the RAM will reduce metadata access latency.

There will come a point when adding additional RAM brings very little benefit. This may either be due to the cache being sufficiently sized that the majority of requests are being served from cache, or that the number of requests the actual MDS can handle has been reached.

Regarding the later point, the MDS process is single-threaded and so there will come a point where the number of metadata requests is causing an MDS to consume 100% of a single CPU core and no additional caching or SSDs will help. The current recommendations are to try and run the MDS on a high clocked CPU as possible. The quad core Xeon E3s are ideal for this use and can often be obtained with frequencies nearing 4 GHz for a reasonable price. Compared to some of the lower-clocked Xeon CPUs, often with high core counts, the performance gain could almost be double by ensuring a fast CPU is used.

If you have purchased the fastest CPU possible and are finding that a single MDS process is still the bottleneck, the last option should be to start deploying multiple active MDSes, so that the metadata requests are sharded across multiple MDSes.

Table of Contents for Tuning CephFS

Create new playlist

Sign In

Sign Up

Table of Contents for
Tuning CephFS