VFS cache pressure

As the name suggests, the filestore object store works by storing RADOS objects as files on a standard Linux file system. In most cases, this will be XFS. As each object is stored as a file, there will likely be hundreds of thousands, if not millions, of files per disk. A Ceph cluster is comprised of 8 TB disks and is used for a RBD workload. Assuming the RBD is made up of the standard 4 MB objects, there would be nearly 2 million objects per disk.

When an application asks Linux to read or write to a file on a file system, it needs to know where that file actually exists on the disk. In order to find this location out, it needs to follow the structure of directory entries and inodes. Each one of these look ups will require disk access if it's not already cached in memory. This can lead to poor performance in some cases if the Ceph objects, which are required to be read or written to, haven't been accessed in a while and are hence not cached. This penalty is a lot higher in spinning disk clusters as opposed to SSD-based clusters due to the impact of the random reads.

By default, Linux favors the caching of data in the pagecache versus the caching of inodes and directory entries. In many cases in Ceph, this is the opposite of what you actually want to happen. Luckily, there is a tuneable kernel that allows you to tell Linux to prefer directory entries and inodes over pagecache; this can be controlled by the following sysctl setting:

    vm.vfs_cache_pressure

Where a lower number sets a preference to cache inodes and directory entries, do not set this to zero. A zero setting tells the kernel not to flush old entries even in the event of a low memory condition and can have adverse effects. A value of 1 is recommended.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.214.60