CephFS recovery

Unlike RBDs, which are simply a concatenation of objects, CephFS requires consistent data in both the data and metadata pools. It also requires a healthy CephFS journal; if any of these data sources have issues, CephFS will go offline and may not recover. This section of the chapter will look at recovering CephFS to an active state and then further recovery steps in the scenario that the metadata pool is corrupt or incomplete.

There are a number of conditions where CephFS may go offline but will not result in any permanent data loss; these are often caused by transient events in the Ceph cluster but shouldn't result in any long-term data loss, and in most cases CephFS should automatically recover.

As CephFS sits on RADOS, barring any software bugs in CephFS, any data loss or corruption should only occur in the instance where there has been a data loss occurrence in the RADOS layer, perhaps due to multiple OSD failures leading to the loss of a PG.

The loss of objects or PGs from the data pool will not take the CephFS filesystem offline, but will result in access requests to the affected files to return zeroes. This will likely cause any applications higher up the stack to fail and, due to the semi-random nature of files or parts of files, which map to PGs, the result would likely mean that the CephFS filesystem is largely usable. The best case in this scenario would be to try to recover the RADOS pool PGs as seen later in this chapter.

The loss of objects or PGs from the metadata pool will take the CephFS filesystem offline and it will not recover without manual intervention. It is important to point out that the actual data contents are unaffected by metadata loss, but the objects storing this data would be largely meaningless without the metadata. Ceph has a number of tools that can be used to recover and rebuild metadata, which may enable you to recover from metadata loss. However, as has been mentioned several times throughout this book, prevention is better than cure and as such, these tools should not been seen as a standard recovery mechanism, but only to be used as a last resort when recovery from regular backups have failed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.249.42