One of the key features of Ceph is its self-repairing and self-healing qualities. Ceph does this by keeping multiple copies of placement groups across different OSDs and ensures very high probability that you will not lose your data. In a very rare case, you may see the failure of multiple OSDs, where one or more PG replicas are on a failed OSD, and the PG state becomes incomplete, which leads to errors in the cluster health. For granular recovery, Ceph provides a low level PG and object data recovery tool known as ceph-objectstore-tool
.
The ceph-objectstore-tool
can be a risky operation, and the command needs to be run either as root or sudo
. Do not attempt this on a production cluster without engaging the Red Hat Ceph Storage Support, unless you are sure of what you are doing. It can cause irreversible data loss in your cluster.
# ceph health detail | grep incomplete
# ceph osd find <osd_number>
# service ceph stop <osd_ID>
The following sections describe the OSD and placement group functions that you can use with the ceph-objectstore-tool
:
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --op list
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --pgid <pgid> --op list
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --op list-pgs
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --op list <object-id>
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --pgid <pg-id> --op info
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --pgid <pg-id> --op log
Removing a placement group is a risky operation and may cause data loss; use this feature with caution. If you have a corrupt placement group on an OSD that prevents the peering or starting of the OSD service, before removing the placement group, ensure that you have a valid copy of the placement group on another OSD. For precaution, before removing the PG, you can also take a backup of the PG by exporting it to a file:
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --pgid <pg-id> --op remove
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --pgid <pg-id> --file /path/to/file --op export
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --file </path/to/file> --op import
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --op list-lost
pgid
:# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --pgid <pgid> --op list-lost
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --op fix-lost
pgid
:# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --pgid <pg-id> --op fix-lost
# ceph-objectstore-tool --data-path </path/to/osd> --journal-path </path/to/journal> --op fix-lost <object-id>
The syntax for the ceph-objectstore tool is: ceph-objectstore-tool <options>
The values for <options>
can be as follows:
--data-path
: The path to the OSD--journal-path
: The path to the journal--op
: The operation--pgid
: The Placement Group ID--skip-journal-replay
: Use this when the journal is corrupted--skip-mount-omap
: Use this when the leveldb
data store is corrupted and unable to mount--file
: The path to the file, used with the import/export operationTo understand this tool better, let's take an example: a pool makes two copies of an object, and PGs are located on osd.1
and osd.2
. At this point, if failure happens, the following sequence will occur:
osd.1
goes down.osd.2
handles all the write operations in a degraded state.osd.1
comes up and peers with osd.2
for data replication.osd.2
goes down before replicating all the objects to osd.1
.osd.1
, but it's stale.After troubleshooting, you will find that you can read the osd.2
data from the file system, but its osd
service is not getting started. In such a situation, one should use the ceph-objectstore-tool
to export/retrieve data from the failed osd
. The ceph-objectstore-tool
provides you with enough capability to examine, modify, and retrieve object data and metadata.
Finally, we have reached the end of this chapter and the book as well. I hope your journey with the Ceph cookbook has been informative. You should have learned several concepts around Ceph that will give you enough confidence to operate the Ceph cluster in your environment. Congratulations! You have now attained the next level in Ceph.
Keep Learning, Keep Exploring, Keep Sharing…
Cheers!
18.226.104.27