Using the Cephs object store tool

Hopefully, if you have followed best practice, your cluster is running with three replicas and is not configured with any dangerous configuration options. Ceph, in most cases, should be able to recover from any failure.

However, in the scenario where a number of OSDs go offline, a number of PGs and/or objects may become unavailable. If you are unable to reintroduce these OSDs back into the cluster to allow Ceph to recover them gracefully, then the data in those PGs is effectively lost. However, there is a possibility that the OSD is still readable to use the objectstore tool to recover the PGs contents. The process involves exporting the PGs from the failed OSDs and then importing the PGs back into the cluster. The objectstore tool does require that the OSDs internal metadata is still in a consistent state, so full recovery is not guaranteed.

In order to demonstrate the use of the objectstore tool, we will shut down two of our three test cluster OSDs, and then recover the missing PGs back into the cluster. In real life, its unlikely you would be facing a situation where every single PG from the failed OSDs is missing, but for demonstration purposes, the required steps are the same:

First, let's set the pool size to 2, so we can make sure that we lose all the copies of some PGs when we stop the OSD service:

Now, shut down two of the OSD services, and you will see from the Ceph status screen that the number of PGs will go offline:

Running a Ceph health detail will also show which PGs are in a degraded state:

The stale PGs are the ones that no longer have a surviving copy, and it can be seen that the acting OSD is the one that was shut down.

If we use grep to filter out just the stale PGs, we can use the resulting list to work out what PGs we need to recover. If the OSDs have actually been removed from the cluster, then the PGs will be listed as incomplete rather than stale.

Check the OSD to make sure the PG exists in it:

We will now use the objectstore tool to export the pg to a file. As the amount of data in our test cluster is small, we can just export the data to the OS disk. In real life, you probably want to consider connecting additional storage to the server. USB disks are ideal for this, as they can easily be moved between servers as part of the recovery process:

sudo ceph-objectstore-tool --op export --pgid 0.2a --data-path /var/lib/ceph/osd/ceph-2 --file 0.2a_export

If you experience an assert while running the tool, you can try running it with the --skip-journal-replay flag, which will skip replaying the journal into the OSD. If there was any outstanding data in the journal, it will be lost. But this may allow you to recover the bulk of the missing PGs that would have otherwise been impossible. And repeat this until you have exported all the missing PGs.

Now, we can import the missing PGs back into an operating OSD; while we could import the PGs into an existing OSD, it is much safer to perform the import on a new OSD, so we don't risk further data loss. For this demonstration, we will create a directory-based OSD on the disk used by the failed OSD. It's highly recommended in a real disaster scenario that the data would be inserted into an OSD running on a separate disk, rather than using an existing OSD. This is done so that there is no further risk to any data in the Ceph cluster.

Also, it doesn't matter that the PGs that are being imported are all inserted into the same temporary OSD. As soon as Ceph discovers the objects, it will recover them to the correct location in the cluster.

Create a new empty folder for the OSD:

sudo mkdir /var/lib/ceph/osd/ceph-2/tmposd/

Use ceph-disk to prepare the directory for Ceph:

sudo ceph-disk prepare  /var/lib/ceph/osd/ceph-2/tmposd/

Change the ownership of the folder to the ceph user and the group:

sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/tmposd/

Activate the OSD to bring it online:

sudo ceph-disk activate  /var/lib/ceph/osd/ceph-2/tmposd/

Set the weight of the OSD to stop any objects from being backfilled into it:

sudo ceph osd crush reweight osd.3 0

Now, we can proceed with the PG import, specifying the temporary OSD location and the PG files that we exported earlier:

sudo ceph-objectstore-tool --op import --data-path /var/lib/ceph/osd/ceph-3 --file 0.2a_export

Repeat this for every PG that you exported previously. Once complete, reset file onwership and restart the new temp OSD:

sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/tmposd/
sudo systemctl start ceph-osd@3

After checking the Ceph status output, you will see that your PGs are now active, but in a degraded state. In the case of our test cluster, there are not sufficient OSDs to allow the objects to recover to the correct amount of copies. If there were more OSDs in the cluster, the objects would then be backfilled around the cluster and would recover to full health with the correct number of copies.

Table of Contents for Using the Cephs object store tool

Create new playlist

Sign In

Sign Up

Table of Contents for
Using the Cephs object store tool