Repairing inconsistent objects

We will now see how we can correctly repair inconsistent objects.

  1. To be able to recreate an inconsistent scenario, create an RBD, and later we'll make a file system on it:
  1. Now, check to see which objects have been created by formatting the RBD with a file system:

  1. Pick one object at random and use the osd map command to find out which PG the object is stored in:

  1. Find this object on the disk on one of the OSD nodes; in this case, it is OSD.0 on OSD1:

  1. Corrupt it by echoing garbage over the top of it:

  1. Now, tell Ceph to do a scrub on the PG that contains the object that we corrupted:

  1. If you check the Ceph status, you will see that Ceph has detected the corrupted object and marked the PG as inconsistent. From this point onward, forget that we corrupted the object manually and work through the process as if it were for real:

By looking at the detailed health report, we can find the PG that contains the corrupted object. We could just tell Ceph to repair the PG now; however, if the primary OSD is the one that holds the corrupted object, it will overwrite the remaining good copies. This would be bad; thus in order to make sure this doesn’t happen, before running the repair command we will confirm which OSD holds the corrupt object.

By looking at the health report we can see the three OSD’s which hold a copy of the object; the first OSD is the primary.

  1. Log onto the primary OSD node and open the log file for the primary OSD. You should be able to find the log entry where it indicates what object was flagged up by the PG scrub.
  1. Now by logging on to each OSD and navigating through the PG structure, find the object mentioned in the log file and calculate a md5sum of each copy.

md5sum of object on osd node one

  md5sum of object on osd node two

  md5sum of object on osd node three.

We can see that the object on OSD.0 has a different md5sum, and so we know that it is the corrupt object.

OSD.0 = d599f0ec05c3bda8c3b8a68c32a1b47
OSD.2 = 5cfa9d6c8febd618f91ac2843d50a1c
OSD.3 = 5cfa9d6c8febd618f91ac2843d50a1c

Although we already know which copy of the object was corrupted as we manually corrupted the object on OSD.0, let's pretend we hadn’t done it, and this corruption was caused by some random cosmic ray. We now have the md5sum of the three replica copies and can clearly see that the copy on OSD.0 is wrong. This is a big reason why a 2x replication scheme is bad; if a PG becomes inconsistent, you can’t figure out which one is the bad one. As the primary OSD for this PG is 2, as can be seen in both the Ceph health details and the Ceph OSD map commands, we can safely run the ceph pg repair command without the fear of copying the bad object over the top of the remaining good copies.

We can see that the inconsistent PG has repaired itself:

In the event that the copy is corrupt on the primary OSD, then the following steps should be taken:

  1. Stop the primary OSD.
  2. Delete the  object from the PG directory.
  3. Restart the OSD.
  4. Instruct Ceph to repair the PG.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.95.245