Investigating PGs in a down state

A PG in a down state will not service any client operations, and any object contained within the PG will be unavailable. This will cause slow requests to build up across the cluster as clients try to access these objects. The most common reason for a PG to be in a down state is when a number of OSDs are offline, which means that there are no valid copies of the PGs on any active OSDs. However, to find out why a PG is down, you can run the following command:

ceph pg x.y query

This will produce a large amount of output; the section we are interested in shows the peering status. The example here was taken from a PG whose pool was set to min_size 1 and had data written to it when only OSD 0 was up and running. OSD 0 was then stopped and OSDs 1 and 2 were started.

We can see that the peering process is being blocked, as Ceph knows that the PG has newer data written to OSD 0. It has probed OSDs 1 and 2 for the data, which means that it didn’t find anything it needed. It wants to try and pol OSD 0, but it can’t because the OSD is down, hence the message starting or marking this osd lost may let us proceed appeared.

Table of Contents for Investigating PGs in a down state

Create new playlist

Sign In

Sign Up

Table of Contents for
Investigating PGs in a down state