Troubleshooting the 2147483647 error

This small section is included within the erasure coding chapter rather than the troubleshooting section of this book, as it's commonly seen with erasure-coded pools and so is very relevant to this chapter. An example of this error is shown in the following screenshot, when running the ceph health detail command:

If you see 2147483647 listed as one of the OSDs for an erasure-coded pool, this normally means that CRUSH was unable to find a sufficient number of OSDs to complete the PG peering process. This is normally due to the number of K+M shards being larger than the number of hosts in the CRUSH topology. However, in some cases this error can still occur even when the number of hosts is equal to or greater than the number of shards. In this scenario it's important to understand how CRUSH picks OSDs as candidates for data placement. When CRUSH is used to find a candidate OSD for a PG, it applies the CRUSH map to find an appropriate location in the CRUSH topology. If the result comes back as the same as a previously selected OSD, Ceph will retry to generate another mapping by passing slightly different values into the CRUSH algorithm. In some cases, if there is a similar number of hosts to the number of erasure shards, CRUSH may run out of attempts before it can suitably find the correct OSD mappings for all the shards. Newer versions of Ceph have mostly fixed these problems by increasing the CRUSH tunable choose_total_tries.

Table of Contents for Troubleshooting the 2147483647 error

Create new playlist

Sign In

Sign Up

Table of Contents for
Troubleshooting the 2147483647 error