Example assert

The following assert was taken from the Ceph user's mailing list:

2017-03-02 22:41:32.338290 7f8bfd6d7700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 7f8bfd6d7700 time 2017-03-02 22:41:32.335020

osd/ReplicatedPG.cc: 10514: FAILED assert(obc)


ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbddac5]
2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x75f) [0x87e48f]
3: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab]
4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0xe3a) [0x8a0d1a]
5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x68a) [0x83be4a]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x405) [0x69a5c5]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x333) [0x69ab33]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f) [0xbcd1cf]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300]
10: (()+0x7dc5) [0x7f8c1c209dc5]
11: (clone()+0x6d) [0x7f8c1aceaced]

The top part of the assert shows the function from where the assert was triggered and also the line number and file where the assert can be found. In this example, the hit_set_trim function is apparently the cause of the assert. We can look into the ReplicatedPG.cc file around line 10,514 to try to understand what might have happened. Note the version of the Ceph release (0.94.7), as the line number in GitHub will only match if you are looking at the same version.

From looking at the code, it appears that the returned value from the get_object_context function call is directly passed to the assert function. If the value is zero  indicating the object containing the hit-set to be trimmed could not be found  the OSD will assert. From this information, there is a chance that investigation could be done to work out why the object is missing and recover it. Or the assert command could be commented out to see whether it allows the OSD to continue functioning. In this example, allowing the OSD to continue processing will likely not cause an issue, but in other cases, an assert may be the only thing stopping more serious corruption from occurring. If you don't 100% understand why something is causing an assert, and the impact of any potential change you might make, seek help before continuing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.226.66