Erasure code is implemented by creating a Ceph pool of the type erasure. This pool is based on an erasure code profile that defines erasure-coding characteristics. We will first create an erasure code profile, and then we will create an erasure-coded pool based on this profile.
# ceph osd erasure-code-profile set EC-profile rulesetfailure-domain=osd k=3 m=2
# ceph osd erasure-code-profile ls
# ceph osd erasure-code-profile get EC-profile
# ceph osd pool create EC-pool 16 16 erasure EC-profile
Check the status of your newly created pool; you should find that the size of the pool is 5 (k + m), that is, the erasure size 5. Hence, data will be written to five different OSDs:
# ceph osd dump | grep -i EC-pool
hello.txt
, and add this file to the EC-pool
.EC-pool
and object1
.If you observe the above output, you will notice that object1
is stored in the placement group 47.c
, which in turn is stored in the EC-pool
. You will also notice that the placement group is stored on five OSDs, that is, osd.5
, osd.3
, osd.2
, osd.8
, and osd.0
. If you go back to Step 1, you will recall that we created the erasure-coded profile of (3,2). This is why object1
is stored on five OSDs.
At this stage, we have completed the setting up of an erasure pool in a Ceph cluster. Now, we will deliberately try to break OSDs to see how the erasure pool behaves when OSDs are unavailable.
OSD.5
and OSD.3
, one by one.Bring down osd.5
and check the OSD map for EC-pool
and object1
. You should notice that osd.5
is replaced by a random number, 2147483647
, which means that osd.5
is no longer available for this pool:
# ssh ceph-node2 service ceph stop osd.5 # ceph osd map EC-pool object1
osd.3
, and notice the OSD map for the EC-pool
and object1
. You will notice that, like osd.5
, osd.3
also gets replaced by the random number, 2147483647
, which means that osd.3
is also no longer available for this EC-pool
:# ssh ceph-node2 service ceph stop osd.3 # ceph osd map EC-pool object1
EC-pool
is running on three OSDs, which is the minimum requirement for this setup of erasure pool. As discussed earlier, the EC-pool
will require any three chunks out of five in order to read the data. Now, we have only three chunks left, which are on osd.2
, osd.8
, and osd.0
, and we can still access the data. Let's verify the data reading:# rados -p EC-pool ls # rados -p EC-pool get object1 /tmp/object1 # cat /tmp/object1
The Erasure code feature is greatly benefited by Ceph's robust architecture. When Ceph detects the unavailability of any failure zone, it starts its basic operation of recovery. During the recovery operation, erasure pools rebuild themselves by decoding failed chunks on to new OSDs, and after that, they make all the chunks available automatically.
osd.5
and osd.3
. After a while, Ceph will start recovery and will regenerate missing chunks onto different OSDs. Once the recovery operation is complete, you should check the OSD map for the EC-pool
and object1
. You will be amazed to see the new OSD IDs as osd.7
and osd.4
. And thus, an erasure pool becomes healthy without administrative input.18.118.137.67