The data management inside a Ceph cluster involves all the components that we have discussed so far. The coordination between these components gives power to Ceph to provide a reliable and robust storage system. Data management starts as soon as a client writes data to a Ceph pool. Once the client writes data to a Ceph pool, data is first written to a primary OSD based on the pool replication size. The primary OSD replicates the same data to its secondary and tertiary OSDs and waits for their acknowledgement. As soon as the secondary and tertiary OSDs complete data writing, they send an acknowledgement signal to the primary OSD, and finally, the primary OSD returns an acknowledgement to the client confirming the write operation completion.
In this way, Ceph consistently stores each client write operation and provides data availability from its replicas in the event of failures. Let's now see how data is stored in a Ceph cluster:
3
copies:# echo "Hello Ceph, You are Awesome like MJ" > /tmp/helloceph # ceph osd pool create HPC_Pool 128 128 # ceph osd pool set HPC_Pool size 3
# rados -p HPC_Pool put object1 /tmp/helloceph # rados -p HPC_Pool ls
# ceph osd map HPC_Pool object1
This command will show you OSD maps for object1
, which is inside HPC_Pool
:
Let's discuss the output of this command:
osdmap e566
: This is the OSD map version ID or OSD epoch 556.pool 'HPC_Pool' (10)
: This is a Ceph pool name and pool ID.object 'object1'
: This is an object name.pg 10.bac5debc (10.3c)
: This is the placement group number, that is, object1
, which belongs to PG 10.3c
.up [0,6,3]
: This is the OSD up set that contains osd.0
, osd.6
, and osd.3
. Since the pool has a replication level set to 3
, each PG will be stored in three OSDs. This also means that all the OSDs holding PG 10.3c
are up. It is the ordered list of OSDs that is responsible for a particular OSD at a particular epoch as per the CRUSH map. This is usually the same as the acting set.acting [0,6,3]
: osd.0
, osd.6
, and osd.3
are in the acting set where osd.0
is the primary OSD, osd.6
is the secondary OSD, and osd.3
is the tertiary OSD. The acting set is the ordered list of OSD, which is responsible for a particular OSD.ceph-node1
, ceph-node3
, and ceph-node2
hosts, respectively.object1
is stored at PG 10.3c of ceph-node2 , on partition sdb1 which is osd.3; note that these PG ID and OSD ID might differ with your setup:# ssh ceph-node2 # df -h | grep -i ceph-3 # cd /var/lib/ceph/osd/ceph-3/current # ls -l | grep -i 10.3c # cd 10.3c_head/ # ls -l
In this way, Ceph stores each data object in a replicated manner over different failure domains. This intelligence mechanism is the core of Ceph's data management.
18.218.31.165