Ceph data management

The data management inside a Ceph cluster involves all the components that we have discussed so far. The coordination between these components gives power to Ceph to provide a reliable and robust storage system. Data management starts as soon as a client writes data to a Ceph pool. Once the client writes data to a Ceph pool, data is first written to a primary OSD based on the pool replication size. The primary OSD replicates the same data to its secondary and tertiary OSDs and waits for their acknowledgement. As soon as the secondary and tertiary OSDs complete data writing, they send an acknowledgement signal to the primary OSD, and finally, the primary OSD returns an acknowledgement to the client confirming the write operation completion.

In this way, Ceph consistently stores each client write operation and provides data availability from its replicas in the event of failures. Let's now see how data is stored in a Ceph cluster:

  1. We will first create a test file, a Ceph pool, and set the pool replication to 3 copies:
    # echo "Hello Ceph, You are Awesome like MJ" > /tmp/helloceph
    # ceph osd pool create HPC_Pool 128 128
    # ceph osd pool set HPC_Pool size 3
    
  2. Put some data in this pool and verify its contents:
    # rados -p HPC_Pool put object1 /tmp/helloceph
    # rados -p HPC_Pool ls
    
  3. The file has now been stored in a Ceph pool. As you know, everything in Ceph gets stored in the form of objects, which belong to a placement group, and these placement groups belong to multiple OSDs. Now, let's see this concept practically:
    # ceph osd map HPC_Pool object1
    

    This command will show you OSD maps for object1, which is inside HPC_Pool:

    Ceph data management

    Let's discuss the output of this command:

    • osdmap e566: This is the OSD map version ID or OSD epoch 556.
    • pool 'HPC_Pool' (10): This is a Ceph pool name and pool ID.
    • object 'object1': This is an object name.
    • pg 10.bac5debc (10.3c): This is the placement group number, that is, object1, which belongs to PG 10.3c.
    • up [0,6,3]: This is the OSD up set that contains osd.0, osd.6, and osd.3. Since the pool has a replication level set to 3, each PG will be stored in three OSDs. This also means that all the OSDs holding PG 10.3c are up. It is the ordered list of OSDs that is responsible for a particular OSD at a particular epoch as per the CRUSH map. This is usually the same as the acting set.
    • acting [0,6,3]: osd.0, osd.6, and osd.3 are in the acting set where osd.0 is the primary OSD, osd.6 is the secondary OSD, and osd.3 is the tertiary OSD. The acting set is the ordered list of OSD, which is responsible for a particular OSD.
  4. Check the physical location of each of these OSDs. You will find OSDs 0, 6, and 3 are physically separated on ceph-node1, ceph-node3, and ceph-node2 hosts, respectively.
    Ceph data management
  5. Now, log in to any of these nodes to check where the real data resides on OSD. You will observe that object1 is stored at PG 10.3c of ceph-node2 , on partition sdb1 which is osd.3; note that these PG ID and OSD ID might differ with your setup:
    # ssh ceph-node2
    # df -h | grep -i ceph-3
    # cd /var/lib/ceph/osd/ceph-3/current
    # ls -l | grep -i 10.3c
    # cd 10.3c_head/
    # ls -l
    
    Ceph data management

In this way, Ceph stores each data object in a replicated manner over different failure domains. This intelligence mechanism is the core of Ceph's data management.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.31.165