In this chapter, we will cover the following topics:
Storage provisioning is the primary and most important task of a storage system administrator. It is the process of assigning storage space or capacity to both physical and virtual servers in the form of blocks, files, or objects. A typical computer system and servers come with a limited local storage capacity that is not enough for your data storage needs. Storage solutions such as Ceph provide virtually unlimited storage capacity to these servers, making them capable to store all your data and making sure you do not run out of space.
In addition to providing extra storage, there are numerous benefits of having a centralized storage system.
Ceph can provision storage capacity in a unified way, which includes block storage, filesystem, and object storage. Depending on your use case, you can select one or more storage solutions as shown in the following diagram. Now, let's discuss these storage types in detail and implement them on our test cluster.
The RADOS block device (RBD)—formerly known as the Ceph block device—provides block-based persistent storage to Ceph clients, which they use as an additional disk. The client has the flexibility to use the disk as they require, either as a raw device or by formatting it with a filesystem followed by mounting it. A RADOS block device makes use of the librbd library and stores blocks of data in a sequential form striped over multiple OSDs in a Ceph cluster. RBD is backed by the RADOS layer of Ceph, and thus, every block device is spread over multiple Ceph nodes, delivering high performance and excellent reliability. RBD is rich with enterprise features such as thin provisioning, dynamically resizable, snapshots, copy-on-write, and caching, among others. The RBD protocol is fully supported with Linux as a mainline kernel driver; it also supports various virtualization platforms such as KVM, Qemu, and libvirt, allowing virtual machines to take advantage of a Ceph block device. All these features make RBD an ideal candidate for cloud platforms such as OpenStack and CloudStack. We will now learn how to create a Ceph block device and make use of it:
ceph-client1-rbd1
of size 10240 MB:# rbd create ceph-client1-rbd1 --size 10240
# rbd ls
# rbd --image ceph-client1-rbd1 info
Have a look at the following screenshot to see the preceding command in action:
rbd
pool of the Ceph cluster. You can specify other pools by using the -p
parameter with the rbd
command. The following command will give you the same output as the last command, but we manually specify the pool name using the -p
parameter here. Similarly, you can create RBD images on other pools using the -p
parameter:# rbd --image ceph-client1-rbd1 info -p rbd
Ceph is a storage system; to store your data on a Ceph cluster, you will require a client machine. Once storage space is provisioned from the Ceph cluster, the client maps or mounts a Ceph block or filesystem and allows us to store data on to the Ceph cluster. For object storage, clients have HTTP access to Ceph clusters to store data. A typical production-class Ceph cluster consists of two different networks, frontend network and backend network, also known as public network and cluster network, respectively.
The frontend network is the client network by which Ceph serves data to its clients. All Ceph clients interact with clusters using the frontend network. Clients do not have access to the backend network, and Ceph mainly uses the backend network for its replication and recovery .We will now set up our first Ceph client virtual machine that we will use throughout this book. During the setup process, we will create a new client virtual machine as we did in Chapter 2, Ceph Instant Deployment:
# VboxManage createvm --name ceph-client1 --ostype RedHat_64 --register # VBoxManage modifyvm ceph-client1 --memory 1024 --nic1 nat --nic2 hostonly --hostonlyadapter2 vboxnet1 # VBoxManage storagectl ceph-client1 --name "IDE Controller" --add ide --controller PIIX4 --hostiocache on --bootable on # VBoxManage storageattach ceph-client1 --storagectl "IDE Controller" --type dvddrive --port 0 --device 0 --medium /downloads/CentOS-6.4-x86_64-bin-DVD1.iso # VBoxManage storagectl ceph-client1 --name "SATA Controller" --add sata --controller IntelAHCI --hostiocache on --bootable on # VBoxManage createhd --filename OS-ceph-client1.vdi --size 10240 # VBoxManage storageattach ceph-client1 --storagectl "SATA Controller" --port 0 --device 0 --type hdd --medium OS-ceph-client1.vdi # VBoxManage startvm ceph-client1 --type gui
ceph-client1
./etc/sysconfig/network-scripts/ifcfg-eth0
file and add the following:ONBOOT=yes BOOTPROTO=dhcp
/etc/sysconfig/network-scripts/ifcfg-eth1
file and add the following:ONBOOT=yes BOOTPROTO=static IPADDR=192.168.57.200 NETMASK=255.255.255.0
/etc/hosts
file and add the following:192.168.57.101 ceph-node1 192.168.57.102 ceph-node2 192.168.57.103 ceph-node3 192.168.57.200 ceph-client1
Earlier in this chapter, we created an RBD image on a Ceph cluster; in order to use this block device image, we need to map it to the client machine. Let's see how the mapping operation works.
Ceph support has been added to Linux kernel from Version 2.6.32. For client machines that need native access to Ceph block devices and filesystems, it is recommended that they use Linux kernel release 2.6.34 and later.
Check the Linux kernel version and RBD support using the modprobe
command. Since this client runs on an older release of Linux kernel, it does not support Ceph natively.
# uname –r # modprobe rbd
Have a look at the following screenshot:
In order to add support for Ceph, we need to upgrade the Linux kernel version.
# rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
# yum --enablerepo=elrepo-kernel install kernel-ml
/etc/grub.conf
, update default = 0
, and then gracefully reboot the machine.Once the machine is rebooted, check the Linux kernel version and its RBD support, as we did earlier:
To grant clients' permission to access the Ceph cluster, we need to add the keyring and Ceph configuration file to them. Client and Ceph cluster authentication will be based on the keyring. The admin user of Ceph has full access to the Ceph cluster, so for security reasons, you should not unnecessarily distribute the admin keyrings to other hosts if they do not need it. As per best practice, you should create separate users with limited capabilities to access the Ceph cluster, and use these keyrings to access RBD. In the upcoming chapters, we will discuss more on Ceph users and keyrings. As of now, we will make use of the admin user keyring.
Install Ceph binaries on ceph-node1 and push ceph.conf
and ceph.admin.keyring
to it:
# ceph-deploy install ceph-client1 # ceph-deploy admin ceph-client1
Once the Ceph configuration file and admin keyring are placed in ceph-client1 node, you can query the Ceph cluster for RBD images:
Map the RBD image ceph-client1-rbd1
to a ceph-client1 machine. Since RBD is now natively supported by Linux kernel, you can execute the following command from the ceph-client1 machine to map RBD:
# rbd map --image ceph-client1-rbd1
Alternatively, you can use the following command to specify the pool name of the RBD image and can achieve the same results. In our case, the pool name is rbd
, as explained earlier in this chapter:
# rbd map rbd/ceph-client1-rbd1
You can find out the operating system device name used for this mapping, as follows:
# rbd showmapped
The following screenshot shows this command in action:
Once RBD is mapped to OS, we should develop a filesystem on it to make it usable. It will now be used as an additional disk or as a block device:
# fdisk -l /dev/rbd0 # mkfs.xfs /dev/rbd0 # mkdir /mnt/ceph-vol1 # mount /dev/rbd0 /mnt/ceph-vol1
Have a look at the following screenshot:
Put some data on Ceph RBD:
# dd if=/dev/zero of=/mnt/ceph-vol1/file1 count=100 bs=1M
Ceph supports thin-provisioned block devices, that is, the physical storage space will not get occupied until you really begin storing data to the block device. Ceph RADOS block devices are very flexible; you can increase or decrease the size of RBD on the fly from the Ceph storage end. However, the underlying filesystem should support resizing. Advance filesystems such as XFS, Btrfs, EXT, and ZFS support filesystem resizing to a certain extent. Follow filesystem-specific documentation to know more on resizing.
To increase or decrease the Ceph RBD image size, use the --size <New_Size_in_MB>
parameter with the rbd resize
command; this will set the new size for an RBD image. The original size of the RBD image ceph-client1-rbd1
was 10 GB; the following command will increase its size to 20 GB:
# rbd resize rbd/ceph-client1-rbd1 --size 20480
Now that the Ceph RBD image has been resized, you should check if the new size is being accepted by the kernel as well by executing the following command:
# xfs_growfs -d /mnt/ceph-vol1
From the client machine, grow the filesystem so that it can make use of increased storage space. From the client perspective, capacity resize is a feature of an OS filesystem; you should read the filesystem documentation before resizing any partition. An XFS filesystem supports online resizing.
Ceph extends full support to snapshots, which are point-in-time, read-only copies of an RBD image. You can preserve the state of a Ceph RBD image by creating snapshots and restoring them to get the original data.
To test the snapshot functionality of Ceph RBD, let's create a file on RBD:
# echo "Hello Ceph This is snapshot test" > /mnt/ceph-vol1/snaptest_file
Now our filesystem has two files. Let's create a snapshot of Ceph RBD using the rbd snap create <pool-name>/<image-name>@<snap-name>
syntax, as follows:
# rbd snap create rbd/ceph-client1-rbd1@snap1
To list a snapshot of an image, use the rbd snap ls <pool-name>/<image-name>
syntax, as follows:
# rbd snap ls rbd/ceph-client1-rbd1
To test the snapshot restore functionality of Ceph RBD, let's delete files from the filesystem:
# cd /mnt/ceph-vol1 # rm -f file1 snaptest_file
We will now restore Ceph RBD snapshots to get the files that we deleted in the last step back.
The syntax for this is rbd snap rollback <pool-name>/<image-name>@<snap-name>
. The following is the command:
# rbd snap rollback rbd/ceph-client1-rbd1@snap1
Once the snapshot rollback operation is completed, remount the Ceph RBD filesystem to refresh the state of the filesystem. You should able to get your deleted files back.
# umount /mnt/ceph-vol1 # mount /dev/rbd0 /mnt/ceph-vol1
When you no longer need snapshots, you can remove a specific snapshot using the rbd snap rm <pool-name>/<image-name>@<snap-name>
syntax. Deleting the snapshot will not delete your current data on the Ceph RBD image:
# rbd snap rm rbd/ceph-client1-rbd1@snap1
If you have multiple snapshots of an RBD image and you wish to delete all the snapshots in a single command, you can make use of the purge
subcommand.
The syntax for it is rbd snap purge <pool-name>/<image-name>
. The following is the command to delete all snapshots with a single command:
# rbd snap purge rbd/ceph-client1-rbd1
The rbd rm <RBD_image_name> -p <Image_pool_name>
syntax is used to remove an RBD image, as follows:
# rbd rm ceph-client1-rbd1 -p rbd
The Ceph storage cluster is capable of creating Copy-on-write (COW) clones from RBD snapshots. This is also known as snapshot layering in Ceph. This layering feature of Ceph allows clients to create multiple instant clones of Ceph RBD. This feature is extremely useful for Cloud and virtualization platforms such as OpenStack, CloudStack, and Qemu/KVM. These platforms usually protect Ceph RBD images containing OS/VM images in the form of a snapshot. Later, this snapshot is cloned multiple times to spin new virtual machines/instances. Snapshots are read only, but COW clones are fully writable; this feature of Ceph provides a greater flexibility and is extremely useful for cloud platforms. The following diagram shows relationship between RADOS block device, RBD snapshot, and COW snapshot clone. In the upcoming chapters of this book, we will discover more on COW clones to spawn OpenStack instances.
Every cloned image (child image) stores references of its parent snapshot to read image data. Hence, the parent snapshot should be protected before it can be used for cloning. At the time of data writing on COW-cloned images, it stores new data references to itself. COW-cloned images are as good as RBD.
They are quite flexible, similar to RBD, that is, they are writable, resizable, can create new snapshots, and can be cloned further.
The type of the RBD image defines the feature it supports. In Ceph, an RBD image is of two types: format-1 and format-2. The RBD snapshot feature is available on both format-1 and format-2 RBD images. However, the layering feature, that is, the COW cloning feature is available only for RBD images with format-2. Format-1 is the default RBD image format.
For demonstration purposes, we will first create a format-2 RBD image, create its snapshot, protect its snapshot, and finally, create COW clones out of it:
# rbd create ceph-client1-rbd2 --size 10240 --image-format 2
# rbd snap create rbd/ceph-client1-rbd2@snapshot_for_clone
# rbd snap protect rbd/ceph-client1-rbd2@snapshot_for_clone
The syntax for this is rbd clone <pool-name>/<parent-image>@<snap-name> <pool-name>/<child-image-name>
. The command to be used is as follows:
# rbd clone rbd/ceph-client1-rbd2@snapshot_for_clone rbd/ceph-client1-rbd3
# rbd --pool rbd --image ceph-client1-rbd3 info
At this point, you have a cloned RBD image, which is dependent upon its parent image snapshot. To make the cloned RBD image independent of its parent, we need to flatten the image, which involves copying the data from a parent snapshot to a child image. The time it takes to complete the flattening process depends upon the size of data present in the parent snapshot. Once the flattening process is completed, there is no dependency between the cloned RBD image and its parent snapshot. Let's perform this flattening process practically:
# rbd flatten rbd/ceph-client1-rbd3
After the completion of the flattening process, if you check the image information, you will notice that the parent image/snapshot name is released, which makes the clone image independent.
# rbd snap unprotect rbd/ceph-client1-rbd2@snapshot_for_clone
# rbd snap rm rbd/ceph-client1-rbd2@snapshot_for_clone
13.58.116.51