A Ceph cluster can be made up of 10 to several thousand physical disks that provide storage capacity to the cluster. As the number of physical disks increases for your Ceph cluster, the frequency of disk failures also increases. Hence, replacing a failed disk drive might become a repetitive task for a Ceph storage administrator. In this recipe, we will learn about the disk replacement process for a Ceph cluster.
HEALTH_OK
:# ceph status
ceph-node1
down, detaching a disk, and powering up the VM. Execute the following commands from your HOST machine:# VBoxManage controlvm ceph-node1 poweroff # VBoxManage storageattach ceph-node1 --storagectl "SATA" --port 1 --device 0 --type hdd --medium none # VBoxManage startvm ceph-node1
The following screenshot will be your output:
ceph-node1
contains a failed disk, osd.0
, which should be replaced:# ceph osd tree
You will also notice that osd.0
is DOWN
, however, it's still marked as IN
. As long as its status is marked IN
, the Ceph cluster will not trigger data recovery for this drive. By default, the Ceph cluster takes 300 seconds to mark a down disk as OUT
and then triggers data recovery. The reason for this timeout is to avoid unnecessary data movements due to short-term outages, for example, Server reboot. One can increase or even decrease this timeout value if they prefer.
OUT:
# ceph osd out osd.0
OUT
, the Ceph cluster will initiate a recovery operation for the PGs that were hosted on the failed disk. You can watch the recovery operation using the following command:# ceph status
# ceph osd crush rm osd.0
# ceph auth del osd.0
# ceph osd rm osd.0
HEALTH_OK
:# ceph -s # ceph osd stat
# VBoxManage controlvm ceph-node1 poweroff # VBoxManage storageattach ceph-node1 --storagectl "SATA" --port 1 --device 0 --type hdd --medium ceph-node1_disk2.vdi # VBoxManage startvm ceph-node1
# ceph-deploy disk list ceph-node1
disk zap
:# ceph-deploy disk zap ceph-node1:sdb
osd.0
:# ceph-deploy --overwrite-conf osd create ceph-node1:sdb
HEALTHY_OK
again:# ceph -s # ceph osd stat
3.137.183.210