Maintaining a Ceph cluster

Being a Ceph storage admin, maintaining your Ceph cluster would be one of your top priorities. Ceph is a distributed system that is designed to grow from tens of OSDs to several thousands of them. One of the key things required to maintain a Ceph cluster is to manage its OSDs. In this recipe, we will cover Ceph sub commands for OSDs and PGs that will help you during cluster maintenance and troubleshooting.

How to do it…

To understand the need for these commands better, let's assume a scenario where you want to add a new node to your production Ceph cluster. One way is to simply add the new node with several disks to the Ceph cluster, and the cluster will start backfilling and shuffling the data on to the new node. This is fine for a test cluster.

However, the situation becomes very critical when it comes to the production setup, where you should use some of the ceph osd sub commands/flags, which are mentioned as follows, before adding a new node to the cluster, such as noin, nobackfill, and so on. This is done so that your cluster does not immediately start the backfilling process when the new node comes in. You can then unset these flags during non-peak hours, and the cluster will take its time to rebalance:

  1. The usages of these flags are as simple as set and unset. For example, to set a flag, use the following command lines:
    # ceph osd set <flag_name>
    # ceph osd set noout
    # ceph osd set nodown
    # ceph osd set norecover
    
  2. Now, to unset the same flags, use the following command lines:
    # ceph osd unset <flag_name>
    # ceph osd unset noout
    # ceph osd unset nodown
    # ceph osd set norecover
    

How it works…

We will now learn what these flags are and why they are used.

  • noout: This forces the Ceph cluster to not mark any OSD as out of the cluster, irrespective of its status. It makes sure all the OSDs remain inside the cluster.
  • nodown: This forces the Ceph cluster to not mark any OSD down, irrespective of its status. It makes sure all the OSDs remain UP and none of them DOWN.
  • noup: This forces the Ceph cluster to not mark any down OSD as UP. So, any OSD that is marked DOWN can only come UP after this flag is unset. This also applies to new OSDs that are joining the cluster.
  • noin: This forces the Ceph cluster to not allow any new OSD to join the cluster. This is quite useful if you are adding several OSDs at once and don't want them to join the cluster automatically.
  • norecover: This forces the Ceph cluster to not perform cluster recovery.
  • nobackfill: This forces the Ceph cluster to not perform backfilling. This is quite useful when you are adding several OSDs at once and don't want Ceph to perform automatic data placement on the new node.
  • norebalance: This forces the Ceph cluster to not perform cluster rebalancing.
  • noscrub: This forces Ceph to not perform OSD scrubbing.
  • nodeep-scrub: This forces Ceph to not perform OSD deep scrubbing.
  • notieragent: This disables the cache pool tiering agent.

In addition to these flags, you can also use the following commands to repair OSDs and PGs:

  • ceph osd repair: This performs repairing on a specified OSD.
  • ceph pg repair: This performs repairing on a specified PG. Use this command with caution; based on your cluster state, this command can impact user data if not used carefully.
  • ceph pg scrub: This performs scrubbing on a specified PG.
  • ceph deep-scrub: This performs deep-scrubbing on specified PGs.

The Ceph CLI is quite powerful for end-to-end cluster management. You can get more information at http://ceph.com/docs/master/man/8/ceph/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.239.103