Monitoring Ceph clusters

In this recipe, we will learn commands that are used to monitor the overall Ceph cluster.

How to do it…

Here is how we go about monitoring the Ceph cluster. The steps are explained topic-wise as follows.

Checking the cluster's health

To check the health of your cluster, use the ceph command followed by health as the command option:

# ceph health

The output of this command would be divided into several sections separated by a semicolon:

Checking the cluster's health

The first section of the output shows that your cluster is in the warning state, HEALTH_WARN, as 64 placement groups (PGs) are degraded. The second section represents that 1408 PGs are not clean, and the third section of the output represents that cluster recovery is going on for 1 out of 5744 objects and the cluster is 0.017% degraded. If your cluster is healthy, you will receive the output as HEALTH_OK.

To find out more details of your cluster health, use the ceph health detail command. This command will tell you all the PGs that are not active and clean, that is, all the PGs that are unclean, inconsistent, and degraded will be listed here with their details. If your cluster is healthy, you will receive the output as HEALTH_OK.

Checking the cluster's health

Monitoring cluster events

You can monitor cluster events using the ceph command with the -w option. This command will display all the cluster events' messages including information (INF), warning (WRN), and error (ERR) in real time. The output of this command will be continuous, live cluster changes; you can use Ctrl + C to get on to the shell:

# ceph -w
Monitoring cluster events

There are other options that can be used with the ceph command to gather different types of event details. They are as follows:

  • --watch-debug: to watch debug events
  • --watch-info: to watch info events
  • --watch-sec: to watch security events
  • --watch-warn: to watch warning events
  • --watch-error: to watch error events

The cluster utilization statistics

To know your cluster's space utilization statistics, use the ceph command with the df option. This command will show the total cluster size, the available size, the used size, and the percentage. This will also display pool information, such as the pool name, ID, utilization, and number of objects in each pool:

# ceph df

The output is as follows:

The cluster utilization statistics

Checking the cluster's status

Checking the cluster's status is the most common and the most frequent operation when managing a Ceph cluster. You can check the status of your cluster using the ceph command and status as the option:

# ceph status

Instead of the status subcommand, you can also use a shorter version, -s, as an option:

# ceph -s

The following screenshot shows the status of our cluster:

Checking the cluster's status

This command will dump a lot of useful information for your Ceph cluster:

  • cluster: This command represents the Ceph unique cluster ID.
  • health: This command shows cluster health.
  • monmap: This command represents the monitor map epoch version, monitor information, monitor election epoch version, and monitor quorum status.
  • mdsmap: This command represent the mdsmap epoch version and the mdsmap status.
  • osdmap: This command represents the osdmap epoch, OSD total, up and in count.
  • pgmap: This command shows the pgmap version, total number of PGs, pool count, capacity in use for a single copy, and total objects. It also displays information about cluster utilization including used size, free size, and total size. Finally, it will display the PG status.

    Tip

    In order to view the real time cluster status, you can use ceph status with the LINUX watch command to get continuous output:

    # watch ceph -s

The cluster authentication entries

Ceph works on an authentication system based on keys. All cluster components interact with each other once they undergo a key-based authentication system. You can use the ceph command with the auth list subcommand to get a list of all the keys:

# ceph auth list

Note

To know more about command operation, you can use help with the sub option. For instance, run # ceph auth --help and use the command as directed in the help.

Monitoring Ceph MON

Usually a Ceph cluster is deployed with more than one MON instance for high availability. Since there is a large number of monitors, they should attain a quorum to make the cluster function properly.

How to do it…

We will now focus on Ceph commands for OSD monitoring. The steps will be explained topic-wise as follows:

Checking the MON status

To display the cluster's MON status and MON map, use the ceph command with either mon stat or the mon dump suboption:

# ceph mon stat
# ceph mon dump

The following screenshot displays the output of this command:

Checking the MON status

Checking the MON quorum status

To maintain a quorum between Ceph MONs, the cluster should always have more than half of the available monitors in a Ceph cluster. Checking the quorum status of a cluster is very useful at the time of MON troubleshooting. You can check the quorum status by using the ceph command and the quorum_status subcommand:

# ceph quorum_status -f json-pretty

The following screenshot displays the output of this command:

Checking the MON quorum status

The quorum status displays election_epoch, which is the election version number, and quorum_leader_name, which denotes the hostname of the quorum leader. It also displays the MON map epoch and cluster ID. Each cluster monitor is allocated with a rank. For I/O operations, clients first connect to the quorum lead monitor; if the leader MON is unavailable, the client then connects to the next rank monitor:

Note

To generate the formatted output for Ceph commands, use the -f json-pretty option.

Monitoring Ceph OSDs

Monitoring OSDs is a crucial task and requires a lot of attention, as there are a lot of OSDs to monitor and take care of. The bigger your cluster, the more OSDs it would have, and the more rigorous the monitoring it would require. Generally, Ceph clusters host a lot of disks, so the chances of facing an OSD failure are quite high.

How to do it…

We will now focus on Ceph commands for OSD monitoring. The steps will be explained topic-wise as follows:

OSD tree view

The tree view in OSD is quite useful for knowing OSD statuses such as IN or OUT and UP or DOWN. The tree view in OSD displays each node with all its OSDs and its location in the CRUSH map. You can check the tree view of OSD using the following command:

# ceph osd tree
OSD tree view

This command displays various useful information for Ceph OSDs, such as weight, UP/DOWN status, and IN/OUT status. The output will be beautifully formatted as per your Ceph crush map. If you were maintaining a big cluster, this format would be beneficial to locating your OSDs and their hosting server from a long list.

OSD statistics

To check OSD statistics, use # ceph osd stat; this command will help you get the OSD map epoch, total OSD count, and their IN and UP statuses.

To get detailed information about the Ceph cluster and OSD, execute the following command:

# ceph osd dump

This is a very useful command that will output the OSD map epoch, pool details including pool ID, pool name, pool type, that is, replicated or erasure, crush ruleset, and PGs. This command will also display information for each OSD, such as the OSD ID, status, weight, last clean interval epoch, and so on. All this information is extremely helpful for cluster monitoring and troubleshooting.

You can also make an OSD blacklist to prevent it from connecting to other OSDs so that no heartbeat process can take place. It's mostly used to prevent a lagging metadata server from making bad changes to data on the OSD. Usually, blacklists are maintained by Ceph itself and shouldn't need manual intervention, but it's good to know.

To display blacklisted clients, execute the following command:

# ceph osd blacklist ls

Checking the crush map

We can query the crush map directly from the ceph osd commands. The crush map command line utility can save a lot of the system administrator's time as compared to the conventional way of viewing and editing it after the decompilation of the crush map:

  • To view the crush map, execute the following command:
    # ceph osd crush dump
    
  • To view the crush map rules, execute the following command:
    # ceph osd crush rule list
    
  • To view the detailed crush rule, execute the following command:
    # ceph osd crush rule dump <crush_rule_name>
    

The following figure displays the output of our query crush map:

Checking the crush map

If you are managing a large Ceph cluster with several hundreds of OSDs, it's sometimes difficult to find the location of a specific OSD in the crush map. It's also difficult if your crush map contains multiple bucket hierarchy. You can use ceph osd find to search for an OSD and its location in a crush map:

# ceph osd find <Numeric_OSD_ID>
Checking the crush map

Monitoring PGs

OSDs store PGs, and each PG contains objects. The overall health of a cluster depends majorly on PGs. The cluster will remain in a HEALTH_OK status only if all the PGs are on the status, active + clean. If your Ceph cluster is not healthy, then there are chances that the PGs are not active + clean. Placement groups can exhibit multiple states, and even combination of states. The following are some states that a PG can be:

  • Creating: The PG is being created.
  • Peering: The process of bringing all of the OSDs that store PGs into agreement about the state of all objects including their metadata in that PG.
  • Active: Once the peering operation is completed, Ceph lists the PG as active. Under the active state, the data in the PG data is available on the primary PG and its replica for the I/O operation.
  • Clean: A clean state means that the primary and secondary OSDs have successfully peered and no PG moves away from their correct location. It also shows that PGs are replicated the correct number of times.
  • Down: This means that the replica with the necessary data is down, so the PG is offline.
  • Degraded: Once an OSD is DOWN, Ceph changes the state of all the PGs that are assigned to that OSD to DEGRADED. After the OSD comes UP, it has to peer again to make the degraded PGs clean. If the OSD remains DOWN and out for more than 300 seconds, Ceph recovers all the PGs that are degraded from their replica PGs to maintain the replication count. Clients can perform I/O even after PGs are in the degraded stage.
  • Recovering: When an OSD goes DOWN, the content of the PGs of that OSD fall behind the contents of the replica PGs on other OSDs. Once the OSD comes UP, Ceph initiates a recovery operation on the PGs to keep them up to date with the replica PGs in other OSDs.
  • Backfilling: As soon as a new OSD is added to the cluster, Ceph tries to rebalance the data by moving some PGs from other OSDs to this new OSD; this process is known as backfilling. Once the backfilling is completed for the PGs, the OSD can participate in the client I/O.
  • Remapped: Whenever there is a change in the PG acting set, data migration happens from the old acting set OSD to the new acting set OSD. This operation might take some time depending on the data size that is being migrated to the new OSD. During this time, the old primary OSD of the old acting group serves to client request. As soon as the data migration operation completes, Ceph uses new primary OSDs from the acting group.

    Note

    An acting set refers to a group of OSDs responsible for PGs. The primary OSD is known as the first OSD from the acting set and is responsible for the peering operation for each PG with its secondary/tertiary OSD. It also entertains write operations from clients. The OSD, which is up, remains in the acting set. Once the primary OSD is down, it's first removed from the up set; the secondary OSD is then promoted to be the primary OSD.

  • Stale: Ceph OSD reports their statistics to the Ceph monitor every 0.5 seconds; by any chance, if the primary OSDs of the PG acting set fail to report their statistics to the monitors, or if other OSDs have reported the primary OSD down, the monitor will consider those PGs as stale.

You can monitor PGs using the following commands:

  • To get the PG status, run # ceph pg stat:
    Monitoring PGs

    The output of the pg stat command will display a lot of information in a specific format: vNNNN: X pgs: Y active+clean; R MB data, U MB used, F GB / T GB avail.

    Where the variables are defined as follows:

    • vNNNN: This is the PG map version number
    • X: The total number of PGs
    • Y: The number of PGs that have an active+clean state
    • R: The raw data stored
    • U: The real data stored after replication
    • F: The free capacity remaining
    • T: The total capacity
  • To get the PG list, execute the following:
    # ceph pg dump -f json-pretty
    

    This command will generate a lot of essential information with respect to PGs, such as the PG map version, PG ID, PG state, acting set, acting set primary, and so on. The output of this command can be huge depending on the number of PGs in your cluster.

  • To query a particular PG for detailed information, execute the following command, which has the syntax as ceph pg <PG_ID> query:
    # ceph pg 2.7d query
    
  • To list stuck PGs, execute the following command that has the syntax as ceph pg dump_stuck < unclean | Inactive | stale >:
    # ceph pg dump_stuck unclean
    

Monitoring Ceph MDS

Metadata servers are used only for CephFS, which is not production ready as of now. The Metadata server has several states, such as UP, DOWN, ACTIVE, and INACTIVE. While performing the monitoring of MDS, you should make sure the state of MDS is UP and ACTIVE. The following commands will help you get information related to the Ceph MDS.

How to do it…

  1. Check the CephFS filesystem list:
    # ceph fs ls
    
  2. Check the MDS status:
    # ceph mds stat
    
  3. Display the details of the metadata server:
    # ceph mds dump
    

The output is shown in the following screenshot:

How to do it…
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.189.199