Custom Ceph collectd plugins

Although the standard collectd Ceph plugin does a good job of collecting all of Ceph's performance counters, it falls short of collecting all the required data to allow you to get a complete view of your cluster health and performance. This section will demonstrate how to use additional custom collectd plugins to collect the PG states, per pool performance stats, and more realistic latency figures:

  1. Jump on to one of your mon nodes via SSH and clone the following git repository:
       git clone https://github.com/grinapo/collectd-ceph
  1. Create a ceph directory under the collectd/plugins directory:
       sudo mkdir -p /usr/lib/collectd/plugins/ceph
  1. Copy the plugins directory to /usr/lib/collectd/plugins/ceph using the following command:
       sudo cp -a collectd-ceph/plugins/*  
/usr/lib/collectd/plugins/ceph/
  1. Now create a new collectd configuration file to enable the plugins:
       sudo nano /etc/collectd/collectd.conf.d/ceph2.conf
  1. Place the following configuration inside it and save the new file:
       <LoadPlugin "python">
Globals true
</LoadPlugin>

<Plugin "python">
ModulePath "/usr/lib/collectd/plugins/ceph"

Import "ceph_pool_plugin"
Import "ceph_pg_plugin"
Import "ceph_latency_plugin"

<Module "ceph_pool_plugin">
Verbose "True"
Cluster "ceph"
Interval "60"
</Module>
<Module "ceph_pg_plugin">
Verbose "True"
Cluster "ceph"
Interval "60"
</Module>
<Module "ceph_latency_plugin">
Verbose "True"
Cluster "ceph"
Interval "60"
TestPool "rbd"
</Module>
</Plugin>

The latency plugin uses RADOS bench to determine the cluster latency; this means that it is actually running RADOS bench and will write data to your cluster. The TestPool parameter determines the target for the RADOS bench command. It is therefore recommended that on a production cluster, a separate small pool is created for this use.

If you are trying to use these extra plugins on Kraken+ releases of Ceph, you will need to edit the ceph_pg_plugin.py file and modify the variable name on line 71 from fs_perf_stat to perf_stat.
  1. Restart the collectd service:
       service collectd restart

The average cluster latency can now be obtained by the following query: 

       collectd.mon1.ceph-ceph.cluster.gauge.avg_latency

This figure is based on doing 64 Kb writes, and so unlike the OSD metrics, it will not change depending on the average client I/O size.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.189.251