Monitoring the broker

The last task required to ensure a smooth production ride with RabbitMQ is the same as that of any other system; proper monitoring and alerting should be put in place in order to stay abreast of what's happening in the running brokers. In essence, the two questions you need to ask yourself are: what to monitor and how to monitor it? Let's take time to answer these two questions in the context of Clever Coney Media. We won't be discussing the monitoring of the machines (hardware or virtual) on which the brokers run, but will be focusing on the RabbitMQ specifics only.

Let's tackle the "how" first. There are two main ways to retrieve live information from a RabbitMQ broker: via the rabbitmqctl command-line tool and via the REST API exposed over HTTP by the management console. Any decent monitoring system will be able to use one or the other in order to collect metrics and report them to its central aggregation, charting, and alerting engine.

Note

An experimental SNMP monitoring plugin has been developed for RabbitMQ. I have successfully used it in the past, but its development has unfortunately been abandoned.

Since you've installed the management console at CCM, you're opting to use its rich and well-documented REST API over the command-line tool. The documentation of this API is available at http://localhost:15672/api/ on any RabbitMQ node where the plugin is installed.

Tip

Keep in mind that the management console is backed by the API, so anything you see and do with your browser can be done via the API.

CCM uses Zabbix as its monitoring tool of choice, so you'll be writing single-line shell commands to gather metrics locally and send them to the Zabbix server. All in all, the monitoring architecture will be as represented in the following diagram:

Monitoring the broker

RabbitMQ monitoring architecture at CCM

Note

You can learn more about Zabbix by reading Mastering Zabbix from Packt Publishing. You can get more information at http://www.packtpub.com/monitor-large-information-technology-environment-by-using-zabbix/book.

Let's now detail the "what". Here are the different checks and metrics you've decided to implement and their related commands:

  • Node liveness: Check whether RabbitMQ is performing its basic duties by executing a set of commands (declares the aliveness-test queue, publishes to, and consumes from it). Set the alarm to fire if the command returns 0 as follows:
    curl -s http://ccm-admin:******@localhost:15672/api/aliveness-test/ccm-prod-vhost | grep -c "ok"
  • Cluster size: Check on each of the clustered nodes their view of the active cluster size (it can differ in case of a network partition). Set the alarm to fire if the size is less than the healthy cluster size, which is 2 in your case, as follows:
    curl -s http://ccm-admin:******@localhost:15672/api/nodes | grep -o "contexts" | wc -l
    
  • Federation status: Check the active upstream links on the central log aggregation broker and raise an alarm if it's less than the optimal size (5 in your case) as follows:
    curl -s http://ccm-admin:******@localhost:15672/api/federation-links/ccm-prod-vhost | grep -o "running" | wc -l
    
  • Queues high-watermarks: Ensure the number of available messages in a queue is below a certain threshold. In your case, you'll verify that both the user-dlq and authentication-service queues have less than 25 messages in them. Otherwise, an alarm will be raised to indicate that either the consumers are down or are too slow and most of them would need to be provisioned. The scripts have to be written to fail gracefully if the queues don't exist:
    curl -s -f http://ccm-admin:******@localhost:15672/api/queues/ccm-dev-vhost/user-dlq | jq '.messages_ready'
    curl -s -f http://ccm-admin:******@localhost:15672/api/queues/ccm-dev-vhost/authentication-service | jq '.messages_ready'
    
  • Overall message throughput: Monitor the intensity of the messaging traffic on a particular broker, for which you won't set any particular alarm (you may have to add an alarm if a throughput threshold proves to be the upper limit of what one of your broker can withstand). The following command will do the same for you:
    curl -s http://ccm-admin:******@localhost:15672/api/vhosts/ccm-prod-vhost | jq '.messages_details.rate'

Some metrics come with related rigid upper limits whose values are also available from the API. For these, you'll raise an alarm whenever a threshold of 80 percent of the upper limit is reached. The following script will return false when the alarm must be raised. Let's detail them:

  • File descriptors: The performance of the message persistence on the disk can be affected if not enough descriptors are available.
    curl -s http://ccm-admin:******@localhost:15672/api/nodes/rabbit@${host} | jq '.fd_used<.fd_total*.8'
    
  • Socket descriptors: RabbitMQ will stop accepting new connections if these descriptors are exhausted.
    curl -s http://ccm-admin:******@localhost:15672/api/nodes/rabbit@${host} | jq '.sockets_used<.sockets_total*.8'
    
  • Erlang processes: There is an upper limit to the number of processes that can be created in an Erlang VM. Albeit if very high (around a million), it is worth keeping an eye on them.
    curl -s http://ccm-admin:******@localhost:15672/api/nodes/rabbit@${host} | jq '.proc_used<.proc_total*.8'
    
  • Memory and disk space: If any of these system resources get exhausted, RabbitMQ will not be able to work properly.
    curl -s http://ccm-admin:******@localhost:15672/api/nodes/rabbit@${host} | jq '.mem_used<.mem_limit*.8'
    curl -s http://ccm-admin:******@localhost:15672/api/nodes/rabbit@${host} | jq '.disk_free_limit<.disk_free*.8'
    

On top of that, the presence of the following two processes must be checked:

  • rabbitmq-server: This is obvious but should not be forgotten!
  • epmd: The Erlang Port Mapper Daemon plays a critical role in the clustering mechanism and, as such, should be carefully monitored.

Finally, the occurrence of ERROR REPORT entries in the main RabbitMQ logfile needs to be monitored as well. This logfile is typically located at: /var/log/rabbitmq/rabbit@<hostname>.log.

You now have the means to gather a holistic view of your RabbitMQ brokers all across your network in order to be proactive and stay on top of issues before they become too problematic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.79.11