The last task required to ensure a smooth production ride with RabbitMQ is the same as that of any other system; proper monitoring and alerting should be put in place in order to stay abreast of what's happening in the running brokers. In essence, the two questions you need to ask yourself are: what to monitor and how to monitor it? Let's take time to answer these two questions in the context of Clever Coney Media. We won't be discussing the monitoring of the machines (hardware or virtual) on which the brokers run, but will be focusing on the RabbitMQ specifics only.
Let's tackle the "how" first. There are two main ways to retrieve live information from a RabbitMQ broker: via the rabbitmqctl
command-line tool and via the REST API exposed over HTTP by the management console. Any decent monitoring system will be able to use one or the other in order to collect metrics and report them to its central aggregation, charting, and alerting engine.
Since you've installed the management console at CCM, you're opting to use its rich and well-documented REST API over the command-line tool. The documentation of this API is available at http://localhost:15672/api/
on any RabbitMQ node where the plugin is installed.
CCM uses Zabbix as its monitoring tool of choice, so you'll be writing single-line shell commands to gather metrics locally and send them to the Zabbix server. All in all, the monitoring architecture will be as represented in the following diagram:
You can learn more about Zabbix by reading Mastering Zabbix from Packt Publishing. You can get more information at http://www.packtpub.com/monitor-large-information-technology-environment-by-using-zabbix/book.
Let's now detail the "what". Here are the different checks and metrics you've decided to implement and their related commands:
0
as follows:curl -s http://ccm-admin:******@localhost:15672/api/aliveness-test/ccm-prod-vhost | grep -c "ok"
curl -s http://ccm-admin:******@localhost:15672/api/nodes | grep -o "contexts" | wc -l
curl -s http://ccm-admin:******@localhost:15672/api/federation-links/ccm-prod-vhost | grep -o "running" | wc -l
user-dlq
and authentication-service
queues have less than 25 messages in them. Otherwise, an alarm will be raised to indicate that either the consumers are down or are too slow and most of them would need to be provisioned. The scripts have to be written to fail gracefully if the queues don't exist:curl -s -f http://ccm-admin:******@localhost:15672/api/queues/ccm-dev-vhost/user-dlq | jq '.messages_ready' curl -s -f http://ccm-admin:******@localhost:15672/api/queues/ccm-dev-vhost/authentication-service | jq '.messages_ready'
curl -s http://ccm-admin:******@localhost:15672/api/vhosts/ccm-prod-vhost | jq '.messages_details.rate'
Some metrics come with related rigid upper limits whose values are also available from the API. For these, you'll raise an alarm whenever a threshold of 80 percent of the upper limit is reached. The following script will return false
when the alarm must be raised. Let's detail them:
curl -s http://ccm-admin:******@localhost:15672/api/nodes/rabbit@${host} | jq '.fd_used<.fd_total*.8'
curl -s http://ccm-admin:******@localhost:15672/api/nodes/rabbit@${host} | jq '.sockets_used<.sockets_total*.8'
curl -s http://ccm-admin:******@localhost:15672/api/nodes/rabbit@${host} | jq '.proc_used<.proc_total*.8'
curl -s http://ccm-admin:******@localhost:15672/api/nodes/rabbit@${host} | jq '.mem_used<.mem_limit*.8' curl -s http://ccm-admin:******@localhost:15672/api/nodes/rabbit@${host} | jq '.disk_free_limit<.disk_free*.8'
On top of that, the presence of the following two processes must be checked:
Finally, the occurrence of ERROR REPORT
entries in the main RabbitMQ logfile needs to be monitored as well. This logfile is typically located at: /var/log/rabbitmq/rabbit@<hostname>.log
.
You now have the means to gather a holistic view of your RabbitMQ brokers all across your network in order to be proactive and stay on top of issues before they become too problematic.
3.147.79.11