Now that you have MCollective working, and have an idea of how powerful MCollective can be, let’s go over some of the steps involved in maintaining and debugging MCollective.
If you have a firewall or flow-tracking switch (e.g. Juniper) between your servers and your middleware, you may need to tweak the settings to ensure the connections remain open.
MCollective’s STOMP sessions are idle unless a client is actively issuing requests. MCollective does set the keep-alive
flag on the TCP session, but many operating systems send the first keep-alive
packet long after most firewalls drop the session from their active table. The server will not be aware that the session has been cut. The middleware will not learn until it tries to forward a message from a client.
To keep the sessions alive, configure the server to send updated registration information on a period shorter than the time the firewall will time out the session. In most situation, every 10 minutes is more than sufficient.
# server.cfg
registerinterval
=
600
# seconds
The default registration agent is AgentList
which only sends a list of the installed server plugins. You can create your own registration agents to send other information, as we’ll document in Chapter 14.
After any server or agent configuration change you’ll need to restart mcollectived
before the changes will be visible.
$ sudo service mcollective restart
Shutting down mcollective: [ OK ]
Starting mcollective: [ OK ]
You can send the mcollectived
daemon a USR1 signal to make it reload agent plugins. On most platforms you can do this with pkill
. The -x
option can be used to ensure you don’t kill any other program with a partial name match. The following command will cause mcollective to reload the agents, but also report back any failures from the pkill command.
$ sudo pkill -USR1 -x mcollectived || echo "pkill failed: $?"
Note that this won’t report back any failures from mcollective. For that purpose you’d have to read the log files:
$ tail -20 /var/log/mcollective.log
In addition to the list of agents available on a server, MCollective also reports back a fair number of statistics from the inventory
request.
$ mco inventory heliotrope
Inventory for heliotrope:
Server Statistics:
Version: 2.5.0
Start Time: Mon Apr 14 23:27:32 -0700 2014
Config File: /etc/mcollective/server.cfg
Collectives: mcollective
Main Collective: mcollective
Process ID: 29427
Total Messages: 5
Messages Passed Filters: 5
Messages Filtered: 0
Expired Messages: 0
Replies Sent: 4
Total Processor Time: 2.66 seconds
System Time: 3.65 seconds
Agents:
discovery filemgr nettest
package puppet rpcutil
service
...several hundred other lines of output
As the output of inventory
is very verbose I rather like using awk to stop after the first blank line.
$ mco inventory heliotrope
| awk '/Server/','/^$/'
The following are defaults for logging used if not override in the server.cfg file.
logger_type = file loglevel = info logfile = /var/log/mcollective.log keeplogs = 5 max_log_size = 2097152 logfacility = user
In this configuration mcollectived
writes its own logs to disk, and does its own log rotation. It keeps five logs on disk, and rotates when each log reaches 2 MB.
This may work for many underutilized hardware systems, but may be non-optimal in many situations where storage is expensive or the systems are virtualized. Personally I prefer to utilize the existing logging and analysis infrastructure, and recommend the following settings:
logger_type = syslog loglevel = debug logfacility = daemon
These settings are documented in detail at http://docs.puppetlabs.com/mcollective/configure/server.html#logging and http://docs.puppetlabs.com/mcollective/configure/client.html#logging.
There are two ways to monitor that MCollective servers are alive: actively, and passively.
An active check would be to issue a call to an agent available on every node, and validate the results. This could be something as simple as mco ping
which is a low-level connectivity test which doesn’t require authentication or authorization. Or you could test to a specific plugin, e.g. a NRPE test. We provide a script to do this in Creating a Standalone Client.
A passive check would be to listen to the registration agent topic and look for servers which haven’t checked in recently. We discuss how to build a registration agent in Registration Collector. An example of how to check this with Nagios can be found at Puppet Labs wiki AgentRegistrationMonitor.
18.117.188.138