Monitoring control services

The control tier of an OpenStack cloud has the most moving parts that will need to be monitored. There are a few services that need at least a basic service connection validation. They include, but are not limited to, MySQL, RabbitMQ, and MongoDB. More monitoring can certainly be added beyond simple connection checks to monitor connections, queue sizes, and other statistics of the services. For now, we'll just add a connection check to make sure that these services are running:

define service { 
check_command check_mysql!nagios! nagios_password
host_name control 
service_description MySQL Health check 
use generic-service 
}
define service { 
check_command check_nrpe!check_rabbitmq_aliveness 
host_name control 
service_description RabbitMQ service check 
use generic-service 
}
define service { 
check_command check_nrpe!check_mongod_connect
host_name control 
service_description MongoDB service check 
use generic-service 
}

You can get the scripts for Rabbit and Mongo from https://github.com/mzupan/nagios-plugin-mongodb and https://github.com/jamesc/nagios-plugins-rabbitmq.

Next, we get into checking OpenStack services. We are going to add API checks to make sure that the service is running and that it is not in an error state. Packstack includes a few scripts to cover most of the API services. A few are additional to Packstack. Let's add the service stanzas for Nagios for the API calls:

define service {
check_command keystone-user-list
host_name control
normal_check_interval 5
service_description number of keystone users
use generic-service
}
define service {
check_command neutron-net-list
host_name network
service_description Neutron Server service check
use generic-service
}
define service {
check_command nova-list
host_name control
normal_check_interval 5
service_description number of nova instances
use generic-service
}
define service {
check_command glance-index
host_name control
normal_check_interval 5
service_description number of glance images
use generic-service
}
define service {
check_command cinder-list
host_name control
normal_check_interval 5
service_description number of cinder volumes
use generic-service
}
define service {
check_command heat-stack-list
host_name control
normal_check_interval 5
service_description number of heat stacks for admin
use generic-service
}
define service {
check_command ceilometer-resource-list
host_name control
normal_check_interval 5
service_description number of ceilometer resources
use generic-service
}
define service {
check_command swift-list
host_name control
normal_check_interval 5
service_description number of swift containers for admin
use generic-service
}

With these basic checks in place, a set of successful checks in Nagios will show that services are up and running and the API services are healthy enough to list the resources that are being managed. There is a collection of services on the control node that are not API services. It is usually enough to do a service status check on them to make sure they are running. Let's add a service status check for the rest of the services that are not API endpoint services. You will want to add configuration stanzas that look like this for each service:

define service {
check_command check_nrpe!check_service_name
host_name 10.100.0.4
service_description Service Name service check
use generic-service
}

Do that for each of the following services, replacing service_name and Service Name with the actual service names:

openstack-ceilometer-alarm-evaluator
openstack-ceilometer-alarm-notifier
openstack-ceilometer-central
openstack-ceilometer-collector
openstack-ceilometer-notification
openstack-cinder-backup
openstack-cinder-scheduler
openstack-cinder-volume
openstack-glance-registry
openstack-heat-api-cfn
openstack-heat-engine
openstack-nova-cert
openstack-nova-conductor
openstack-nova-consoleauth
openstack-nova-novncproxy
openstack-nova-scheduler

Remember that each of these services points to a corresponding NRPE command, so the hosts that these services run on will have to have the corresponding NRPE command defined on them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.19.185