Chapter 5. Troubleshooting OpenStack Compute

Nova is one of the central services of OpenStack, and it is also one of the largest in terms of lines of code. It's also worth noting that Nova is one of the oldest OpenStack projects, and it has seen a lot of changes and development over the years. Nova leverages and interacts with many of the other OpenStack services. As a result, isolating and troubleshooting problems with Nova can be challenging, but in this chapter, we will give you the necessary tips to be successful.

When troubleshooting Nova, it helps to follow a series of steps as you seek to isolate the problems you may encounter. In this chapter, we will work through each of the following topics step by step as we troubleshoot Nova:

  • Checking the services
  • Checking the database
  • Checking the authentication settings
  • Checking the Glance integration
  • Checking the Neutron integration

Checking the services

A successful Nova deployment will have multiple Nova services running, and, in addition, there will be multiple supporting services at play as well. A good first step when troubleshooting is to make sure that each of the services has been successfully initiated. We can check the various Nova services by running this command:

ps –aux | grep nova-

Be sure to include a dash (-), as the Nova services are prefixed with nova-. There are a lot of Nova processes, and in the following sections, we will look at each of these processes. The processes that we will explore are as follows:

  • nova-api
  • nova-scheduler
  • nova-conductor
  • nova-compute

nova-api

The Nova API service is usually run on the controller node. Nova supports an OpenStack API, which is the default, in addition to an AWS EC2 API. A request to port 8774 will be handled by the OpenStack API. A request to port 8773 will be handled by the AWS EC2 API. Nova also supports a metadata service, which will listen on port 8775.

In order to confirm that the nova-api service is running, execute the following command:

ps -aux | grep nova-api

When nova-api is running as expected, the output from this command will look like the output shown here:

nova-api

If the Nova API is not running, the preceding ps -aux command that you ran will come back with just the grep command and no nova-api processes. Also, when you attempt to use the command-line client with Nova, you may encounter an error like the following one:

nova-api

This is your first clue to any problems with the nova-api service. Your first course of action will be to attempt to start the nova-api service. On Ubuntu systems that use upstart, you can run this command:

start nova-api

Once you start the service, you should make sure that it has actually started and continues running successfully. At this point, you want to run ps –aux | grep nova-api again and make sure that the nova-api process is still up.

If the nova-api process isn't returned in the output, then I would recommend that you try to start the process manually. When you start an OpenStack process manually, that is, without the init scripts, any errors during startup will be printed to the console. If you are dealing with a process that fails on startup, your log files will most likely be empty. Starting the process manually will provide you with the clues that you need to troubleshoot further. To start the nova-api service manually, execute the following command:

sudo -u nova nova-api --config-file=/etc/nova/nova.conf

As the preceding command is executed, you will see the start up values printed to the console. Toward the end of this output, you will need to look for the lines indicating that your APIs have started up:

nova-api

If you do not see lines similar to the lines shown in the preceding code snippet, you will most likely be staring at an error somewhere in the output. The good news is that this error will provide sufficient information to determine what is stopping the service from starting. While we cannot cover every potential cause within the confines of this book, we will take a look at the following few potential causes.

Address already in use

Suppose that, when starting the nova-api service manually, you see an error like the one shown here:

ERROR nova error: [Errno 98] Address already in use

This error means that there is something else running on port 8774, which Nova uses for the API service. You can further troubleshoot this issue by running this command:

lsof -i :8774

This command will tell you what is running on port 8774. Once you clear this port, you can attempt to start the nova-api service again by running start nova-api. As always, we want to check whether the nova-api process has started successfully by running the ps –aux | grep nova-api command. If the API has not started successfully, we can attempt to start it manually, as we did before, and look for the error output.

The permission error

Suppose that, when you attempt to start the nova-api process manually, you receive an error like the one shown here:

The permission error

An error like the preceding one points to the fact that there is a permission or ownership problem with the Nova configuration file, typically located at /etc/nova/nova.conf:

chmod 644 /etc/nova/nova.conf 
chown nova:nova /etc/nova/nova.conf 

The Nova configuration file needs to be readable by the Nova user. The preceding chmod and chown commands will set the proper permissions and ownership for this configuration file. After this fix, you can attempt to start the nova-api service again and verify that it is running successfully. If it doesn't start successfully, remember to check the nova-api.log file for clues.

nova-scheduler

The Nova scheduler service is responsible for selecting the compute node that will host a particular instance. If this service is not operating as expected, you will notice problems when trying to create new instances. To check whether the nova-scheduler service is running, we can use the following command:

ps -aux | grep nova-scheduler

The output of this command should have a line similar to the following one:

nova-scheduler

If the Nova scheduler service does not start properly, there are a couple of things you should check. The first troubleshooting step should be attempting to start the nova-scheduler service manually. You can do this by running the following command:

 sudo -u nova /usr/bin/python /usr/local/bin/nova-scheduler --config-file=/etc/nova/nova.conf

Any errors returned from this command should give you clues as to why the nova-scheduler service isn't starting. One error that you may see here is as follows:

nova-scheduler

As we've seen before, the Nova configuration file located at /etc/nova/nova.conf needs to be readable by the Nova user. This error will cause problems with several Nova services, including the Nova scheduler. This problem can be resolved if you make sure that the configuration file is readable by the Nova user.

Once you run the nova-scheduler service successfully, you may still discover problems with the service. Your troubleshooting process should continue by looking at the nova-scheduler log for clues. The nova-scheduler log is typically located at /etc/nova/nova-scheduler.log. It is helpful to grep this log for errors by using a command like the one shown here:

less /var/log/nova/nova-scheduler.log | grep 'ERROR'

The output of this command will list any errors captured in the scheduler log files. There are a few errors to look out for in particular. To operate correctly, the Nova scheduler requires access to the OpenStack message broker and the Nova database.

nova-scheduler

The preceding error indicates that the nova-scheduler service is not able to connect to the AMQP server. In this instance, you want to make sure that the message broker is running and accessible. If you are using RabbitMQ, you can check its status by running this command:

 rabbitmqctl status

When the RabbitMQ service is not running, the output of this command will look similar to the output shown here:

nova-scheduler

The fix for this problem is to start your message broker. For RabbitMQ, you can use the following command to start the message broker:

service rabbitmq-server start

You can confirm that RabbitMQ has started successfully by running the rabbitmqctl status command again. If RabbitMQ starts successfully, you will see an output similar to the following one:

nova-scheduler

In the nova-scheduler.log file, you should also see a confirmation that the scheduler was able to successfully connect to the message broker. Look for the log lines like the ones shown in this code snippet:

2015-09-27 23:52:28.248 2355 INFO oslo.messaging._drivers.impl_rabbit [req-0c95b20c-a70d-40c8-bb90-deeec2f0cd47 - - - - -] Reconnected to AMQP server on myrabbitserver:5672
2015-09-27 23:52:28.249 2355 INFO oslo.messaging._drivers.impl_rabbit [req-0c95b20c-a70d-40c8-bb90-deeec2f0cd47 - - - - -] Connected to AMQP server on myrabbitserver:5672

nova-compute

The Nova compute service needs to be running on each of the compute nodes. You can check whether the service is running by executing the following command:

ps -aux | grep nova-compute

If you find that the nova-compute service is not running, you can start the service by executing the following command on Ubuntu:

start nova-compute

After you attempt to start the nova-compute server, make sure that it is running successfully using the ps -aux command again. If the service does not start and remain running, you should try to start the service manually to check whether there are any errors printed out to the console. Use the following command to start the nova-compute service manually:

sudo -u nova nova-compute --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-compute.conf 

After executing this command, there will be several log lines containing the startup information for the service. You want to be on the look out for any errors or traces printed in this output. If something pops up, use the details of the error to troubleshoot further.

The compute service is responsible for interacting with the underlying hypervisor and plays a critical role when manipulating instances in OpenStack. If there are problems with the Nova compute service, this can result in multiple issues or errors. For example, if you attempt to launch a new instance without running the nova-compute service, you may see that the instance eventually ends up with an ERROR status. For example, when you run nova list, you may see an output like this:

nova-compute

You will also notice that the status of the instance is ERROR. As demonstrated earlier, the first step is to make sure that the nova-compute service is running. If it is and you are still experiencing problems, there are several reasons why you may find your instance in this state. To find more clues about the root cause, we should begin looking through the Nova logs. When troubleshooting an instance with an ERROR state, you will want to look for errors in any of the following log files:

  • /var/log/nova/nova-compute.log
  • /var/log/nova/nova-scheduler.log
  • /var/log/nova/nova-conductor.log

If the nova-compute service is indeed the root cause of the issue, you are likely to find an error in the nova-conductor.log command, similar to the error shown here:

nova-compute

Remember that the nova-compute service is what abstracts the hypervisor in OpenStack. You can run the nova hypervisor-list command to see which hypervisors are available, and this can also give you clues about your compute hosts. For example, if we run nova hypervisor-list when the nova-compute service is down, we may see an output similar to this:

nova-compute

As you can see in this output, the state of the hypervisor is down. This would indicate that we need to look at the nova-compute service to ensure that it is functioning properly.

When the nova-compute service is running, but instances are still ending up with an error state, you can continue troubleshooting by looking at a few more potential causes.

nova-compute

An error like the preceding one in the nova-compute.log file is an indication that you may have a configuration problem on your compute host. Specifically, this error points to the setting of virt_type in /etc/nova/nova-compute.conf. The fix here would be to change virt_type to a value accepted by your hypervisor. Specifically, we would change this value to qemu. Remember to restart the nova-compute service whenever you make a change to the nova-compute.conf configuration file.

nova-conductor

One of the purposes of the Nova conductor service is to handle all the database interactions on behalf of the compute nodes. This allows us to have a more secure installation, as the compute hosts don't have direct access to the database. To check whether nova-conductor is running, you can use the following command:

ps -aux | grep nova-conductor

If nova-conductor is running, the preceding command will return the following output:

nova-conductor

If the service is not running, you can start it by running a command similar to the following command:

start nova-conductor

Once you attempt to start the service, be sure to confirm that it is running successfully using the ps -aux command, like we did earlier. If you find that this service has not started in the right way, you should attempt to start the service manually and check the console for errors:

sudo -u nova nova-conductor --config-file=/etc/nova/nova.conf

When running the service manually, keep an eye on the console for traces or errors. These errors can help you identify issues that are prohibiting the Nova conductor from starting.

Since nova-compute has a dependency on nova-conductor, the order in which these services start is important. If you try to start nova-compute before nova-conductor, you will most likely see the following warning in your nova-compute.log file:

nova-conductor

The fix for the preceding issue is to simply start nova-conductor and then start nova-compute. If your nova-conductor service is not running, you will have issues when trying to launch a new instance, as you can see in the following screenshot:

nova-conductor

In the preceding example, the instance is stuck in the BUILD status with scheduling as the task state. A quick look at the nova-compute.log file will reveal the root of the issue, as shown in the following screenshot:

nova-conductor

As the error message indicates that nova-conductor is not running. The fix here is to start the nova-conductor service. Nova can typically recover from this sort of error, allowing your instance to eventually build successfully:

nova-conductor

Note that the status of the preceding instance is ACTIVE and the power state is Running. This indicates that the instance has been successfully created. While there are various issues that may cause an instance to build unsuccessfully, you can typically determine the root cause by following these points:

  • Making sure that all the Nova services are running
  • Checking the various Nova log files for clues
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.71.72