Chapter 10. OpenStack Performance, Availability, and Reliability

We have covered a lot of errors and problems that you need to troubleshoot in a typical OpenStack installation. In this final chapter, we want to cover some of the chronic issues that might be early signs of trouble. This chapter is more about prevention and aims to help you avoid emergency troubleshooting as much as possible.

We will be looking at the following topics in this chapter:

  • Databases
  • RabbitMQ
  • Services
  • Community resources

Databases

As we have seen throughout this book, many OpenStack services make heavy use of databases. Production deployments typically use MySQL or Postgres as a backend database server. As you have learned, a failing or misconfigured database will quickly lead to trouble in your OpenStack cluster. Database problems can also present more subtle concerns that may grow into huge problems if neglected.

Availability

The database server can become a single point of failure if your database server is not deployed in a highly available configuration. OpenStack does not require a high-availability installation of your database, and as a result, many installations may skip this step. However, production deployments of OpenStack should take care to ensure that their database can survive the failure of a single database server.

MySQL with Galera Cluster

For installations that use the MySQL database engine, there are several options that can be used to cluster your installation. One popular method is to leverage Galera Cluster (http://galeracluster.com/). Galera Cluster for MySQL leverages synchronous replication and provides a multi-master cluster, which offers high availability for your OpenStack databases.

Postgres

Installations that use the Postgres database engine have several options: high availability, load balancing, and replication. Options include block device replication with DRBD, log shipping, master-standby replication based on triggers, statement based replication, and asynchronous multi-master replication. For details, refer to the Postgres high-availability guide (http://www.postgresql.org/docs/current/static/high-availability.html).

Performance

Database performance is one of those metrics that can degrade over time. For those administrators who do not pay attention, small problems in this area can eventually become large problems. A wise administrator will regularly monitor the performance of their database constantly to be on a lookout for slow queries, high-database loads, and other indications of trouble.

MySQL

There are several options to monitor your MySQL server, some of which are commercial and many that are open source. Administrators should evaluate the options available and select a solution that fits their current set of tools and operating environment. There are several performance metrics you will want to monitor. Some of them are discussed in the following sections.

Show status

The MySQL SHOW STATUS statement can be executed from the mysql Command Prompt. The output of this statement is server status information with over 300 variables that are reported. To narrow down this information, you can leverage a LIKE clause on the variable_name command to display the sections you are interested in. Here is an abbreviated list of the output returned by SHOW STATUS:

mysql> SHOW STATUS;
+------------------------------------------+-------------+
| Variable_name                            | Value       |
+------------------------------------------+-------------+
| Aborted_clients                          | 29          |
| Aborted_connects                         | 27          |
| Binlog_cache_disk_use                    | 0           |
| Binlog_cache_use                         | 0           |
| Binlog_stmt_cache_disk_use               | 0           |
| Binlog_stmt_cache_use                    | 0           |
| Bytes_received                           | 614         |
| Bytes_sent                               | 33178       |

Mytop

Mytop is a command-line utility inspired by the Linux top command. Mytop retrieves data from the MySql SHOW PROCESSLIST and SHOW STATUS commands. Data from these commands is refreshed, processed, and displayed in the output of the Mytop command. The Mytop output includes a header, which contains summary data, followed by a thread section.

The Mytop header section

Here is an example of the header output from the Mytop command:

MySQL on localhost (5.5.46)                                                                                                                    load 1.01 0.85 0.79 4/538 23573 up 5+02:19:24 [14:35:24]
 Queries: 3.9M     qps:    9 Slow:     0.0         Se/In/Up/De(%):    49/00/08/00 
 Sorts:     0 qps now:   10 Slow qps: 0.0  Threads:   30 (   1/   4) 40/00/12/00 
 Cache Hits: 822.0 Hits/s:  0.0 Hits now:   0.0  Ratio:  0.0%
 Ratio now:  0.0% 
 Key Efficiency: 97.3%  Bps in/out:  1.7k/ 3.1k   Now in/out:  1.0k/ 3.9k

As demonstrated in the preceding output, the header section for the Mytop command includes the following information:

  • The hostname and MySQL version
  • The server load
  • The MySQL server uptime
  • The total number of queries
  • The average number of queries
  • Slow queries
  • The percentage of Select, Insert, Update, and Delete queries
  • Queries per second
  • Threads
  • Cache hits
  • Key efficiency
The Mytop thread section

The Mytop thread section will list as many threads as can be displayed. The threads are ordered by the Time column, which displays the threads idle time:

       Id      User         Host/IP         DB       Time    Cmd    State Query                                                                                                                        
       --      ----         -------         --       ----    ---    ----- ----------                                                                                                                   
     3461   neutron  174.143.201.98    neutron       5680  Sleep                                                                                                                                       
     3477    glance  174.143.201.98     glance       1480  Sleep                                                                                                                                       
     3491      nova  174.143.201.98       nova        880  Sleep                                                                                                                                       
     3512      nova  174.143.201.98       nova        281  Sleep                                                                                                                                       
     3487  keystone  174.143.201.98   keystone        280  Sleep                                                                                                                                       
     3489    glance  174.143.201.98     glance        280  Sleep                                                                                                                                       
     3511  keystone  174.143.201.98   keystone        280  Sleep                                                                                                                                       
     3513   neutron  174.143.201.98    neutron        280  Sleep                                                                                                                                       
     3505  keystone  174.143.201.98   keystone        279  Sleep                                                                                                                                       
     3514  keystone  174.143.201.98   keystone        141  Sleep                                                                                                                                       
     ...

The Mytop thread section displays the ID of each thread followed by the user and host. Finally, this section will display the database, idle time, and state or command query. Mytop will allow you to keep an eye on the performance of your MySQL database server.

Percona Toolkit

Percona Toolkit is a very useful set of command-line tools that are used to perform MySQL operations and system tasks. The toolkit can be downloaded from Percona at https://www.percona.com/downloads/percona-toolkit/. The output from these tools can be fed into your monitoring system, allowing you to effectively monitor your MySQL installation.

Postgres

Like MySQL, the Postgres database also has a series of tools, which can be leveraged to monitor database performance. In addition to standard Linux troubleshooting tools, such as top and ps, Postgres also offers its own collection of statistics.

The PostgreSQL statistics collector

The statistics collector in Postgres allows you to collect data related to a server's activity. The statistics collected in this tool is varied, and may be helpful for troubleshooting or system monitoring. In order to leverage the statistics collector, you must turn on the functionality in the postgresql.conf file. The settings are commented out by default in the RUNTIME STATISTICS section of the configuration file. Uncomment the lines in the Query/Index Statistics Collector subsection:

#------------------------------------------------------------------------------
# RUNTIME STATISTICS
#------------------------------------------------------------------------------

# - Query/Index Statistics Collector -

track_activities = on
track_counts = on
track_io_timing = off
track_functions = none                 # none, pl, all
track_activity_query_size = 1024       # (change requires restart)
update_process_title = on
stats_temp_directory = 'pg_stat_tmp'

Once the statistics collector is configured, restart the database server or execute a pg_ctl command reload for the configuration to take effect. Once the collector has been configured, there will be a series of views created that are named with the prefix pg_stat. These views can be queried for relevant statistics in the Posgres database server.

Database backups

A diligent operator will be sure to take a backup of the database for each OpenStack project. Since most OpenStack services make heavy use of the database to persist things such as states and metadata, a corruption or loss of data could render your OpenStack cloud unusable. Current database backups can help rescue you from this fate. MySQL users can use the mysqldump utility to take a back up of all OpenStack databases:

mysqldump --opt --all-databases > all_openstack_dbs.sql

Similarly, Postgres users can take a back up of all OpenStack databases with a command similar to the one shown here:

pg_dumpall > all_openstack_dbs.sql

Your cadence for backups will depend on your environment and tolerance for data corruption or loss. You should store these backups in a safe place and occasionally deploy test restores from the data to ensure that they are working as expected.

Monitoring

Monitoring is often your early warning system that something is going wrong in your cluster. Your monitoring system can also be a rich source of information when it there comes a time to troubleshoot issues with the cluster. There are multiple options available to monitor OpenStack. Many of your current application monitoring platforms will handle OpenStack just as well as any other Linux system. Regardless of the tool you select to do your monitoring, there are several parts of OpenStack you should focus on.

Resource monitoring

OpenStack is typically deployed on a series of Linux servers. Monitoring the resources on those servers is essential. A set-it-and-forget-it attitude is a recipe for disaster. Things you may want to monitor on your host servers include the following:

  • A CPU
  • A disk
  • Memory
  • The log file size
  • Network I/O
  • A database
  • A message broker

OpenStack quotas

OpenStack operators have the option of setting usage quotas for each tenant/project. As an administrator, it is helpful to monitor a project's amount of usage, as it pertains to these quotas. Once users reach a quota, they may not be able to deploy additional resources. Users may misinterpret this as an error in the system and report it to you as such. By keeping an eye on the quotas, you can proactively warn users as they reach their thresholds or you can decide to increase the quotas as appropriate. Some of the services have client commands that can be used to retrieve quota statistics. As an example, take a look at the nova absolute-limits command here:

nova absolute-limits
+--------------------+------+-------+
| Name               | Used | Max   |
+--------------------+------+-------+
| Cores              | 1    | 20    |
| FloatingIps        | 0    | 10    |
| ImageMeta          | -    | 128   |
| Instances          | 1    | 10    |
| Keypairs           | -    | 100   |
| Personality        | -    | 5     |
| Personality Size   | -    | 10240 |
| RAM                | 512  | 51200 |
| SecurityGroupRules | -    | 20    |
| SecurityGroups     | 1    | 10    |
| Server Meta        | -    | 128   |
| ServerGroupMembers | -    | 10    |
| ServerGroups       | 0    | 10    |
+--------------------+------+-------+

The absolute-limits command in Nova is nice because it displays the project's current usage along with the quota maximum, making it easy to note that a project/tenant is close to the limit.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.0.89