High availability

While HAProxy has monitors built into it to check the health of a host, this is only to know whether or not to send traffic to the host. It does not include any capability of recovering from failure.

To make the control tier highly available, Pacemaker is added to the cluster to monitor services, filesystems, networking resources, and other resources that is it not sufficient to simply load-balance, they need to be made highly available. Pacemaker is capable of moving services from node to node in a pacemaker cluster and monitoring the nodes to know whether action needs to be taken to recover a particular resource or even one of the entire nodes. Triple-O will install and configure a highly available control tier with the previously shown options passed to overcloud deploy in this chapter.

There are two major infrastructure considerations that go into designing a Pacemaker cluster. These points are related to the installation of Pacemaker and preparing it to start managing resources that you would like to be highly available. First, at least three nodes are needed to properly configure Pacemaker and establish a quorum. With only two nodes, if communication is lost between them, they can enter a state called split brain. This is when the nodes both think that they should be the primary node because they cannot reach the other node. In this case, resources that should only reside on one server can be started in two places and cause conflict, for example, the Virtual IPs (VIPs). We will discuss the VIPs a little more in just a moment. When there are more than two nodes, there will be a vote cast from the nodes before an action takes place. For example, if one node loses communication with the other, two or more nodes will have to vote for that node to be fenced. When the majority agrees on the action, then the power to the node is cut to reboot it.

Second, fencing must be configured for a proper Pacemaker installation. Fencing is the capability of nodes within the cluster to control the power of each other. In the event that one node loses communication with the others, the other nodes must be able to do something about it. Without the ability to communicate with one of the nodes, the other nodes will first agree that they all cannot communicate with it, then it will be fenced. To fence a node, the power to it is cut to power-cycle it and to force it to reboot in the hope that a fresh boot will restore its communication with the cluster.

Once a Pacemaker cluster is set up and running, it will be ready to have resources configured within it to be highly available. An example set of resources that should be made highly available are HAProxy and the VIP that it is listening in on. HAProxy is a single point of failure for all of the API traffic being passed through it. By adding it as a resource to Pacemaker, it will be monitored to ensure that the IP address is always reachable on one of Pacemaker's nodes and that HAProxy is listening in on that IP address to receive incoming traffic. VIPs are not persisted across boots on the control node. That is because when they are added to Pacemaker as a resource, Pacemaker will handle the configuration and health of that IP address for you.

Almost all of the other OpenStack services can be made highly available. Most of them can be added to Pacemaker in what is called a cloned configuration. That means that Pacemaker expects to run them on more than one node but will monitor their health and restart them if they go down. This is the configuration Triple-O will use the services that are being load-balanced by HAProxy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.122.244