Implementing high availability on a web server

Now that you know all the software components in play, it's time to go deep into a web server HA configuration. This proposed design foresees Apache, bonded to a virtual IP address, on top of two nodes. In this design, the HTTPD or, better, Apache is on top of an active/passive cluster that is managed by Corosync/Pacemaker.

It is quite an easy task to provide a highly available configuration for the Zabbix GUI because the web application is well defined and does not produce or generate data or any kind of file on the web server. This allows you to have two nodes deployed on two different servers—if possible, on two distant locations—implementing a highly available fault-tolerant disaster-recovery setup. In this configuration, since the web content will be static, in the sense that it will not change (apart from the case of system upgrade), you don't need a filesystem replication between the two nodes. The only other component that is needed is a resource manager that will detect the failure of the primary node and coordinate the failover on the secondary node. The resource manager that will be used is Pacemaker/Corosync.

The installation will follow this order:

  1. Installing the HTTPD server on both nodes.
  2. Installing Pacemaker.
  3. Deploying the Zabbix web interface on both nodes.
  4. Configuring Apache to bind it on VIP.
  5. Configuring Corosync/Pacemaker.
  6. Configuring the Zabbix GUI to access RDBMS (on VIP of PostgreSQL).

The following diagram explains the proposed infrastructure:

Implementing high availability on a web server

Configuring HTTPD HA

Pacemaker is a sophisticated cluster resource manager that is widely used with a lot of features. To set up Pacemaker, you need to:

  • Install Corosync
  • Install Pacemaker
  • Configure and start Corosync

It is time to spend a couple of lines on this part of the architecture. Corosync is a software layer that provides the messaging service between servers within the same cluster.

Corosync allows any number of servers to be a part of the cluster using different fault tolerant configurations, such as Active-Active, Active-Passive, and N+1. Corosync, in the middle of its tasks, checks whether Pacemaker is running and practically bootstraps all the process that is needed.

To install this package, you can run the following command:

$ yum install pacemaker corosync

Yum will resolve all dependencies for you; once everything is installed, you can configure Corosync. The first thing to do is copy the sample configuration file available at the following location:

$ cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf

To configure Corosync, you need to choose an unused multicast address and a port:

$ export MULTICAST_PORT=4000
$ export MULTICAST_ADDRESS=226.94.1.1
$ export BIND_NET_ADDRESS=`ip addr | grep "inet " |grep brd |tail -n1 | awk '{print $4}' | sed s/255/0/`

$ sed -i.bak "s/.*mcastaddr:.*/mcastaddr: $MULTICAST_ADDRESS/g" /etc/corosync/corosync.conf
$ sed -i.bak "s/.*mcastport:.*/mcastport: $MULTICAST_PORT/g" /etc/corosync/corosync.conf
$ sed -i.bak "s/.*bindnetaddr:.*/bindnetaddr: $BIND_NET_ADDRSS/g" /etc/corosync/corosync.conf

Note

Please take care to allow the multicast traffic through the 4000 port running this command from root:

iptables -I INPUT -p udp -m state --state NEW -m multiport --dports 4000 -j ACCEPT

Follow up the preceding steps with:

service iptables save

Now you need to tell Corosync to add the Pacemaker service and create the /etc/corosync/service.d/pcmk file with the following content:

service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}

At this point, you need to propagate the files you just configured on node2:

/etc/corosync/corosync.conf
/etc/corosync/service.d/pcmk

After that, you can start Corosync and Pacemaker on both nodes:

$ /etc/init.d/corosync start
$ /etc/init.d/pacemaker start

Check the cluster status using the following command:

$ crm_mon

Examine the configuration using the following command:

$ crm configure show

Understanding Pacemaker and STONITH

Shoot The Other Node In The Head (STONITH) can introduce a weak point in this configuration; it can cause a split-brain scenario, especially if servers are in two distant locations where numerous causes that can prevent communication between them. The split-brain scenarios happen when each node believes that the other is broken and that it is the first node. Then, when the second reboot occurs, it shoots the first and so on. This is also known as the STONITH death match.

There are basically three issues that can cause one node to STONITH the other:

  • The nodes are alive but unable to communicate with each other
  • A node is dead
  • An HA resource failed to stop

The first cause can be avoided by ensuring redundant communication paths and by handling the multicast properly. This involves the whole network infrastructure, and if you buy a network service from a vendor, you cannot expect safety or trust, and multicasts will not be well managed. The second cause is obvious, and it is unlikely that the node causes the STONITH death match.

The third cause is not easy to understand. This can be clarified with an example. Basically, an HA resource is started on a node. If it is started, the resource will be monitored indefinitely; if the start fails, the resource will be started and stopped and then restarted in either the current node or the second node. If the resource needs to be stopped and the stop happens, the resource is restarted on the other node. Now, if the stop fails, the node will be fenced STONITH because it is considered the safe thing to do.

Note

If the HA resource can't be stopped and the node is fenced, the worse action is killing the whole node. This can cause data corruption on your node, especially if there is an ongoing transactional activity, and this needs to be avoided. It's less dangerous if the HA service is a resource, such as an HTTP server that provides web pages (without transactional activity involved); however, this is not safe.

There are different ways to avoid the STONITH death match, but we want the proposed design to be as easy as possible to implement, maintain, and manage, so the proposed architecture can live without the STONITH actor that can introduce issues if not managed well and configured.

Note

Pacemaker is distributed with STONITH enabled. STONITH is not really necessary on a two-node cluster setup.

To disable STONITH, use the following command:

$ crm configure property stonith-enabled="false"

Pacemaker – is Quorum really needed?

Quorum refers to the concept of voting; it means each node can vote with regard to what can happen. This is similar to democracy, where the majority wins and implements decisions. For example, if you have a three-node (or more) cluster and one of the nodes in the pool fails, the majority can decide to fence the failed node.

With the Quorum configuration, you can also decide on a no-Quorum policy; this policy can be used for the following purposes:

  • Ignore: No action is taken if a Quorum is lost
  • Stop (default option): This stops all resources on the affected cluster node
  • Freeze: This continues running all the existing resources but doesn't start the stopped ones
  • Suicide: This can fence all nodes on the affected partition

All these considerations are valid if you have a three-node or more (nodes) configuration. Quorum is enabled by default on most configurations, but this can't be applied to two-node clusters because there is no majority to elect the winner and get a decision.

The following command needs to be disabled to apply the ignore rule:

$ crm configure property no-quorum-policy=ignore

Pacemaker – the stickiness concept

It is obviously highly desirable to prevent healthy resources from being moved around the cluster. Moving a resource always requires a period of downtime that can't be accepted for a critical service (such as the RDBMS), especially if the resource is healthy. To address this, Pacemaker introduces a parameter that expresses how much a service prefers to stay running where it is actually located. This concept is called stickiness. Every downtime has its cost, which is not necessarily represented by an expense that is tied to the little downtime period needed to switch the resource to the other node.

Pacemaker doesn't calculate this cost associated with moving resources and will do so to achieve the optimal resource placement.

Note

On a two-node cluster, it is important to specify the stickiness; this will simplify all the maintenance tasks. Pacemaker can't decide on switching the resource to a maintenance node without disrupting the service.

Note that Pacemaker's optimal resource placement does not always agree with what you would want to choose. To avoid this movement of resources, you can specify a different stickiness for every resource:

$ crm configure property default-resource-stickiness="100"

Tip

It is possible to use INFINITY instead of a number on the stickiness properties. This will force the cluster to stay on that node until it's dead, and once the INFINITY node comes up, all will migrate back to the primary node:

$ crm configure property default-resource-stickiness="INFINITY"

Pacemaker – configuring Apache/HTTPD

The Pacemaker resource manager needs to access the Apache server's status to know the status of HTTPD. To enable access to the server's status, you need to change the /etc/httpd/conf.d/httpd.conf file as follows:

<Location /server-status>
   SetHandler server-status
   Order deny,allow
   Deny from all
   Allow from 
127.0.0.1 <YOUR-NETWOR-HERE>/24
</Location>

Note

For security reasons, it makes sense to deny access to this virtual location and permit only your network and the localhost (127.0.0.1).

Once this is done, we need to restart Apache by running the following command from root:

$ service httpd restart

This kind of configuration foresees two web servers that will be called www01 and www02 to simplify the proposed example. Again, to keep the example as simple as possible, you can consider the following addresses:

  • www01 (eth0 192.168.1.50 eth1 10.0.0.50)
  • www02 (eth0 192.168.1.51 eth1 10.0.0.51)

Now the first step to perform is to configure the virtual address using the following commands:

$ crm configure
crm(live)configure#
primitive vip ocf:heartbeat:IPaddr2 
> params ip="10.0.0.100"
# please note that 10.0.0.100 is the pacemaker ip address
> nic="eth1" 
> cidr_netmask="24" 
> op start interval="0s" timeout="50s" 
> op monitor interval="5s" timeout="20s" 
> op stop interval="0s" timeout="50s"
crm(live)configure#
show
# make sure
node www01.domain.example.com
node www02.domain.example.com
primitive vip ocf:heartbeat:IPaddr2 
        params ip="10.0.0.100" nic="eth1" cidr_netmask="24" 
        op start interval="0s" timeout="50s" 
        op monitor interval="5s" timeout="20s" 
        op stop interval="0s" timeout="50s"
property $id="cib-bootstrap-options" 
        dc-version="1.1.2-f059ec7cedada865805490b67ebf4a0b963bccfe" 
        cluster-infrastructure="openais" 
        expected-quorum-votes="2" 
        no-quorum-policy="ignore" 
        stonith-enabled="false"
rsc_defaults $id="rsc-options" 
        resource-stickiness="INFINITY" 
        migration-threshold="1"

crm(live)configure# commit
crm(live)configure# exit

Using commit, you can enable the configuration. Now, to be sure that everything went fine, you can check the configuration using the following command:

$ crm_mon

You should get an output similar to the following one:

============
Last updated: Fri Jul 10 10:59:16 2015
Stack: openais
Current DC: www01.domain.example.com  - partition WITHOUT quorum
Version: 1.1.2-f059ec7cedada865805490b67ebf4a0b963bccfe
2 Nodes configured, , unknown expected votes 
1 Resources configured.
============

Online: [ www01.domain.example.com  www02.domain.example.com  ]

vip     (ocf::heartbeat:IPaddr2):       Started www01.domain.example.com  
To be sure that the VIP is up and running you can simply ping it
$ ping 10.0.0.100

PING 10.0.0.100 (10.0.0.100) 56(84) bytes of data.
64 bytes from 10.0.0.100: icmp_seq=1 ttl=64 time=0.012 ms
64 bytes from 10.0.0.100: icmp_seq=2 ttl=64 time=0.011 ms
64 bytes from 10.0.0.100: icmp_seq=3 ttl=64 time=0.008 ms
64 bytes from 10.0.0.100: icmp_seq=4 ttl=64 time=0.021 ms

Now you have the VIP up and running. To configure Apache in the cluster, you need to go back to the CRM configuration and tell Corosync that you will have a new service, your HTTPD daemon, and that it will have to group it with the VIP. This group will be called "web server".

This configuration will tie the VIP and the HTTPD, and both will be up and running on the same node. We will configure the VIP using the following commands:

$ crm configure crm(live)configure# primitive httpd ocf:heartbeat:apache > params configfile="/etc/httpd/conf/httpd.conf" > port="80" > op start interval="0s" timeout="50s" > op monitor interval="5s" timeout="20s" > op stop interval="0s" timeout="50s" 
crm(live)configure# group webserver vip httpd 
crm(live)configure# commit 
crm(live)configure# exit 

Now you can check your configuration using the following command:

$ crm_mon
============
Last updated: Fri Jul 10 10:59:16 2015
Stack: openais
Current DC: www01.domain.example.com - partition WITHOUT quorum
Version: 1.1.2-f059ec7cedada865805490b67ebf4a0b963bccfe
2 Nodes configured, unknown expected votes
1 Resources configured.
============

Online: [ www01.domain.example.com www02.domain.example.com ]

Resource Group: webserver
     vip        (ocf::heartbeat:IPaddr2):       Started www01.domain.example.com
     httpd      (ocf::heartbeat:apache):        Started www01.domain.example.com

Note

Note that since you are not using Quorum, you need to make sure that the crm_mon display: partition WITHOUT Quorum and unknown expected votes are normal.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.183.17