Chapter 5. High Availability

Even though messaging allows for a very loosely coupled type of communication, it is common in many scenarios that a large downtime or message loss are not acceptable, especially when guaranteed delivery must take place. In the previous chapter, we described how RabbitMQ supports clustering and how it focuses on queue scalability rather than providing high availability. In this chapter, we will further discover mechanisms for establishing high availability at the level of the message broker.

Topics covered in the chapter:

  • Benefits of high availability
  • High availability support in RabbitMQ
  • Client high availability
  • Case Study: Introducing high availability in CSN

Benefits of high availability

When we design and develop large systems that need to be up-and-running most of the time, we need to consider what would happen when a single component fails. This could be due to a hardware, network, or any other type of failure. Some systems, for example, have an SLA (service level agreement) that specifies a 99.99 percent uptime. In this regard, high availability should be considered for every such component that could turn out to be a bottleneck, including the message broker. This not only allows you to justify the SLAs (service level agreements) defined over your system, which increases confidence in its reliability, it also allows you to implement a system that minimizes as much as possible the impact of having a system that fails from time to time for a certain amount of time—at least until some manual intervention takes place in order to bring it up. This imposes the risk of losing money; the more users are impacted by a system failure, the more likely it is your SLAs oblige you to pay out. In reality, there are general solutions that allow you to provide high availability clusters for systems that do not have built-in support for creating such clusters. Luckily RabbitMQ provides mechanisms for that, as we will discover later in this chapter.

Moreover, we may want to perform upgrades without having to disrupt users of our system or backup data while the system is running.

High availability may be considered when:

  • A connection fails (for example, due to a network/node failure). In that case, your client, either a publisher or a consumer, must be able to reconnect automatically to the cluster. You can use a load balancer that provides capabilities for detecting node failures or extending your client with support for reconnection to the cluster.
  • A node fails. In that case, other nodes in the cluster should be able to take over the processing of messages in the cluster. There are various cluster topologies that allow for the implementation of high availability in a cluster. One is an active/active topology, where all nodes can take over the load for a failed node. Another type is an active/passive topology, where there are some passive nodes that can become active and take over the load for a failed node. There are yet other variations that are derived on the basis of these, considering the number of passive nodes available, or passivating nodes, when failed nodes become available again.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.180.43