Chapter 8. Advanced Stack

In this chapter, we will learn to build and manipulate a fault-tolerant, high-performance foundation for our PostgreSQL clusters. We will cover the following recipes in this chapter:

  • Preparing systems for the stack
  • Starting with the Linux Volume Manager
  • Adding block-level replication
  • Incorporating the second LVM layer
  • Verifying a DRBD filesystem
  • Correcting a DRBD split brain
  • Formatting an XFS filesystem
  • Tweaking XFS performance
  • Maintaining an XFS filesystem
  • Using LVM snapshots
  • Switching live stack systems
  • Detaching a problematic node
  • Building and attaching a new node

Introduction

Thus far in this book, we've discussed quite an array of functionality and methodology dedicated to keeping PostgreSQL systems online. By now, we have a burgeoning menagerie of replication utilities, system monitoring tools, connection pooling layers, and even a handful of troubleshooting tips. What could we possibly cover next?

As it turns out, simply installing PostgreSQL on a server can be done too early. Presuming that we have all of the hardware and software we discussed earlier, our servers are still missing the following three things:

  • The ability to synchronize data to two servers simultaneously
  • The capacity to freeze data to prevent changes for backup purposes
  • A durable filesystem designed for multiprocess I/O

There are several solutions for each of these missing elements, yet we've settled on three in particular: DRBD, LVM, and XFS. Let's explore a bit about each of these technologies and why we've chosen them to represent what we've deemed our Advanced Stack.

Why DRBD?

DRBD stands for Distributed Replicated Block Device. DRBD is meant to operate below the filesystem layer, mirroring the contents of one server's storage to another at the block level. This means the operating system doesn't even know that its data is located on another server as well. Having trouble imagining how it works? We hope the following diagram will help:

Why DRBD?

As we can see here, DRBD acts as an abstraction from the disk device that normally hosts our PostgreSQL database. The primary benefit we gain from this situation is that data is always located on at least two servers at all times. If one server crashes and its storage is rendered unusable, we have a backup available.

Why not use streaming replication instead? Even PostgreSQL synchronous streaming replication only guarantees that transactions are written to the standby, not replayed within the actual database. As we've already discussed, streaming replication means that the master node will halt on commit if there isn't at least one replica available at all times. With DRBD, the other server has a copy, which is identical in all aspects. Any block written to one server is always available on the other.

Why LVM?

LVM is the Linux Volume Manager. Like DRBD, LVM is another abstraction layer that sits between the filesystem and the underlying disk devices. Why is this necessary? LVM allows us to dynamically manage disk devices as one single continuous piece of storage that we can arbitrarily extend, group, freeze, or reorganize, all while remaining online.

Have you ever wanted to simply add storage to a filesystem without messy symbolic links or a server reboot? What about moving data from one device to another after an upgrade? With LVM, all of this is easy. Using a modern server with hot-swappable disks or a SAN, we never even have to reboot the server to completely reconfigure its disk devices.

Through the entire process of almost any LVM change, PostgreSQL can remain online and serve requests. This is the ultimate in high availability.

Why XFS?

XFS stands for eXtended File System. Some may consider this a somewhat controversial selection, given that ext4 performs perfectly well and is the current default for all of the major Linux distributions. Both XFS and ext4 are journaling filesystems; they provide online growth, LVM freezing, and numerous maintenance and repair tools.

However, XFS still has something that ext4 does not: allocation groups. ext4, like all of its predecessors, has a single file allocation table for the entire formatted device. XFS, on the other hand, can split the allocation table into several segments so that multiple independent CPU processes can write to the disk simultaneously. The end result of this is that large servers with many CPUs and random writes, such as a PostgreSQL database, will perform better on an XFS-formatted device.

Tip

If you are using Red Hat Enterprise Linux (RHEL) and have a support contract with Red Hat, be wary of using XFS. Red Hat considers XFS enterprise-grade storage and distributes it separately as a paid extension. If this becomes a problem, please feel free to use ext4 and ignore the XFS-related sections of this chapter.

The stack

At the end of this chapter, we will have a software stack that looks like the following:

The stack

Each of the following layers represents one enhancement necessary for best long-term high availability:

  • The first LVM layer (starting at the storage) protects DRBD from inheriting device-specific block sizes and allows for online resizing or migration to new devices
  • The DRBD layer replicates data to another server for immediate use
  • The second LVM layer provides snapshot capabilities and other potentially useful LVM functionality to the filesystem
  • The XFS layer is the last element where data resides and is available for direct manipulation by programs such as PostgreSQL

The recipes we provide in this chapter should make this easier to understand, despite its advanced nature.

Note

These layers in our stack do come at a cost. Since each is an abstraction above the raw storage device, performance will decrease slightly. We believe this tradeoff is worth the security and availability the stack provides. The makers of DRBD provide a good summary of how storage speed is affected at http://blogs.linbit.com/p/469/843-random-writes-faster/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.171.193