Mitigating the impact of hardware failure

Software can have bugs, and PostgreSQL is no exception. Bugs in the database software rarely, if ever, lead directly to data corruption. Hardware can fail too, but hardware problems are not always so straightforward.

Disk, CPU, or memory failures don't always cause the server to crash. In fact, these failures can persist for weeks or even months before their detection by a monitoring infrastructure. Disk failures are generally abstracted away by RAID or SAN devices, and these arrays are designed to readily handle online rebuilds. Other types of failures are more subtle.

CPU or memory problems can manifest in several different ways. In order for PostgreSQL to function, the data from disk must be read into memory to be processed by the CPU. During any of these transition states, a bad CPU or RAM module can inject an invalid checksum or data value inconsistent with the rest of the database. However, PostgreSQL generally assumes that the database is consistent and that transaction logs have been faithfully recorded and applied.

When running a dual-node database, where one node is always connected and synchronized with the other, a failure like this can corrupt data on both nodes nearly simultaneously. When both nodes contain invalid data, our promise of providing a highly available system is impossible. We have no backup to switch to or no alternate node to host the database while we repair the problem. Data corruption can require intricate investigative and mitigation efforts, which are much harder to complete while the database is online.

The only reasonable way to prevent this type of scenario is by exercising extreme caution and with some extra preparation work.

Getting ready

We need to cover a few different scenarios here. One of the things we want to do is transfer files from one server to another. A popular way to do this is with the rsync command. On Debian or Ubuntu systems, we can install it as a root-capable user this way:

sudo apt-get install rsync

We also need it properly configured in order to use it. Create a file named /etc/rsyncd.conf and fill it with this content:

[archive]
    path = /db/wal_archive
    comment = Archived Transaction Logs
    uid = postgres
    gid = postgres
    read only = true

We're now ready to protect our data from hardware problems.

How to do it...

The first thing we need to do is secure the WAL stream. Follow these steps to build a semipermanent copy of archived WAL data in the /db/wal_archive directory:

  1. On the primary node, modify the postgresql.conf file to include the following setting:
    archive_command = 'cp -an %p > /db/wal_archive/%f'
  2. Create the /db/wal_archive directory as a root-capable user using the following commands:
    sudo mkdir -p -m 0700 /db/wal_archive
    sudo chown -R postgres /db/wal_archive
    
  3. Reload the PostgreSQL service using the following command:
    pg_ctl -D /path/to/database reload
    
  4. As a root-capable user, create a script named del_archives in the /etc/cron.daily directory and fill it with this content as a single line:
    find /db/wal_archive -name '0000*' -type f -mtime +2 -delete
  5. Make sure that the script is executable using the following command:
    chmod a+x /etc/cron.daily/del_archives
    

Next, we should set up a copy on a remote location. In this case, let's assume that the database is at 192.168.1.10 and we have another server set up specifically for WAL storage at 192.168.1.100. Impose an hour's delay by following these steps:

  1. On 192.168.1.100, create a /db/wal_archive directory as a root-capable user with these commands:
    sudo mkdir -p -m 0700 /db/wal_archive
    sudo chown -R postgres /db/wal_archive
    
  2. Ensure that the server at 192.168.1.100 has the rsync.conf file we discussed earlier.
  3. As a root-capable user on 192.168.1.10, create a script named sync_archives in the /etc/cron.d directory with this content:
    * * * * * postgres find /db/wal_archive -name '0000*' 
                   -type f -mmin +60 | 
                   xargs -I{} rsync {} 192.168.1.100::archive

How it works...

To ensure that WAL data is available for recovery or emergency restore, we need to secure it on a tertiary location away from the primary or secondary server. We start this by telling PostgreSQL to store the old WAL files instead of deleting them. The cp command we used to copy the files will not overwrite the existing archives due to the -n setting. This prevents accidentally corrupting the existing transaction logs.

Then, we need to create the directory where the files will reside. The mkdir command does this, and the chown command ensures that the PostgreSQL server can write to that directory. Once the directory is in place, we need to reload the server because we changed archive_command.

Once a WAL file is no longer needed by PostgreSQL, it's stored in our /db/wal_archive directory until it gets deleted. This is why we create the del_archives script. We only really need two or three days worth of live WAL files. This allows us to send very old files to tape, and newer files are available for Point In Time Recovery (PITR) or restore. Once we make the script executable with the chattr command, we will not have to worry about accidentally filling the disks with WAL files.

The final steps might be the most important of all. We create a directory on a completely different server rather than on any of our existing database nodes. Once this directory is there, we create an automated rsync job on the database master that will run every minute and copy all WAL files older than 1 hour to the new storage area. Why only an hour? Current versions of PostgreSQL don't have the ability to delay the replay stream, so if we encounter a hardware problem, corrupt data will immediately synchronize to our spare server. This gives us up to an hour for monitors, maintenance, and logs to discover the problem before the corrupted WAL files pollute the tertiary storage server.

Tip

We could use PITR instead at this point. However, an imposed 1 hour delay allows us to have live access to databases that obtain their WAL files from the tertiary server. Otherwise, we would have to restore from backup and apply WAL files to reach our desired point in time.

There's more...

In securing the WAL stream, there are a few other options available to us.

Copy WAL files more easily

If we have a version of PostgreSQL of 9.2 or above, there is a new command that, much like pg_basebackup, utilizes the replication mechanism for a new purpose. Assuming PostgreSQL is configured as described in the Configuration – getting it right the first time recipe, there should be five available replication streams. As we're smart and have a dual-node cluster, we are already using at least one to create a copy of the database.

The next step would be to have a copy of the WAL files alone, as they are critical to PITR, which helps isolate the database. Instead of using rsync to copy these between nodes, we can simply pull them directly from the primary node. With 192.168.1.30 as the virtual database IP address and rep_user as the name of the replication user, we could use the following command to obtain WAL data:

pg_receivexlog -h 192.168.1.30 -U rep_user -D /db/wal_archive

This command acts like a service. This means it will only copy from the replication stream while it is actually running. To use pg_receivexlog effectively, it needs to be started as a background service and should be restarted if the virtual IP is moved or the server it's running on is ever restarted.

Add compression

PostgreSQL WAL files are very compressible. As such, we can save quite a bit of space while storing them for long periods of time. Since PostgreSQL archive_command can be anything we wish, we can incorporate compression right into the process. For example, we could use this postgresql.conf setting instead:

archive_command = 'gzip -qc %p > /db/wal_archive/%f'

Now, whenever PostgreSQL moves a WAL file into the archive, it also compresses it.

Secondary delay

We have already discussed maintenance in the previous sections. What we never covered was self-imposed archival delay. If we're performing maintenance or the primary node crashes, it is a very good idea to either delete the /etc/cron.d/sync_archive script or comment out the rsync command itself until the maintenance is complete. This hour-long barrier helps avoid propagating corrupt data, but there's no reason to take excess risks.

Some environments have another pair of servers in a different data center that acts as disaster recovery. If this is our setup, any running server on the disaster-recovery side should be stopped while we modify or rebuild the primary or secondary servers. The reasoning is the same: if there is a problem with the maintenance, we have an untainted copy of everything.

Feel free to re-enable all the synchronization after verifying that crash recovery or maintenance hasn't introduced invalid data.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.233.205