Chapter 6. Monitoring Your Setup

In previous chapters of this book you have learned about various kinds of replication and how to configure various types of scenarios. Now it is time to make your setup more reliable by adding monitoring.

In this chapter you will learn what to monitor and how to implement reasonable monitoring policies. You will learn about:

  • Checking your XLOG archive
  • Checking the pg_stat_replication system view
  • Checking for replication-related processes on the OS level

At the end of this chapter you should be able to monitor any kind of replication setup properly.

Checking your archive

If you are planning to use Point-In-Time-Recovery or if you want to use an XLOG archive to assist your streaming setup, various things can go wrong, for example:

  • Pushing the XLOG might fail
  • Cleaning up the archive might fail

Checking the archive_command

A failing archive_command might be one of the greatest showstoppers in your setup. The idea of the archive_command is to push XLOG to some archive and store the data there. But, what happens if those XLOG files cannot be pushed for some reason?

The answer is quite simple: The master has to keep these XLOG files to ensure that no XLOG files can be lost. There must always be an uninterrupted sequence of XLOG files—if a single file in the sequence of files is missing, your slave won't be able to recover anymore. For example, if your network has failed, the master will accumulate those files and keep them. Logically, this cannot be done forever and so, at some point you will face disk space shortages on your master server.

This can be dangerous because if you are running out of disk space, there is no way to keep writing to the database. While reads might still be possible, most of the writes will definitely fail and cause serious disruptions on your system. PostgreSQL won't fail and your instance will be intact after a disk has filled up but, as stated before, your service will be interrupted.

To prevent this from happening, it is suggested to monitor your pg_xlog directory and check for:

  • Unusually high number of XLOG files
  • Free disk space on the partition hosting pg_xlog

The core question here is: What would be a reasonable number to check for? In a standard configuration PostgreSQL should not use more XLOG files than checkpoint_segments * 2 + wal_keep_segments. If the number of XLOG files starts to skyrocket massively higher, you can expect some weird problem.

Make sure that the archive_command works properly.

If you perform these checks properly, nothing bad can happen on this front—if you fail to check these parameters, however, you are risking doomsday.

Monitoring the transaction log archive

The master is not the only place that can run out of space. The very same thing can happen in your archive. So, it is suggested to monitor disk space there as well.

Apart from disk space, which has to be monitored anyway, there is one more thing you should keep on your radar. You have to come up with a decent policy to handle base backups. Remember, you are only allowed to delete XLOG if it is older than the oldest base backup you want to keep around. This tiny thing can undermine your disk space monitoring. Why that? Well, because if you have to keep a certain amount of data around, it is good to know that you are running out of disk space—but, there is nothing to do about it? It is highly recommended to make sure that your archive has enough spare capacity. This is important in case your database system has to write a lot of transaction log.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.115.154