Logging checkpoints properly

Checkpoints are an integral part of a PostgreSQL server. Table data is not modified during query execution until modified rows, index pages, and other structures are committed to the Write Ahead Log (WAL). WAL files are also known as checkpoint segments. When the count of these segments reaches checkpoint_segments—or the time since the last checkpoint exceeds checkpoint_timeout—the data files are modified to reflect the changes.

This decoupled writing ensures database integrity at the cost of doubling the necessary disk writes. This is the main reason why experienced PostgreSQL DBAs interested in performance move the WAL location to a separate storage device. However, even moving the WAL files to another device may not sufficiently reduce write pressure. Database activity is variable in nature, and checkpoints only happen every few minutes or after a threshold of data modifications.

As PostgreSQL tries to avoid overwhelming the operating system, writes necessary to satisfy a checkpoint are spread evenly over the checkpoint interval. Unfortunately, the operating system may choose to buffer these writes unevenly, resulting in unexpected write spikes. A busy database might have saturated disk bandwidth already, thus tying up any resources necessary for writing data modifications.

The way we combat this is by logging all checkpoints and analyzing the output of our log for checkpoint activity. We may need to leverage tablespaces, storage improvements, or application revisions to really address resource collisions like this, so it's in our best interest to be proactive.

Getting ready

You need to know where to find PostgreSQL logs. We usually suggest a few specific modifications to the postgresql.conf file for logging, including the following:

log_directory = 'pg_log'
log_checkpoints = on

This means logs will be found within our PostgreSQL data directory, in a subdirectory named pg_log. Some distributions use /var/log/postgresql instead. Regardless, find where the logs are kept. To ensure access, examine these as the postgres user, who should either own the logs directly or have the necessary read access.

How to do it...

Assuming our logs are located at /db/pgdata/pg_log, follow these steps to examine the checkpoint activity:

  1. Execute this command to find the most recent logfile:
    ls -lt /db/pgdata/pg_log/postgres*.log | head -n 1
    
  2. If the latest log is named postgresql-2014-02-02.log, view all the checkpoints in this log with the following command:
    grep checkpoint /db/pgdata/pg_log/postgresql-2014-02-02.log
    
  3. Execute the following command to obtain the five longest disk syncs:
    grep 'checkpoint complete:' 
        /db/pgdata/pg_log/postgresql-2014-02-02.log 
        | sed 's/.* sync=/sync=/; s/total=.*; //;' 
        | sort -n | tail -n 5
    

How it works...

We need to first find the most recent logfile. The ls command's -t parameter will sort the data by the last modified time, which the head command limits to one line of results. Distributions that provide PostgreSQL may adhere to a log-rotation scheme instead. In these cases, the latest logfile will reside in /var/log/postgresql and always have the same name. Older logs will have a number appended until the retention period passes.

No matter how we locate the most recent logfile, we use two relatively simple commands to examine its contents. These logfiles can be extremely useful; however, for now, we will focus on the checkpoint activity. Of those two commands, the first simply isolates all the checkpoint data in the order it occurred. One complete checkpoint will resemble these lines:

2014-02-02 19:54:02 CST LOG:  checkpoint starting: time
2014-02-02 20:00:36 CST LOG:  checkpoint complete: wrote 129631 buffers (24.7%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=392.875 s, sync=1.789 s, total=394.667 s; sync files=203, longest=1.004 s, average=0.008 s

This data is helpful in determining the time period of the checkpoint. Combined with other troubleshooting tools such as sar, we can correlate the checkpoint with disk activity. In the case of this example, we wrote 24.7 percent of a 4 GB buffer as well, which is quite a bit of data. However, these writes are spread over more than 6 minutes, reducing contention.

As useful as the raw log lines are, we can apply a few filters and sorting to expose the disk synchronization time. Our last command makes use of grep to isolate the checkpoints, sed to remove excess data, sort to focus on the longest syncs, and tail to restrict the output to the top five. Of these, the sed command is the most complex. However, it merely removes all the content before the first sync field and removes the total field, leaving only the data related to disk synchronization. Then, our top five most expensive checkpoints look like this:

sync=0.891 s, sync files=87, longest=0.470 s, average=0.010 s
sync=1.203 s, sync files=129, longest=0.302 s, average=0.009 s
sync=1.789 s, sync files=203, longest=1.004 s, average=0.008 s
sync=2.004 s, sync files=187, longest=1.031 s, average=0.010 s
sync=5.083 s, sync files=104, longest=3.076 s, average=0.048 s

The first four could be improved, but the last example is clearly much larger than we would normally expect or desire. Relatively few files were synchronized, yet the longest sync of over 3 seconds would likely adversely affect query performance. Disk synchronization times exhibited here indicate a high level of contention. If we were to execute sar for the time periods indicated by the longest checkpoint, we would most likely see 100 percent disk utilization.

If this utilization is primarily data reads, we may be able to ignore it if the checkpoint time occurred outside of operational hours. In such cases, the cause is probably related to maintenance or voluminous batch jobs. Otherwise, we should expand our investigation to track the source of the disk activity until all the checkpoints are below a desirable threshold.

There's more...

Some checkpoint data is stored in a PostgreSQL view named pg_stat_bgwriter. This is more of a summary view of the checkpoint activity, but it is available to any user who can execute SQL statements in the database. Within this view, there are three fields related to this recipe that directly concern us:

  • checkpoints_timed: This column provides the number of checkpoints that occur based on a schedule. These are normal checkpoints and indicate regular operation.
  • checkpoints_req: This column stores the number of checkpoints that PostgreSQL has forced to occur in order to keep up with write activity. If there are too many of these, database performance can be extremely reduced and disk contention can have other adverse affects.
  • checkpoint_sync_time: This column describes the total amount of time that the checkpoint system has spent in sync status, in milliseconds. This is basically a sum of all of the sync columns for all the checkpoints since the statistics were last reset. This is a good value to graph if you are monitoring the database, as a sudden spike in the elapsed sync time can indicate trouble.

See also

The WAL is integral to how PostgreSQL operates. We strongly recommend that you learn as much about its functionality as possible. The PostgreSQL documentation provides a great deal of depth in its explanation of how the WAL really works. Please make use of these links:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.116.51