Defusing cache poisoning

Not every DBA has experienced disk cache poisoning. Those who have recognize it as a bane to any critical OLTP system and a source of constant stress in a highly available environment.

When the operating system fetches disk blocks into memory, it also applies arbitrary aging, promotion, and purging heuristics. Several of these can invalidate cached data in the presence of an originating process change such as a database crash or restart. Any memory stored by PostgreSQL in shared memory is also purged upon database shutdown.

Perhaps the worst thing a DBA can do following a database crash or a restart is to immediately make the database available to applications and users. Unless storage is based on SSD or a very capable SAN, random read performance will drop by two or three orders of magnitude as data is being supplied by slow disks instead of by memory. As a result, all subsequent queries will greatly over-saturate the available disk bandwidth. This delays query results and slows down the cache rebuild, potentially multiplying query execution times for several hours.

In a highly available system, we cannot ignore this kind of risk. Saturated disk bandwidth means random reads are spread very thin. We need to figure out how to reinstate the disk cache and possibly, the PostgreSQL shared buffers before declaring that the database is usable. Otherwise, the claim turns out to be false. Queries can often become so slow that applications will ignore results and return errors to users.

Getting ready

We recommend that you check the PostgreSQL documentation for system administration functions and views maintained by the statistics collector. We will be using the pg_relation_filepath function and the pg_stat_user_tables view.

We will also make use of a contributed utility named pgFincore. This utility is not included with standard PostgreSQL but is often packaged for popular Linux distributions. To install it on an Ubuntu server along with the PostgreSQL server, use this command:

sudo apt-get install postgresql-9.3-pgfincore

Afterwards, activate it in the database with this query:

CREATE EXTENSION pgfincore;

How to do it...

First, follow these steps to create a static table that stores the top 20 active tables and indexes:

  1. Execute the following query as a superuser and ignore any errors:
    DROP TABLE IF EXISTS active_snap;
  2. Next, recreate the snapshot table by running this query as a superuser:
    CREATE TABLE active_snap AS
    (SELECT t.relid AS objrelid,
            s.setting || '/' || 
            pg_relation_filepath(t.relid) AS file_path
       FROM pg_stat_user_tables t, pg_settings s
      WHERE s.name = 'data_directory'
      ORDER BY coalesce(idx_scan, 0) DESC
      LIMIT 20)
    UNION
    (SELECT t.indexrelid AS objrelid,
            s.setting || '/' ||
            pg_relation_filepath(t.indexrelid) AS file_path
       FROM pg_stat_user_indexes t, pg_settings s
      WHERE s.name = 'data_directory'
      ORDER BY coalesce(idx_scan, 0) DESC
      LIMIT 20);

To restore the disk cache to the operating system easily, follow these steps:

  1. As a superuser in the database connected with psql, execute the following query in the critical OLTP database before shutting down the database:
    COPY active_snap (file_path) TO '/tmp/frequent_tables.txt';
  2. Shut down PostgreSQL.
  3. Perform maintenance, updates, or recovery.
  4. Execute these commands from the command-line:
    for x in $(tac /tmp/frequent_tables.txt); do
        for y in $x*; do
            dd if=$y of=/dev/null bs=8192
            dd if=$y of=/dev/null bs=8192
        done
    done
  5. Restart PostgreSQL.

If we're not comfortable with Unix commands, this pure SQL method will work as well. Follow these steps instead:

  1. Shut down PostgreSQL.
  2. Perform maintenance, updates, or recovery.
  3. Restart the database.
  4. As a superuser in the database, execute the following SQL query in the critical OLTP database:
    UPDATE pg_database
       SET datallowconn = FALSE
     WHERE datname != 'template1';
  5. Next, execute the entire contents of this SQL block:
    DO $$
    DECLARE
        obj_oid oid;
    BEGIN
        FOR obj_oid IN SELECT objrelid FROM active_snap
        LOOP
            PERFORM pgfadvise_willneed(obj_oid::regclass);
        END LOOP;
    END;
    $$ LANGUAGE plpgsql;
  6. Finally, execute the following query to re-enable connections:
    UPDATE pg_database SET datallowconn = TRUE;

How it works...

The first part of this recipe has two steps. We could perform this work at any time, so the table may have existed from our previous work. Therefore, the first step is to drop the active_snap table. None of the steps following this one remove this table, because in the case of a crash, we want its contents as a starting point for restoring the cache contents.

After dropping the active_snap table, we recreate it with the top 20 tables and top 20 indexes that are sorted by how often they're used in selects. This is only a close approximation based on the collected database statistics, but it's better than leaving the data entirely uncached.

Note

Though it is not available at the time of writing this book, PostgreSQL 9.4 introduces the pg_prewarm extension. It can load database objects into the operating system cache or PostgreSQL shared buffers.

After creating the list of the most accessed tables and indexes, we have one of two paths. In the first and simplest one, we merely preserve the file_path contents of the active_snap table, as this tells us exactly where the files are located. After preserving the table, we can do anything we want, including restarting the database server.

After we're done with maintenance or crash recovery, we can actually restore the file cache before starting the PostgreSQL service. To do this, we use an imposing block of shell scripting. While it looks complex, it's actually just two loops to get a full list of every file that has a name similar to the ones we identified. As PostgreSQL objects exist in 1 GB chunks, there can be several of these that we may have to find. Then, we use the dd utility to read the file into memory, twice. We do it twice because the first time, it loads the data into memory, and the second time, it encourages marking of the blocks as frequently used so that the OS is less likely to purge them.

Afterward, we can start PostgreSQL and enjoy a database that is much less likely to have problems retrieving frequently used data. If we don't have command-line access to the system where PostgreSQL runs, this process is a little more complicated but still manageable.

In the second scenario, we actually stop the database first. Any of our cache recovery must come after the database is restarted. Until that time, we're free to perform any activity necessary to get the server or database contents in order. After we start the database, the fun begins.

We need to reject user connections while we load the database cache. The easiest way to do this without complicated scripts is to simply reject all connections that don't target the template1 database. It's extremely unlikely that applications or users will use this, as it generally contains nothing and they have no permissions within it. For our use, it allows us to reconnect and re-enable connections from template1 if we get disconnected for some reason.

Then, we can use the contents of our previously initialized active_snap table to tell the pgFincore module to load all of those tables and indexes into memory. After this is complete, we re-enable database connections and our work is finished.

Note

Our active_snap table is pretty handy, but it depends on the existence of statistical data that might not be available in the case of a system crash. Be wary of using this approach if statistical information is not trustworthy or is missing.

See also

The tools discussed in this section have a lot of documentation and examples. Please refer to these sites for more information:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.46.92