Chapter 9: Regular Maintenance

In these busy times, many people believe if it ain't broken, don't fix it. I believe that too, but it isn't an excuse for not taking action to maintain your database servers and be sure that nothing will break.

Database maintenance is about making your database run smoothly.

PostgreSQL prefers regular maintenance, so please read the Planning maintenance recipe for more information.

We recognize that you're here for a reason and are looking for a quick solution to your needs. You're probably thinking – Fix the problem first, and I'll plan later. So, off we go!

PostgreSQL provides a utility command named VACUUM, which is a reference to a garbage collector that sweeps up all of the bad things and fixes them – or at least most of them. That's the single most important thing you need to remember to do – I say single because closely connected to that is the ANALYZE command, which collects statistics for the SQL optimizer. It's possible to run VACUUM and ANALYZE as a single joint command, VACUUM ANALYZE. These actions are automatically executed for you when appropriate by autovacuum, a special background process that runs as part of the PostgreSQL server.

VACUUM performs a range of cleanup activities, some of them too complex to describe without a whole sideline into their internals. VACUUM has been heavily optimized over 30 years to take the minimum required lock levels on tables and execute them in the most efficient manner possible, skipping all of the unnecessary work and using L2 cache CPU optimizations when work is required.

Many experienced PostgreSQL DBAs will prefer to execute their VACUUM commands, though autovacuum now provides a fine degree of control, which, if enabled and controlled, can save much of your time. Using both manual and automatic vacuuming gives you control and a safety net.

In this chapter, we will cover the following recipes:

  • Controlling automatic database maintenance
  • Avoiding auto-freezing and page corruptions
  • Removing issues that cause bloat
  • Removing old prepared transactions
  • Actions for heavy users of temporary tables
  • Identifying and fixing bloated tables and indexes
  • Monitoring and tuning a vacuum
  • Maintaining indexes
  • Finding unused indexes
  • Carefully removing unwanted indexes
  • Planning maintenance

Controlling automatic database maintenance

autovacuum is enabled by default in PostgreSQL and mostly does a great job of maintaining your PostgreSQL database. We say mostly because it doesn't know everything you do about the database, such as the best time to perform maintenance actions. Let's explore the settings that can be tuned so that you can use vacuums efficiently.

Getting ready

Exercising control requires some thinking about what you want:

  • What are the best times of day to do things? When are system resources more available?
  • Which days are quiet, and which are not?
  • Which tables are critical to the application, and which are not?

How to do it…

Perform the following steps:

  1. The first thing you must do is make sure that autovacuum is switched on, which is the default. Check that you have the following parameters enabled in your postgresql.conf file:

    autovacuum = on

    track_counts = on

  2. PostgreSQL controls autovacuum with more than 40 individually tunable parameters that provide a wide range of options, though this can be a little daunting. The following are the relevant parameters that can be set in postgresql.conf to tune the VACUUM command:

    vacuum_cleanup_index_scale_factor

    vacuum_cost_delay

    vacuum_cost_limit

    vacuum_cost_page_dirty

    vacuum_cost_page_hit

    vacuum_cost_page_miss

    vacuum_defer_cleanup_age

    vacuum_failsafe_age

    vacuum_freeze_min_age

    vacuum_freeze_table_age

    vacuum_multixact_freeze_min_age

    vacuum_multixact_freeze_table_age

  3. There are also postgresql.conf parameters that apply specifically to autovacuum:

    autovacuum

    autovacuum_analyze_scale_factor

    autovacuum_analyze_threshold

    autovacuum_freeze_max_age

    autovacuum_max_workers

    autovacuum_multixact_freeze_max_age

    autovacuum_naptime

    autovacuum_vacuum_cost_delay

    autovacuum_vacuum_cost_limit

    autovacuum_vacuum_insert_threshold

    autovacuum_vacuum_insert_scale_factor

    autovacuum_vacuum_scale_factor

    autovacuum_vacuum_threshold

    autovacuum_work_mem

    log_autovacuum_min_duration

  4. The preceding parameters apply to all tables at once. Individual tables can be controlled by storage parameters, which are set using the following command:

    ALTER TABLE mytable SET (storage_parameter = value);

  5. The storage parameters that relate to maintenance are as follows:

    autovacuum_enabled

    autovacuum_analyze_scale_factor

    autovacuum_analyze_threshold

    autovacuum_freeze_min_age

    autovacuum_freeze_max_age

    autovacuum_freeze_table_age

    autovacuum_multixact_freeze_max_age

    autovacuum_multixact_freeze_min_age

    autovacuum_multixact_freeze_table_age

    autovacuum_vacuum_cost_delay

    autovacuum_vacuum_cost_limit

    autovacuum_vacuum_insert_threshold

    autovacuum_vacuum_insert_scale_factor

    autovacuum_vacuum_scale_factor

    autovacuum_vacuum_threshold

    vacuum_truncate (no equivalent postgresql.conf parameter)

    log_autovacuum_min_duration

  6. The toast tables can be controlled with the following parameters. Note that these parameters are set on the main table and not on the toast table (which gives an error):

    toast.autovacuum_enabled

    toast.autovacuum_analyze_scale_factor

    toast.autovacuum_analyze_threshold

    toast.autovacuum_freeze_min_age

    toast.autovacuum_freeze_max_age

    toast.autovacuum_freeze_table_age

    toast.autovacuum_multixact_freeze_max_age

    toast.autovacuum_multixact_freeze_min_age

    toast.autovacuum_multixact_freeze_table_age

    toast.autovacuum_vacuum_cost_delay

    toast.autovacuum_vacuum_cost_limit

    toast.autovacuum_vacuum_insert_threshold

    toast.autovacuum_vacuum_insert_scale_factor

    toast.autovacuum_vacuum_scale_factor

    toast.autovacuum_vacuum_threshold

    toast.vacuum_truncate

    toast.log_autovacuum_min_duration

How it works…

If autovacuum is set, then it will wake up every autovacuum_naptime seconds, and decide whether to run VACUUMANALYZE, or both (don't modify that).

There will never be more than autovacuum_max_workers maintenance processes running at any time. As these autovacuum workers perform I/O, they accumulate cost points until they hit the autovacuum_vacuum_cost_limit value, after which they sleep for an autovacuum_vacuum_cost_delay period. This is designed to throttle the resource utilization of autovacuum to prevent it from using all of the available disk I/O bandwidth, which it should never do. So, increasing autovacuum_vacuum_cost_delay will slow down each VACUUM to reduce the impact on user activity, but the general advice is don't do that. autovacuum will run ANALYZE when there have been at least autovacuum_analyze_threshold changes and a fraction of the table defined by autovacuum_analyze_scale_factor has been inserted, updated, or deleted.

autovacuum will run VACUUM when there have been at least autovacuum_vacuum_threshold changes, and a fraction of the table defined by autovacuum_vacuum_scale_factor has been updated or deleted.

The autovacuum_* parameters only change vacuums and analyze operations that are executed by autovacuum. User-initiated VACUUM and ANALYZE commands are affected by vacuum_cost_delay and other vacuum_* parameters.

If you set log_autovacuum_min_duration, then any autovacuum process that runs for longer than this value will be logged to the server log, like so:

2019-04-19 01:33:55 BST (13130) LOG:  automatic vacuum of table "postgres.public.pgbench_accounts": index scans: 1

      pages: 0 removed, 3279 remain

      tuples: 100000 removed, 100000 remain

      system usage: CPU 0.19s/0.36u sec elapsed 19.01 sec

2019-04-19 01:33:59 BST (13130) LOG:  automatic analyze of table "postgres.public.pgbench_accounts"

      system usage: CPU 0.06s/0.18u sec elapsed 3.66 sec

Most of the preceding global parameters can also be set at the table level. For example, the normal autovacuum_cost_delay is 2 ms, but if you want big_table to be vacuumed more quickly, then you can set the following:

ALTER TABLE big_table SET (autovacuum_vacuum_cost_delay = 0);

It's also possible to set parameters for toast tables. A toast table is where the oversized column values get placed, which the documents refer to as supplementary storage tables. If there are no oversized values, then the toast table will occupy little space. Tables with very wide values often have large toast tables. The Oversized Attribute Storage Technique (TOAST) is optimized for UPDATE. For example, if you have a heavily updated table, the toast table is often untouched, so it may make sense to turn off autovacuuming for the toast table, as follows:

ALTER TABLE pgbench_accounts

SET ( toast.autovacuum_enabled = off);

Note

Autovacuuming the toast table is performed completely separately from the main table, even though you can't ask for an explicit include or exclude of the toast table yourself when running VACUUM.

Use the following query to display reloptions for tables and their toast tables:

postgres=#

SELECT n.nspname

, c.relname

, array_to_string(

    c.reloptions ||

ARRAY(

SELECT 'toast.' || x

FROM unnest(tc.reloptions) AS x

), ', ')

AS relopts

FROM pg_class c

LEFT JOIN pg_class tc    ON c.reltoastrelid = tc.oid

JOIN pg_namespace n ON c.relnamespace  = n.oid

WHERE c.relkind = 'r'

AND nspname NOT IN ('pg_catalog', 'information_schema');

An example of the output of this query is shown here:

nspname |     relname      |   relopts

---------+------------------+------------------------------

public  | pgbench_accounts | fillfactor=100,

                               autovacuum_enabled=on,

                               autovacuum_vacuum_cost_delay=20

public  | pgbench_tellers  | fillfactor=100

public  | pgbench_branches | fillfactor=100

public  | pgbench_history  |

public  | text_archive     | toast.autovacuum_enabled=off

Managing parameters for many different tables becomes difficult with tens, hundreds, or thousands of tables. We recommend that these parameter settings are used with caution and only when you have good evidence that they are worthwhile. Undocumented parameter settings will cause problems later.

Note that when multiple workers are running, the autovacuum cost delay parameters are "balanced" among all the running workers, so that the total I/O impact on the system is the same regardless of the number of workers running. However, if you set the per-table storage parameters for autovacuum_vacuum_cost_delay or autovacuum_vacuum_cost_limit, then those tables are not considered in the balancing algorithm.

VACUUM allows insertions, updates, and deletions while it runs, but it prevents DDL commands such as ALTER TABLE and CREATE INDEX. autovacuum can detect whether a user has requested a conflicting lock on the table while it runs, and it will cancel itself if it is getting in the user's way. VACUUM doesn't cancel itself since we expect that the DBA would not want it to be canceled.

From PostgreSQL 13+, autovacuum can be trigged by insertions, so you may see more vacuum activity than before in some workloads, but this is likely to be a good thing and nothing to worry about.

Note that VACUUM does not shrink a table when it runs unless there is a large run of space at the end of a table, and nobody is accessing the table when we try to shrink it. If you want to avoid trying to shrink a table when we vacuum it, you can turn this off with the following setting:

ALTER TABLE pgbench_accounts

SET (vacuum_truncate = off);

To shrink a table properly, you'll need VACUUM FULL, but this locks up the whole table for a long time and should be avoided if possible. The VACUUM FULL command will rewrite every row of the table and completely rebuild all indexes. This process is faster than it used to be, though it still takes a long time for larger tables, as well as needing up to twice the current space for the sort and new copy of the table.

There's more…

The postgresql.conf file also allows include directives, which look as follows:

include 'autovacuum.conf'

These specify another file that will be read at that point, just as if those parameters had been included in the main file.

This can be used to maintain multiple sets of files for the autovacuum configuration. Let's say we have a website that is busy mainly during the daytime, with some occasional nighttime use. We decide to have two profiles – one for daytime, when we want less aggressive autovacuuming, and another for nighttime, where we can allow more aggressive vacuuming:

  1. You need to add the following lines to postgresql.conf:

    autovacuum = on

    autovacuum_max_workers = 3

    include 'autovacuum.conf'

  2. Remove all other autovacuum parameters.
  3. Then, create a file named autovacuum.conf.day that contains the following parameters:

    autovacuum_analyze_scale_factor = 0.1

    autovacuum_vacuum_cost_delay = 5

    autovacuum_vacuum_scale_factor = 0.2

  4. Then, create another file, named autovacuum.conf.night, that contains the following parameters:

    autovacuum_analyze_scale_factor = 0.05

    autovacuum_vacuum_cost_delay = 0

    autovacuum_vacuum_scale_factor = 0.1

  5. To swap profiles, simply do the following:

    $ ln -sf autovacuum.conf.night autovacuum.conf

    $ pg_ctl reload

The latter command reloads the server configuration, and it must be customized depending on your platform.

This allows us to switch profiles twice per day without needing to edit the configuration files. You can also easily tell which is the active profile simply by looking at the full details of the linked file (using ls -l). The exact details of the schedule are up to you. Night and day was just an example, which is unlikely to suit everybody.

See also

The autovacuum_freeze_max_age parameter is explained in the next recipe, Avoiding auto-freezing and page corruptions, as are the more complex table-level parameters.

Avoiding auto-freezing and page corruptions

In the life cycle of a row, there are two routes that a row can take in PostgreSQL – a row version dies and needs to be removed by VACUUM, or a row version gets old enough and needs to be frozen, a task that is also performed by the VACUUM process. The removal of dead rows is easy to understand, while the second seems strange and surprising.

PostgreSQL uses internal transaction identifiers that are 4 bytes long, so we only have 232 transaction IDs (about four billion). PostgreSQL starts again from the beginning when that wraps around, circularly allocating new identifiers. The reason we do this is that moving to an 8-byte identifier has various other negative effects and costs that we would rather not pay for, so we keep the 4-byte transaction identifier. The impact is that we need to do regular sweeps of the entire database to mark tuples as frozen, meaning they are visible to all users – that's why this procedure is known as freezing. Once frozen, they don't need to be touched again, though they can still be updated or deleted later if desired.

How to do it…

Why do we care? Suppose that we load a table with 100 million rows, and everything is fine. When those rows have been there long enough to begin being frozen, the next VACUUM operation on that table will rewrite all of them to freeze their transaction identifiers. Put another way, autovacuum will wake up and start using lots of I/O to perform the freezing.

The most obvious way to forestall this problem is to explicitly vacuum a table after a major load. Of course, that doesn't remove the problem entirely, because vacuuming doesn't freeze all the rows immediately and so some will remain for later vacuums.

The knee-jerk reaction for many people is to turn off autovacuum because it keeps waking up at the most inconvenient times. My way of doing this is described in the Controlling automatic database maintenance recipe.

Freezing takes place when a transaction identifier on a row becomes more than vacuum_freeze_min_age transactions older than the current next value, measured in xid values, not time. Normal VACUUM operations will perform a small amount of freezing as you go, and in most cases, you won't notice that at all. As explained in the previous example, large transactions leave many rows with the same transaction identifiers, so those might cause problems when it comes to freezing.

The VACUUM command is normally optimized to only look at the chunks of a table that require cleaning, both for normal vacuum and freezing operations.

If you fiddle with the vacuum parameters to try to forestall heavy VACUUM operations, then you'll notice that the autovacuum_freeze_max_age parameter controls when the table will be scanned by a forced VACUUM command. To put this another way, you can't turn off the need to freeze rows, but you can defer it to a more convenient time. The mistake comes from deferring it completely and then finding that PostgreSQL executes an aggressive, uncancellable vacuum to remedy the lack of freezing. My advice is to control autovacuum, as we described in the previous recipe, or perform explicit VACUUM operations at a time of your choosing, rather than wait for the inevitable emergency freeze operation.

The VACUUM command is also an efficient way to confirm the absence of page corruptions, so it is worth scanning the whole database, block by block, from time to time. To do this, you can run the following command on each of your databases:

VACUUM (DISABLE_PAGE_SKIPPING);

You can do this table by table as well. There's nothing important about running whole database VACUUM operations anymore; in earlier versions of PostgreSQL, this was important, so you may read that this is a good idea on the web.

You can focus on only the tables that most need freezing by using the vacuumdb utility with the new --min-xid-age and --min-mxid-age options. By setting those options, vacuumdb will skip them if the main table or toast table has a relfrozenxid older than the specified age threshold. If you choose the values carefully, this will skip tables that don't need freezing yet (there is no corresponding option for these on the VACUUM command, as there is in most other cases).

If you've never had a corrupt block, then you may only need to scan every 2 to 3 months. If you start to get corrupt blocks, then you may want to increase the scan rate to confirm that everything is OK. Corrupt blocks are usually hardware induced, though they show up as database errors. It's possible but rare that the corruption was from a PostgreSQL bug instead.

There's no easy way to fix page corruption at present. There are, however, ways to investigate and extract data from corrupt blocks, for example, by using the pageinspect contrib utility that Simon wrote. You can also detect them automatically by creating the whole cluster using the following code:

initdb --data-checksums

This command initializes the data directory and enables data block checksums. This means that every time something changes in a block, PostgreSQL will compute the new checksum, and then store the resulting block checksums in that same block so that a simple program can detect it.

Removing issues that cause bloat

Bloat can be caused by long-running queries or long-running write transactions that execute alongside write-heavy workloads. Resolving that is mostly down to understanding the workloads that are running on the server.

Getting ready

Look at the age of the oldest snapshots that are running, like this:

postgres=# SELECT now() -

  CASE

  WHEN backend_xid IS NOT NULL

  THEN xact_start

  ELSE query_start END

  AS age

, pid

, backend_xid AS xid

, backend_xmin AS xmin

, state

FROM  pg_stat_activity

WHERE backend_type = 'client backend'

ORDER BY 1 DESC;

age             |  pid  |   xid    |   xmin   |        state       

----------------+-------+----------+----------+------------------

00:00:25.791098 | 27624 |          | 10671262 | active

00:00:08.018103 | 27591 |          |          | idle in transaction

00:00:00.002444 | 27630 | 10703641 | 10703639 | active

00:00:00.001506 | 27631 | 10703642 | 10703640 | active

00:00:00.000324 | 27632 | 10703643 | 10703641 | active

00:00:00        | 27379 |          | 10703641 | active

The preceding example shows an updated workload of three sessions alongside one session that is waiting in an idle in transaction state, plus two other sessions that are only reading data.

How to do it…

If you have sessions stuck in the idle_in_transaction state, then you may want to consider setting the idle_in_transaction_session_timeout parameter so that transactions in that mode will be canceled. The default for that is zero, meaning there will be no cancellation.

If not, try running shorter transactions or shorter queries.

If that is not an option, then consider setting old_snapshot_threshold. This parameter sets a time delay, after which dead rows are at risk of being removed. If a query attempts to read data that has been removed, then we cancel the query. All queries executing in less time than the old_snapshot_threshold parameter will be safe. This is a very similar concept to the way Hot Standby works (see Chapter 12Replication and Upgrades).

How it works…

VACUUM cannot remove dead rows until they are invisible to all users. The earliest piece of data that's visible to a session is defined by its oldest snapshot's xmin value, or if that is not set, then by the backend's xid value.

There's more…

A session that is not running any query is in the idle state if it's outside of a transaction, or in the idle in transaction state if it's inside a transaction; that is, between a BEGIN and the corresponding COMMIT. Recall the Writing a script that either succeeds entirely or fails entirely recipe in Chapter 7Database Administration, which was about how BEGIN and COMMIT can be used to wrap several commands into one transaction.

The reason to distinguish between these two states is that locks are released at the end of a transaction. Hence, an idle in transaction session is not currently doing anything, but it might be preventing other queries, including VACUUM, from accessing some tables.

Removing old prepared transactions

You may have been routed here from other recipes, so you might not even know what prepared transactions are, let alone what an old prepared transaction looks like.

The good news is that prepared transactions don't just happen at random; they happen in certain situations. If you don't know what I'm talking about, that's OK! You don't need to, and better still, you probably don't have any prepared transactions either.

Prepared transactions are part of the two-phase commit feature, also known as 2PC. A transaction commits in two stages rather than one, allowing multiple databases to have synchronized commits. Its typical use is to combine multiple so-called resource managers using the XA protocol, which is usually provided by a Transaction Manager (TM), as used by the Java Transaction API (JTA) and others. If none of this means anything to you, then you probably don't have any prepared transactions.

Getting ready

First, check the setting of max_prepared_transactions:

SHOW max_prepared_transactions;

If your setting is more than zero, check whether you have any prepared transactions. As an example, you may find something like the following:

postgres=# SELECT * FROM pg_prepared_xacts;

-[ RECORD 1 ]------------------------------

transaction | 459812

gid         | prep1

prepared    | 2017-04-11  13:21:51.912374+01

owner       | postgres

database    | postgres

Here, gid (the global identifier) will usually be automatically generated.

How to do it…

Removing a prepared transaction is also referred to as resolving in-doubt transactions. The transaction is stuck between committing and aborting. The database or TM may have crashed, leaving the transaction midway through the two-phase commit process.

If you have a connection pool of 100 active connections and something crashes, you'll probably find 1 to 20 transactions stuck in the prepared state, depending on how long your average transaction is.

To resolve the transaction, we need to decide whether we want that change or not. The best way to do this is to check what happened externally to PostgreSQL. That should help you decide.

If you need further help, look at the There's more... section of this recipe.

If you wish to commit these changes, then use the following command:

COMMIT PREPARED 'prep1';

If you want to roll back these changes, then use the following command:

ROLLBACK PREPARED 'prep1';

How it works…

Prepared transactions are persistent across crashes, so you can't just do a fast restart to get rid of them. They have both an internal transaction identifier and an external global identifier. Either of these can be used to locate locked resources and help you decide how to resolve the transactions.

There's more…

If you're not sure what the prepared transaction did, you can go and look, but this is time-consuming. The pg_locks view shows locks that are held by prepared transactions. You can get a full report of what is being locked by using the following query:

postgres=# SELECT l.locktype, x.database, l.relation, l.page, l.tuple,l.classid, l.objid, l.objsubid, l.mode, x.transaction, x.gid, x.prepared, x.owner

FROM pg_locks l JOIN pg_prepared_xacts x

ON l.virtualtransaction = '-1/' || x.transaction::text;

The documents mention that you can join pg_locks to pg_prepared_xacts, but they don't mention that, if you join directly on the transaction ID, all it tells you is that there is a transaction lock unless there are some row-level locks. The table locks are listed as being held by a virtual transaction. A simpler query is the following:

postgres=# SELECT DISTINCT x.database, l.relation

FROM pg_locks l JOIN pg_prepared_xacts x

ON l.virtualtransaction = '-1/' || x.transaction::text

WHERE l.locktype != 'transactionid';

database | relation

---------+----------

postgres |    16390

postgres |    16401

(2 rows)

This tells you which relationships in which databases have been touched by the remaining prepared transactions. We don't know their names because we'd need to connect to those databases to check.

It is much harder to check the rows that have been changed by a transaction until it is committed and even then, deleted rows will be invisible.

Actions for heavy users of temporary tables

If you are a heavy user of temporary tables in your applications, then there are some additional actions that you may need to perform.

How to do it…

There are four main things to check, which are as follows:

  • Make sure you run VACUUM on system tables or enable autovacuum so that it will do this for you.
  • Monitor running queries to see how many temporary files are active and how large they are.
  • Tune the memory parameters. Think about increasing the temp_buffers parameter, but be careful not to over-allocate memory.
  • Separate the temp table's I/O. In a query-intensive system, you may find that reads/writes to temporary files exceed reads/writes on permanent data tables and indexes. In this case, you should create new tablespace(s) on separate disks, and ensure that the temp_tablespaces parameter is configured to use the additional tablespace(s).

How it works…

When we create a temporary table, we insert entries into the pg_classpg_type, and pg_attribute catalog tables. These catalog tables and their indexes begin to grow and bloat – an issue that will be covered in further recipes. To control that growth, you can either vacuum those tables manually or let autovacuum do its work. You cannot run ALTER TABLE against system tables, so it is not possible to set specific autovacuum settings for any of these tables.

If you vacuum the system catalog tables manually, make sure that you get all of the system tables. You can get the full list of tables to vacuum and a list of their indexes by using the following query:

postgres=# SELECT relname, pg_relation_size(oid) FROM pg_class

WHERE relkind in ('i','r') AND relnamespace = 'pg_catalog'::regnamespace

ORDER BY 2 DESC;

This results in the following output:

             relname             | pg_relation_size

---------------------------------+------------------

pg_proc                         |           450560

pg_depend                       |           344064

pg_attribute                    |           286720

pg_depend_depender_index        |           204800

pg_depend_reference_index       |           204800

pg_proc_proname_args_nsp_index  |           180224

pg_description                  |           172032

pg_attribute_relid_attnam_index |           114688

pg_operator                     |           106496

pg_statistic                    |           106496

pg_description_o_c_o_index      |            98304

pg_attribute_relid_attnum_index |            81920

pg_proc_oid_index               |            73728

pg_rewrite                      |            73728

pg_class                        |            57344

pg_type                         |            57344

pg_class_relname_nsp_index      |            40960

...(partial listing)

The preceding values are for a newly created database. These tables can become very large if they're not properly maintained, with values of 11 GB for one index being witnessed in one unlucky installation.

Identifying and fixing bloated tables and indexes

PostgreSQL implements Multiversion Concurrency Control (MVCC), which allows users to read data at the same time as writers make changes. This is an important feature for concurrency in database applications as it can allow the following:

  • Better performance because of fewer locks
  • Greatly reduced deadlocking
  • Simplified application design and management

Bloated tables and indexes are a natural consequence of MVCC design in PostgreSQL. Bloat is caused mainly by updates, as we must retain both the old and new updates for a certain period. Since these extra row versions are required to provide MVCC, some amount of bloat is normal and acceptable. Tuning to remove bloat completely isn't useful and probably a waste of time.

Bloating results in increased disk consumption, as well as performance loss – if a table is twice as big as it should be, scanning it takes twice as long. VACUUM is one of the best ways of removing bloat.

Many users execute VACUUM far too frequently, while at the same time complaining about the cost of doing so. This recipe is all about understanding when you need to run VACUUM by estimating the amount of bloat in tables and indexes.

Getting ready

MVCC is a core part of PostgreSQL and cannot be turned off, nor would you want it to be. The internals of MVCC have some implications for the DBA that need to be understood: each row represents a row version, so it has two system columns – xmin and xmax – indicating the identifiers of the two transactions when the version was created and deleted, respectively. The value of xmax is NULL if that version has not been deleted yet.

The general idea is that, instead of removing row versions, we alter their visibility by changing their xmin and/or xmax values. To be more precise, when a row is inserted, its xmin value is set to the "XID" or transaction ID of the creating transaction, while xmax is emptied; when a row is deleted, xmax is set to the number of the deleting transaction, without actually removing the row. An UPDATE operation is treated similarly to a DELETE operation, followed by INSERT; the deleted row represents the older version, and the row that's been inserted is the newer version. Finally, when rolling back a transaction, all of its changes are made invisible by marking that transaction ID as aborted.

In this way, we get faster DELETEUPDATE, and ROLLBACK statements, but the price of these benefits is that the SQL UPDATE command can cause tables and indexes to grow in size because they leave behind dead row versions. The DELETE and aborted INSERT statements take up space, which must be reclaimed by garbage collection. VACUUM is the command we use to reclaim space in a batch operation, though there is another internal feature named Heap-Only Tuples (HOT), which allows us to clean data blocks one at a time as we scan each data block if that is possible. HOT also reduces index bloat since not all updates require index maintenance.

How to do it…

The best way to understand this is to look at things the same way that autovacuum does, by using a view that's been created with the following query:

CREATE OR REPLACE VIEW av_needed AS

SELECT N.nspname, C.relname

, pg_stat_get_tuples_inserted(C.oid) AS n_tup_ins

, pg_stat_get_tuples_updated(C.oid) AS n_tup_upd

, pg_stat_get_tuples_deleted(C.oid) AS n_tup_del

, CASE WHEN pg_stat_get_tuples_updated(C.oid) > 0

       THEN pg_stat_get_tuples_hot_updated(C.oid)::real

          / pg_stat_get_tuples_updated(C.oid)

       END

  AS HOT_update_ratio

, pg_stat_get_live_tuples(C.oid) AS n_live_tup

, pg_stat_get_dead_tuples(C.oid) AS n_dead_tup

, C.reltuples AS reltuples

, round(COALESCE(threshold.custom, current_setting('autovacuum_vacuum_threshold'))::integer

        + COALESCE(scale_factor.custom, current_setting('autovacuum_vacuum_scale_factor'))::numeric

       * C.reltuples)

  AS av_threshold

, date_trunc('minute',

    greatest(pg_stat_get_last_vacuum_time(C.oid),

             pg_stat_get_last_autovacuum_time(C.oid)))

  AS last_vacuum

, date_trunc('minute',

    greatest(pg_stat_get_last_analyze_time(C.oid),

             pg_stat_get_last_analyze_time(C.oid)))

  AS last_analyze

, pg_stat_get_dead_tuples(C.oid) >

  round( current_setting('autovacuum_vacuum_threshold')::integer

       + current_setting('autovacuum_vacuum_scale_factor')::numeric

       * C.reltuples)

  AS av_needed

, CASE WHEN reltuples > 0

       THEN round(100.0 * pg_stat_get_dead_tuples(C.oid) / reltuples)

       ELSE 0 END

  AS pct_dead

FROM pg_class C

LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)

NATURAL LEFT JOIN LATERAL (

   SELECT (regexp_match(unnest,'^[^=]+=(.+)$'))[1]

   FROM unnest(reloptions)

   WHERE unnest ~ '^autovacuum_vacuum_threshold='

) AS threshold(custom)

NATURAL LEFT JOIN LATERAL (

   SELECT (regexp_match(unnest,'^[^=]+=(.+)$'))[1]

   FROM unnest(reloptions)

   WHERE unnest ~ '^autovacuum_vacuum_scale_factor='

) AS scale_factor(custom)

WHERE C.relkind IN ('r', 't', 'm')

  AND N.nspname NOT IN ('pg_catalog', 'information_schema')

  AND N.nspname NOT LIKE 'pg_toast%'

ORDER BY av_needed DESC, n_dead_tup DESC;

We can then use this to look at individual tables, as follows:

postgres=# x  

postgres=# SELECT * FROM av_needed WHERE nspname = 'public' AND relname = 'pgbench_accounts';

We will get the following output:

-[ RECORD 1 ]----+------------------------

nspname          | public

relname          | pgbench_accounts

n_tup_ins        | 100001

n_tup_upd        | 117201

n_tup_del        | 1

hot_update_ratio | 0.123454578032611

n_live_tup       | 100000

n_dead_tup       | 0

reltuples        | 100000

av_threshold     | 20050

last_vacuum      | 2010-04-29 01:33:00+01

last_analyze     | 2010-04-28 15:21:00+01

av_needed        | f

pct_dead         | 0

How it works…

We can compare the number of dead row versions, shown as n_dead_tup, against the required threshold, av_threshold.

The preceding query doesn't take into account table-specific autovacuum thresholds. It could do so if you need it, but the main purpose of the query is to give us information to understand what is happening, and then set the parameters accordingly – not the other way around.

Notice that the table query shows insertions, updates, and deletions so that you can understand your workload better. There is also something named hot_update_ratio. This shows the fraction of updates that take advantage of the HOT feature, which allows a table to self-vacuum as the table changes. If that ratio is high, then you may avoid VACUUM activities altogether or at least for long periods. If the ratio is low, then you will need to execute VACUUM commands or autovacuum more frequently. Note that the ratio never reaches 1.0, so if you have it above 0.95, then that is very good and you need not think about it further.

HOT updates take place when the UPDATE statement does not change any of the column values that are indexed by any index, and there is enough free space in the disk page where the updated row is located. If you change even one column that is indexed by just one index, then it will be a non-HOT update, and there will be a performance hit. So, carefully selecting indexes can improve update performance and reduce the need for maintenance. Also, if HOT updates do occur, though not often enough for your liking, you might want to try to decrease the fillfactor storage parameter for the table to make more space for them. Remember that this will only be important on your most active tables. Seldom touched tables don't need much tuning.

To recap, non-HOT updates cause indexes to bloat. The following query is useful in investigating the index size and how it changes over time. It runs fairly quickly and can be used to monitor whether your indexes are changing in size over time:

SELECT

nspname,relname,

round(100 * pg_relation_size(indexrelid) /

                    pg_relation_size(indrelid)) / 100

                AS index_ratio,     

  pg_size_pretty(pg_relation_size(indexrelid))

                AS index_size,

  pg_size_pretty(pg_relation_size(indrelid))

                AS table_size

FROM pg_index I

LEFT JOIN pg_class C ON (C.oid = I.indexrelid)

LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)

WHERE

  nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast') AND

  C.relkind='i' AND

  pg_relation_size(indrelid) > 0;

Another route is to use the pgstattuple contrib extension, which provides very detailed statistics on tables and indexes:

CREATE EXTENSION pgstattuple;

You can scan tables using pgstattuple(), as follows:

test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');

The output will look as follows:

-[ RECORD 1 ]------+-------

table_len          | 458752

tuple_count        | 1470

tuple_len          | 438896

tuple_percent      | 95.67

dead_tuple_count   | 11

dead_tuple_len     | 3157

dead_tuple_percent | 0.69

free_space         | 8932

free_percent       | 1.95

The downside of pgstattuple is that it derives exact statistics by scanning the whole table and counting everything. If you have time to scan the table, you may as well vacuum the whole table anyway. So, a better idea is to use pgstattuple_approx(), which is much, much faster, and yet is still fairly accurate. It works by accessing the table's visibility map first and then only scanning the pages that need VACUUM, so I recommend that you use it in all cases for checking tables (there is no equivalent for indexes since they don't have a visibility map):

postgres=# select * from pgstattuple_approx('pgbench_accounts');

-[ RECORD 1 ]--------+-----------------

table_len            | 268591104

scanned_percent      | 0

approx_tuple_count   | 1001738

approx_tuple_len     | 137442656

approx_tuple_percent | 51.1717082037088

dead_tuple_count     | 0

dead_tuple_len       | 0

dead_tuple_percent   | 0

approx_free_space    | 131148448

approx_free_percent  | 48.8282917962912

You can also scan indexes using pgstatindex(), as follows:

postgres=> SELECT * FROM pgstatindex('pg_cast_oid_index');

-[ RECORD 1 ]------+------

version            | 2

tree_level         | 0

index_size         | 8192

root_block_no      | 1

internal_pages     | 0

leaf_pages         | 1

empty_pages        | 0

deleted_pages      | 0

avg_leaf_density   | 50.27

leaf_fragmentation | 0

There's more…

You may want to set up monitoring for the bloated tables and indexes. Look at the Nagios plugin called check_postgres_bloat, which is a part of the check_postgres plugins.

It provides some flexible options to assess bloat. Unfortunately, it's not that well documented, but if you've read this, it should make sense. You'll need to play with it to get the thresholding correct anyway, so that shouldn't be a problem.

Also, note that the only way to know for certain the exact bloat of a table or index is to scan the whole relationship. Anything else is just an estimate and may lead to you running maintenance either too early or too late.

Monitoring and tuning a vacuum

This recipe covers both the VACUUM command and autovacuum, which I refer to collectively as vacuums (non-capitalized).

If you're currently waiting for a long-running vacuum (or autovacuum) to finish, go straight to the How to do it... section.

If you've just had a long-running vacuum complete, then you may want to think about setting a few parameters for next time, so read the How it works… section.

Getting ready

Let's watch what happens when we run a large VACUUM. Don't run VACUUM FULL, because it runs for a long time while holding an AccessExclusiveLock on the table. Ouch.

First, locate which process is running this VACUUM by using the pg_stat_activity view to identify the specific pid (34399 is just an example).

How to do it…

Repeatedly execute the following query to see the progress of the VACUUM command, specifying the pid of the process you wish to monitor:

postgres=# SELECT * FROM pg_stat_progress_vacuum WHERE pid = 343

The next section explains what this all means.

How it works…

VACUUM works in various phases:

  1. The first phase is initializing but this phase is over so quickly that you'll never see it.
  2. The first main phase is scanning heap, which performs about 90% of the cleanup of data blocks in the heap. The heap_blks_scanned columns will increase from 0 up to the value of heap_blks_total. The number of blocks that have been vacuumed is shown as heap_blks_vacuumed, and the resulting rows to be removed are shown as num_dead_tuples. During this phase, by default, VACUUM will skip blocks that are currently being pinned by other users – the DISABLE_PAGE_SKIPPING option controls that behavior. If num_dead_tuples reaches max_dead_tuples, then we move straight to the next phase, though we will return later to continue scanning:

    pid                | 34399

    datid              | 12515

    datname            | postgres

    relid              | 16422

    phase              | scanning heap

    heap_blks_total    | 32787

    heap_blks_scanned  | 25207

    heap_blks_vacuumed | 0

    index_vacuum_count | 0

    max_dead_tuples    | 9541017

    num_dead_tuples    | 537600

  3. After this, we switch to the second main phase, where we start vacuuming indexes. We can avoid scanning the indexes altogether, so you may find that vacuuming is faster in this release. You can control whether indexes are vacuumed by setting the vacuum_cleanup_index_scale_factor parameter, which can also be set at the table level if needed, though the default value seems good.

While this phase is happening, the progress data doesn't change until it has vacuumed all of the indexes. This phase can take a long time; more indexes increase the time that is required unless you specify parallelism (more on this later). After this phase, we increment index_vacuum_count. Note that this does not refer to the number of indexes on the table, only how many times we have scanned all the indexes:

pid                | 3439

datid              | 12515

datname            | postgres

relid              | 16422

phase              | vacuuming indexes

heap_blks_total    | 32787

heap_blks_scanned  | 32787

heap_blks_vacuumed | 0

index_vacuum_count | 0

max_dead_tuples    | 9541017

num_dead_tuples    | 999966

  1. Once the indexes have been vacuumed, we move onto the third main phase, where we return to vacuuming the heap. In this phase, we scan through the heap, skipping any blocks that did not have dead tuples, and removing completely any old tuple item pointers.
  2. If num_dead_tuples reaches the limit of max_dead_tuples, then we repeat phases (1) "scanning heap," (2) "vacuuming indexes," and then (3) "vacuuming the heap" until the whole table has been scanned. Each iteration will further increment index_vacuum_count. The value of max_dead_tuples is controlled by the setting of maintenance_work_mem. PostgreSQL needs 6 bytes of memory for each dead row pointer. It's a good idea to set maintenance_work_mem high enough to avoid multiple iterations since these can take lots of extra time:

    pid                | 34399

    datid              | 12515

    datname            | postgres

    relid              | 16422

    phase              | vacuuming heap

    heap_blks_total    | 32787

    heap_blks_scanned  | 32787

    heap_blks_vacuumed | 25051

    index_vacuum_count | 1

    max_dead_tuples    | 9541017

    num_dead_tuples    | 999966

  3. If the indexes were vacuumed, we then clean up the indexes, which is a short phase where various pieces of metadata are updated.
  4. If there are many empty blocks at the end of the table, VACUUM will attempt to get AccessExclusiveLock on the table. Once acquired, it will truncate the end of the table, showing a phase of truncating the heap. Truncation does not occur every time because PostgreSQL will only attempt it if the gain is significant and if there's no conflicting lock; if it does, the truncation can often last a long time because it reads from the end of the table backward to find the truncation point. (Note that AccessExclusiveLock is passed through to physical replication standby servers and can cause replication conflicts, so you may wish to avoid it by using the TRUNCATE OFF option. You can also set the vacuum_truncate option on a table to ensure autovacuum doesn't attempt the truncation. However, there is no function to specifically request truncation of a table as an individual action.)
  5. Once a table has been vacuumed, we vacuum the TOAST table by default. This behavior is controlled by the TOAST option. This isn't shown as a separate phase in the progress view; vacuuming the TOAST table will be shown as a separate vacuum.

To make VACUUM run in minimal time, maintenance_work_mem should be set to anything up to 1 GB, according to how much memory you can allocate to this task at this time. This will minimize the number of times indexes are scanned. If you avoid running vacuums, then more dead rows will be collected when it runs, which may cause an overflow of max_dead_tuples, thus causing the vacuum to take longer to run.

Using the INDEX_CLEANUP OFF option allows you to request that steps after "scanning heap" will be skipped, which will then make VACUUM go much faster. This is not an option with autovacuum.

If your filesystem supports it, you may also be able to set maintenance_io_concurrency to an optimal value for running ANALYZE and VACUUM.

VACUUM can be blocked while waiting for table-level locks by other DDL statements such as a long-running ALTER TABLE or CREATE INDEX. If that happens, the lock waits are not shown in the progress view, so you may also want to look in the pg_stat_activity or pg_locks views. You can request that locked tables be skipped with the SKIP_LOCKED option.

You can request multiple options for a VACUUM command, as shown in these examples, both of which do the same thing:

VACUUM (DISABLE_PAGE_SKIPPING, SKIP_LOCKED, VERBOSE) my_table;

VACUUM (DISABLE_PAGE_SKIPPING ON, SKIP_LOCKED ON, VERBOSE ON, ANALYZE OFF) my_table;

There's more…

VACUUM doesn't run in parallel on a single table. However, if you have more than one index on a table, the index scanning phases can be conducted in parallel, if specifically requested by the user – autovacuum never does this. To use this feature, add the PARALLEL option and specify the number of workers, which will be limited to the number of indexes, the value of max_parallel_maintenance_workers, and whether we exceed min_parallel_index_scan_size.

If you want to run multiple VACUUMs at once, you can do this by, for example, running four vacuums, each job with up to two parallel workers to scan indexes, scanning all databases:

$ vacuumdb --jobs=4 –parallel=2 --all

If you run multiple VACUUM at once, you'll use more memory and I/O, so be careful.

Vacuums can be slowed down by raising vacuum_cost_delay or lowering vacuum_cost_limit. Setting vacuum_cost_delay too high is counterproductive. VACUUM is your friend, not your enemy, so delaying it until it doesn't happen at all just makes things worse. Be careful.

Each vacuum sleeps when the work it has performed takes it over its limit, so the processes running VACUUM do not all sleep at the same time.

VACUUM  commands use the value of vacuum_cost_limit as their limit.

For autovacuum workers, their limit is a share of the total autovacuum_vacuum_cost_limit, so the total amount of work that's done is the same no matter what the setting of autovacuum_max_workers.

autovacuum_max_workers should always be set to more than 2 to ensure that all the tables can begin vacuuming when they need it. Setting it too high may not be very useful, so you need to be careful.

If you need to change the settings to slow down or speed up a running process, then vacuums will pick up any new default settings when you reload the postgresql.conf file.

If you do choose to run VACUUM FULL, the progress for that is available in PostgreSQL 12+ via the pg_stat_progress_cluster catalog view, which also covers the CLUSTER command. Note that you can have multiple jobs running VACUUM FULL, but you should not specify parallel workers when using FULL to avoid deadlocks.

PostgreSQL 13+ allows ANALYZE progress reporting via pg_stat_progress_analyze. ANALYZE ignores any parallel workers that have been set.

Maintaining indexes

Just as tables can become bloated, so can indexes. However, reusing space in indexes is much less effective. In the Identifying and fixing bloated tables and indexes recipe, you saw that non-HOT updates can cause bloated indexes. Non-primary key indexes are also prone to some bloat from normal INSERT commands, as is common in most relational databases. Indexes can become a problem in many database applications that involve a high proportion of INSERT and DELETE commands.

autovacuum does not detect bloated indexes, nor does it do anything to rebuild indexes. Therefore, we need to look at other ways to maintain indexes.

Getting ready

PostgreSQL supports commands that will rebuild indexes for you. The client utility, reindexdb, allows you to execute the REINDEX command conveniently from the operating system:

$ reindexdb

This executes the SQL REINDEX command on every table in the default database. If you want to reindex all your databases, then use the following command:

$ reindexdb -a

That's what the manual says, anyway. My experience is that many indexes don't need rebuilding, so you should probably be more selective of what you rebuild.

Also, REINDEX puts a full table lock (AccessExclusiveLock) on the table while it runs, preventing even SELECTs against the table. You don't want to run that on your whole database!

So, I recommend that you rebuild individual indexes or all the indexes on one table at a time.

Try these steps instead:

  1. First, let's create a test table with two indexes – a primary key and an additional index – as follows:

    DROP  TABLE  IF  EXISTS  test; CREATE  TABLE  test

    (id  INTEGER  PRIMARY  KEY

    ,category  TEXT

    ,  value  TEXT);

    CREATE  INDEX  ON  test  (category);

  2. Now, let's look at the internal identifier of the tables, oid, and the current file number (relfilenodes), as follows:

    SELECT oid, relname, relfilenode

    FROM pg_class

    WHERE oid in (SELECT indexrelid

                          FROM pg_index

                          WHERE indrelid = 'test'::regclass);

              oid  |      relname      | relfilenode

            -------+-------------------+-------------

             16639 | test_pkey         |       16639

             16641 | test_category_idx |       16641

            (2 rows)

How to do it…

PostgreSQL supports a command known as REINDEX CONCURRENTLY, which builds an index without taking a painful AccessExclusiveLock:

REINDEX INDEX CONCURRENTLY test_category_idx;

When we check our internal identifiers again, we get the following:

SELECT oid, relname, relfilenode

FROM pg_class

WHERE oid in (SELECT indexrelid

                FROM pg_index

                WHERE indrelid = 'test'::regclass);

  oid  |      relname      | relfilenode

-------+-------------------+-------------

16639 | test_pkey         |       16639

16642 | test_category_idx |       16642

(2 rows)

Here, we can see that test_category_idx is now a completely new index.

This seems pretty good, and it works on primary keys too.

If you do choose to use the reindexdb tool, make sure that you use these options to reindex one table at a time, concurrently, with some useful output:

$ reindexdb --concurrently -t test --verbose

INFO:  index "public.test_category_idx" was reindexed

INFO:  index "public.test_pkey" was reindexed

INFO:  index "pg_toast.pg_toast_16414_index" was reindexed

INFO:  table "public.test" was reindexed

DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.02 s.

How it works…

The REINDEX INDEX CONCURRENTLY statement allows the INSERTUPDATE, and DELETE commands to be used while the index is being created. It cannot be executed inside another transaction, and only one index per table can be created concurrently at any time.

If you perform REINDEX TABLE CONCURRENTLY, then each index will be recreated one after the other. However, each index can be built in parallel, as discussed shortly.

REINDEX will also work on partitioned tables, from PostgreSQL 14+.

You can also now use REINDEX to change the tablespaces of indexes, as it works.

Also new in PostgreSQL 14+ is the ability to use VACUUM to ignore long-running transactions that execute REINDEX on other tables, making it even more practical to use on production database servers.

There's more…

CREATE INDEX/ REINDEX for B-tree indexes can be run in parallel for PostgreSQL 11+. The amount of parallelism will be directly controlled by the setting of a table's parallel_workers parameter. Be careful since setting this at the table level affects all queries, not just the index build/rebuild. If the table-level parameter is not set, then the maintenance_work_mem and max_parallel_maintenance_workers parameters will determine how many workers will be used; the default is 64 MB for maintenance_work_mem and 2 MB for max_parallel_maintenance_workers. Increase both to get further gains in performance and/or concurrency. Note that these workers are shared across all users, so be careful not to over-allocate jobs; otherwise, there won't be enough workers to let everybody run in parallel.

If you are fairly new to database systems, you may think that rebuilding indexes for performance is something that only PostgreSQL needs to do. Other DBMSes require this as well – they just don't say so.

Indexes are designed for performance and, in all databases, deleting index entries causes contention and loss of performance. PostgreSQL does not remove index entries for a row when that row is deleted, so an index can be filled with dead entries. PostgreSQL attempts to remove dead entries when a block becomes full, but that doesn't stop a small number of dead entries from accumulating in many data blocks.

Finding unused indexes

Selecting the correct set of indexes for a workload is known to be a hard problem. It usually involves trial and error by developers and DBAs to get a good mix of indexes.

Tools for identifying slow queries exist and many SELECT statements can be improved by adding an index.

What many people forget is to check whether the mix of indexes remains valuable over time, which is something for the DBA to investigate and optimize.

How to do it…

PostgreSQL keeps track of each access against an index. We can view that information and use it to see whether an index is unused, as follows:

postgres=# SELECT schemaname, relname, indexrelname, idx_scan

FROM pg_stat_user_indexes ORDER BY idx_scan;

schemaname |       indexrelname       | idx_scan

------------+--------------------------+----------

public     | pgbench_accounts_bid_idx |        0

public     | pgbench_branches_pkey    |    14575

public     | pgbench_tellers_pkey     |    15350

public     | pgbench_accounts_pkey    |   114400

(4 rows)

As shown in the preceding code, there is one unused index, alongside others that have some usage. You now need to decide whether unused means that you should remove the index. That is a more complex question, so we need to explain how it works.

How it works…

The PostgreSQL statistics accumulate various pieces of useful information. These statistics can be reset to zero using an administrator function. Also, as the data accumulates over time, we usually find that objects that have been there for longer periods have higher apparent usage. So, if we see a low number for idx_scan, then it may be that the index was newly created (as was the case in my preceding demonstration), or that the index is only used by a part of the application that runs only at certain times of the day, week, month, and so on.

Another important consideration is that the index may be a unique constraint index that exists specifically to safeguard against duplicate INSERT commands. An INSERT operation does not show up as idx_scan, even if the index was used while checking the uniqueness of the newly inserted values, whereas UPDATE or DELETE may show up because they have to locate the row first. So, a table that only has INSERT commands against it will appear to have unused indexes.

Here is an updated version of the preceding query, which excludes unique constraint indexes:

SELECT schemaname

, relname

, indexrelname

, idx_scan

FROM pg_stat_user_indexes i

LEFT JOIN pg_constraint c

   ON i.indexrelid = c.conindid

WHERE c.contype IS NULL

ORDER BY idx_scan DESC;

Also, some indexes that show usage might be showing historical usage, and there is no further usage. Alternatively, it might be the case that some queries use an index where they could just as easily and almost as cheaply use an alternative index. Those things are for you to explore and understand before you take action. A very common approach is to regularly monitor such numbers to gain knowledge by examining their evolution over time, both on the master database and any replicated Hot Standby nodes.

In the end, you may decide that you want to remove an index. If only there was a way to try removing an index and then put it back again quickly, in case you cause problems! Rebuilding an index may take hours on a big table, so these decisions can be a little scary. No worries! Just follow the next recipe, Carefully removing unwanted indexes.

Carefully removing unwanted indexes

Carefully removing? Do you mean pressing Enter gently after typing DROP INDEX? Err, no!

The reasoning is that it takes a long time to build an index and a short time to drop it.

What we want is a way of removing an index so that if we discover that removing it was a mistake, we can put the index back again quickly.

Getting ready

The following query will list all invalid indexes, if any:

SELECT ir.relname AS indexname

, it.relname AS tablename

, n.nspname AS schemaname

FROM pg_index i

JOIN pg_class ir ON ir.oid = i.indexrelid

JOIN pg_class it ON it.oid = i.indrelid

JOIN pg_namespace n ON n.oid = it.relnamespace

WHERE NOT i.indisvalid;

Take note of these indexes so that you can tell whether a given index is invalid later because we marked it as invalid during this recipe, in which case it can safely be marked as valid, or because it was already invalid for other reasons.

How to do it…

Here, we will describe a procedure that allows us to deactivate an index without actually dropping it so that we can appreciate what its contribution was and possibly reactivate it:

  1. First, create the following function:

    CREATE  OR  REPLACE  FUNCTION  trial_drop_index(iname  TEXT) RETURNS  VOID

    LANGUAGE  SQL  AS  $$ UPDATE  pg_index

    SET  indisvalid  =  false

    WHERE  indexrelid  =  $1::regclass;

    $$;

  2. Then run it to perform a trial of dropping the index.
  3. If you experience performance issues after dropping the index, then use the following function to undrop the index:

    CREATE  OR  REPLACE  FUNCTION  trial_undrop_index(iname  TEXT) RETURNS  VOID

    LANGUAGE  SQL  AS  

    $$ UPDATE  pg_index

    SET  indisvalid  =  true

    WHERE  indexrelid  =  $1::regclass;

    $$;

    Note

    Be careful to avoid undropping any index that was detected by the query in the Getting ready section; if it wasn't marked as invalid when applying this recipe, then it may be unusable because it isn't valid.

How it works…

This recipe also uses some inside knowledge. When we create an index using CREATE INDEX CONCURRENTLY, it is a two-stage process. The first phase builds the index and then marks it as invalid. The INSERTUPDATE, and DELETE statements now begin maintaining the index, but we perform a further pass over the table to see whether we missed anything, before declaring the index valid. User queries don't use the index until it says that it is valid.

Once the index has been built and the valid flag has been set, if we set the flag to invalid, the index will still be maintained. It's just that it will not be used by queries. This allows us to turn the index off quickly, though with the option to turn it on again if we realize that we do need the index after all. This makes it practical to test whether dropping the index will alter the performance of any of your most important queries.

Planning maintenance

Monitoring systems are not a substitute for good planning. They alert you to unplanned situations that need attention. The more unplanned things you respond to, the greater the chance that you will need to respond to multiple emergencies at once. And when that happens, something will break. Ultimately, that is your fault. If you wish to take your responsibilities seriously, you should plan for this.

How to do it…

This recipe is all about planning, so we'll provide discussion points rather than portions of code. We'll cover the main points that should be addressed and provide a list of points as food for thought, around which the actual implementation should be built:

  • Let's break a rule: If you don't have a backup, take one now. I mean now – go on, off you go! Then, let's talk some more about planning maintenance. If you already have, well done! It's hard to keep your job as a DBA if you lose data because of missing backups, especially today, when everybody's grandmother knows to keep their photos backed up.
  • First, plan your time: Decide on a regular date to perform certain actions. Don't allow yourself to be a puppet of your monitoring system, running up and down every time the lights change. If you keep getting dragged off on other assignments, then you must understand that you need to get a good handle on the database maintenance to make sure that it doesn't bite you.
  • Don't be scared: It's easy to worry about what you don't know, and either overreact or underreact. Your database probably doesn't need to be inspected daily, but it's never a bad practice.

How it works…

Build a regular cycle of activity around the following tasks:

  • Capacity planning: Observe long-term trends in system performance and keep track of the growth of database volumes. Plan to schedule any new data feeds and new projects that increase the rates of change. This is best done monthly so that you can monitor what has happened and what will happen.
  • Backups, recovery testing, and emergency planning: Organize regular reviews of written plans and test scripts. Check the tape rotation, confirm that you still have the password to the off-site backups, and so on. Some sysadmins run a test recovery every night so that they always know that successful recovery is possible.
  • Vacuum and index maintenance: Do this to reduce bloat, as well as to collect optimizer statistics through the ANALYZE command. Also, regularly check index usage, drop unused indexes, and reindex concurrently as needed. Consider VACUUM again, with the need to manage the less frequent freezing process. This is listed as a separate task so that you don't ignore this and let it bite you later!
  • Server log file analysis: How many times has the server restarted? Are you sure you know about each incident?
  • Security and intrusion detection: Has your database already been hacked? What did they do?
  • Understanding usage patterns: If you don't know much about what your database is used for, then I'll wager it is not very well tuned or maintained.
  • Long-term performance analysis: It's a common occurrence for me to get asked to come and tune a slow system. Often, what happens is that a database server gets slower over a very long period. Nobody ever noticed any particular day when it got slow – it just got slower over time. Keeping records of response times over time can help you confirm whether everything is as good now as it was months or years earlier. This activity is where you may reconsider current index choices.

Many of these activities are mentioned in this chapter or throughout the rest of this cookbook. Some are not because they aren't very technical and are more about planning and understanding your environment.

There's more…

You may also find time to consider the following:

  • Data quality: Is the content of the database accurate and meaningful? Could the data be enhanced?
  • Business intelligence: Is the data being used for everything that can bring value to the organization?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.208.25