Cleaning up the database

Often, a database can contain several unused objects or very old data. Cleaning up these objects helps administrators perform a backup of images more quickly. From the development point of view, unused objects are similar to quiet noise because they affect the refactoring process.

Getting ready

In database applications, one needs to keep the database clean as database objects might hinder quick development due to the objects' dependencies. To clean the database, one needs to identify the unused database objects, including tables, views, indexes, and functions.

A recipe for bloated tables and indexes will not be introduced here; you can take a look at the bucardo check_postgres Nagios plugin code at https://bucardo.org/wiki/Check_postgres to understand how bloats in tables and indexes can be calculated.

How to do it…

Table statistics, such as the number of live rows, index scans, and sequential scans, can help identify empty and unused tables. Note that the following queries are based on statistics, so the result needs to be validated. The pg_stat_user_tables function provides this information, and the following query shows only two empty tables:

car_portal=# SELECT relname FROM pg_stat_user_tables WHERE n_live_tup= 0 limit 2;
        relname
-----------------------
 logged_actions
 advertisement_picture
(1  rows)

To find the empty columns, one can have a look at the null_fraction attribute of the pg_stats table, as follows:

car_portal=# SELECT schemaname, tablename, attname FROM pg_stats WHERE null_frac= 1 and schemaname NOT IN ('pg_catalog', 'information_schema') limit 1;
   schemaname   | tablename | attname
----------------+-----------+---------
 car_portal_app | log       | old_row
(1 row)

To find the useless indexes, two techniques can be applied: the first technique is to determine whether an index is duplicated or overlapped with another index, and the second technique is to assess whether the index is used based on the index statistics.

The following query can be used to assess whether an index is used excluding constraint indexes—the unique constraints and primary keys—based on the catalog statistics:

SELECT schemaname, relname, indexrelname FROM pg_stat_user_indexes s JOIN pg_index i ON s.indexrelid = i.indexrelid WHERE idx_scan=0 AND NOT indisunique AND NOT indisprimary;

Overlapping index attributes can be used to identify duplicate indexes, as shown in the following example:

WITH index_info AS
(SELECT pg_get_indexdef(indexrelid) AS index_def, indexrelid::regclass index_name , indrelid::regclass table_name, array_agg(attname) AS index_att
FROM
  pg_index i JOIN
  pg_attribute a ON i.indexrelid = a.attrelid
GROUP BY pg_get_indexdef(indexrelid), indrelid,  indexrelid
)
SELECT DISTINCT
  CASE WHEN a.index_name > b.index_name THEN a.index_def ELSE b.index_def END AS index_def,
  CASE WHEN a.index_name > b.index_name THEN a.index_name ELSE b.index_name END AS index_name,
  CASE WHEN a.index_name > b.index_name THEN b.index_def ELSE a.index_def END AS overlap_index_def,
  CASE WHEN a.index_name > b.index_name THEN b.index_def ELSE a.index_def END AS overlap_index_name,
  a.table_name
FROM
  index_info a INNER JOIN index_info b ON (a.index_name != b.index_name AND a.table_name = b.table_name AND a.index_att && b.index_att );

There's more…

Cleaning up unused views and functions is a little bit tricky. By default, PostgreSQL collects statistics about indexes and tables but not functions. To enable statistics collection on functions, the track_functions setting needs to be enabled. The statistics on functions can be found in the pg_stat_user_functions table.

For views, there are no statistics collected unless the views are materialized. In order to assess whether a view is used, we need to do this manually. This can be done by rewriting the view and joining it with a function with a certain side effect, such as updating a table and increasing the number of times the view is accessed or raising a certain log message. To test this technique, let's write a simple function that raises a log, as follows:

CREATE OR REPLACE FUNCTION monitor_view_usage (view_name TEXT) RETURNS BOOLEAN AS $$
BEGIN
  RAISE LOG 'The view % is used on % by % ', view_name, current_time, session_user;
  RETURN TRUE;
END;
$$
LANGUAGE plpgsql
cost .001;

Now, let's assume that we want to drop the following view; however, there is uncertainty regarding whether an application depends on it:

CREATE OR REPLACE VIEW sport_car AS
SELECT car_id, number_of_owners,regestration_number
FROM car_portal_app.car
WHERE number_of_doors = 3;

To ensure that the view is not used, the view should be rewritten as follows, where the monitor_view_usage function is used, and the log files should be monitored for a certain period of time. The following query can help:

CREATE OR REPLACE VIEW sport_car AS SELECT car_id, number_of_owners, regestration_number
FROM car_portal_app.car CROSS JOIN monitor_view_usage ('sport_car')
WHERE number_of_doors = 3;

If the view is used, an entry in the log file should appear, as follows:

2015-06-03 17:55:04 CEST LOG:  The view sport_car is used on 17:55:04.571+02 by postgres.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.143.21