Chapter 2. Database Administration

This chapter will cover what we feel are the most common activities for basic administration of a PostgreSQL server; namely: role management, database creation, add-on installation, backup, and restore. We’ll assume you’ve already installed PostgreSQL and have one of the administration tools at your disposal.

Configuration Files

Three main configuration files control basic operations of a PostgreSQL server instance. These files are all located in the default PostgreSQL data folder. You can edit them using your text editor of choice, or using the admin pack that comes with pgAdmin (Editing postgresql.conf and pg_hba.conf from pgAdmin).

  • postgresql.conf controls general settings, such as how much memory to allocate, default storage location for new databases, which IPs PostgreSQL listens on, where logs are stored, and so forth.

  • pg_hba.conf controls security. It manages access to the server, dictating which users can login into which databases, which IPs or groups of IPs are permitted to connect and the authentication scheme expected.

  • pg_ident.conf is the mapping file that maps an authenticated OS login to a PostgreSQL user. This file is used less often, but allows you to map a server account to a PostgreSQL account. For example, people sometimes map the OS root account to the postgre’s super user account. Each authentication line in pg_hba.conf can use a different pg_ident.conf file.

If you are ever unsure where these files are located, run the Example 2-1 query as a super user while connected to any of your databases.

Example 2-1. Location of configuration files

SELECT name, setting
FROM pg_settings 
WHERE category = 'File Locations';
name              |          setting
------------------+-----------------------------
config_file       |E:/PGData91/postgresql.conf
data_directory    |E:/PGData91
external_pid_file |
hba_file          |E:/PGData91/pg_hba.conf
ident_file        |E:/PGData91/pg_ident.conf

The postgresql.conf File

postgresql.conf controls the core settings of the PostgreSQL server instance as well as default settings for new databases. Many settings—such as sorting memory—can be overriden at the database, user, session, and even function levels for PostgreSQL versions higher than 8.3.

Details on how to tune this can be found at Tuning Your PostgreSQL Server.

An easy way to check the current settings you have is to query the pg_settings view, as we demonstrate in Example 2-2. Details of the various columns of information and what they mean are described in pg_settings.

Example 2-2. Key Settings

SELECT name, context 1, unit 2
 , setting 3, boot_val 4, reset_val 5
FROM pg_settings
WHERE name in('listen_addresses','max_connections','shared_buffers','effective_cache_size', 'work_mem', 'maintenance_work_mem')
ORDER BY context,name;
name                 |  context   | unit | setting | boot_val  | reset_val
---------------------+------------+------+---------+-----------+-----------
listen_addresses     | postmaster |      | *       | localhost | *
max_connections      | postmaster |      | 100     | 100       | 100
shared_buffers       | postmaster | 8kB  | 4096    | 1024      | 4096
effective_cache_size | user       | 8kB  | 16384   | 16384     | 16384
maintenance_work_mem | user       | kB   | 16384   | 16384     | 16384
work_mem             | user       | kB   | 1024    | 1024      | 1024
1

If context is set to postmaster, it means changing this parameter requires a restart of the postgresql service. If context is set to user, changes require a reload to take effect globally. Furthermore, user context settings can be overridden at the database, user, session, or function levels.

2

unit tells you the unit of measurement that the setting is reported in. This is very important for memory settings since, as you can see, some are reported in 8 kB and some in kB. In postgresql.conf, usually you explicitly set these to a unit of measurement you want to record in, such as 128 MB. You can also get a more human-readable display of a setting by running the statement: SHOW effective_cache_size;, which gives you 128 MB, or SHOW maintenance_work_mem;, which gives you 16 MB for this particular case. If you want to see everything in friendly units, use SHOW ALL.

3 4 5

setting is the currently running setting in effect; boot_val is the default setting; reset_val is the new value if you were to restart or reload. You want to make sure that after any change you make to postgresql.conf the setting and reset_val are the same. If they are not, it means you still need to do a reload.

We point out the following parameters as ones you should pay attention to in postgresql.conf. Changing their values requires a service restart:

  • listen_addresses tells PostgreSQL which IPs to listen on. This usually defaults to localhost, but many people change it to *, meaning all available IPs.

  • port defaults to 5432. Again, this is often set in a different file in some distributions, which overrides this setting. For instance, if you are on a Red Hat or CentOS, you can override the setting by setting a PGPORT value in /etc/sysconfig/pgsql/your_service_name_here.

  • max_connections is the maximum number of concurrent connections allowed.

  • shared_buffers defines the amount of memory you have shared across all connections to store recently accessed pages. This setting has the most effect on query performance. You want this to be fairly high, probably at least 25% of your on-board memory.

The following three settings are important, too, and take effect without requiring a restart, but require at least a reload, as described in Reload the Configuration Files.

  • effective_cache_size is an estimate of how much memory you expect to be available in the OS and PostgreSQL buffer caches. It has no affect on actual allocation, but is used only by the PostgreSQL query planner to figure out whether plans under consideration would fit in RAM or not. If it’s set too low, indexes may be underutilized. If you have a dedicated PostgreSQL server, then setting this to half or more of your on-board memory would be a good start.

  • work_mem controls the maximum amount of memory allocated for each operation such as sorting, hash join, and others. The optimal setting really depends on the kind of work you do, how much memory you have, and if your server is a dedicated database server. If you have many users connecting, but fairly simple queries, you want this to be relatively low. If you do lots of intensive processing, like building a data warehouse, but few users, you want this to be high. How high you set this also depends on how much motherboard memory you have. A good article to read on the pros and cons of setting work_mem is Understanding postgresql.conf work_mem.

  • maintenance_work_mem is the total memory allocated for housekeeping activities like vacuuming (getting rid of dead records). This shouldn’t be set higher than about 1 GB.

The above settings can also be set at the database, function, or user level. For example, you might want to set work_mem higher for a power user who runs sophisticated queries. Similarly, if you have a sort-intensive function, you could raise the work_mem just for it.

I edited my postgresql.conf and now my server is broken.

The easiest way to figure out what you did wrong is to look at the log file, which is located in the root of the data folder, or in the subfolder pg_log. Open up the latest file and read what the last line says. The error notice is usually self-explanatory.

A common culprit is that you set the shared_buffers too high. Another common cause of failures is that there is an old postmaster.pid hanging around from a failed shutdown. You can safely delete this file which is located in the data cluster folder and try to restart again.

The pg_hba.conf File

The pg_hba.conf controls which and how users can connect to PostgreSQL databases. Changes to the pg_hba.conf require a reload or a server restart to take effect. A typical pg_hba.conf looks like this:

# TYPE DATABASE USER ADDRESS METHOD
# IPv4 local connections:
host	all		all		127.0.0.1/32	ident 1
# IPv6 local connections:
host	all		all	::1/128 2	trust
host	all		all	192.168.54.0/24 3	md5
hostssl 4 all	all	0.0.0.0/0	md5
# Allow replication connections from localhost, by a user with the 5
# replication privilege.
#host	replication	postgres	127.0.0.1/32	trust
#host	replication	postgres	::1/128	trust
1

Authentication method. ident, trust, md5, password are the most common and available. In PostgreSQL 9.1, the additional peer authentication was introduced. The ident and peer options are only available on Linux/Unix like systems (not Windows). Others such as gss, radius, ldap, and pam, may not always be installed.

3

IPv4 syntax for defining network range. The first part in this case 192.168.54.0 is the network address. The /24 is the bit mask. In this example, we are allowing anyone in our subnet of 192.168.54.0 to connect as long as they provide a valid md5 hashed password.

2

IPv6 syntax for defining localhost. This only applies to servers with IPv6 support and may cause the configuration file to not load if you have it and don’t have IPv6. For example, on a Windows XP or Windows 2003 machine, you shouldn’t have this line.

4

Users must connect through SSL. In our example, we allow anyone to connect to our server as long as they connect using SSL and have a valid md5-encrypted password.

5

Defines a range of IPs allowed to replicate with this server. This is new in PostgreSQL 9.0+. In this example, we have the line remarked out.

For each connection request, postgres service checks the pg_hba.conf file in order from the top down. Once a rule granting access is encountered, processing stops and the connection is allowed. Should the end of the file be reached without any matching rules, the connection is denied. A common mistake people make is to not put the rules in order. For example, if you put 0.0.0.0/0 reject before you put 127.0.0.1/32 trust, local users won’t be able to connect, even though you have a rule allowing them to do so.

I edited my pg_hba.conf and now my database server is broken.

This occurs quite frequently, but it’s easily recoverable. This error is generally caused by typos, or by adding an unavailable authentication scheme. When the postgres service can’t parse the pg_hba.conf file, it’ll block all access or won’t even start up. The easiest way to figure out what you did wrong is to read the log file. This is located in the root of the data folder or in the sub folder pg_log. Open up the latest file and read the last line. The error message is usually self-explanatory. If you’re prone to slippery fingers, consider backing up the file prior to editing.

Authentication Methods

PostgreSQL has many methods for authenticating users, probably more than any other database. Most people stick with the four main ones: trust, ident, md5, and password. There is also a fifth one: reject. which performs an immediate deny. Authentication methods stipulated in pg_hba.conf serve as gatekeepers to the entire server. Users or devices must still satisfy individual role and database access restrictions after connecting.

We list the most commonly used authentication methods below. For more information on the various authentication methods, refer to PostgreSQL Client Authentication.

  • trust is the least secure of the authentication schemes and means you allow people to state who they are and don’t care about the passwords, if any, presented. As long as they meet the IP, user, and database criteria, they can connect. You really should use this only for local connections or private network connections. Even then it’s possible to have IPs spoofed, so the more security-minded among us discourage its use entirely. Nevertheless, it’s the most common for PostgreSQL installed on a desktop for single user local access where security is not as much of a concern. The user name defaults to the logged in user if not specified.

  • md5 is the most common and means an md5-encrypted password is required.

  • password means clear text password authentication.

  • ident uses the pg_ident.conf to see if the OS account of the user trying to connect has a mapping to a PostgreSQL account. Password is not checked.

You can have multiple authentication methods, even for the same database; just keep in mind the top to bottom checking of pg_hba.conf.

Reload the Configuration Files

Many, but not all changes, to configuration files require restarting the postgres service. Many changes take effect by performing a reload of the configuration. Reloading doesn’t affect active connections. Open up a command line and follow these steps to reload:

pg_ctl reload -D your_data_directory_here

If you have PostgreSQL installed as a service in Redhat EL or CentOS, you can do:

service postgresql-9.1 reload

where postgresql-9.1 is the name of your service.

You can also log in as a super user on any database and run this SQL statement:

SELECT pg_reload_conf();

You can also do this from pgAdmin, refer to Editing postgresql.conf and pg_hba.conf from pgAdmin.

Setting Up Groups and Login Roles (Users)

In PostgreSQL, there is really only one kind of an account and that is a role. Some roles can log in; when they have login rights, they are called users. Roles can be members of other roles, and when we have this kind of relationship, the containing roles are called groups. It wasn’t always this way, though: Pre-8.0 users and groups were distinct entities, but the model got changed to be role-centric to better conform to the ANSI-SQL specs.

For backward compatibility, there is still a CREATE USER and CREATE GROUP. For the rest of this discussion, we’ll be using the more generic CREATE ROLE , which is used to create both users and groups.

If you look at fairly ANSI-SQL standard databases such as Oracle and later versions of SQL Server, you’ll notice they also have a CREATE ROLE statement, which works similarly as the PostgreSQL one.

Creating an Account That Can Log In

postgres is an account that is created when you first initialize the PostgreSQL data cluster. It has a companion database called postgres. Before you do anything else, you should login as this user via psql or pgAdmin and create other users. pgAdmin has a graphical section for creating user roles, but if you were to do it using standard SQL data control language (DCL), you would execute an SQL command as shown in Example 2-3.

Example 2-3. User with login rights that can create database objects

CREATE ROLE leo LOGIN PASSWORD 'lion!king'
CREATEDB VALID UNTIL 'infinity';

The 'infinity' is optional and assumed if not specified. You could instead put in a valid date at which you want the account to expire.

If you wanted to create a user with super rights, meaning they can cause major destruction to your database cluster and can create what we call untrusted language functions, you would create such a user as shown in Example 2-4. You can only create a super user if you are a super user yourself.

Example 2-4. User with login rights that can create database objects

CREATE ROLE regina LOGIN PASSWORD 'queen!penultimate' 
 SUPERUSER VALID UNTIL '2020-10-20 23:00';

As you can see, we don’t really want our queen to reign forever, so we put in a timestamp when her account will expire.

Creating Group Roles

Group roles are generally roles that have no login rights but have other roles as members. This is merely a convention. There is nothing stopping you from creating a role that can both login and can contain other roles.

We can create a group role with this SQL DCL statement:

CREATE ROLE jungle INHERIT;

And add a user or other group role to the group with this statement:

GRANT jungle TO leo;

Roles Inheriting Rights

One quirky thing about PostgreSQL is the ability to define a role that doesn’t allow its member roles to inherit its rights. The concept comes into play when you define a role to have member roles. You can designate that members of this role don’t inherit rights of the role itself. This is a feature that causes much confusion and frustration when setting up groups, as people often forget to make sure that the group role is marked to allow its permissions as inheritable.

Non-Inheritable rights

Some permissions can’t be inherited. For example, while you can create a group role that you mark as super user, this doesn’t make its member roles super users; however, those users can impersonate their parent role, thus gaining super power rights for a brief period.

Databases and Management

The simplest create database statement to write is:

CREATE DATABASE mydb;

The owner of the database will be the logged in user and is a copy of template1 database.

Creating and Using a Template Database

A template database is, as the name suggests, a database that serves as a template for other databases. In actuality, you can use any database as template for another, but PostgreSQL allows you to specifically flag certain databases as templates. The main difference is that a database marked as template can’t be deleted and can be used by any user having CREATEDB rights (not just superuser) as a template for their new database. More details about template databases are described in the PostgreSQL manual Managing Template Databases.

The template1 database that is used as the default when no template is specified, doesn’t allow you to change encodings. As such, if you want to create a database with an encoding and collation different from your default, or you installed extensions in template1 you don’t want in this database, you may want to use template0 instead.

CREATE DATABASE mydb TEMPLATE template0;

If we wanted to make our new database a template, we would run this SQL statement as a super user:

UPDATE pg_database SET datistemplate=true WHERE datname='mydb';

This would allow other users with CREATEDB rights to use this as a template. It will also prevent the database from being deleted.

Organizing Your Database Using Schemas

Schemas are a logical way of partitioning your database into mini-containers. You can divide schemas by functionality, by users, or by any other attribute. Aside from logical partitioning, they provide an easy way for doling out rights. One common practice is to install all contribs and extensions, covered in Extensions and Contribs into a separate schema and give rights to all users of a database.

To create a schema called contrib in a database, we connect to the database and run this SQL:

CREATE SCHEMA contrib;

The default search_path defined in postgresql.conf is "$user",public. This means that if there is a schema with the same name as the logged in user, then all non-schema qualified objects will first check the schema with the same name as user and then the public schema. You can override this behavior at the user level or the database level. For example, if we wanted all objects in contrib to be accessible without schema qualification, we would change our database as follows:

ALTER DATABASE mydb SET search_path="$user",public,contrib;

Note

The SET search_path change will not take effect for existing connections. You'll need to reconnect to your database to experience the change.

Schemas are also used for simple abstraction. A table name only needs to be unique within the schema, so many applications exploit this by creating same named tables in different schemas and, depending on who is logging in, they will get their own version based on which is their primary schema.

Permissions

Permissions are one of the trickiest things to get right in PostgreSQL. This is one feature that we find more difficult to work with than other databases. Permission management became a lot easier with the advent of PostgreSQL 9.0+. PostgreSQL 9.0 introduced default permissions, which allowed for setting permissions on all objects of a particular schema or database as well as permissions on specific types of objects. More details on permissions management are detailed in the manual, in sections ALTER DEFAULT PRIVILEGES and GRANT.

Getting back to our contrib schema. Let’s suppose we want all users of our database to have EXECUTE and SELECT access to any tables and functions we will create in the contrib schema. We can define permissions as shown in Example 2-5:

Example 2-5. Defining default permissions on a schema

GRANT USAGE ON SCHEMA contrib TO public;
ALTER DEFAULT PRIVILEGES IN SCHEMA contrib 
GRANT SELECT, REFERENCES ON TABLES
    TO public;

ALTER DEFAULT PRIVILEGES IN SCHEMA contrib
    GRANT SELECT, UPDATE ON SEQUENCES
    TO public;

ALTER DEFAULT PRIVILEGES IN SCHEMA contrib
    GRANT EXECUTE ON FUNCTIONS
    TO public;

ALTER DEFAULT PRIVILEGES IN SCHEMA contrib
    GRANT USAGE ON TYPES
    TO public;

If you already have your schema set with all the tables and functions, you can retroactively set permissions on each object separately or do this for all existing tables, functions, and sequences with a GRANT .. ALL .. IN SCHEMA.

Example 2-6. Set permissions on existing objects of a type in a schema

GRANT USAGE ON SCHEMA contrib TO public;
GRANT SELECT, REFERENCES, TRIGGER 
	ON ALL TABLES IN SCHEMA contrib 
	TO public;

GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA contrib TO public;
GRANT SELECT, UPDATE ON ALL SEQUENCES IN SCHEMA contrib TO public;

Note

If you find this all overwhelming for setting permissions, just use pgAdmin for permission management. pgAdmin provides a great interface for setting default permissions, as well as retroactively granting bulk permissions of selective objects. We’ll cover this feature in Creating Databases and Setting Permissions.

Extensions and Contribs

Extensions and contribs are add-ons that you can install in a PostgreSQL database to extend functionality beyond the base offerings. They exemplify the best feature of open source software: people collaborating, building, and freely sharing new features. Prior to PostgreSQL 9.1, the add-ons were called contribs. Since PostgreSQL 9.1+, add-ons are easily installed using the new PostgreSQL extension model. In those cases, the term extension has come to replace the term contrib. For the sake of consistency, we’ll be referring to all of them by the newer name of extension, even if they can’t be installed using the newer extension model.

The first thing to know about extensions is that they are installed separately in each database. You can have one database with the fuzzy text support extension and another that doesn’t. If you want all your databases to have a certain set of extensions installed in a specific schema, you can set up a template database as discussed in Creating and Using a Template Database with all these installed, and then create all your databases using that template.

To see which extensions you have already installed, run the query in Example 2-7:

Example 2-7. List extensions installed

SELECT * 
FROM pg_available_extensions 
WHERE comment LIKE '%string%' OR installed_version IS NOT NULL 
ORDER BY name;
name          | default_version | installed_version |    comment
--------------+-----------------+-------------------+------------------------
citext        | 1.0             |                   | data type for case-insen..
fuzzystrmatch | 1.0             | 1.0               | determine simil.. and dist..
hstore        | 1.0             | 1.0               | data type for .. (key, value) ..
pg_trgm       | 1.0             | 1.0               | text similarity measur..index sear..
plpgsql       | 1.0             | 1.0               | PL/pgSQL procedural language
postgis       | 2.0.0           | 2.0.0             | geometry, geography,..raster ..
temporal      | 0.7.1           | 0.7.1             | temporal data type ..

To get details about a particular installed extension, enter the following command from psql:

dx+ fuzzystrmatch

Or run this query:

SELECT pg_catalog.pg_describe_object(d.classid, d.objid, 0) AS description 
FROM pg_catalog.pg_depend AS D 
  INNER JOIN pg_extension AS E ON D.refobjid = E.oid 
WHERE D.refclassid = 'pg_catalog.pg_extension'::pg_catalog.regclass 
 AND deptype = 'e' AND E.extname = 'fuzzystrmatch';

Which outputs what is packaged in the extension:

description
----------------------------------------------------------------------------
function dmetaphone_alt(text)
function dmetaphone(text)
function difference(text,text)
function text_soundex(text)
function soundex(text)
function metaphone(text,integer)
function levenshtein_less_equal(text,text,integer,integer,integer,integer)
function levenshtein_less_equal(text,text,integer)
function levenshtein(text,text,integer,integer,integer)
function levenshtein(text,text)

Installing Extensions

Regardless of how you install an extension in your database, you’ll need to have gathered all the dependent libraries in your PostgreSQL bin and lib, or have them accessible via your system path. For small extensions, most of these libraries already come prepackaged with your PostgreSQL install so you don’t have to worry. For others, you’ll either need to compile your own, get them with a separate install, or copy the files from another equivalent setup.

The Old Way

Prior to PostgreSQL 9.1, the only way to install an extension was to manually run the requisite SQL scripts in your database. Many extensions still can only be installed this way.

By convention, add-ons scripts are automatically dumped into the contrib folder of your PostgreSQL if you use an installer. Where you’d find this folder will depend on your particular OS and distro. As an example, on a CentOS running 9.0, to install the pgAdmin pack, one would run the following from the command line:

psql -p 5432 -d postgres -f /usr/pgsql-9.0/share/contrib/adminpack.sql

The New Way

With PostgreSQL 9.1 and above, you can use the CREATE EXTENSION command. The two big benefits are that you don’t have to figure out where the extension files are kept (they are kept in a folder share/extension), and you can uninstall just as easily with DROP EXTENSION. Most of the common extensions are packaged with PostgreSQL already, so you really don’t need to do more than run the command. To retrieve extensions not packaged with PostgreSQL, visit the PostgreSQL Extension Network. Once you have downloaded, compiled, and installed (install just copies the scripts and .control to share/extension, and the respective binaries to bin and lib) the new extension, run CREATE EXTENSION extension_name to install in specific database. Here is how we would install the fuzzystrmatch extension in PostgreSQL 9.1+: the new way no longer requires psql since CREATE EXTENSION is part of the PostgreSQL’s SQL language. Just connect to the database you want to install the extension and run the SQL command:

CREATE EXTENSION fuzzystrmatch;

If you wanted all your extensions installed in a schema called my_extensions, you would first create the schema, and install the extensions:

CREATE EXTENSION fuzzystrmatch SCHEMA my_extensions;

Upgrading from Old to New

If you’ve been using a version of PostgreSQL before 9.1 and restored your old database into a 9.1 during a version upgrade, all add-ons should continue to work untouched. For maintainability, you’ll probably want to upgrade your old extensions in the contrib folder to use the new extensions approach. Many extensions, especially the ones that come packaged with PostgreSQL, have ability to upgrade pre-extension installs. Let’s suppose you had installed the tablefunc extension (which provides cross tabulation functions) to your PostgreSQL 9.0 in a schema called contrib, and you’ve just restored your database to a PostgreSQL 9.1 server. Run the following command to upgrade the extension:

CREATE EXTENSION tablefunc SCHEMA contrib FROM unpackaged;

You’ll notice that the old functions are still in the contrib schema, but moving forward they will no longer be backed up and your backups will just have a CREATE EXTENSION .. clause.

Common Extensions

Many extensions come packaged with PostgreSQL, but are not installed by default. Some past extensions have gained enough traction to become part of the PostgreSQL core, so if you’re upgrading from an ancient version, you may not even have to worry about extensions.

Old Extensions Absorbed into PostgreSQL

Prior to PostgreSQL 8.3, the following extensions weren’t part of core:

  • PL/PgSQL wasn’t always installed by default in every database. In old versions, you had to run CREATE LANGUAGE plpgsql; in your database. From around 8.3 on, it’s installed by default, but you retain the option of uninstalling it.

  • tsearch is a suite for supporting full-text searches by adding indexes, operators, custom dictionaries, and functions. It became part of PostgreSQL core in 8.3. You don’t have the option to uninstall it. If you’re still relying on old behavior, you can install the tsearch2 extension, which retained old functions that are no longer available in the newer version. A better approach would be just to update where you’re using the functions because compatiblity with the old tsearch could end at any time.

  • xml is an extension that adds in support of XML data type and related functions and operators. As of version 8.3, XML became an integral part of PostgreSQL, in part to meet the ANSI-SQL XML standard. The old extension, now dubbed xml2, can still be installed and contains functions that didn’t make it into the core. In particular, you need this extension if you relied on the xlst_process() function for processing XSL templates. There are also a couple of old XPath functions not found in the core.

Popular Extensions

In this section, we’ll list and quickly describe the most popular, and some may say, must-have extensions, that aren’t part of current core.

  • postgis elevates PostgreSQL to a state-of-the-art spatial database outrivaling all commercial options. If you deal with standard OGC GIS data, demographic statistics data, or geocoding, you don’t want to be without this one. You can learn more about PostGIS in our book, PostGIS in Action. Part of the book’s proceeds will help fund the PostGIS project itself. PostGIS is a whopper of an extension, weighing in at over 800 functions, types, and spatial indexes.

  • fuzzystrmatch is a lightweight extension with functions like soundex, levenshtein, and metaphone for fuzzy string matching. We discuss its use in Where is Soundex and Other Warm and Fuzzy Things.

  • hstore is an extension that adds key-value pair storage and index support well-suited for storing pseudo-normalized data. If you are looking for a comfortable medium between relational and NoSQL, check out hstore.

  • pg_trgm (trigram) is an extension that is another fuzzy string search library. It is often used in conjunction with fuzzystrmatch. In PostgreSQL 9.1, it takes on another special role in that it makes ILIKE searches indexable by creating a trigram index. Trigram can also index wild-card searches of the form LIKE '%something%'. Refer to Teaching ILIKE and LIKE New Tricks for further discussion.

  • dblink is a module that allows you to query other PostgreSQL databases. This is currently the only supported mechanism of cross-database interaction for PostgreSQL. In PostgreSQL 9.3, foreign data wrapper for PostgreSQL is expected to hit the scene.

  • pgcrypto provides various encryption tools including the popular PGP. We have a quick primer on using it available here: Encrypting Data with pgcrypto.

As of 9.1, less used procedural languages (PLs), index types, and foreign data wrappers (FDW) are also packaged as extensions.

Backup

PostgreSQL comes with two utilities for backup—pg_dump and pg_dumpall. You’ll find both in the bin folder. You use pg_dump to backup specific databases, and pg_dumpall to backup all databases and server globals. pg_dumpall needs to run under a postgres super user account so it has access to backup all databases. You will notice that most of the commands for these tools will have both long names as well as equivalent short switches. You can use them interchangeably, even in the same command. We’ll be covering just the basics here, but for a more in-depth discussion, refer to the PostgreSQL Backup and Restore section of the official manual.

Note

We often specify the port and host in these commands because we often run them via scheduled jobs not on the same machine; or we have several instances of PostgreSQL running on the same box, each running on a different port. Sometimes specifying the -h or --host switch, for example, may cause problems if your service is set to only listen on local. You can safely leave it out if you are running from the server.

You may also want to employ the use of ~pgpass since none of these command lines give you the option of specifying a password.

Selective Backup Using pg_dump

For day-to-day backup, pg_dump is generally more expeditious than pg_dumpall because it can selectively backup tables, schemas, databases. pg_dump backs up to plain SQL, but also compressed and TAR formats. Compressed and TAR backups can take advantage of the parallel restore feature introduced in 8.4. Refer to Database Backup: pg_dump for a listing of pg_dump command options.

In this example, we’ll show a few common backup scenarios and corresponding pg_dump switches. These examples should work for any version of PostgreSQL.

Example 2-8. pg_dump usage

Creates a compressed, single database backup:

pg_dump -h localhost -p 5432 -U someuser -F c -b -v -f mydb.backup mydb

Creates a plain-text single database backup, including Creates database:

pg_dump -h localhost -p 5432 -U someuser -C -F p -b -v -f mydb.backup mydb

Creates a compressed backup of tables with a name that starts with payments in any schema:

pg_dump -h localhost -p 5432 -U someuser -F c -b -v -t *.payments* -f payment_tables.backup mydb

Creates a compressed backup of all objects in hr and payroll schemas:

pg_dump -h localhost -p 5432 -U someuser -F c -b -v -n hr -n payroll -f hr_payroll_schemas.backup mydb

Creates a compressed backup of all objects in all schemas, excluding public schemas:

pg_dump -h localhost -p 5432 -U someuser -F c -b -v -N public -f all_schema_except_public.backup mydb

Creates a plain-text SQL backup of select tables, useful for porting to lower versions of PostgreSQL or other database systems:

pg_dump -h localhost -p 5432 -U someuser -F p --column-inserts -f select_tables.backup mydb

Note

If you have spaces in your file paths, you’ll want to wrap the file path in double quotes: "/path with spaces/mydb.backup". As a general rule, you can always use double quotes if you aren’t sure.

The Directory format option was introduced in PostgreSQL 9.1. This option backs up each table as a separate file in a folder and gets around the problem where your file system has limitations on the size of each file. It is the only pg_dump backup format option that generates multiple files. An example of this is shown in Example 2-8. The directory backup first creates the directory to put the files in and errors out if the directory already exists.

Example 2-9. Directory format backup

The a_directory is created and in the folder, a separate gzipped file for each table and a file that has all the structures listed.

pg_dump -h localhost -p 5432 -U someuser -F d -f /somepath/a_directory mydb

Systemwide Backup Using pg_dumpall

The pg_dumpall utility is what you would use to backup all databases into a single plain-text file, along with server globals such as tablespace definitions and users. Refer to Server Backup: pg_dumpall for listing of available pg_dumpall command options.

It’s a good idea to backup globals such as roles and tablespace definitions on a daily basis. Although you can use pg_dumpall to backup databases as well, we generally don’t bother or do it—at most, once a month—since it would take much longer to restore the plain text backup for large databases.

To backup roles and tablespaces:

pg_dumpall -h localhost -U postgres --port=5432 -f myglobals.sql --globals-only

If you only care about backing up roles and not tables spaces, you would use the roles only option:

pg_dumpall -h localhost -U postgres --port=5432 -f myroles.sql --roles-only

Restore

There are two ways of restoring in PostgreSQL:

  • Using psql to restore plain text backups generated with pg_dumpall or pg_dump

  • Using pg_restore utility for restoring compressed, tar and directory backups created with pg_dump

Terminating Connections

Before you can perform a full drop and restore of a database or restore a particular table that’s in use, you’ll need to kill connections. Every once in a while, someone else (never you) will execute a query that he or she didn’t mean to and end up wasting resources. You could also run into a query that’s taking much longer than what you have the patience for. Should these things happen, you’ll either want to cancel the query on the connection or kill the connection entirely. To cancel running queries or to terminate connections, you elicit three administrative functions.

  • pg_stat_activity (SELECT * FROM pg_stat_activity;) is a view that will list currently active connections and the process id. Additionally, it’ll provide details of the active query running on each connection, the connected user (usename), the database (datname) in use, and start times of query currently running. You need this view to obtain the proc ids of connections that you wish to terminate.

  • pg_cancel_backend(procid) (SELECT pg_cancel_backend(procid);) will cancel all active queries on a connection, but doesn’t terminate the connection.

  • pg_terminate_backend(procid) (SELECT pg_terminate_backend(procid);) will kill a specific connection. All running queries will automatically cancel. This will be your weapon of choice prior to a restore to prevent an eager user from immediately restarting a cancelled query.

PostgreSQL, unlike some other databases, lets you embed functions that perform actions within a regular SELECT query. This means that though pg_terminate_backend() and pg_cancel_backend() can only act on one connection at a time, you can effectuate multiple connections by wrapping them in a SELECT. For example, let’s suppose a user (Regina) was hogging up resources and had 100 connections going. We can kill all her connections by running this command:

Before 9.2:

SELECT pg_terminate_backend(procpid) FROM pg_stat_activity WHERE usename = 'regina';

9.2 and after:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE usename = 'regina';

pg_stat_activity has changed considerably in PostgreSQL 9.2 with renaming and addition of new columns. For example, procpid is now pid. More details about the changes and enhancements are detailed in PostgreSQL 9.2 Monitoring Enhancements.

Using psql to Restore Plain Text SQL backups

A plain SQL backup is nothing more than a text file of a huge SQL script. It’s the least convenient of backups to have, but it’s portable across different database systems. With SQL backup, you must execute the entire script, there’s no partial restore, unless you’re willing to manually edit the file. Since there are no options, the backups are simple to restore by using the -f psql switch as shown in Example 2-10. However, they are useful if you need to load data to another DBMS with some editing.

Example 2-10. Restores plain text SQL backups

Restores a full backup and ignore errors:

psql -U postgres -f myglobals.sql

Restores and stops on first error:

psql -U postgres --set ON_ERROR_STOP=on -f myglobals.sql

Restores a partial backup to a specific database:

psql -U postgres -d mydb -f select_objects.sql

Using pg_restore

If you backed up using pg_dump, and specified a non-plain text backup format like tar, custom, or directory, you can use the versatile pg_restore utility for the restore. pg_restore provides you with a dizzying array of options for restoration and far surpasses any restoration utility found in other database systems. Here are some of its outstanding features:

  • As of 8.4, you can do parallel restores using the -j switch to control the number of threads to use. This allows each thread to be restoring a separate table simultaneously, which significantly speeds up restores.

  • You can generate a plain text table of contents from your backup file to confirm what has been backed up. You have the ability to edit this table of contents and use the revision to control which database objects will be restored.

  • Just as pg_dump allows you to do selective backups of objects to save time, pg_restore allows you to do selective restores even from a backup that contains a full database.

  • For the most part, pg_restore and pg_dump are backward-compatible. You can backup a database on an older version and restore using a newer version.

Refer to Database Backup: pg_restore for a listing of pg_restore command options.

A basic restore command of a compressed or TAR backup would be to first create the database in SQL:

CREATE DATABASE mydb;

and then restore:

pg_restore --dbname=mydb --jobs=4 --verbose mydb.backup

If the database is the same as the one you backed up, you can create the database in ones step with the following:

pg_restore --dbname=postgres --create --jobs=4 --verbose mydb.backup

Note

If you use the --create switch, the --dbname switch needs to be different from the database being created, since you can’t really run anything within the context of a database that has yet to be created. The downside of using --create is that the database name is always the name of the one you backed up and you can’t change it during the restore.

If you are running 9.2, you can take advantage of the --section switch to restore just the table structure without the actual data. This is useful if you want to use an existing database as a template for a new one. To do so, we would first create the target database using psql or pgAdmin:

CREATE DATABASE mydb2;

and then use pg_restore:

pg_restore --dbname=mydb2 --section=pre-data --jobs=4 mydb.backup

Managing Disk Space with Tablespaces

PostgreSQL uses tablespaces to ascribe logical names to physical locations on disk. Initializing a PostgreSQL cluster automatically begets two tablespaces: pg_default, which stores for all user data and pg_global, which stores all system data. These are located in the same folder as your default data cluster. You’re free to create tablespaces at will and house them on any server disks. You can explicitly assign default tablespaces for new objects by database. You can also move existing database objects to new ones.

Creating Tablespaces

To create a tablespace, you just need to denote a logical name and a physical folder. The postgres service account needs to have full access to this folder. If you are on a Windows server, use the following command (note the use of Unix-style slashes):

CREATE TABLESPACE secondary LOCATION 'C:/pgdata91_secondary';

For Unix-based systems, you first have to create the folder or define an fstab location then use this command:

CREATE TABLESPACE secondary LOCATION '/usr/data/pgdata91_secondary';

Moving Objects Between Tablespaces

You can shuffle database objects among different tablespaces. To move all objects in the database to our secondary tablespace:

ALTER DATABASE mydb SET TABLESPACE secondary;

To move just a table:

ALTER TABLE mytable SET TABLESPACE secondary;

Note

Moving a table to another tablespace locks it for the duration of the move.

Verboten

We have seen so many ways that people manage to break their PostgreSQL server that we thought it best to end this chapter itemizing the most common mistakes that people make. For starters, if you don’t know what you did wrong the log file could provide clues. Look for the pg_log folder in your PostgreSQL data folder or the root of the PostgreSQL data folder for the log files. It’s also quite possible that your server shutdown before a log entry could be written in which case the log won’t help you. Should your server fail to restart, try the command line by using:

path/to/your/bin/pg_ctl -D your_postgresql_data_folder

Delete PostgreSQL Core System Files and Binaries

When people run out of disk space, the first thing they do is panic and start deleting files from the PostgreSQL data cluster folder because it’s so big. Part of the reason why this mistake happens so frequently is that some folders such as pg_log, pg_xlog, and pg_clog sound like logging folders that you expect to build up and be safe to delete. There are some files you can safely delete, and some that will destroy your data if you do.

The pg_log folder often found in your data folder is a folder that tends to build up, especially if you have logging enabled. Files in this folder can always be safely deleted without issues. In fact, many people just schedule jobs to delete them.

Files in the other folders except for pg_xlog should never be deleted, even if they sound like logs. In particular, don’t even think of touching pg_clog, the active commit log, without getting into trouble.

pg_xlog stores transaction logs. Some systems we’ve seen are configured to move processed transaction logs in a subfolder called archive. You’ll often have an archive folder somewhere (not necessarily as a subfolder of pg_xlog) if you are running synchoronous replication, continuous archiving, or just keeping around logs if you need to revert to a different point in time. Deleting files in the root of pg_xlog will destroy data, however, deleting files in the archived folder will just prevent you from performing point-in-time recovery, or if a slave server hasn’t played back the logs, prevent them from fetching them. If you aren’t concerned about any of these scenarios, then it’s safe to delete or move files in the archive folder.

Be weary of overzealous anti-virus programs, especially on Windows. We’ve seen cases where AV software removed important binaries in the PostgreSQL bin folder. Should PostgreSQL fail to start on a Windows system, the event viewer is the first place to look for clues as to why.

Giving Full Administrative Rights to the Postgres System (Daemon) Account

Many people are under the misconception that the postgres account needs to have full administrative rights to the server. In fact, depending on your PostgreSQL version, if you give the postgres account full administrative rights to the server, your database server may not even start.

The postgres system account should always be created as a regular system user in the OS with just rights to the data cluster and additional tablespace folders. Most installers will set up the correct permissions for postgres. Don’t try to do postgres any favors by giving it more rights than it needs. Granting unnecessary rights leaves your system vulnerable should you fall under an SQL injection attack. There are cases where you’ll need to give the postgres account write/delete/read rights to folders or executables outside of the data cluster. With scheduled jobs that execute batch files, this need often arises. We advise you to practice restraint and only grant the minimum rights necessary to get the job done.

Setting shared_buffers Too High

Loading up your server with RAM doesn’t mean you can set the shared_buffers as high as you’d like. Try it and your server may crash or refuse to start. If you are running PostgreSQL on 32-bit Windows, setting it higher than 512 MB often results in instability. With PostgreSQL 64-bit windows, you can push the envelop a bit higher and even exceed 1 GB without any issues. On some Linux systems, the compiled SHMMAX variable is low and shared_buffers can’t be set higher. Details on how to remedy this issue are detailed in the manual, in the section Kernel Resources.

Trying to Start PostgreSQL on a Port Already in Use

If you do this, you’ll see errors in your pg_log files of the form. Make sure PostgreSQL is not already running. Here are the common reasons why this happens:

  • You’ve already started postgres service.

  • You are trying to run it on a port already in use by another service.

  • Your postgres service had a sudden shutdown and you have an orphan postgresql.pid file in the data folder. Just delete the file and try to start again.

  • You have an orphaned PostgreSQL process. When all else fails, kill all running PostgreSQL processes and then start again.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.135.225