This chapter will cover what we feel are the most common activities for basic administration of a PostgreSQL server; namely: role management, database creation, add-on installation, backup, and restore. We’ll assume you’ve already installed PostgreSQL and have one of the administration tools at your disposal.
Three main configuration files control basic operations of a PostgreSQL server instance. These files are all located in the default PostgreSQL data folder. You can edit them using your text editor of choice, or using the admin pack that comes with pgAdmin (Editing postgresql.conf and pg_hba.conf from pgAdmin).
postgresql.conf controls general settings, such as how much memory to allocate, default storage location for new databases, which IPs PostgreSQL listens on, where logs are stored, and so forth.
pg_hba.conf controls security. It manages access to the server, dictating which users can login into which databases, which IPs or groups of IPs are permitted to connect and the authentication scheme expected.
pg_ident.conf is the mapping file that maps an authenticated OS login to a PostgreSQL user. This file is used less often, but allows you to map a server account to a PostgreSQL account. For example, people sometimes map the OS root account to the postgre’s super user account. Each authentication line in pg_hba.conf can use a different pg_ident.conf file.
If you are ever unsure where these files are located, run the Example 2-1 query as a super user while connected to any of your databases.
Example 2-1. Location of configuration files
SELECT name, setting FROM pg_settings WHERE category = 'File Locations';
name | setting ------------------+----------------------------- config_file |E:/PGData91/postgresql.conf data_directory |E:/PGData91 external_pid_file | hba_file |E:/PGData91/pg_hba.conf ident_file |E:/PGData91/pg_ident.conf
postgresql.conf controls the core settings of the PostgreSQL server instance as well as default settings for new databases. Many settings—such as sorting memory—can be overriden at the database, user, session, and even function levels for PostgreSQL versions higher than 8.3.
Details on how to tune this can be found at Tuning Your PostgreSQL Server.
An easy way to check the current settings you have is to query the pg_settings view, as we demonstrate in Example 2-2. Details of the various columns of information and what they mean are described in pg_settings.
Example 2-2. Key Settings
SELECT name, context , unit , setting , boot_val , reset_val FROM pg_settings WHERE name in('listen_addresses','max_connections','shared_buffers','effective_cache_size', 'work_mem', 'maintenance_work_mem') ORDER BY context,name;
name | context | unit | setting | boot_val | reset_val ---------------------+------------+------+---------+-----------+----------- listen_addresses | postmaster | | * | localhost | * max_connections | postmaster | | 100 | 100 | 100 shared_buffers | postmaster | 8kB | 4096 | 1024 | 4096 effective_cache_size | user | 8kB | 16384 | 16384 | 16384 maintenance_work_mem | user | kB | 16384 | 16384 | 16384 work_mem | user | kB | 1024 | 1024 | 1024
If context
is set to postmaster, it means
changing this parameter requires a restart of the postgresql
service. If context
is set to user, changes
require a reload to take effect globally. Furthermore, user context
settings can be overridden at the database, user, session, or
function levels.
unit
tells you the unit of measurement that
the setting is reported in. This is very important for memory
settings since, as you can see, some are reported in 8 kB and some
in kB
. In postgresql.conf, usually you explicitly
set these to a unit of measurement you want to record in, such as
128 MB. You can also get a more human-readable display of a setting
by running the statement: SHOW
effective_cache_size;
, which gives you 128 MB, or
SHOW maintenance_work_mem;
, which gives you 16 MB for
this particular case. If you want to see everything in friendly
units, use SHOW ALL
.
setting
is the currently running setting in
effect; boot_val
is the default setting;
reset_val
is the new value if you were to restart
or reload. You want to make sure that after any change you make to
postgresql.conf the setting and
reset_val
are the same. If they are not, it means
you still need to do a reload.
We point out the following parameters as ones you should pay attention to in postgresql.conf. Changing their values requires a service restart:
listen_addresses
tells PostgreSQL which IPs
to listen on. This usually defaults to localhost
, but
many people change it to *
, meaning all available
IPs.
port
defaults to 5432. Again, this is often
set in a different file in some distributions, which overrides this
setting. For instance, if you are on a Red Hat or CentOS, you can
override the setting by setting a PGPORT
value in
/etc/sysconfig/pgsql/your_service_name_here
.
max_connections
is the maximum number of
concurrent connections allowed.
shared_buffers
defines the amount of memory
you have shared across all connections to store recently accessed
pages. This setting has the most effect on query performance. You
want this to be fairly high, probably at least 25% of your on-board
memory.
The following three settings are important, too, and take effect without requiring a restart, but require at least a reload, as described in Reload the Configuration Files.
effective_cache_size
is an estimate of how
much memory you expect to be available in the OS and PostgreSQL
buffer caches. It has no affect on actual allocation, but is used
only by the PostgreSQL query planner to figure out whether plans
under consideration would fit in RAM or not. If it’s set too low,
indexes may be underutilized. If you have a dedicated PostgreSQL
server, then setting this to half or more of your on-board memory
would be a good start.
work_mem
controls the maximum amount of
memory allocated for each operation such as sorting, hash join, and
others. The optimal setting really depends on the kind of work you
do, how much memory you have, and if your server is a dedicated
database server. If you have many users connecting, but fairly
simple queries, you want this to be relatively low. If you do lots
of intensive processing, like building a data warehouse, but few
users, you want this to be high. How high you set this also depends
on how much motherboard memory you have. A good article to read on
the pros and cons of setting work_mem
is Understanding
postgresql.conf work_mem.
maintenance_work_mem
is the total memory
allocated for housekeeping activities like vacuuming (getting rid of
dead records). This shouldn’t be set higher than about 1 GB.
The above settings can also be set at the database, function, or
user level. For example, you might want to set
work_mem
higher for a power user who runs
sophisticated queries. Similarly, if you have a sort-intensive function,
you could raise the work_mem
just for it.
The easiest way to figure out what you did wrong is to look at the log file, which is located in the root of the data folder, or in the subfolder pg_log. Open up the latest file and read what the last line says. The error notice is usually self-explanatory.
A common culprit is that you set the
shared_buffers
too high. Another common cause of
failures is that there is an old postmaster.pid hanging around from a failed
shutdown. You can safely delete this file which is located in the data
cluster folder and try to restart again.
The pg_hba.conf controls which and how users can connect to PostgreSQL databases. Changes to the pg_hba.conf require a reload or a server restart to take effect. A typical pg_hba.conf looks like this:
# TYPE DATABASE USER ADDRESS METHOD # IPv4 local connections: host all all 127.0.0.1/32 ident # IPv6 local connections: host all all ::1/128 trust host all all 192.168.54.0/24 md5 hostssl all all 0.0.0.0/0 md5 # Allow replication connections from localhost, by a user with the # replication privilege. #host replication postgres 127.0.0.1/32 trust #host replication postgres ::1/128 trust
Authentication method. ident, trust, md5, password are the most common and available. In PostgreSQL 9.1, the additional peer authentication was introduced. The ident and peer options are only available on Linux/Unix like systems (not Windows). Others such as gss, radius, ldap, and pam, may not always be installed.
IPv4 syntax for defining network range. The first part in this case 192.168.54.0 is the network address. The /24 is the bit mask. In this example, we are allowing anyone in our subnet of 192.168.54.0 to connect as long as they provide a valid md5 hashed password.
IPv6 syntax for defining localhost. This only applies to servers with IPv6 support and may cause the configuration file to not load if you have it and don’t have IPv6. For example, on a Windows XP or Windows 2003 machine, you shouldn’t have this line.
Users must connect through SSL. In our example, we allow anyone to connect to our server as long as they connect using SSL and have a valid md5-encrypted password.
Defines a range of IPs allowed to replicate with this server. This is new in PostgreSQL 9.0+. In this example, we have the line remarked out.
For each connection request, postgres service checks the pg_hba.conf file in order from the top down. Once a rule granting access is encountered, processing stops and the connection is allowed. Should the end of the file be reached without any matching rules, the connection is denied. A common mistake people make is to not put the rules in order. For example, if you put 0.0.0.0/0 reject before you put 127.0.0.1/32 trust, local users won’t be able to connect, even though you have a rule allowing them to do so.
This occurs quite frequently, but it’s easily recoverable. This error is generally caused by typos, or by adding an unavailable authentication scheme. When the postgres service can’t parse the pg_hba.conf file, it’ll block all access or won’t even start up. The easiest way to figure out what you did wrong is to read the log file. This is located in the root of the data folder or in the sub folder pg_log. Open up the latest file and read the last line. The error message is usually self-explanatory. If you’re prone to slippery fingers, consider backing up the file prior to editing.
PostgreSQL has many methods for authenticating users, probably
more than any other database. Most people stick with the four main
ones: trust, ident, md5, and password. There is also a fifth one:
reject
. which performs an immediate deny.
Authentication methods stipulated in pg_hba.conf serve as gatekeepers to the
entire server. Users or devices must still satisfy individual role and
database access restrictions after connecting.
We list the most commonly used authentication methods below. For more information on the various authentication methods, refer to PostgreSQL Client Authentication.
trust
is the least secure of the
authentication schemes and means you allow people to state who
they are and don’t care about the passwords, if any, presented. As
long as they meet the IP, user, and database criteria, they can
connect. You really should use this only for local connections or
private network connections. Even then it’s possible to have IPs
spoofed, so the more security-minded among us discourage its use
entirely. Nevertheless, it’s the most common for PostgreSQL
installed on a desktop for single user local access where security
is not as much of a concern. The user name defaults to the logged
in user if not specified.
md5 is the most common and means an md5-encrypted password is required.
password means clear text password authentication.
ident uses the pg_ident.conf to see if the OS account of the user trying to connect has a mapping to a PostgreSQL account. Password is not checked.
You can have multiple authentication methods, even for the same database; just keep in mind the top to bottom checking of pg_hba.conf.
Many, but not all changes, to configuration files require restarting the postgres service. Many changes take effect by performing a reload of the configuration. Reloading doesn’t affect active connections. Open up a command line and follow these steps to reload:
pg_ctl reload -D your_data_directory_here
If you have PostgreSQL installed as a service in Redhat EL or CentOS, you can do:
service postgresql-9.1
reload
where postgresql-9.1
is the name of
your service.
You can also log in as a super user on any database and run this SQL statement:
SELECT pg_reload_conf();
You can also do this from pgAdmin, refer to Editing postgresql.conf and pg_hba.conf from pgAdmin.
In PostgreSQL, there is really only one kind of an account and that
is a role. Some roles can log in; when they have login rights, they are
called users. Roles can be members of other roles, and when we have this
kind of relationship, the containing roles are called groups. It wasn’t
always this way, though: Pre-8.0 users and groups were distinct entities,
but the model got changed to be role
-centric to better
conform to the ANSI-SQL specs.
For backward compatibility, there is still a CREATE
USER
and CREATE GROUP
. For the rest of this
discussion, we’ll be using the more generic
CREATE ROLE
, which is used to create both users and
groups.
If you look at fairly ANSI-SQL standard databases such as Oracle and
later versions of SQL Server, you’ll notice they also have a CREATE
ROLE
statement, which works similarly as the PostgreSQL one.
postgres
is an account that is created when you
first initialize the PostgreSQL data cluster. It has a companion
database called postgres
. Before you do anything
else, you should login as this user via psql or pgAdmin and create other
users. pgAdmin has a graphical section for creating user roles, but if
you were to do it using standard SQL data control language (DCL), you
would execute an SQL command as shown in Example 2-3.
Example 2-3. User with login rights that can create database objects
CREATE ROLE leo LOGIN PASSWORD 'lion!king' CREATEDB VALID UNTIL 'infinity';
The 'infinity'
is optional and assumed if not
specified. You could instead put in a valid date at which you want the
account to expire.
If you wanted to create a user with super rights, meaning they can cause major destruction to your database cluster and can create what we call untrusted language functions, you would create such a user as shown in Example 2-4. You can only create a super user if you are a super user yourself.
Example 2-4. User with login rights that can create database objects
CREATE ROLE regina LOGIN PASSWORD 'queen!penultimate' SUPERUSER VALID UNTIL '2020-10-20 23:00';
As you can see, we don’t really want our queen to reign forever, so we put in a timestamp when her account will expire.
Group roles are generally roles that have no login rights but have other roles as members. This is merely a convention. There is nothing stopping you from creating a role that can both login and can contain other roles.
We can create a group role with this SQL DCL statement:
CREATE ROLE jungle INHERIT;
And add a user or other group role to the group with this statement:
GRANT jungle TO leo;
One quirky thing about PostgreSQL is the ability to define a role that doesn’t allow its member roles to inherit its rights. The concept comes into play when you define a role to have member roles. You can designate that members of this role don’t inherit rights of the role itself. This is a feature that causes much confusion and frustration when setting up groups, as people often forget to make sure that the group role is marked to allow its permissions as inheritable.
The simplest create database statement to write is:
CREATE DATABASE mydb;
The owner of the database will be the logged in user and is a copy
of template1
database.
A template database is, as the name suggests, a database that
serves as a template for other databases. In actuality, you can use any
database as template for another, but PostgreSQL allows you to
specifically flag certain databases as templates. The main difference is
that a database marked as template can’t be deleted and can be used by
any user having CREATEDB
rights (not just superuser) as a
template for their new database. More details about template databases
are described in the PostgreSQL manual Managing
Template Databases.
The template1
database that is used as the
default when no template is specified, doesn’t allow you to change
encodings. As such, if you want to create a database with an encoding
and collation different from your default, or you installed extensions
in template1
you don’t want in this database, you may
want to use template0
instead.
CREATE DATABASE mydb TEMPLATE template0;
If we wanted to make our new database a template, we would run this SQL statement as a super user:
UPDATE pg_database SET datistemplate=true WHERE datname='mydb';
This would allow other users with CREATEDB
rights to
use this as a template. It will also prevent the database from being
deleted.
Schemas are a logical way of partitioning your database into mini-containers. You can divide schemas by functionality, by users, or by any other attribute. Aside from logical partitioning, they provide an easy way for doling out rights. One common practice is to install all contribs and extensions, covered in Extensions and Contribs into a separate schema and give rights to all users of a database.
To create a schema called contrib in a database, we connect to the database and run this SQL:
CREATE SCHEMA contrib;
The default search_path
defined in postgresql.conf is
"$user",public
. This means that if there is a schema with
the same name as the logged in user, then all non-schema qualified
objects will first check the schema with the same name as user and then
the public schema. You can override this behavior at the user level or
the database level. For example, if we wanted all objects in contrib to
be accessible without schema qualification, we would change our database
as follows:
ALTER DATABASE mydb
SET search_path="$user",public,contrib;
The SET search_path
change will not take
effect for existing connections. You'll need to reconnect to your
database to experience the change.
Schemas are also used for simple abstraction. A table name only needs to be unique within the schema, so many applications exploit this by creating same named tables in different schemas and, depending on who is logging in, they will get their own version based on which is their primary schema.
Permissions are one of the trickiest things to get right in PostgreSQL. This is one feature that we find more difficult to work with than other databases. Permission management became a lot easier with the advent of PostgreSQL 9.0+. PostgreSQL 9.0 introduced default permissions, which allowed for setting permissions on all objects of a particular schema or database as well as permissions on specific types of objects. More details on permissions management are detailed in the manual, in sections ALTER DEFAULT PRIVILEGES and GRANT.
Getting back to our contrib schema. Let’s suppose we want all
users of our database to have EXECUTE
and
SELECT
access to any tables and functions we will create in
the contrib schema. We can define permissions as shown in Example 2-5:
Example 2-5. Defining default permissions on a schema
GRANT USAGE ON SCHEMA contrib TO public; ALTER DEFAULT PRIVILEGES IN SCHEMA contrib GRANT SELECT, REFERENCES ON TABLES TO public; ALTER DEFAULT PRIVILEGES IN SCHEMA contrib GRANT SELECT, UPDATE ON SEQUENCES TO public; ALTER DEFAULT PRIVILEGES IN SCHEMA contrib GRANT EXECUTE ON FUNCTIONS TO public; ALTER DEFAULT PRIVILEGES IN SCHEMA contrib GRANT USAGE ON TYPES TO public;
If you already have your schema set with all the tables and
functions, you can retroactively set permissions on each object
separately or do this for all existing tables, functions, and sequences
with a GRANT .. ALL .. IN SCHEMA
.
Example 2-6. Set permissions on existing objects of a type in a schema
GRANT USAGE ON SCHEMA contrib TO public; GRANT SELECT, REFERENCES, TRIGGER ON ALL TABLES IN SCHEMA contrib TO public; GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA contrib TO public; GRANT SELECT, UPDATE ON ALL SEQUENCES IN SCHEMA contrib TO public;
If you find this all overwhelming for setting permissions, just use pgAdmin for permission management. pgAdmin provides a great interface for setting default permissions, as well as retroactively granting bulk permissions of selective objects. We’ll cover this feature in Creating Databases and Setting Permissions.
Extensions and contribs are add-ons that you can install in a PostgreSQL database to extend functionality beyond the base offerings. They exemplify the best feature of open source software: people collaborating, building, and freely sharing new features. Prior to PostgreSQL 9.1, the add-ons were called contribs. Since PostgreSQL 9.1+, add-ons are easily installed using the new PostgreSQL extension model. In those cases, the term extension has come to replace the term contrib. For the sake of consistency, we’ll be referring to all of them by the newer name of extension, even if they can’t be installed using the newer extension model.
The first thing to know about extensions is that they are installed separately in each database. You can have one database with the fuzzy text support extension and another that doesn’t. If you want all your databases to have a certain set of extensions installed in a specific schema, you can set up a template database as discussed in Creating and Using a Template Database with all these installed, and then create all your databases using that template.
To see which extensions you have already installed, run the query in Example 2-7:
Example 2-7. List extensions installed
SELECT * FROM pg_available_extensions WHERE comment LIKE '%string%' OR installed_version IS NOT NULL ORDER BY name;
name | default_version | installed_version | comment --------------+-----------------+-------------------+------------------------ citext | 1.0 | | data type for case-insen.. fuzzystrmatch | 1.0 | 1.0 | determine simil.. and dist.. hstore | 1.0 | 1.0 | data type for .. (key, value) .. pg_trgm | 1.0 | 1.0 | text similarity measur..index sear.. plpgsql | 1.0 | 1.0 | PL/pgSQL procedural language postgis | 2.0.0 | 2.0.0 | geometry, geography,..raster .. temporal | 0.7.1 | 0.7.1 | temporal data type ..
To get details about a particular installed extension, enter the following command from psql:
dx+ fuzzystrmatch
Or run this query:
SELECT pg_catalog.pg_describe_object(d.classid, d.objid, 0) AS description FROM pg_catalog.pg_depend AS D INNER JOIN pg_extension AS E ON D.refobjid = E.oid WHERE D.refclassid = 'pg_catalog.pg_extension'::pg_catalog.regclass AND deptype = 'e' AND E.extname = 'fuzzystrmatch';
Which outputs what is packaged in the extension:
description ---------------------------------------------------------------------------- function dmetaphone_alt(text) function dmetaphone(text) function difference(text,text) function text_soundex(text) function soundex(text) function metaphone(text,integer) function levenshtein_less_equal(text,text,integer,integer,integer,integer) function levenshtein_less_equal(text,text,integer) function levenshtein(text,text,integer,integer,integer) function levenshtein(text,text)
Regardless of how you install an extension in your database, you’ll need to have gathered all the dependent libraries in your PostgreSQL bin and lib, or have them accessible via your system path. For small extensions, most of these libraries already come prepackaged with your PostgreSQL install so you don’t have to worry. For others, you’ll either need to compile your own, get them with a separate install, or copy the files from another equivalent setup.
Prior to PostgreSQL 9.1, the only way to install an extension was to manually run the requisite SQL scripts in your database. Many extensions still can only be installed this way.
By convention, add-ons scripts are automatically dumped into the contrib folder of your PostgreSQL if you use an installer. Where you’d find this folder will depend on your particular OS and distro. As an example, on a CentOS running 9.0, to install the pgAdmin pack, one would run the following from the command line:
psql -p 5432 -d postgres
-f /usr/pgsql-9.0/share/contrib/adminpack.sql
With PostgreSQL 9.1 and above, you can use the CREATE
EXTENSION
command. The two big benefits are that you don’t have
to figure out where the extension files are kept (they are kept in a
folder share/extension), and you
can uninstall just as easily with DROP EXTENSION
. Most of
the common extensions are packaged with PostgreSQL already, so you
really don’t need to do more than run the command. To retrieve
extensions not packaged with PostgreSQL, visit the PostgreSQL Extension Network. Once you
have downloaded, compiled, and installed (install just copies the
scripts and .control to share/extension, and the respective
binaries to bin and lib) the new extension, run CREATE
EXTENSION
to install
in specific database. Here is how we would install the fuzzystrmatch
extension in PostgreSQL 9.1+: the new way no longer requires psql
since extension_name
CREATE EXTENSION
is part of the PostgreSQL’s SQL
language. Just connect to the database you want to install the
extension and run the SQL command:
CREATE EXTENSION fuzzystrmatch;
If you wanted all your extensions installed in a schema called my_extensions, you would first create the schema, and install the extensions:
CREATE EXTENSION fuzzystrmatch SCHEMA my_extensions;
If you’ve been using a version of PostgreSQL before 9.1 and restored your old database into a 9.1 during a version upgrade, all add-ons should continue to work untouched. For maintainability, you’ll probably want to upgrade your old extensions in the contrib folder to use the new extensions approach. Many extensions, especially the ones that come packaged with PostgreSQL, have ability to upgrade pre-extension installs. Let’s suppose you had installed the tablefunc extension (which provides cross tabulation functions) to your PostgreSQL 9.0 in a schema called contrib, and you’ve just restored your database to a PostgreSQL 9.1 server. Run the following command to upgrade the extension:
CREATE EXTENSION tablefunc SCHEMA contrib FROM unpackaged;
You’ll notice that the old functions are still in the contrib schema, but moving forward they will
no longer be backed up and your backups will just have a CREATE EXTENSION ..
clause.
Many extensions come packaged with PostgreSQL, but are not installed by default. Some past extensions have gained enough traction to become part of the PostgreSQL core, so if you’re upgrading from an ancient version, you may not even have to worry about extensions.
Prior to PostgreSQL 8.3, the following extensions weren’t part of core:
PL/PgSQL wasn’t always installed by default in every
database. In old versions, you had to run CREATE LANGUAGE
plpgsql;
in your database. From around 8.3 on, it’s
installed by default, but you retain the option of uninstalling
it.
tsearch is a suite for supporting full-text searches by adding indexes, operators, custom dictionaries, and functions. It became part of PostgreSQL core in 8.3. You don’t have the option to uninstall it. If you’re still relying on old behavior, you can install the tsearch2 extension, which retained old functions that are no longer available in the newer version. A better approach would be just to update where you’re using the functions because compatiblity with the old tsearch could end at any time.
xml
is an extension that adds in support of XML data type and related
functions and operators. As of version 8.3, XML became an integral
part of PostgreSQL, in part to meet the ANSI-SQL XML standard. The
old extension, now dubbed
xml2, can still be installed and
contains functions that didn’t make it into the core. In
particular, you need this extension if you relied on the
xlst_process()
function for processing XSL templates.
There are also a couple of old XPath functions not found in the
core.
In this section, we’ll list and quickly describe the most popular, and some may say, must-have extensions, that aren’t part of current core.
postgis elevates PostgreSQL to a state-of-the-art spatial database outrivaling all commercial options. If you deal with standard OGC GIS data, demographic statistics data, or geocoding, you don’t want to be without this one. You can learn more about PostGIS in our book, PostGIS in Action. Part of the book’s proceeds will help fund the PostGIS project itself. PostGIS is a whopper of an extension, weighing in at over 800 functions, types, and spatial indexes.
fuzzystrmatch
is a lightweight extension with functions like
soundex
, levenshtein
, and
metaphone
for fuzzy string matching. We discuss its
use in Where
is Soundex and Other Warm and Fuzzy
Things.
hstore is an extension that adds key-value pair storage and index support well-suited for storing pseudo-normalized data. If you are looking for a comfortable medium between relational and NoSQL, check out hstore.
pg_trgm
(trigram) is an extension that is another fuzzy string search
library. It is often used in conjunction with fuzzystrmatch. In
PostgreSQL 9.1, it takes on another special role in that it makes
ILIKE
searches indexable by creating a trigram index.
Trigram can also index wild-card searches of the form LIKE
'%something%'
. Refer to Teaching
ILIKE and LIKE New Tricks for further
discussion.
dblink is a module that allows you to query other PostgreSQL databases. This is currently the only supported mechanism of cross-database interaction for PostgreSQL. In PostgreSQL 9.3, foreign data wrapper for PostgreSQL is expected to hit the scene.
pgcrypto provides various encryption tools including the popular PGP. We have a quick primer on using it available here: Encrypting Data with pgcrypto.
As of 9.1, less used procedural languages (PLs), index types, and foreign data wrappers (FDW) are also packaged as extensions.
PostgreSQL comes with two utilities for backup—pg_dump and pg_dumpall. You’ll find both in the
bin
folder. You use pg_dump to backup specific databases, and
pg_dumpall to backup all databases
and server globals. pg_dumpall needs
to run under a postgres super user account so it has access to backup all
databases. You will notice that most of the commands for these tools will
have both long names as well as equivalent short switches. You can use
them interchangeably, even in the same command. We’ll be covering just the
basics here, but for a more in-depth discussion, refer to the PostgreSQL
Backup
and Restore section of the official manual.
We often specify the port and host in these commands because we
often run them via scheduled jobs not on the same machine; or we have
several instances of PostgreSQL running on the same box, each running on
a different port. Sometimes specifying the -h or --host
switch, for example, may cause problems if your service is set to only
listen on local. You can safely leave it out if you are running from the
server.
You may also want to employ the use of ~pgpass since none of these command lines give you the option of specifying a password.
For day-to-day backup, pg_dump is generally more expeditious than pg_dumpall because it can selectively backup tables, schemas, databases. pg_dump backs up to plain SQL, but also compressed and TAR formats. Compressed and TAR backups can take advantage of the parallel restore feature introduced in 8.4. Refer to Database Backup: pg_dump for a listing of pg_dump command options.
In this example, we’ll show a few common backup scenarios and corresponding pg_dump switches. These examples should work for any version of PostgreSQL.
Example 2-8. pg_dump usage
Creates a compressed, single database backup:
pg_dump -h localhost -p 5432 -Usomeuser
-F c -b -v -fmydb
.backupmydb
Creates a plain-text single database backup, including Creates database:
pg_dump -h localhost -p 5432 -Usomeuser
-C -F p -b -v -fmydb
.backupmydb
Creates a compressed backup of tables with a name that starts with payments in any schema:
pg_dump -h localhost -p 5432 -Usomeuser
-F c -b -v -t *.payments* -fpayment_tables
.backupmydb
Creates a compressed backup of all objects in hr and payroll schemas:
pg_dump -h localhost -p 5432 -Usomeuser
-F c -b -v -n hr -n payroll -fhr_payroll_schemas
.backupmydb
Creates a compressed backup of all objects in all schemas, excluding public schemas:
pg_dump -h localhost -p 5432 -Usomeuser
-F c -b -v -N public -fall_schema_except_public
.backupmydb
Creates a plain-text SQL backup of select tables, useful for porting to lower versions of PostgreSQL or other database systems:
pg_dump -h localhost -p 5432 -Usomeuser
-F p --column-inserts -fselect_tables
.backupmydb
If you have spaces in your file paths, you’ll want to wrap the
file path in double quotes: "
. As a general rule, you can
always use double quotes if you aren’t sure./path with
spaces/mydb.backup
"
The Directory format option was introduced in PostgreSQL 9.1. This option backs up each table as a separate file in a folder and gets around the problem where your file system has limitations on the size of each file. It is the only pg_dump backup format option that generates multiple files. An example of this is shown in Example 2-8. The directory backup first creates the directory to put the files in and errors out if the directory already exists.
The pg_dumpall utility is what you would use to backup all databases into a single plain-text file, along with server globals such as tablespace definitions and users. Refer to Server Backup: pg_dumpall for listing of available pg_dumpall command options.
It’s a good idea to backup globals such as roles and tablespace definitions on a daily basis. Although you can use pg_dumpall to backup databases as well, we generally don’t bother or do it—at most, once a month—since it would take much longer to restore the plain text backup for large databases.
To backup roles and tablespaces:
pg_dumpall -h localhost -U postgres --port=5432 -f myglobals.sql --globals-only
If you only care about backing up roles and not tables spaces, you would use the roles only option:
pg_dumpall -h localhost -U postgres --port=5432 -f myroles.sql --roles-only
There are two ways of restoring in PostgreSQL:
Using psql to restore plain text backups generated with pg_dumpall or pg_dump
Using pg_restore utility for restoring compressed, tar and directory backups created with pg_dump
Before you can perform a full drop and restore of a database or restore a particular table that’s in use, you’ll need to kill connections. Every once in a while, someone else (never you) will execute a query that he or she didn’t mean to and end up wasting resources. You could also run into a query that’s taking much longer than what you have the patience for. Should these things happen, you’ll either want to cancel the query on the connection or kill the connection entirely. To cancel running queries or to terminate connections, you elicit three administrative functions.
pg_stat_activity
(SELECT * FROM
pg_stat_activity;
) is a view that will list currently active
connections and the process id. Additionally, it’ll provide details
of the active query running on each connection, the connected user
(usename
), the database
(datname
) in use, and start times of query
currently running. You need this view to obtain the proc ids of
connections that you wish to terminate.
pg_cancel_backend(
(procid
)SELECT
pg_cancel_backend(
) will
cancel all active queries on a connection, but doesn’t terminate the
connection.procid
);
pg_terminate_backend(
(procid
)SELECT
pg_terminate_backend(
)
will kill a specific connection. All running queries will
automatically cancel. This will be your weapon of choice prior to a
restore to prevent an eager user from immediately restarting a
cancelled query.procid
);
PostgreSQL, unlike some other databases, lets you
embed functions that perform actions within a regular
SELECT
query. This means that though
pg_terminate_backend()
and
pg_cancel_backend()
can only act on one connection at
a time, you can effectuate multiple connections by wrapping them in a
SELECT
. For example, let’s suppose a user (Regina) was
hogging up resources and had 100 connections going. We can kill all her
connections by running this command:
Before 9.2:
SELECT pg_terminate_backend(procpid) FROM pg_stat_activity WHERE usename = 'regina';
9.2 and after:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE usename = 'regina';
pg_stat_activity
has changed considerably in
PostgreSQL 9.2 with renaming and addition of new columns. For example,
procpid
is now pid
. More details
about the changes and enhancements are detailed in PostgreSQL
9.2 Monitoring Enhancements.
A plain SQL backup is nothing more than a text file of a huge SQL
script. It’s the least convenient of backups to have, but it’s portable
across different database systems. With SQL backup, you must execute the
entire script, there’s no partial restore, unless you’re willing to
manually edit the file. Since there are no options, the backups are
simple to restore by using the -f
psql switch as shown in
Example 2-10. However, they are useful if you
need to load data to another DBMS with some editing.
Example 2-10. Restores plain text SQL backups
Restores a full backup and ignore errors:
psql -U postgres -f myglobals.sql
Restores and stops on first error:
psql -U postgres --set ON_ERROR_STOP=on -f myglobals.sql
Restores a partial backup to a specific database:
psql -U postgres -d mydb -f select_objects.sql
If you backed up using pg_dump, and specified a non-plain text backup format like tar, custom, or directory, you can use the versatile pg_restore utility for the restore. pg_restore provides you with a dizzying array of options for restoration and far surpasses any restoration utility found in other database systems. Here are some of its outstanding features:
As of 8.4, you can do parallel restores using the
-j
switch to control the number of threads to use. This
allows each thread to be restoring a separate table simultaneously,
which significantly speeds up restores.
You can generate a plain text table of contents from your backup file to confirm what has been backed up. You have the ability to edit this table of contents and use the revision to control which database objects will be restored.
Just as pg_dump allows you to do selective backups of objects to save time, pg_restore allows you to do selective restores even from a backup that contains a full database.
For the most part, pg_restore and pg_dump are backward-compatible. You can backup a database on an older version and restore using a newer version.
Refer to Database Backup: pg_restore for a listing of pg_restore command options.
A basic restore command of a compressed or TAR backup would be to first create the database in SQL:
CREATE DATABASE mydb;
and then restore:
pg_restore --dbname=mydb --jobs=4 --verbose mydb.backup
If the database is the same as the one you backed up, you can create the database in ones step with the following:
pg_restore --dbname=postgres --create --jobs=4 --verbose mydb.backup
If you use the --create
switch, the
--dbname
switch needs to be different from the
database being created, since you can’t really run anything within the
context of a database that has yet to be created. The downside of
using --create is that the database name is always the name of the one
you backed up and you can’t change it during the restore.
If you are running 9.2, you can take advantage of the
--section
switch to restore just the table structure
without the actual data. This is useful if you want to use an existing
database as a template for a new one. To do so, we would first create
the target database using psql or pgAdmin:
CREATE DATABASE mydb2;
and then use pg_restore:
pg_restore --dbname=mydb2 --section=pre-data --jobs=4 mydb.backup
PostgreSQL uses tablespaces to ascribe logical names to physical locations on disk. Initializing a PostgreSQL cluster automatically begets two tablespaces: pg_default, which stores for all user data and pg_global, which stores all system data. These are located in the same folder as your default data cluster. You’re free to create tablespaces at will and house them on any server disks. You can explicitly assign default tablespaces for new objects by database. You can also move existing database objects to new ones.
To create a tablespace, you just need to denote a logical name and a physical folder. The postgres service account needs to have full access to this folder. If you are on a Windows server, use the following command (note the use of Unix-style slashes):
CREATE TABLESPACE secondary LOCATION 'C:/pgdata91_secondary';
For Unix-based systems, you first have to create the folder or define an fstab location then use this command:
CREATE TABLESPACE secondary LOCATION '/usr/data/pgdata91_secondary';
You can shuffle database objects among different tablespaces. To move all objects in the database to our secondary tablespace:
ALTER DATABASE mydb SET TABLESPACE secondary;
To move just a table:
ALTER TABLE mytable SET TABLESPACE secondary;
Moving a table to another tablespace locks it for the duration of the move.
We have seen so many ways that people manage to break their PostgreSQL server that we thought it best to end this chapter itemizing the most common mistakes that people make. For starters, if you don’t know what you did wrong the log file could provide clues. Look for the pg_log folder in your PostgreSQL data folder or the root of the PostgreSQL data folder for the log files. It’s also quite possible that your server shutdown before a log entry could be written in which case the log won’t help you. Should your server fail to restart, try the command line by using:
path/to/your/bin/
pg_ctl -Dyour_postgresql_data_folder
When people run out of disk space, the first thing they do is panic and start deleting files from the PostgreSQL data cluster folder because it’s so big. Part of the reason why this mistake happens so frequently is that some folders such as pg_log, pg_xlog, and pg_clog sound like logging folders that you expect to build up and be safe to delete. There are some files you can safely delete, and some that will destroy your data if you do.
The pg_log folder often found in your data folder is a folder that tends to build up, especially if you have logging enabled. Files in this folder can always be safely deleted without issues. In fact, many people just schedule jobs to delete them.
Files in the other folders except for pg_xlog should never be deleted, even if they
sound like logs. In particular, don’t even think of touching
pg_clog
, the active commit log, without getting into
trouble.
pg_xlog stores transaction logs. Some systems we’ve seen are configured to move processed transaction logs in a subfolder called archive. You’ll often have an archive folder somewhere (not necessarily as a subfolder of pg_xlog) if you are running synchoronous replication, continuous archiving, or just keeping around logs if you need to revert to a different point in time. Deleting files in the root of pg_xlog will destroy data, however, deleting files in the archived folder will just prevent you from performing point-in-time recovery, or if a slave server hasn’t played back the logs, prevent them from fetching them. If you aren’t concerned about any of these scenarios, then it’s safe to delete or move files in the archive folder.
Be weary of overzealous anti-virus programs, especially on Windows. We’ve seen cases where AV software removed important binaries in the PostgreSQL bin folder. Should PostgreSQL fail to start on a Windows system, the event viewer is the first place to look for clues as to why.
Many people are under the misconception that the postgres account needs to have full administrative rights to the server. In fact, depending on your PostgreSQL version, if you give the postgres account full administrative rights to the server, your database server may not even start.
The postgres system account should always be created as a regular system user in the OS with just rights to the data cluster and additional tablespace folders. Most installers will set up the correct permissions for postgres. Don’t try to do postgres any favors by giving it more rights than it needs. Granting unnecessary rights leaves your system vulnerable should you fall under an SQL injection attack. There are cases where you’ll need to give the postgres account write/delete/read rights to folders or executables outside of the data cluster. With scheduled jobs that execute batch files, this need often arises. We advise you to practice restraint and only grant the minimum rights necessary to get the job done.
Loading up your server with RAM doesn’t mean you can set the
shared_buffers
as high as you’d like. Try it and your
server may crash or refuse to start. If you are running PostgreSQL on
32-bit Windows, setting it higher than 512 MB often results in
instability. With PostgreSQL 64-bit windows, you can push the envelop a
bit higher and even exceed 1 GB without any issues. On some Linux
systems, the compiled SHMMAX
variable is low and
shared_buffers
can’t be set higher. Details on how to
remedy this issue are detailed in the manual, in the section Kernel
Resources.
If you do this, you’ll see errors in your pg_log files of the form. Make sure PostgreSQL is not already running. Here are the common reasons why this happens:
You’ve already started postgres service.
You are trying to run it on a port already in use by another service.
Your postgres service had a sudden shutdown and you have an orphan postgresql.pid file in the data folder. Just delete the file and try to start again.
You have an orphaned PostgreSQL process. When all else fails, kill all running PostgreSQL processes and then start again.
3.16.135.225