In addition to the run-of-the-mill data table, PostgreSQL offers several kinds of tables that are rather unique: temporary, unlogged (demonstrated in Example 6-3), inherited (demonstrated in Example 6-2), and typed tables (demonstrated in Example 6-4).
In this section, we’ll demonstrate some common table creation examples. Most are similar to or exactly what you’ll find in other databases.
serial
type is the data type you use when you want an incrementing
auto number. It creates a companion sequence
object and defines the new
column as an integer with the default value set to the next value of the sequence
object. It is often as a primary key.
varchar
is a variable length string similar to what you will find
used in other databases. It can also be written as character varying(50)
.
If you don’t specify a size, the size is unconstrained.
text
is an unconstrained string. It’s never followed with a
size.
timestamp with time zone
is a date and time data type always stored in UTC.
It will, by default, always display date and time in the server’s own time zone unless
you tell it to otherwise. It’s often written using the short-hand
timestamptz
. There is a companion data type called
timestamp
, which lacks the time zone. As a result, the value of
timestamp
will not change if your server’s time zone changes.
PostgreSQL is the only database that we know of that offers table inheritance. When you specify that a child table inherit from a parent, the child will be created with all the columns of the parent in addition to its own columns. All structural changes made to the parent will automatically propagate its child tables. To save you even more time, whenever you query the parent, all rows in the children are included as well. Not every trait of the parent passes down to the child, notably indexes and primary key constraints.
We defined a check constraint to limit data to just year 2011 for our time zone. Since we didn’t specify a time zone, our timestamp will default to the server’s time zone. Having the check constraint in place allows the query planner to completely skip over inherited tables that do not satisfy a query condition.
For ephemeral data that could be rebuilt in event of a disk failure or don’t need to
be restored after a crash, you might prefer having more speed over redundancy. In 9.1, the
UNLOGGED
modifier allows you to create unlogged tables. These tables will
not be part of any write-ahead logs. Should you accidentally unplug the power cord on the
server, when you turn back the power, all data in your unlogged tables will be wiped clean
during the roll-back process. You can find more examples and gotchas in Depesz: Waiting for 9.1 Unlogged Tables.
Example 6-3. Unlogged table creation
CREATE UNLOGGED TABLE web_sessions(session_id text PRIMARY KEY, add_ts timestamptz , upd_ts timestamptz, session_state xml);
The benefit of unlogged tables is that they can be 15 times or faster to write to than logged tables.
Unlogged tables are always truncated during crash recovery, so don’t use them for data that is not derivable or ephemeral. They also don’t support GIST indexes, and are therefore unsuitable for exotic data types that require such an index for speedy access. GIN indexes are supported, though.
PostgreSQL 9.0+ provides another way of table creation whereby the column structure is defined by a composite data type. When using this method, you can’t add additional columns directly to the table. The advantage of this approach is that if you have many tables sharing the same structure and you need to alter the column, you can do so by simply changing the underlying type.
We’ll demonstrate by first creating a type with the definition:
CREATE TYPE app_user AS (user_name varchar(50), email varchar(75), pwd varchar(50));
We can then create a table that has rows that are instances of this type:
Example 6-4. Typed table Creation
CREATE TABLE super_users OF app_user(CONSTRAINT pk_super_users PRIMARY KEY (user_name));
Let’s say we now need to add a phone number to all our tables. We simply have to run the following command to alter the underlying type:
ALTER TYPE app_user ADD ATTRIBUTE main_phone varchar(18) CASCADE;
Normally, you can’t change the definition of a type if tables depend on that type. The
CASCADE
modifier allows you to override this restriction.
PostgreSQL syntax pretty much abides by the ANSI-SQL standards for adding data, but it does have some lagniappes not always found in many other databases, one of which is a multi-row constructor that can be used to insert more than one record at a time. The multi-row constructor has been in existence since 8.2. The constructor can in fact be used in any SQL and behaves exactly like a table.
Example 6-5. Using multi-row consructor to insert data
INSERT INTO logs_2011(user_name, description, log_ts) VALUES ('robe', 'logged in', '2011-01-10 10:15 AM EST') , ('lhsu', 'logged out', '2011-01-11 10:20 AM EST'),
It’s much like a single row INSERT VALUES()
syntax except you can add
more than one row at a time.
For this next section, we’ll load the data collected in Example 3-5 into production-grade tables replete with proper
indexes and constraints. We’ll be using the new DO
command, which allows you to
write a piece of procedural language code on the fly. Don’t worry if you can’t follow the
code in this section. As long as you run the code, you’ll have the tables you need to
continue with our on-going examples.
The first step we’re going to take is to create a new schema. Think of schemas as
another level of organization for database objects. We use it liberally to group our
tables into logical units. You can have two tables with the same name as long as they are
in separate schemas. To distinguish between them, you must prepend the schema name. To
avoid the hassle of always having to tag on the schema in front of table names, you can
set the search_path
either on a per-session or permanent basis. You list
schemas in the order you would like the SQL parser to search for tables. For example, if
you have two tables named my_table, one in a schema called s1, another in s2, and you set
search_path=s2, s1;
. When you refer to my_table in a query without
prefixing the schema name, it’ll be the one in s2. Schemas are not limited to organizing
tables, you can place functions, types, views, and many other objects into separate
schemas.
Example 6-6. Creating a new schema, setting search_path, and populating lu_tracts
CREATE SCHEMA census; set search_path=census; CREATE TABLE lu_tracts(tract_id varchar(11), tract_long_id varchar(25) , tract_name varchar(150) , CONSTRAINT pk_lu_tracts PRIMARY KEY (tract_id)); INSERT INTO lu_tracts( tract_id, tract_long_id, tract_name) SELECT geo_id2, geo_id, geo_display FROM staging.factfinder_import WHERE geo_id2 ~ '^[0-9]+';
The next two examples take advantage of the new DO
command and the
procedural language PL/pgSQL to generate a series of INSERT INTO SELECT
statements. The SQL also performs an unpivot operation converting columnar data into
rows.
Example 6-7. Insert using DO to generate dynamic SQL
set search_path=census; CREATE TABLE lu_fact_types(fact_type_id serial, category varchar(100) , fact_subcats varchar(255)[] , short_name varchar(50) , CONSTRAINT pk_lu_fact_types PRIMARY KEY (fact_type_id)); DO language plpgsql $$ DECLARE var_sql text; BEGIN var_sql := string_agg('INSERT INTO lu_fact_types(category, fact_subcats, short_name) SELECT ''Housing'' , array_agg(s' || lpad(i::text,2,'0') || ') As fact_subcats, ' || quote_literal('s' || lpad(i::text,2,'0') ) || ' As short_name FROM staging.factfinder_import WHERE s' || lpad(I::text,2,'0') || ' ~ ''^[a-zA-Z]+'' ', ';') FROM generate_series(1,51) As I; EXECUTE var_sql; END $$;
Example 6-8. Adding data to facts table
set search_path=census; CREATE TABLE facts(fact_type_id int, tract_id varchar(11), yr int , val numeric(12,3), perc numeric(6,2), CONSTRAINT pk_facts PRIMARY KEY (fact_type_id, tract_id, yr)); DO language plpgsql $$ DECLARE var_sql text; BEGIN var_sql := string_agg('INSERT INTO facts(fact_type_id, tract_id, yr, val, perc) SELECT ' || ft.fact_type_id::text || ', geo_id2, 2010, s' || lpad(i::text,2,'0') || '::integer As val , CASE WHEN s' || lpad(i::text,2,'0') || '_perc LIKE ''(X%'' THEN NULL ELSE s' || lpad(i::text,2,'0') || '_perc END::numeric(5,2) As perc FROM staging.factfinder_import AS X WHERE s' || lpad(i::text,2,'0') || ' ~ ''^[0-9]+'' ', ';') FROM generate_series(1,51) As I INNER JOIN lu_fact_types AS F ON ('s' || lpad(I::text,2,'0') = T.short_name); EXECUTE var_sql; END$$;
PostgreSQL constraints are the most advanced and (most complex) of any database we’ve worked with. Not only do you just create constraints, but you can also control all facets of how it’ll handle existing data, cascade options, how to perform the matching, which indexes to incorporate, conditions under which constraint can be violated, and so forth. On top of it all, you need to even pick your own name for the constraint. For the full treatment, we suggest you review the official documentation. You’ll find comfort in knowing that taking the default settings usually works out fine. We’ll start off with something familiar to most relational folks: foreign key, unique, and check constraints before moving onto exclusion constraints introduced in 9.0.
PostgreSQL follows the same convention as most databases you may have worked with that support referential integrity. It supports the ability to define cascade update and delete rules. We’ll experiment with that in Example 6-9.
Example 6-9. Building FK constraints and covering indexes
set search_path=census,public; ALTER TABLE facts ADD CONSTRAINT fk_facts_lu_fact_types FOREIGN KEY (fact_type_id) REFERENCES lu_fact_types (fact_type_id) ON UPDATE CASCADE ON DELETE RESTRICT; CREATE INDEX fki_facts_lu_fact_types ON facts(fact_type_id); ALTER TABLE facts ADD CONSTRAINT fk_facts_lu_tracts FOREIGN KEY (tract_id) REFERENCES census.lu_tracts (tract_id) ON UPDATE CASCADE ON DELETE RESTRICT; CREATE INDEX fki_facts_lu_tracts ON census.facts(tract_id);
In this first constraint, we define a foreign key relationship between our facts and fact_types table. This prevents us from introducing fact types not already present in our fact types lookup table.
We also define a cascade rule that automatically update the fact_type_id in our
facts table should we renumber our fact types. We restrict deletes from our lookup
table if any values are in use. Although RESTRICT
is already the default
behavior, we add it for clarity.
Unlike primary key and unique constraints, PostgreSQL doesn’t automatically create an index for foreign key constraints; you need to do this yourself.
Each table can have no more than a single primary key. Should you need to enforce uniqueness on other columns, you must resort to unique constraints. Adding a unique constraint automatically creates an associated unique index. Unlike a primary key, a column with unique constraints can still be populated with NULLs. Having a unique constraint doesn’t qualify a column to participate in a foreign key relationship. Adding a unique constraint is simple.
ALTER TABLE logs_2011 ADD CONSTRAINT uq_us_log UNIQUE (user_name, log_ts);
Check constraints are conditions that must be met for a field or set of fields for
each row. PostgreSQL query planner also uses them for what is called constraint
exclusion
which means if a check constraint on a table guarantees that it can’t
service the filter condition of a query, then the planner can skip checking the table. We
saw an example of a check contraint in Example 6-2. That
particular example was used to prevent the planner from having to scan log tables that
don’t satisfy the date range of a query. You can define additional constraints, for
example you make require all user names input into logs tables be in lower case with this
check constraint:
ALTER TABLE logs ADD CONSTRAINT chk_lusername CHECK (user_name = lower(user_name));
The other noteworthy thing about check constraints is that unlike primary key, foreign
key, and unique key constraints, they can be inherited from parent tables. So you’ll see
that this particular check constraint we put on the logs
gets inherited by all child tables of logs
.
Introduced in PostgreSQL 9.0, exclusion constraints allow you to incorporate
additional operators to enforce a certain kind of uniqueness that can’t be satisfied by
equality. Exclusion constraints are really useful in problems involving scheduling. If a
room is booked between a certain period of time, additional booking overlaps won’t be
allowed. To enforce this rule, create a scheduling table using the period data type[1] and then add an exclusion constraint using the OVERLAP
(&&
) operator. PostgreSQL 9.2 introduces the range data types that
are perfect for use in exclusion constraints. In 9.2, the period extension is obsolete and
supplanted by the new built-in tstzrange range data type. An example of using 9.2
ranges with exclusion constraints is demonstrated in Waiting
for 9.2 Range Data Types.
PostgreSQL comes with a superbly flexible index framework. At time of writing, PostgreSQL comes with at least four types of indexes. Should you find these insufficient, you can define new index operators and modifiers to work on top of these. If still unsatisfied, you’re free to create your own index type. PostgreSQL also allows you to mix types in the same table each with their own catered index types and count on the planner to take advantage of them all by the planner’s bitmap index scan strategy. So, for instance, one column could use a B-tree index; the adjacent column a GiST index and both indexes can be utilized in the same query.
To take full advantage of all that PostgreSQL has to offer, you’ll want to understand the various types of indexes and what they can and can’t be used for. The various types of indexes PostgreSQL currently has built-in are listed next.
Index Types
B-tree is the index you’ll find most common in any relation database. B-tree is designed to be a general purpose type of index. You can usually get by with just this one alone if you don’t want to experiment with additional types. If PostgreSQL automatically creates an index for you or you don’t bother picking the type, B-tree will be chosen. It is currently the only index type allowed for primary key and unique indexes.
Generalized Search Tree (GiST) is an index type optimized for full text search, spatial data, astronomical data, and hierarchical data. You can’t use it to enforce uniqueness, however, you can use it in exclusion constraints.
Generalized Inverted Index (GIN) is an index type commonly used for the built-in full text search of PostgreSQL and the trigram extensions. GIN is a decendent of Gist, but it’s not lossy. GIN indexes are generally faster to search than GiST, but slower to update. You can see an example at Waiting for Faster LIKE/ILIKE.
Space-Partitioning Trees Generalized Search Tree (SP-GiST) is an index type introduced in PostgreSQL 9.2. It’s use is similar that of GiST, but is generally faster for certain kinds of distribution. PostGIS 2.1 spatial extension has planned support for it. The only types built-in that currently have support for it are the built-in PostgreSQL geometry types like point and box and text. Other GiST dependent extensions also have planned support for it.
Hash is an index that was popular before GiST and GIN came along. General consensus is that GiST and GIN outperform and are more transaction safe than hash. PostgreSQL has relegated hash to legacy status. You may encounter this index type in other databases, but it’s best to avoid it in PostgreSQL.
Should you want to go beyond the index types that PostgreSQL installs by default, either out of need or curiosity, you should start perusing the list of additional index types available as extensions.
Custom Index Types
This index is useful when you’re trying to group different types into a single index. Excellent choice for cases where you have a simple type like a number and a more complex type like a point. It’s also used to leverage gist like KNN operators and exclusion constraints for basic types, which can only be used with GiST and GIN indexable operators.
is a cross-breeding of B-tree and GIN. It supports the indexable specialty operators of GIN, but also offers indexable equality found in the B-tree index not available with standard GIN. It’s most useful when you want to create a compound index composed of a column data type like a text or number, normally serviced by btree operators and another column, such as a hierarchical ltree type, or full-text vector supported by GIN. By using a btree_gin index, you can have both columns as part of the compound index and still have and indexable equality check be able to use the index for the text/integer column.
You can install any of the index types in Custom Index Types, using the following:
CREATE EXTENSION btree_gist
;
Indexes for each data type have operator classes (a.k.a. opclass
).
Operator classes are detailed in Operator Classes. Operator classes support a given set
of operators. In short, an operator can utilize an index only if the operator class for that
index supports it. For example, many B-tree operator classes support =
,
>=
but not pattern operations like ~>~
. Each data type
comes with a default operator class. For instance, the default opclass for
varchar
is varchar_ops
, which includes operators
such as =
, >
, and <
but no support for pattern
operations. If you create an index without being explicit about the opclass(es) to be used,
the default for the data type(s) being indexed would automatically be picked. The index will
then only be useful when you’re using operators within the opclass of the index. Refer to
Why is My Index Not Used? for more information.
You shouldn’t always accept the default. For instance, varchar_ops
doesn’t include the LIKE
operators, so none of your like searches can use an
index of that opclass
. If you’re going to be performing wildcard
searches on a varchar
columns, you’d be better off choosing the
varchar_pattern_ops
opclass
for your index. To specify the opclass, just append the opclass
after the column name, as in:
CREATE INDEX idx_bt_my_table_description_varchar_pattern ON my_table USING btree (description varchar_pattern_ops);
For PostgreSQL 9.1+, you’d do even better with a GiST index and the companion
pg_trgm
extension packaged with the gist_trgm_ops
operator class. This particular opclass
is highly optimized for wildcard
searches. You can learn more about Trigrams in our article at Teaching LIKE and ILIKE New Tricks. Wtih the extension
installed, you can create your index as follows:
CREATE INDEX idx_gist_my_table_description_gist_trgm_ops ON my_table USING gist (description gist_trgm_ops);
You’ll then see your standard ILIKE and LIKE searches being able to take advantage of indexing.
PostgreSQL has a feature called functional indexes, which you won’t often find in other databases. A more common parallel you’ll see in other databases like Microsoft SQL Server or MySQL are computed columns and the ability to place indexes on computed columns. PostgreSQL didn’t buy into the idea of computed columns since views are more appropriate places for them. To still reap the speed advantage of indexes, PostgreSQL lets you place indexes on functions of columns. A classic example where you’d want to employ a functional index is for dealing with mixed case text. PostgreSQL is a case-sensitive database, so to be able to search using an index when casing doesn’t matter; you can create an index as follows:
CREATE INDEX idx_featnames_ufullname_varops ON featnames_short USING btree (upper(fullname) varchar_pattern_ops);
Partial indexes (read more about them here: http://www.postgresql.org/docs/current/interactive/indexes-partial.html) are indexes that only index that portion of data fitting a
specific WHERE
condition. This is pretty much synonymous with SQL Server 2008+
filtered index, but PostgreSQL has had this feature even in pre 8.0 versions. If you have a
table of one million rows, but you only query a fixed set of 10,000, you’re better off
creating partial indexes because of the disk savings and having a smaller more efficient
index to scan. The main caveat with partial indexes is that you must use the same
WHERE
condition when you created the index in your query to activate the
index. An easy way to ensure that your partial index will always be used is to use a view
when querying the data. For example, let’s say we have a table of newspaper subscribers,
which we define as follows:
Let’s suppose we have a subscription table, but we want to ensure that for each user, we have only one active subscription. We might create a table like this:
CREATE TABLE allsubscribers (id serial PRIMARY KEY, user_name varchar(50) NOT NULL , deactivate timestamptz);
We can then add our partial index to guarantee uniqueness only for active subscribers:
CREATE UNIQUE INDEX uqidx_1 ON allsubscribers USING btree (lower(user_name)) WHERE deactivate IS NULL;
To ensure our index is always used for active subscriptions, we can create a view with the built-in condition and always use this view when querying active subscriptions:
CREATE OR REPLACE VIEW vw_active_subscribers AS SELECT id, lower(user_name) As user_name FROM allsubscribers WHERE deact_dt IS NULL;
To ensure index usage, we always query against our view as follows:
SELECT * FROM vw_active_subscribers WHERE user_name = 'sandy';
You can open up the planner and see that our index was indeed used.
PostgreSQL, like many other databases, supports compound indexes, a.k.a. multicolumn indexes. Compound indexes allow you to combine multiple columns or functions on columns into one index. Prior to 9.0, there wasn’t a compelling reason to use compound indexes apart from primary key and unique key indexes because PostgreSQL supports bitmap index scans, which allows the planner to utilize multiple indexes in a query. Using a compound index, may speed up certain kinds of searches if you always search exactly those multiple columns together.
In 9.0 and even more so in 9.2, compound indexes serve an important role in exclusion constraints. In PostgreSQL 9.2, index-only scans were introduced, which makes the use of compound indexes even more relevant since the planner can just scan the index and use data from the index without ever needing to check the underlying table.
Here is an example of a multicolumn index:
CREATE INDEX idx_sometable_cmpd ON sometable USING btree(type_id, upper(fullname) varchar_pattern_ops);
3.16.50.252