Chapter 6. Of Tables, Constraints, and Indexes

Tables

In addition to the run-of-the-mill data table, PostgreSQL offers several kinds of tables that are rather unique: temporary, unlogged (demonstrated in Example 6-3), inherited (demonstrated in Example 6-2), and typed tables (demonstrated in Example 6-4).

Table Creation

In this section, we’ll demonstrate some common table creation examples. Most are similar to or exactly what you’ll find in other databases.

Example 6-1. Basic table creation

CREATE TABLE logs(
 log_id serial PRIMARY KEY 1, user_name varchar(50) 2
 , description text 3
 , log_ts timestamp with time zone NOT NULL DEFAULT current_timestamp); 4
CREATE INDEX idx_logs_log_ts ON logs USING btree(log_ts);
1

serial type is the data type you use when you want an incrementing auto number. It creates a companion sequence object and defines the new column as an integer with the default value set to the next value of the sequence object. It is often as a primary key.

2

varchar is a variable length string similar to what you will find used in other databases. It can also be written as character varying(50). If you don’t specify a size, the size is unconstrained.

3

text is an unconstrained string. It’s never followed with a size.

4

timestamp with time zone is a date and time data type always stored in UTC. It will, by default, always display date and time in the server’s own time zone unless you tell it to otherwise. It’s often written using the short-hand timestamptz. There is a companion data type called timestamp, which lacks the time zone. As a result, the value of timestamp will not change if your server’s time zone changes.

PostgreSQL is the only database that we know of that offers table inheritance. When you specify that a child table inherit from a parent, the child will be created with all the columns of the parent in addition to its own columns. All structural changes made to the parent will automatically propagate its child tables. To save you even more time, whenever you query the parent, all rows in the children are included as well. Not every trait of the parent passes down to the child, notably indexes and primary key constraints.

Example 6-2. Inherited table creation

CREATE TABLE logs_2011(PRIMARY KEY(log_id)) INHERITS (logs);
CREATE INDEX idx_logs_2011_log_ts ON logs USING btree(log_ts);
ALTER TABLE logs_2011 
  ADD CONSTRAINT chk_y2011 
   CHECK (log_ts BETWEEN '2011-01-01'::timestamptz AND '2012-1-1'::timestamptz);1
1

We defined a check constraint to limit data to just year 2011 for our time zone. Since we didn’t specify a time zone, our timestamp will default to the server’s time zone. Having the check constraint in place allows the query planner to completely skip over inherited tables that do not satisfy a query condition.

For ephemeral data that could be rebuilt in event of a disk failure or don’t need to be restored after a crash, you might prefer having more speed over redundancy. In 9.1, the UNLOGGED modifier allows you to create unlogged tables. These tables will not be part of any write-ahead logs. Should you accidentally unplug the power cord on the server, when you turn back the power, all data in your unlogged tables will be wiped clean during the roll-back process. You can find more examples and gotchas in Depesz: Waiting for 9.1 Unlogged Tables.

Example 6-3. Unlogged table creation

CREATE UNLOGGED TABLE web_sessions(session_id text PRIMARY KEY, add_ts timestamptz
 , upd_ts timestamptz, session_state xml);

The benefit of unlogged tables is that they can be 15 times or faster to write to than logged tables.

Warning

Unlogged tables are always truncated during crash recovery, so don’t use them for data that is not derivable or ephemeral. They also don’t support GIST indexes, and are therefore unsuitable for exotic data types that require such an index for speedy access. GIN indexes are supported, though.

PostgreSQL 9.0+ provides another way of table creation whereby the column structure is defined by a composite data type. When using this method, you can’t add additional columns directly to the table. The advantage of this approach is that if you have many tables sharing the same structure and you need to alter the column, you can do so by simply changing the underlying type.

We’ll demonstrate by first creating a type with the definition:

CREATE TYPE app_user AS (user_name varchar(50), email varchar(75), pwd varchar(50));

We can then create a table that has rows that are instances of this type:

Example 6-4. Typed table Creation

CREATE TABLE super_users OF app_user(CONSTRAINT pk_super_users PRIMARY KEY (user_name));

Let’s say we now need to add a phone number to all our tables. We simply have to run the following command to alter the underlying type:

ALTER TYPE app_user ADD ATTRIBUTE main_phone varchar(18) CASCADE;

Normally, you can’t change the definition of a type if tables depend on that type. The CASCADE modifier allows you to override this restriction.

Multi-Row Insert

PostgreSQL syntax pretty much abides by the ANSI-SQL standards for adding data, but it does have some lagniappes not always found in many other databases, one of which is a multi-row constructor that can be used to insert more than one record at a time. The multi-row constructor has been in existence since 8.2. The constructor can in fact be used in any SQL and behaves exactly like a table.

Example 6-5. Using multi-row consructor to insert data

INSERT INTO logs_2011(user_name, description, log_ts)
 VALUES ('robe', 'logged in', '2011-01-10 10:15 AM EST')
 , ('lhsu', 'logged out', '2011-01-11 10:20 AM EST'),

It’s much like a single row INSERT VALUES() syntax except you can add more than one row at a time.

An Elaborate Insert

For this next section, we’ll load the data collected in Example 3-5 into production-grade tables replete with proper indexes and constraints. We’ll be using the new DO command, which allows you to write a piece of procedural language code on the fly. Don’t worry if you can’t follow the code in this section. As long as you run the code, you’ll have the tables you need to continue with our on-going examples.

The first step we’re going to take is to create a new schema. Think of schemas as another level of organization for database objects. We use it liberally to group our tables into logical units. You can have two tables with the same name as long as they are in separate schemas. To distinguish between them, you must prepend the schema name. To avoid the hassle of always having to tag on the schema in front of table names, you can set the search_path either on a per-session or permanent basis. You list schemas in the order you would like the SQL parser to search for tables. For example, if you have two tables named my_table, one in a schema called s1, another in s2, and you set search_path=s2, s1;. When you refer to my_table in a query without prefixing the schema name, it’ll be the one in s2. Schemas are not limited to organizing tables, you can place functions, types, views, and many other objects into separate schemas.

Example 6-6. Creating a new schema, setting search_path, and populating lu_tracts

CREATE SCHEMA census; 1
set search_path=census; 2
CREATE TABLE lu_tracts(tract_id varchar(11), tract_long_id varchar(25)
 , tract_name varchar(150) 
 , CONSTRAINT pk_lu_tracts PRIMARY KEY (tract_id));
INSERT INTO lu_tracts( tract_id, tract_long_id, tract_name)
SELECT geo_id2, geo_id, geo_display 
FROM staging.factfinder_import 
WHERE geo_id2 ~ '^[0-9]+'; 3
1

Create a schema called census to house our new data.

2

set search_path allows us to designate the default schemas to search in for this session.

3

We want to insert only census tract rows, so we designate a regex that searches for strings starting with one or more numbers.

The next two examples take advantage of the new DO command and the procedural language PL/pgSQL to generate a series of INSERT INTO SELECT statements. The SQL also performs an unpivot operation converting columnar data into rows.

Example 6-7. Insert using DO to generate dynamic SQL

set search_path=census;
CREATE TABLE lu_fact_types(fact_type_id serial, category varchar(100)
 , fact_subcats varchar(255)[] 1, short_name varchar(50)
 , CONSTRAINT pk_lu_fact_types PRIMARY KEY (fact_type_id));

DO language plpgsql 
$$ 
DECLARE var_sql text;
BEGIN
  var_sql := string_agg('INSERT INTO lu_fact_types(category, fact_subcats, short_name) 
  SELECT ''Housing''
   , array_agg(s' || lpad(i::text,2,'0') || ') As fact_subcats, ' || quote_literal('s' || lpad(i::text,2,'0') ) || ' As short_name 
  FROM staging.factfinder_import 
  WHERE s' || lpad(I::text,2,'0') || ' ~ ''^[a-zA-Z]+'' ', ';') FROM generate_series(1,51) As I;
  EXECUTE var_sql;
END
$$;
1

An array of strings each with maximum length of 255 characters.

Example 6-8. Adding data to facts table

set search_path=census;
CREATE TABLE facts(fact_type_id int, tract_id varchar(11), yr int
 , val numeric(12,3), perc numeric(6,2),
CONSTRAINT pk_facts PRIMARY KEY (fact_type_id, tract_id, yr));
DO language plpgsql 
$$
DECLARE var_sql text;
BEGIN
	var_sql := string_agg('INSERT INTO facts(fact_type_id, tract_id, yr, val, perc)
	SELECT ' || ft.fact_type_id::text || ', geo_id2, 2010, s' || lpad(i::text,2,'0') || '::integer As val
	 , CASE WHEN s' || lpad(i::text,2,'0') || '_perc LIKE ''(X%'' THEN NULL ELSE s' || lpad(i::text,2,'0') || '_perc END::numeric(5,2) As perc
	FROM staging.factfinder_import AS X
	WHERE s' || lpad(i::text,2,'0') || ' ~ ''^[0-9]+'' ', ';')
	FROM generate_series(1,51) As I INNER JOIN lu_fact_types AS F ON ('s' || lpad(I::text,2,'0') = T.short_name);
	EXECUTE var_sql;
END$$;

Constraints

PostgreSQL constraints are the most advanced and (most complex) of any database we’ve worked with. Not only do you just create constraints, but you can also control all facets of how it’ll handle existing data, cascade options, how to perform the matching, which indexes to incorporate, conditions under which constraint can be violated, and so forth. On top of it all, you need to even pick your own name for the constraint. For the full treatment, we suggest you review the official documentation. You’ll find comfort in knowing that taking the default settings usually works out fine. We’ll start off with something familiar to most relational folks: foreign key, unique, and check constraints before moving onto exclusion constraints introduced in 9.0.

Foreign Key Constraints

PostgreSQL follows the same convention as most databases you may have worked with that support referential integrity. It supports the ability to define cascade update and delete rules. We’ll experiment with that in Example 6-9.

Example 6-9. Building FK constraints and covering indexes

set search_path=census,public;
ALTER TABLE facts
ADD CONSTRAINT fk_facts_lu_fact_types
FOREIGN KEY (fact_type_id) REFERENCES lu_fact_types (fact_type_id) 1
ON UPDATE CASCADE ON DELETE RESTRICT;  2
CREATE INDEX fki_facts_lu_fact_types ON facts(fact_type_id); 3

ALTER TABLE facts
ADD CONSTRAINT fk_facts_lu_tracts
FOREIGN KEY (tract_id)
REFERENCES census.lu_tracts (tract_id)
ON UPDATE CASCADE ON DELETE RESTRICT;
CREATE INDEX fki_facts_lu_tracts ON census.facts(tract_id);
1

In this first constraint, we define a foreign key relationship between our facts and fact_types table. This prevents us from introducing fact types not already present in our fact types lookup table.

2

We also define a cascade rule that automatically update the fact_type_id in our facts table should we renumber our fact types. We restrict deletes from our lookup table if any values are in use. Although RESTRICT is already the default behavior, we add it for clarity.

3

Unlike primary key and unique constraints, PostgreSQL doesn’t automatically create an index for foreign key constraints; you need to do this yourself.

Unique Constraints

Each table can have no more than a single primary key. Should you need to enforce uniqueness on other columns, you must resort to unique constraints. Adding a unique constraint automatically creates an associated unique index. Unlike a primary key, a column with unique constraints can still be populated with NULLs. Having a unique constraint doesn’t qualify a column to participate in a foreign key relationship. Adding a unique constraint is simple.

ALTER TABLE logs_2011 ADD CONSTRAINT uq_us_log UNIQUE (user_name, log_ts);

Check Constraints

Check constraints are conditions that must be met for a field or set of fields for each row. PostgreSQL query planner also uses them for what is called constraint exclusion which means if a check constraint on a table guarantees that it can’t service the filter condition of a query, then the planner can skip checking the table. We saw an example of a check contraint in Example 6-2. That particular example was used to prevent the planner from having to scan log tables that don’t satisfy the date range of a query. You can define additional constraints, for example you make require all user names input into logs tables be in lower case with this check constraint:

ALTER TABLE logs ADD CONSTRAINT chk_lusername 
 CHECK (user_name = lower(user_name));

The other noteworthy thing about check constraints is that unlike primary key, foreign key, and unique key constraints, they can be inherited from parent tables. So you’ll see that this particular check constraint we put on the logs gets inherited by all child tables of logs.

Exclusion Constraints

Introduced in PostgreSQL 9.0, exclusion constraints allow you to incorporate additional operators to enforce a certain kind of uniqueness that can’t be satisfied by equality. Exclusion constraints are really useful in problems involving scheduling. If a room is booked between a certain period of time, additional booking overlaps won’t be allowed. To enforce this rule, create a scheduling table using the period data type[1] and then add an exclusion constraint using the OVERLAP (&&) operator. PostgreSQL 9.2 introduces the range data types that are perfect for use in exclusion constraints. In 9.2, the period extension is obsolete and supplanted by the new built-in tstzrange range data type. An example of using 9.2 ranges with exclusion constraints is demonstrated in Waiting for 9.2 Range Data Types.

Indexes

PostgreSQL comes with a superbly flexible index framework. At time of writing, PostgreSQL comes with at least four types of indexes. Should you find these insufficient, you can define new index operators and modifiers to work on top of these. If still unsatisfied, you’re free to create your own index type. PostgreSQL also allows you to mix types in the same table each with their own catered index types and count on the planner to take advantage of them all by the planner’s bitmap index scan strategy. So, for instance, one column could use a B-tree index; the adjacent column a GiST index and both indexes can be utilized in the same query.

PostgreSQL Stock Indexes

To take full advantage of all that PostgreSQL has to offer, you’ll want to understand the various types of indexes and what they can and can’t be used for. The various types of indexes PostgreSQL currently has built-in are listed next.

Index Types

B-tree

B-tree is the index you’ll find most common in any relation database. B-tree is designed to be a general purpose type of index. You can usually get by with just this one alone if you don’t want to experiment with additional types. If PostgreSQL automatically creates an index for you or you don’t bother picking the type, B-tree will be chosen. It is currently the only index type allowed for primary key and unique indexes.

GiST

Generalized Search Tree (GiST) is an index type optimized for full text search, spatial data, astronomical data, and hierarchical data. You can’t use it to enforce uniqueness, however, you can use it in exclusion constraints.

GIN

Generalized Inverted Index (GIN) is an index type commonly used for the built-in full text search of PostgreSQL and the trigram extensions. GIN is a decendent of Gist, but it’s not lossy. GIN indexes are generally faster to search than GiST, but slower to update. You can see an example at Waiting for Faster LIKE/ILIKE.

SP-GiST

Space-Partitioning Trees Generalized Search Tree (SP-GiST) is an index type introduced in PostgreSQL 9.2. It’s use is similar that of GiST, but is generally faster for certain kinds of distribution. PostGIS 2.1 spatial extension has planned support for it. The only types built-in that currently have support for it are the built-in PostgreSQL geometry types like point and box and text. Other GiST dependent extensions also have planned support for it.

hash

Hash is an index that was popular before GiST and GIN came along. General consensus is that GiST and GIN outperform and are more transaction safe than hash. PostgreSQL has relegated hash to legacy status. You may encounter this index type in other databases, but it’s best to avoid it in PostgreSQL.

Should you want to go beyond the index types that PostgreSQL installs by default, either out of need or curiosity, you should start perusing the list of additional index types available as extensions.

Custom Index Types

btree_gist

This index is useful when you’re trying to group different types into a single index. Excellent choice for cases where you have a simple type like a number and a more complex type like a point. It’s also used to leverage gist like KNN operators and exclusion constraints for basic types, which can only be used with GiST and GIN indexable operators.

btree_gin

is a cross-breeding of B-tree and GIN. It supports the indexable specialty operators of GIN, but also offers indexable equality found in the B-tree index not available with standard GIN. It’s most useful when you want to create a compound index composed of a column data type like a text or number, normally serviced by btree operators and another column, such as a hierarchical ltree type, or full-text vector supported by GIN. By using a btree_gin index, you can have both columns as part of the compound index and still have and indexable equality check be able to use the index for the text/integer column.

You can install any of the index types in Custom Index Types, using the following:

CREATE EXTENSION btree_gist;

Operator Class

Indexes for each data type have operator classes (a.k.a. opclass). Operator classes are detailed in Operator Classes. Operator classes support a given set of operators. In short, an operator can utilize an index only if the operator class for that index supports it. For example, many B-tree operator classes support =, >= but not pattern operations like ~>~. Each data type comes with a default operator class. For instance, the default opclass for varchar is varchar_ops, which includes operators such as =, >, and < but no support for pattern operations. If you create an index without being explicit about the opclass(es) to be used, the default for the data type(s) being indexed would automatically be picked. The index will then only be useful when you’re using operators within the opclass of the index. Refer to Why is My Index Not Used? for more information.

You shouldn’t always accept the default. For instance, varchar_ops doesn’t include the LIKE operators, so none of your like searches can use an index of that opclass. If you’re going to be performing wildcard searches on a varchar columns, you’d be better off choosing the varchar_pattern_ops opclass for your index. To specify the opclass, just append the opclass after the column name, as in:

CREATE INDEX idx_bt_my_table_description_varchar_pattern ON my_table 
 USING btree (description varchar_pattern_ops);

For PostgreSQL 9.1+, you’d do even better with a GiST index and the companion pg_trgm extension packaged with the gist_trgm_ops operator class. This particular opclass is highly optimized for wildcard searches. You can learn more about Trigrams in our article at Teaching LIKE and ILIKE New Tricks. Wtih the extension installed, you can create your index as follows:

CREATE INDEX idx_gist_my_table_description_gist_trgm_ops ON my_table 
 USING gist (description gist_trgm_ops);

You’ll then see your standard ILIKE and LIKE searches being able to take advantage of indexing.

Functional Indexes

PostgreSQL has a feature called functional indexes, which you won’t often find in other databases. A more common parallel you’ll see in other databases like Microsoft SQL Server or MySQL are computed columns and the ability to place indexes on computed columns. PostgreSQL didn’t buy into the idea of computed columns since views are more appropriate places for them. To still reap the speed advantage of indexes, PostgreSQL lets you place indexes on functions of columns. A classic example where you’d want to employ a functional index is for dealing with mixed case text. PostgreSQL is a case-sensitive database, so to be able to search using an index when casing doesn’t matter; you can create an index as follows:

CREATE INDEX idx_featnames_ufullname_varops ON featnames_short 
 USING btree (upper(fullname) varchar_pattern_ops);

Partial Indexes

Partial indexes (read more about them here: http://www.postgresql.org/docs/current/interactive/indexes-partial.html) are indexes that only index that portion of data fitting a specific WHERE condition. This is pretty much synonymous with SQL Server 2008+ filtered index, but PostgreSQL has had this feature even in pre 8.0 versions. If you have a table of one million rows, but you only query a fixed set of 10,000, you’re better off creating partial indexes because of the disk savings and having a smaller more efficient index to scan. The main caveat with partial indexes is that you must use the same WHERE condition when you created the index in your query to activate the index. An easy way to ensure that your partial index will always be used is to use a view when querying the data. For example, let’s say we have a table of newspaper subscribers, which we define as follows:

Let’s suppose we have a subscription table, but we want to ensure that for each user, we have only one active subscription. We might create a table like this:

CREATE TABLE allsubscribers (id serial PRIMARY KEY, user_name varchar(50) NOT NULL
 , deactivate timestamptz);

We can then add our partial index to guarantee uniqueness only for active subscribers:

CREATE UNIQUE INDEX uqidx_1 ON allsubscribers 
 USING btree (lower(user_name)) WHERE deactivate IS NULL;

To ensure our index is always used for active subscriptions, we can create a view with the built-in condition and always use this view when querying active subscriptions:

CREATE OR REPLACE VIEW vw_active_subscribers AS 
SELECT id, lower(user_name) As user_name 
 FROM allsubscribers WHERE deact_dt IS NULL;

To ensure index usage, we always query against our view as follows:

SELECT * FROM vw_active_subscribers WHERE user_name = 'sandy';

You can open up the planner and see that our index was indeed used.

Multicolumn Indexes

PostgreSQL, like many other databases, supports compound indexes, a.k.a. multicolumn indexes. Compound indexes allow you to combine multiple columns or functions on columns into one index. Prior to 9.0, there wasn’t a compelling reason to use compound indexes apart from primary key and unique key indexes because PostgreSQL supports bitmap index scans, which allows the planner to utilize multiple indexes in a query. Using a compound index, may speed up certain kinds of searches if you always search exactly those multiple columns together.

In 9.0 and even more so in 9.2, compound indexes serve an important role in exclusion constraints. In PostgreSQL 9.2, index-only scans were introduced, which makes the use of compound indexes even more relevant since the planner can just scan the index and use data from the index without ever needing to check the underlying table.

Here is an example of a multicolumn index:

CREATE INDEX idx_sometable_cmpd ON sometable 
 USING btree(type_id, upper(fullname) varchar_pattern_ops);


[1] You’ll need to install the period extension to be able to use this data type.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.50.252