Chapter 3

Tables

Abstract

A discussion of the types of tables and how to design the columns.

Keywords

CREATE TABLE

TEMPORARY

LOCAL

GLOBAL

ON COMMIT clause

default clause

column constraint

NOT NULL

Data types

CHECK() constraints

UNIQUE and PRIMARY KEY constraints

REFERENCES clause

Computed column

DEFERRABLE constraints

CREATE DOMAIN statement

CREATE SEQUENCE statement

Character sets

Unicode

CREATE TRANSLATION

Conceptually, a table is a set of zero or more rows, and a row is a set of one or more columns. This hierarchy is important; actions apply at either the schema level, table level, row level, or column level. For example, the DELETE FROM statement removes whole rows, not columns, and leaves the base table in the schema. You cannot delete just a single column from a row. Likewise, DROP TABLE is a schema level command that removes a table completely.

I am starting with tables because eventually, everything comes back to tables. It is the only data structure in SQL. We do not have files, arrays, lists, pointers, and so forth. This is a set-oriented model of data and is as simple and implementation independent as we could make it.

 The tables have no particular ordering in the schema; they are referenced by table names.

 The rows have no particular ordering in their table; they are referenced by a key. Each row has, for each column, exactly one value or NULL in the data type of that column.

 The columns have no particular ordering in the row; they are referenced by column names. Each column has a name and a data type.

Each column has a specific data type and constraints that make up an implementation of an abstract domain. The way a table is physically implemented does not matter, because you access it only with SQL. The database engine handles all the details for you and you never worry about the internals as you would with a physical file. In fact, almost no two SQL products use the same internal structures.

There are two common conceptual errors made by programmers who are accustomed to file systems. The first is thinking that a table is a file; the second is thinking that a table is a spreadsheet. Tables do not behave like either one of these, and you will get surprises if you do not understand the basic concepts.

It is easy to make these conceptual errors. When data moves from SQL to a host language, it has to be converted into host language data types and data structures to be displayed and used. The host languages have file systems built into them.

Another major difference is that tables and columns can have constraints attached to them. A constraint is a rule that defines what must be true about the database after each transaction. In this sense, a database is more like a collection of objects than a traditional passive file system.

3.1 CREATE TABLE Statements

The CREATE TABLE statement does all the hard work. The basic syntax for base table declarations looks like this:

CREATE TABLE < table name > (< table element list >)
< table element list > ::=
 < table element > | < table element >, < table element list >
< table element > ::=
 < column definition > | < table constraint definition >

The table definition includes data in the column definitions and rules for handling that data in the table constraint definitions. This means that a table acts more like an object (with its data and methods) than just a simple, passive file. We will get into the details shortly.

3.1.1 Base Tables

A base table is a table that has an actual physical existence in some kind of persistent storage. Virtual tables eventually are built from the base tables.

3.1.2 [GLOBAL | LOCAL] TEMPORARY Tables

The temporary tables can be used to hold intermediate results rather than requerying or recalculating them over and over. The syntax for creating a TEMPORARY TABLE is

CREATE [GLOBAL | LOCAL] TEMP[ORARY] TABLE < table name >
 (< table element list >)
ON COMMIT [PRESERVE | DELETE] ROWS;

This is just like the usual CREATE TABLE statement with the addition of two pieces of syntax. The access option to the table is given between the key words. The behavior of the data when a COMMIT is performed is shown at the end of the declaration.

The GLOBAL option in the TEMPORARY means that one copy of the table is available to all the modules of the application program in which it appears. The GLOBAL TEMPORARY TABLE is generally used to pass shared data between sessions.

The LOCAL option means that one copy of the table is available to each modules of the application program in which the temporary table appears. The LOCAL TEMPORARY TABLE is generally used as a “scratch table” by the user within a single module. If more than one user accesses the same LOCAL TEMPORARY TABLE, they each get a private copy of the table, initially empty, for their session or within the scope of the module that uses it.

If you have trouble imagining multiple tables in the schema with the same name (a violation of a basic rule of SQL about uniqueness of schema objects), then imagine a single table created as declared, but with an extra phantom column which contains a user identifier. What the users are then seeing is an updatable VIEW on the LOCAL TEMPORARY TABLE which shows them only the rows where this phantom column is equal to their user identifier, but not the phantom column itself. New rows are added to the LOCAL TEMPORARY TABLE with a DEFAULT of CURRENT USER.

Since this is a table in the schema, you can get rid of it with a DROP TABLE < table name > statement and you can change it with the usual INSERT INTO, DELETE FROM, and UPDATE statements. The differences are at the start and end of a session or module.

The ON COMMIT [PRESERVE | DELETE] ROWS clause describes the action taken when a COMMIT statement is executed successfully. The PRESERVE options means that the next time this table is used, the rows will still be there and will be deleted only at the end of the session. The DELETE options means that the rows will be deleted whenever a COMMIT statement is executed during the session. In both cases, the table will be cleared out at the end of the session or module.

3.2 Column Definitions

Beginning SQL programmers often fail to take full advantage of the options available to them, and they pay for it with errors or extra work in their applications. A column is not like a simple passive field in a file system. It has more than just a data type associated with it.

< column definition > ::=
 < column name > < data type >
 [< default clause >]
 [< column constraint>. . .]
< column constraint > ::= NOT NULL
 | < check constraint definition >
 | < UNIQUE specification >
 | < references specification >

The first important thing to notice here is that each column must have a data type, which it keeps unless you ALTER the table. The SQL standard offers many data types, because SQL must work with many different host languages. The data types fall into three major categories: numeric, character, and temporal. We will discuss the data types and their rules of operation in other sections; they are fairly obvious, so not knowing the details will not stop you from reading the examples that follow.

3.2.1 DEFAULT Clause

The default clause is an underused feature, whose syntax is

< default clause > ::=
 [CONSTRAINT < constraint name >] DEFAULT < default option >
< default option > ::= < literal > | < system value > | NULL
< system value > ::= CURRENT_DATE | CURRENT_TIME | CURRENT_TIMESTAMP | SYSTEM_USER | SESSION_USER | CURRENT_USER | NEXT VALUE FOR < sequence name >

Whenever the system does not have an explicit value to put into this column, it will look for its DEFAULT clause and use that value. The default option can be a literal value of the relevant data type, or something provided by the system, such as the current timestamp, current date, current user identifier, the e3xt value from a SEQUENCE, and so forth. If you do not provide a DEFAULT clause and the column is NULL-able, the system will provide a NULL as the default. If all that fails, you will get an error message about missing data.

This is a good way to make the database do a lot of work that you would otherwise have to code into all the application programs. The most common tricks are to use a zero in numeric columns, a string to encode a missing value (‘{{unknown}}’) or a true default (‘same address’) in character columns, and the system timestamp to mark transactions.

3.2.2 NOT NULL Constraint

The most important column constraint is the NOT NULL, which forbids the use of NULLs in a column. Use this constraint automatically and then remove it only when you have good reason. It will help you avoid the complications of NULL values when you make queries against the data.

The NULL is a special value in SQL that belongs to all data types. SQL is the only language that has such a creature; if you can understand how it works, you will have a good grasp of SQL. In Relational Theory, the NULL has no data type, but in SQL, we have to allocate storage for a column which has a data type. This means we can write “CAST (< expression > AS NULL)” in our code.

A NULL means that we have a missing, unknown, miscellaneous, or inapplicable value in the data.

The problem is that exactly which of these four possibilities the NULL indicates depends on how it is used. To clarify this, imagine that I am looking at a carton of Easter eggs and I want to know their colors. If I see an empty hole, I have a missing egg, which I hope will be provided later. If I see a foil-wrapped egg, I have an unknown color value in my set. If I see a multicolored egg, I have a miscellaneous value in my set. If I see a cue ball, I have an inapplicable value in my set. The way you handle each situation is a little different. The best solution is to design encodings with values that cover as many situations as you can have. Much like clothing catalogs we can have “multicolored,” “camouflage,” (better: “desert camouflage,” “jungle camouflage,” etc.) “animal print” (better: “leopard,” “tiger,” “zebra,” etc.). And “bald” is a perfectly good hair color in my world.

When you use NULLs in math calculations, they propagate in the results so that the answer is another NULL. When you use them in logical expressions or comparisons, they return a logical value of UNKNOWN and give SQL its strange three-valued logic. They sort either always high or always low in the collation sequence. They group together for some operations but not for others. In short, NULLs cause a lot of irregular features in SQL, which we will discuss later. Your best bet is just to memorize the situations and the rules for NULLs when you cannot avoid them.

3.2.3 CHECK() Constraint

The check constraint tests the rows of the table against a logical expression, which SQL calls a search condition, and rejects rows whose search condition returns FALSE. However, the constraint accepts rows when the search condition returns TRUE or UNKNOWN. This is not the same rule as the WHERE clause which rejects rows that test UNKNOWN. The reason for this “benefit-of-the-doubt” feature is so that it will be easy to write constraints on NULL-able columns.

< check constraint definition > ::=
 [CONSTRAINT < constraint name >] CHECK (< search condition >)

The usual technique is to do simple range checking, such as CHECK (rating BETWEEN 1 AND 10), or to verify that a column’s value is in an enumerated set, such as CHECK (sex_code IN (0, 1, 2, 9)), with this constraint. Remember that the sex column could also be set to NULL, unless a NOT NULL constraint is also added to the column’s declaration. While it is optional, it is a really good idea to use a constraint name. Without it, most SQL implementations will create a huge, ugly, unreadable random string for the name since they need to have one in the schema tables. If you provide your own, you can drop the constraint more easily and understand the error messages when the constraint is violated.

For example, you can use a single check clause to enforce the rule that a firm does not hire anyone under 21 years of age for a job that requires a liquor-serving license by checking the birth date and their hire date. However, you cannot put the current system date into the CHECK() clause logic for obvious reasons— it is always changing.

The real power of the CHECK() clause comes from writing complex expressions that verify relationships with other rows, with other tables, or with constants. Before SQL-92, the CHECK() constraint could only reference columns in the table in which it was declared. In Full SQL-92 and beyond, the CHECK() constraint can reference any schema object. However, you will have a hard time finding it in a SQL implementation.

3.2.4 UNIQUE and PRIMARY KEY Constraints

The UNIQUE constraint says that no duplicate values are allowed in the column. It comes in two forms. The syntax is

< UNIQUE specification > ::= UNIQUE | PRIMARY KEY

There are some subtle differences between UNIQUE and PRIMARY KEY. There can be only one PRIMARY KEY per table but many UNIQUE columns. A PRIMARY KEY is automatically declared to have a NOT NULL constraint on it, but a UNIQUE column can have a NULL in a column unless you explicitly add a NOT NULL constraint. Adding the NOT NULL whenever possible is a good idea, as it makes the column into a proper relational key. I also add NOT NULL to PRIMARY KEY to document the table and to be sure it stays there when the key changes.

File system programmers understand the concept of a PRIMARY KEY, but for the wrong reasons. Their mindset is a sequential file, which can have only one key because that key is used to determine the physical order of the records within the file. There is no ordering in a table; the term PRIMARY KEY in SQL has to do with defaults in referential actions, which we will discuss later.

There is also a multiple-column form of the < UNIQUE specification >, which is usually written at the end of the column declarations. It is a list of columns in parentheses after the proper keyword; it means that the combination of those columns is UNIQUE. For example, I might declare PRIMARY KEY (city, department), so I can be sure that though I have offices in many cities and many identical departments in those offices, there is only one personnel department in Chicago.

3.2.5 REFERENCES Clause

The < references specification > is the simplest version of a referential constraint definition, which can be quite tricky. For now, let us just consider the simplest case:

< references specification > ::=
 [CONSTRAINT < constraint name >]
  REFERENCES < referenced table name>[(< reference column >)]

This relates two tables together, so it is different from the other options we have discussed so far. What this says is that the value in this column of the referencing table must appear somewhere in the referenced table’s column that is named in the constraint. Notice the terms referencing and referenced. This is not the same as the parent and child terms used in network databases. Those terms were based on pointer chains that were traversed in one direction; that is, you cannot find a path back to the parent from a child node in the network. Another difference is that the referencing and referenced tables can be the same table. Self-references can be a useful trick.

Furthermore, the referenced column must have UNIQUE constraint. For example, you can set up a rule that the Orders table will have orders only for goods that appear in the Inventory table.

If no < reference column > is given, then the PRIMARY KEY column of the referenced table is assumed to be the target. This is one of those places where the PRIMARY KEY is important, but you can always play it safe and explicitly name a column. There is no rule to prevent several columns from referencing the same target column. For example, we might have a table of flight crews that has pilot and copilot columns that both reference a table of certified pilots.

A circular reference is a relationship in which one table references a second table, which in turn references the first table. The old gag about “you cannot get a job until you have experience, and you cannot get experience until you have a job!” is the classic version of this.

3.2.6 Referential Actions

The REFERENCES clause can have two subclauses that take actions when a database event changes the referenced table. This feature came with and took awhile to be implemented in most SQL products. The two database events are updates and deletes and the subclauses look like this:

< referential triggered action > ::=
  < update rule > [< delete rule >] | < delete rule > [< update rule >]
< update rule > ::= ON UPDATE < referential action >
< delete rule > ::= ON DELETE < referential action >
< referential action > ::= CASCADE | SET NULL | SET DEFAULT | NO ACTION

When the referenced table is changed, one of the referential actions is set in motion by the SQL engine.

(1) The CASCADE option will change the values in the referencing table to the new value in the referenced table. This is a very common method of DDL programming that allows you to set up a single table as the trusted source for an identifier. This way the system can propagate changes automatically.
This removes one of the arguments for nonrelational system-generated surrogate keys. In early SQL products that were based on a file system for their physical implementation, the values were repeated both in the referenced and referencing tables. Why? The tables were regarded as separate units, like files.
Later, SQL products regarded the schema as a whole. The referenced values appeared once in the referenced table, and the referencing tables obtained them by following pointer chains to that one occurrence in the schema. The results are much faster update cascades, a physically smaller database, faster joins, and faster aggregations.

(2) The SET NULL option will change the values in the referencing table to a NULL. Obviously, the referencing column needs to be NULL-able.

(3) The SET DEFAULT option will change the values in the referencing table to the default value of that column, NULL. Obviously, the referencing column needs to have some DEFAULT declared for it, but each referencing column can have its own default in its own table.

(4) The NO ACTION option explains itself. Nothing is changed in the referencing table and it is possible that some error message about reference violation will be raised. If a referential constraint does not specify any ON UPDATE or ON DELETE rule, update rule, then NO ACTION is implicit.

You will also see the reserved word RESTRICT in some products. This option can act as a “place holder” during the design phase while you are waiting for final specifications. Full ANSI/ISO Standard SQL has more options about how matching is done between the referenced and referencing tables. Full SQL also has deferrable constraints. This lets the programmer turn a constraint off during a session so that the table can be put into a state that would otherwise be illegal. However, at the end of a session, all the constraints are enforced. Many SQL products have implemented these options, and they can be quite handy, but I will not mention them anymore.

All the constraints can be defined as equivalent to some CHECK constraint, for example:

PRIMARY KEY = CHECK (UNIQUE (SELECT < key columns > FROM < table >)
     AND (< key columns >) IS NOT NULL)
 UNIQUE = CHECK (UNIQUE (SELECT < key columns > FROM < table >))
 NOT NULL = CHECK (< column > IS NOT NULL)

These predicates can be reworded in terms of other predicates and subquery expressions and then passed on to the optimizer.

3.2.6.1 Nested UNIQUE Constraints

One of the basic tricks in SQL is representing a one-to-one or many-to-many relationship with a table that references the two (or more) entity tables involved by their primary keys. This third table has several popular names such as “junction table,” “Associative Entity,” or “join table,” but we know that it is a relationship. The terms “junction table” is a pointer structure from Network databases, not part of RDBMS. For example given two tables,

CREATE TABLE Boys
(boy_name VARCHAR(30) NOT NULL PRIMARY KEY
 . . .);
CREATE TABLE Girls
(girl_name VARCHAR(30) NOT NULL PRIMARY KEY,
 . . .);

Yes, I know using names for a key is a bad practice, but it will make my examples easier to read. There are a lot of different relationships that we can make between these two tables. If you don’t believe me, just watch the Jerry Springer Show sometime. The simplest relationship table looks like this:

CREATE TABLE Couples
(boy_name INTEGER NOT NULL
    REFERENCES Boys (boy_name)
    ON UPDATE CASCADE
    ON DELETE CASCADE,
 girl_name INTEGER NOT NULL,
     REFERENCES Girls(girl_name)
     ON UPDATE CASCADE
     ON DELETE CASCADE);

The Couples table allows us to insert rows like this:

INSERT INTO Couples
VALUES
('Joe Celko', 'Miley Cyrus'),
('Joe Celko', 'Lady GaGa'),
('Alec Baldwin', 'Lady GaGa'),
('Joe Celko', 'Miley Cyrus'),

Opps! I am shown twice with “Miley Cyrus” because the Couples table does not have its own compound key. This is an easy mistake to make, but fixing it is not an obvious thing.

CREATE TABLE Orgy
(boy_name INTEGER NOT NULL
    REFERENCES Boys (boy_name)
    ON DELETE CASCADE
    ON UPDATE CASCADE,
 girl_name INTEGER NOT NULL,
    REFERENCES Girls(girl_name)
    ON UPDATE CASCADE
    ON DELETE CASCADE,
 PRIMARY KEY (boy_name, girl_name)); -- compound key

The Orgy table gets rid of the duplicated rows and makes this a proper table. The primary key for the table is made up of two or more columns and is called a compound key because of that fact. These are valid rows now.

('Joe Celko', 'Miley Cyrus')
('Joe Celko', 'Lady GaGa')
('Alec Baldwin', 'Lady GaGa')

But the only restriction on the couples is that they appear only once. Every boy can be paired with every girl, much to the dismay of the traditional marriage advocates. I think I want to make a rule that guys can have as many gals as they want, but the gals have to stick to one guy.

The way I do this is to use a NOT NULL UNIQUE constraint on the girl_name column, which makes it a key. It is a simple key since it is only one column, but it is also a nested key because it appears as a subset of the compound PRIMARY KEY.

CREATE TABLE Playboys
(boy_name INTEGER NOT NULL
    REFERENCES Boys (boy_name)
    ON UPDATE CASCADE
    ON DELETE CASCADE,
 girl_name INTEGER NOT NULL UNIQUE, -- nested key
    REFERENCES Girls(girl_name)
    ON UPDATE CASCADE
    ON DELETE CASCADE,
 PRIMARY KEY (boy_name, girl_name)); -- compound key

The Playboys is a proper table, without duplicated rows, but it also enforces the condition that I get to play around with one or more ladies, thus.

('Joe Celko', 'Miley Cyrus')
('Joe Celko', 'Lady GaGa')

The ladies might want to go the other way and keep company with a series of men.

CREATE TABLE Playgirls
(boy_name INTEGER NOT NULL UNIQUE -- nested key
    REFERENCES Boys (boy_name)
    ON UPDATE CASCADE
    ON DELETE CASCADE,
 girl_name INTEGER NOT NULL,
     REFERENCES Girls(girl_name)
     ON UPDATE CASCADE
     ON DELETE CASCADE,
 PRIMARY KEY (boy_name, girl_name)); -- compound key

The Playgirls table would permit these rows from our original set.

('Joe Celko', 'Lady GaGa')
('Alec Baldwin', 'Lady GaGa')

Think about all of these possible keys for a minute. The compound PRIMARY KEY is now redundant. If each boy appears only once in the table or each girl appears only once in the table, then each (boy_name, girl_name) pair can appear only once. However, the redundancy can be useful in searching the table because the SQL engine can use it to optimize queries.

The traditional marriage advocates can model their idea of stable couples. With this code

CREATE TABLE Marriages
(boy_name INTEGER NOT NULL UNIQUE -- nested key
    REFERENCES Boys (boy_name)
    ON UPDATE CASCADE
    ON DELETE CASCADE,
 girl_name INTEGER NOT NULL UNIQUE -- nested key,
     REFERENCES Girls(girl_name)
     ON UPDATE CASCADE
     ON DELETE CASCADE,
 PRIMARY KEY(boy_name, girl_name)); -- redundant compound key

The Couples table allows us to insert these rows from the original set.

('Joe Celko', 'Miley Cyrus')
('Alec Baldwin', 'Lady GaGa')

Making special provisions for the primary key in the SQL engine is not a bad assumption because the REFERENCES clause uses the PRIMARY KEY of the referenced table as the default. Many new SQL programmers are not aware that a FOREIGN KEY constraint can also reference any UNIQUE constraint in the same table or in another table. The following nightmare will give you an idea of the possibilities. The multiple column versions follow the same syntax.

CREATE TABLE Foo
(foo_key INTEGER NOT NULL PRIMARY KEY,
   . . .
 self_ref INTEGER NOT NULL
   REFERENCES Foo(fookey),
 outside_ref_1 INTEGER NOT NULL
    REFERENCES Bar(bar_key),
 outside_ref_2 INTEGER NOT NULL
      REFERENCES Bar(other_key),
 . . .);
CREATE TABLE Bar
(bar_key INTEGER NOT NULL PRIMARY KEY,
 other_key INTEGER NOT NULL UNIQUE,
 . . .);

3.2.6.2 Overlapping Keys

But getting back to the nested keys, just how far can we go with them? My favorite example is a teacher’s schedule kept in a table like this (I am leaving off reference clauses and CHECK() constraints):

CREATE TABLE Schedule –- skeleton table. WRONG!
(teacher_name VARCHAR(15) NOT NULL,
 class_title CHAR(15) NOT NULL,
 room_nbr INTEGER NOT NULL,
 period_nbr INTEGER NOT NULL,
 PRIMARY KEY (teacher_name, class_title, room_nbr, period_nbr));

That choice of a primary key is the most obvious one—use all the columns. Typical rows would look like this:

 ('Mr. Celko', 'Database 101', 222, 6)

The rules we want to enforce are

(1) A teacher is in only one room each period.

(2) A teacher teaches only one class each period.

(3) A room has only one class each period.

(4) A room has only one teacher in each period.

Stop reading and see what you come up with for an answer. Okay, now consider using one constraint for each rule in the list, thus.

CREATE TABLE Schedule_1 -- version one, WRONG!
(teacher_name VARCHAR(15) NOT NULL,
 class_title CHAR(15) NOT NULL,
 room_nbr INTEGER NOT NULL,
 period_nbr INTEGER NOT NULL,
 UNIQUE (teacher_name, room_nbr, period_nbr), -- rule #1
 UNIQUE (teacher_name, class_title, period_nbr), -- rule #2
 UNIQUE (class_title, room_nbr, period_nbr), -- rule #3
 UNIQUE (teacher_name, room_nbr, period_nbr), -- rule #4
 PRIMARY KEY (teacher_name, class_title, room_nbr, period_nbr));

We know that there are four ways to pick three things from a set of four things; it is called a combination.

I could drop the PRIMARY KEY as redundant if I have all four of these constraints in place. But what happens if I drop the PRIMARY KEY and then one of the constraints?

CREATE TABLE Schedule_2 -- still wrong
(teacher_name VARCHAR(15) NOT NULL,
 class_title CHAR(15) NOT NULL,
 room_nbr INTEGER NOT NULL,
 period_nbr INTEGER NOT NULL,
 UNIQUE (teacher_name, room_nbr, period_nbr), -- rule #1
 UNIQUE (teacher_name, class_title, period_nbr), -- rule #2
 UNIQUE (class_title, room_nbr, period_nbr)); -- rule #3

I can now insert these rows in the second version of the table:

 ('Mr. Celko', 'Database 101', 222, 6)
 ('Mr. Celko', 'Database 102', 223, 6)

This gives me a very tough sixth period teaching load since I have to be in two different rooms at the same time. Things can get even worse when another teacher is added to the schedule:

 ('Mr. Celko', 'Database 101', 222, 6)
 ('Mr. Celko', 'Database 102', 223, 6)
 ('Ms. Shields', 'Database 101', 223, 6)

Ms. Shields and I are both in room 223, trying to teach different classes at the same time. Matthew Burr looked at the constraints and the rules came up with this analysis.

CREATE TABLE Schedule_3 -- corrected version
(teacher_name VARCHAR(15) NOT NULL,
 class_title CHAR(15) NOT NULL,
 room_nbr INTEGER NOT NULL,
 period_nbr INTEGER NOT NULL,
 UNIQUE (teacher_name, period_nbr), -- rules #1 and #2
 UNIQUE (room_nbr, period_nbr)); -- rules #3 and #4

If a teacher is in only one room each period, then given a period and a teacher I should be able to determine only one room, i.e., room is functionally dependent upon the combination of teacher and period. Likewise, if a teacher teaches only one class each period, then class is functionally dependent upon the combination of teacher and period. The same thinking holds for the last two rules: class is functionally dependent upon the combination of room and period, and teacher is functionally dependent upon the combination of room and period.

With the constraints that were provided in the first version, you will find that the rules are not enforced. For example, I could enter the following rows:

 ('Mr. Celko', 'Database 101', 222, 6)

 ('Mr. Celko', 'Database 102', 223, 6)

These rows violate rule #1 and rule #2.

However, the UNIQUE constraints first provided in Schedule_2 do not capture this violation and will allow the rows to be entered.

The constraint

  UNIQUE (teacher_name, room_nbr, period_nbr)

is checking the complete combination of teacher, room, and period, and since (‘Mr. Celko’, 222, 6) is different from (‘Mr. Celko’, 223, 6), the DDL does not find any problem with both rows being entered, even though that means that Mr. Celko is in more than one room during the same period.

  UNIQUE (teacher_name, class_title, period_nbr)

doesn’t catch its associated rule either since (‘Mr. Celko’, ‘Database 101’, 6) is different from (‘Mr. Celko’, ‘Database 102’, 6), and so, Mr. Celko is able to teach more than one class during the same period, thus violating rule #2. It seems that we’d also be able to add the following row:

('Ms. Shields', 'Database 103', 222, 6)

which violates rules #3 and #4.

Try to imagine enforcing this with procedural code. This is why I say that most of the work in SQL is done in the DDL.

3.3 Computed Columns

A computed column calculates a value from other columns in the same row. Each vendor has different syntax, but the two options are virtual and persisted. The virtual option is like a local VIEW that puts the expression’s value in a new column when the row is invoked. The persistent option actually creates and stores the value in the table. Here is the syntax:

< column name > < data type > [GENERATED ALWAYS] AS (< expression >) [VIRTUAL | PERSISTENT]

The column name obeys the same rules as a regular column. It is always best to specify the storage option. As an example, here are the classic order details with a line extension computation.

CREATE TABLE Order_Details
(order_nbr CHAR(15) NOT NULL
  REFERENCES Orders(order_nbr),
 gtin CHAR(15) NOT NULL,
 PRIMARY KEY (order_nbr, gtin),
  order_qty INTEGER NOT NULL,
  unit_price DECIMAL(10,2) NOT NULL,
  line_tot DECIMAL(15,2) GENERATED ALWAYS AS (order_qty * unit_price) VIRTUAL
);

3.4 [NOT] DEFERRABLE Constraints

The one firm rule about constraints is that they are all valid at the end of a session. This lets you control what happens during the session. First, you decide if the constraint can be deferred during a session at all. When you start the session, you can decide the initial state of the constraint (initially deferred, initially immediate).

CONSTRAINT < constraint name >
 [INITIALLY DEFERRED | INITIALLY IMMEDIATE]
 [[NOT] DEFERRABLE]
 [NOT] ENFORCED

Some of these options are obviously contradictory and are disallowed.

u03-01-9780128007617

The default is INITIALLY IMMEDIATE NOT DEFERRABLE ENFORCED. These options are invoked by the SET CONSTRAINTS statement.

SET CONSTRAINTS < constraint name list > {DEFERRED | IMMEDIATE}

< constraint name list > ::= ALL | < constraint name > [{< comma > < constraint name >}. . .]

The constraints named in the < constraint name list > have to be deferrable. Here is an example of how to use this.

Consider this self-reference trick to prevent gaps in a timeline of events. We have made sure that the starting and ending dates of an event are in the right order and that the previous event is contiguous to this event.

CREATE TABLE Events
(event_id CHAR(10) NOT NULL,
 previous_event_end_date DATE NOT NULL
  REFERENCES Events (event_end_date),
 event_start_date DATE NOT NULL,
 event_end_date DATE UNIQUE, -- null means event in progress
 PRIMARY KEY (event_id, event_start_date),
 CONSTRAINT Event_Order_Valid
 CHECK (event_start_date <= event_end_date),
 CONSTRAINT Chained_Dates
 CHECK (previous_event_end_date
   + INTERVAL '1' DAY = event_start_date)
 INITIALLY IMMEDIATE DEFERRABLE
);

First we disable the Chained_Dates constraint to be able to start a chain of dates.

SET CONSTRAINT Chained_Dates DEFERRED;

While the constraint is off, insert a starter row

INSERT INTO Events(event_id, previous_event_end_date, event_start_date, event_end_date)

VALUES ('Foo Fest', '2016-01-01', '2016-01-02', '2016-01-05'),

Now re-enable the constraint in the table

SET CONSTRAINT Chained_Dates IMMEDIATE;
-- this works
INSERT INTO Events(event_id, previous_event_end_date, event_start_date, event_end_date)
VALUES ('Glob Week', '2016-01-05', '2016-01-06', '2016-01-10'),
-- this fails
INSERT INTO Events(event_id, previous_event_end_date, event_start_date, event_end_date)
VALUES ('Snoob day', '2016-01-09', '2016-01-11', '2016-01-15'),

3.5 CREATE DOMAIN and CREATE SEQUENCE

There are two constructs that are used in table which live at the schema level. They are kinds of shorthand that could have been with a hugely expensive stored procedure. They make programming much easier so they are worth learning.

3.5.1 CREATE DOMAIN

A domain schema object is shorthand for a column declaration which can be used in several table declarations. For whatever reason, this never became popular and it is not widely implemented.

CREATE DOMAIN < domain name > [AS] < predefined type >
[< default clause >]
[< domain constraint>. . .]
[< collate clause >]

The term < predefined type > is one of the built-in types in SQL; you cannot nest domains. If < constraint characteristics > is specified, then neither ENFORCED nor NOT ENFORCED is specified and the < constraint characteristics > is not specified, then INITIALLY IMMEDIATE NOT DEFERRABLE is implicit.

3.5.2 CREATE SEQUENCE

A sequence schema object is a generator for an ordered list of whole numbers. It is at the schema level and you can have several of them. But there are some subtle things to watch out for.

A series is not a sequence. A series is a mathematical construct that is based on adding the terms. Series can converge to a limit at infinite. We do not need to do that very often in RDBMS.

A Sequence is not an IDENTITY (physical insertion to the disk count), auto-increment (counter attached to a table), a row_id (physical location on a disk). It is a logical concept, not related to physical storage. In the case of inserting a set of rows, these proprietary features will use a permutation on the rows, queue, and number them in that order. It is nondeterministic.

Whole numbers are not identifiers; we use them for math. What you want for an identifier is a tag number, a string, usually of fixed length, made up of digits. You keep them in sort order so you can spot gaps, easily sort, and generate them. Think of a serial number on a manufactured product. The most common example is a vehicle identification number (VIN). This is a unique code which is used by the automotive industry to identify individual motor vehicles, as defined in ISO 3833. The positions 12-17 are a sequence prefixed by codes for the Manufacturer Identifier, some vehicle attributes, a check digit, model year, and plant code.

The check digit is a major reason we like tag numbers. This is a topic in itself, but the idea is that we can take each digit in the tag number and run it through a formula as an integer. We get a result, usually another digit, and we attach it to the tag number. For example, the Luhn algorithm is a common method defined by ISO/IEC 7812-1. We like it because it is simple to put in hardware.

1. Compute the sum of the digits.

2. Take the units digit from this total.

3. Subtract it from 10.

It is not a great check digit, but it catches most of the common input errors—missing digits, extra digits, wrong digits, and pairwise transposes.

The CREATE SEQUENCE Statement is easier to explain with a physical model. Imagine you are in the butcher store. You walk in and pull a service ticket number from a roll of tickets on the counter. Sequence numbers are generated outside the scope of the current transaction, just like the tickets. The numbers are consumed whether the transaction using the ticket number is actually served (committed) or they walk out (rolled back). Here is the BNF for CREATE SEQUENCE.

CREATE SEQUENCE < sequence_name >
 [AS < built_in_integer_type >
 [START WITH < constant >]
 [INCREMENT BY < constant >]
 [{MINVALUE [< constant >]} | {NO MINVALUE}]
 [{MAXVALUE [< constant >]} | {NO MAXVALUE}]
 [CYCLE | {NO CYCLE}]

Let’s go through the BNF in detail. The < sequence_name > and its qualifiers explain itself. This is pure SQL. Sequence numbers are generated outside the scope of the current transaction. They are consumed whether the transaction using the sequence number is committed or rolled back. Think about the butcher shop ticket that gets dropped on the floor.

A sequence can be defined as any integer type. That means TINYINT, SMALLINT, INTEGER, and BIGINT. But it also allows DECIMAL (s, 0) and NUMERIC(s, 0) data types. You can also use a user-defined data type that is based on one of the allowed types. Do not do that; it destroys portability.

If no data type is provided, BIGINT is the default. Do not use this unless you really need more numbers than atoms in the Universe. As with most defaults, this is the largest, safest value in the problem space.

START WITH < constant >

This is the first value returned by the sequence. The START value must be a value between the minimum and maximum values. The default start value for a new sequence is the minimum value for an ascending sequence and the maximum value for a descending sequence. Note this is a constant, no function calls.

INCREMENT BY < constant >

Value used to increment (or decrement if negative) the value of the sequence for each call to the NEXT VALUE FOR. If the increment is a negative value, the sequence is descending; otherwise, it is ascending. The increment cannot be 0, obviously. The default increment for a new sequence is 1. Again, this is a constant and not a function call.

[MINVALUE < constant > | NO MINVALUE]

Specifies the bounds for the sequence. The default minimum value for a new sequence is the minimum value of the data type of the sequence. This is zero for the TINYINT data type and a negative number for all other data types. NO MINVALUE is probably a really bad design choice. You probably wanted zero or one.

[MAXVALUE < constant > | NO MAXVALUE

Specifies the bounds for the sequence. The default maximum value for a new sequence is the maximum value of the data type of the sequence.

NO MAXVALUE is probably another bad design. If you are going to use this to create a tag number, you have to worry about overflowing the length of your string.

If you are casting the numeric data type to strings for tag numbers, you will want to be sure that the numbers do not overflow in the conversion. And you probably do not need the upper limit of a data type.

[CYCLE | NO CYCLE]

Property that specifies whether the sequence should restart from the minimum value (or maximum for descending sequences) or throw an exception when its minimum or maximum value is exceeded. The default cycle option for new sequences is NO CYCLE. Note that cycling restarts from the minimum or maximum value, not from the start value.

3.5.2.1 Using the SEQUENCE

How do you use this feature? You simply call the next value with “NEXT VALUE FOR < sequence name>” wherever a numeric value of the appropriate type would go. This returns one value.

Remember earlier in this article when I mentioned that how IDENTITY and other such features were not deterministic? SEQUENCE has a way to fix this problem; you can queue the set using the sequence with an ORDER BY clause. If another process is accessing the sequence object at the same time, the numbers returned could have gaps.

An OVER clause applied to the NEXT VALUE FOR function does not support the PARTITION BY or the [ROW | RANGE] subclauses for obvious reasons. The following additional rules apply when using the NEXT VALUE FOR function with the OVER clause.

Multiple calls to the NEXT VALUE FOR function for the same sequence generator in a single statement must all use the same OVER clause definition. Again, this is obvious.

Multiple calls to the NEXT VALUE FOR function that references different sequence generators in a single statement can have different OVER clause definitions. They are separate schema objects.

If all calls to the NEXT VALUE FOR function in a SELECT statement specify the OVER clause, an ORDER BY clause may be used in the SELECT statement. Again, the SEQUENCE is a separate schema object.

The OVER clause is allowed with the NEXT VALUE FOR function when used in a SELECT statement or INSERT … SELECT … statement. The NEXT VALUE FOR function is not allowed in UPDATE or MERGE statements.

To get a feel for how this works, create a simple sequence and play with it.

CREATE SEQUENCE Invoice_Seq
 AS INTEGER
 START WITH 1
 INCREMENT BY 1
 MINVALUE 1
 MAXVALUE 99
 NO CYCLE;

Now, just play with it. Just hit “SELECT NEXT VALUE FOR Invoice:Seq;” a few times. When you hit 99 and invoke the next value, you will get an error message. When you hit the limit, you can reset things.

ALTER SEQUENCE < sequence name > RESTART [WITH < constant >];

The WITH option lets you assign the new starting value. If you do not give a specific value, the default is the original starting value. In this example, we can use

 ALTER SEQUENCE Invoice_Seq RESTART WITH 3;

 SELECT NEXT VALUE FOR Invoice_Seq;

This will return 3. Most of the situations where you want to restart a sequence can be done with the CYCLE clause.

The use of the ORDER BY clause can let you build groupings. Create sequence groups of 10:

CREATE SEQUENCE Ten_Seq
 AS INTEGER
 START WITH 1
 INCREMENT BY 1
 MINVALUE 1
 MAXVALUE 10
 CYCLE;

A fun example is using the same sequence in more than one table. Let’s go back to the original service ticket example. When you come into the shop, you pull a ticket.

CREATE SEQUENCE Service_Ticket_Seq
 AS INTEGER
 START WITH 1
 INCREMENT BY 1
 MINVALUE 1
 MAXVALUE 100
 CYCLE;

We have two departments in this delicatessen, meats and fish. If you have been to Kosher market, you know that these two areas are kept separate. A ticket can be used in only one department.

CREATE TABLE Meats
(ticket_seq INTEGER NOT NULL PRIMARY KEY,
 meat_type VARCHAR(15) NOT NULL);
CREATE TABLE Fish
(ticket_seq INTEGER NOT NULL PRIMARY KEY,
 fish_type VARCHAR(15) NOT NULL);
CREATE PROCEDURE Ticket_Service
(IN market_code CHAR(1),
 IN product_name VARCHAR(15))
BEGIN
DECLARE local_ticket_seq INTEGER;
SET local_ticket_seq = NEXT VALUE FOR Service_Ticket_Seq;
IF market_code = 'M'
THEN INSERT INTO Meats
    VALUES (local_ticket_seq, product_name);
ELSE IF market_code = 'F'
   THEN INSERT INTO Fish
   VALUES (local_ticket_seq, product_name);
   END IF;
END IF;
END;

Now, let’s call the procedure a few times:

EXEC Ticket_Service 'M', 'Brisket';
EXEC Ticket_Service 'F', 'Lox';
EXEC Ticket_Service 'M', 'Chicken';
EXEC Ticket_Service 'M', 'Brisket';
EXEC Ticket_Service 'F', 'Sturgeon';
EXEC Ticket_Service 'F', 'Haddock';

And now let’s see how this works.

SELECT * FROM Meats;

1Brisket
3Chicken
4Brisket

SELECT * FROM Fish;

2Lox
5Sturgeon
6Haddock

If I UNION ALL the two tables, I get a complete sequence and I am sure their intersection is empty. Doing this without a SEQUENCE is a lot harder. But I did resort to if-then-else flow control because I have two different tables. UGH!

I can put a SEQUENCE in the DEFAULT clause of the DDL for table:

CREATE TABLE Service_Tickets
(ticket_nbr INTEGER DEFAULT NEXT VALUE FOR Service_Ticket_Seq,
 department_code CHAR(1) NOT NULL
 CHECK (department_code IN ('M', 'F')));

Now play with this code.

INSERT INTO Service_Tickets (department_code)
VALUES ('M'),
SELECT * FROM Service_Tickets;

That is cool! But do not stop here. Let’s redo the Meats and Fish tables with this feature:

CREATE TABLE Meats
(ticket_seq INTEGER DEFAULT NEXT VALUE FOR Service_Ticket_Seq
 PRIMARY KEY,
 meat_type VARCHAR(15) NOT NULL);
CREATE TABLE Fish
(ticket_seq INTEGER DEFAULT NEXT VALUE FOR Service_Ticket_Seq
 PRIMARY KEY,
 fish_type VARCHAR(15) NOT NULL);

Now to get you started, try these statements. I dropped the Kosher theme:

INSERT INTO Meats (meat_type) VALUES ('pig'),
INSERT INTO Fish (fish_type) VALUES ('squid'),
SELECT * FROM Meats;
SELECT * FROM Fish;

There are no guarantees as to how the sequence numbers will be assigned; it is pretty much first-come first-served in the system. Did you notice that the sequences are the PRIMARY KEY?

INSERT INTO Meats VALUES (8, 'Cat'),

will give us an error when the sequence gets to that value.

3.6 Character Set Related Constructs

There are several schema level constructs for handling characters. You can create a named set of characters for various language or special purposes, define one or more collation sequences for them, and translate one set into another.

Today, the Unicode Standards and vendor features are what are really used. Most of the characters actually used have Unicode names and collations defined already. For example, SQL text is written in Latin-1, as defined by ISO 8859-1. This is the set used for HTML and it consists of 191 characters from the Latin alphabet. This is the most commonly used character set in the Americas, Western Europe, Oceania, and Africa and for standard Romanizations of East-Asian languages.

Since 1991, the Unicode Consortium has been working with ISO and IEC to develop the Unicode Standard and ISO/IEC 10646: the Universal Character Set (UCS) in tandem. Unicode and ISO/IEC 10646 currently assign about 100, 000 characters to a code space consisting of over a million code points, and they define several standard encodings that are capable of representing every available code point. The standard encodings of Unicode and the UCS use sequences of one to four 8-bit code values (UTF-8), sequences of one or two 16-bit code values (UTF-16), or one 32-bit code value (UTF-32 or UCS-4). There is also an older encoding that uses one 16-bit code value (UCS-2), capable of representing one-seventeenth of the available code points. Of these encoding forms, only UTF-8’s byte sequences are in a fixed order; the others are subject to platform-dependent byte ordering issues that may be addressed via special codes or indicated via out-of-band means.

3.6.1 CREATE CHARACTER SET

You will not find this syntax in many SQLs. The vendors will default to a system level character set based on the local language settings.

< character set definition > ::=
CREATE CHARACTER SET < character set name > [AS]
< character set source > [< collate clause >] < character set source > ::= GET < character set specification >

The < collate clause > is usually defaulted also, but you can use named collations.

3.6.2 CREATE COLLATION

< collation definition > ::=
CREATE COLLATION < collation name >
 FOR < character set specification >
 FROM < existing collation name > [< pad characteristic >]
< pad characteristic > ::= NO PAD | PAD SPACE

The < pad characteristic > option has to do with how strings will be compared to each other. If the collation for the comparison has the NO PAD characteristic and the shorter value is equal to some prefix of the longer value, then the shorter value is considered less than the longer value. If the collation for the comparison has the PAD SPACE characteristic, for the purposes of the comparison, the shorter value is effectively extended to the length of the longer by concatenation of < space>s on the right. SQL normally pads a shorter string with spaces on the end and then matches them, letter for letter, position by position.

3.6.3 CREATE TRANSLATION

This statement defines how one character set can be mapped into another character set. The important part is that it gives this mapping a name.

< transliteration definition > ::=
CREATE TRANSLATION < transliteration name >
 FOR < source character set specification >
 TO < target character set specification >
 FROM < transliteration source >
< source character set specification > ::= < character set specification >
< target character set specification > ::= < character set specification >
< transliteration source > ::= < existing transliteration name > | < transliteration routine >
< existing transliteration name > ::= < transliteration name >
< transliteration routine > ::= < specific routine designator >

Notice that I can use a simple mapping which will behave much like a bunch of nested REPLACE() function calls or use a routine that can do some computations. The reason that having a name for these transliterations is that I can use them in the TRANSLATE() function instead of that bunch of nested REPLACE() function calls. The syntax is simple:

TRANSLATE (< character value expression > USING < transliteration name >)

DB2 and other implementations generalize TRANSLATE() to allow for target and replacement strings so that you can do a lot of edit work in a single expression. We will get to that when we get to string functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.219.80