CHAPTER 5

5.1 In some ways a tuple does resemble a record and an attribute a field—but these resemblances are only approximate. A relvar shouldn’t be regarded as just a file, but rather as a “file with discipline,” as it were. The discipline in question is one that results in a considerable simplification in the structure of the data as seen by the user, and hence in a corresponding simplification in the operators needed to deal with that data, and indeed in the user interface in general. What is that discipline? Well, it’s that there’s no top to bottom ordering to the records; and no left to right ordering to the fields; and no duplicate records; and no nulls; and no repeating groups; and no pointers; and no anonymous fields (and on and on). Partly as a consequence of these facts, it really is much better to think of a relvar like this: The heading represents some predicate (or some intension), and the body at any given time represents the extension of that predicate at that time.

5.2 Loosely, the specified remark means the UPDATE operation in question “updates the STATUS attribute in tuples for suppliers in London.” But tuples (and, a fortiori, attribute values within tuples) are values and simply can’t be updated, by definition. Here’s a more precise version of the remark:

  • Let relation s be the current value of relvar S.

  • Let ls be that restriction of s for which the CITY value is London.

  • Let ls′ be that relation that’s identical to ls except that the STATUS value in each tuple is the new value as specified in the given UPDATE operation.

  • Let s′ be the relation denoted by the expression (s MINUS ls) UNION ls′.

  • Then s′ is assigned to S.

5.3 Because relational operations are fundamentally set level and SQL’s “positioned update” operations are necessarily tuple level (or row level, rather), by definition. Although set level operations for which the set in question is of cardinality one are sometimes—perhaps even frequently—acceptable, they can’t always work. In particular, tuple level update operations might work for a while and then cease to work when integrity constraint support is improved.

5.4 It’s defined in terms of EXCEPT ALL. Consider the SQL DELETE statement:

     DELETE FROM T WHERE bx ;

Let temp denote the result of the expression SELECT * FROM T WHERE bx. Note that if row r appears exactly n times in temp, it also appears exactly n times in T. Then the effect of the specified DELETE statement is to assign the result of the expression

     SELECT * FROM T EXCEPT ALL SELECT * FROM temp

to table T. (Note that EXCEPT DISTINCT would have the additional effect of eliminating duplicates from T that don’t appear in temp.)

5.5 The statements aren’t equivalent. The source for the first is the table t1 denoted by the specified table subquery; the source for the second is the table t2 containing just the row denoted by the specified row subquery (i.e., the VALUES argument). If table S does include a row for supplier S6, then t1 and t2 are identical. But if table S doesn’t include such a row, then t1 is empty while t2 contains a row of all nulls.

As for Tutorial D analogs of the two SQL statements: Well, note first of all that in order for such analogs even to exist, it’s necessary to assume that table SS doesn’t permit duplicates, because “relvars that permit duplicates” aren’t supported in Tutorial D (in fact, they’re a contradiction in terms). Under this assumption, however, a Tutorial D analog of the first statement is reasonably straightforward:

     INSERT SS ( S WHERE SNO = 'S6' ) ;

As for the second statement, the closest we can get in Tutorial D is:

     INSERT SS RELATION { TUPLE FROM ( S WHERE SNO = 'S6' ) } ;

Recall from Chapter 3 that the expression TUPLE FROM rx extracts the single tuple from the relation denoted by the relational expression rx (that relation must have cardinality one). So if relvar S does contain a (necessarily unique) tuple for supplier S6, the foregoing INSERT will behave more or less as its SQL counterpart. But if relvar S doesn’t contain such a tuple, then the INSERT will fail (more precisely, the TUPLE FROM invocation will fail), whereas the SQL analog will as already noted insert a row of all nulls. Subsidiary exercise: Which behavior do you think is more reasonable (or more useful)—Tutorial D’s or SQL’s?

5.6 The Assignment Principle states that after assignment of the value v to the variable V, the comparison V = v must evaluate to TRUE. SQL violates this principle if “v is null”; it also violates it on certain character string assignments; and it certainly also violates it for any type for which the “=” operator isn’t defined, including type XML in particular, and possibly certain user defined types as well. Negative consequences: Too many to list here.

5.7 As in the body of the chapter, I assume the availability of certain user defined types in the following definitions. For simplicity, I also choose to overlook the fact that some of the column names I’ve chosen (which?) are in fact reserved words in SQL.

     CREATE TABLE TAX_BRACKET
       ( LOW        MONEY   NOT NULL ,
         HIGH       MONEY   NOT NULL ,
         PERCENTAGE INTEGER NOT NULL ,
         UNIQUE ( LOW ) ,
         UNIQUE ( HIGH ) ,
         UNIQUE ( PERCENTAGE ) ) ;

     CREATE TABLE ROSTER
       ( DAY   DAY_OF_WEEK NOT NULL ,
         TIME  TIME_OF_DAY NOT NULL ,
         GATE  GATE        NOT NULL ,
         PILOT NAME        NOT NULL ,
         UNIQUE ( DAY , TIME , GATE ) ,
         UNIQUE ( DAY , TIME , PILOT ) ) ;

     CREATE TABLE MARRIAGE
       ( SPOUSE_A         NAME NOT NULL ,
         SPOUSE_B         NAME NOT NULL ,
         DATE_OF_MARRIAGE DATE NOT NULL ,
         UNIQUE ( SPOUSE_A , DATE_OF_MARRIAGE ) ,
         UNIQUE ( DATE_OF_MARRIAGE , SPOUSE_B ) ,
         UNIQUE ( SPOUSE_B , SPOUSE_A ) ) ;

5.8 Because keys imply constraints; constraints apply to variables, not values; and relations are values, not variables. (That said, it’s certainly possible, and sometimes useful, to think of some subset k of the heading of relation r as if it were “a key for r” if it’s unique and irreducible with respect to the tuples of r. But thinking this way is strictly incorrect, and potentially confusing, and certainly much less useful than thinking about keys for relvars as opposed to relations.)

5.9 Here’s one: Suppose relvar A has a “reducible key” consisting of the disjoint union of K and X, say, where K and X are both subsets of the heading of A and K is a genuine key. Then the functional dependency KX holds in relvar A. Suppose now that relvar B has a foreign key referencing that “reducible key” in A. Then the functional dependency KX holds in B as well. As a result, B probably displays some redundancy; in fact, it’s probably not in Boyce/Codd normal form.[200]

5.10 Keys are sets of attributes—in fact, every key is a subset of the pertinent heading—and key values are thus tuples by definition, even when the tuples in question have exactly one attribute. Thus, for example, the key for the parts relvar P is {PNO} and not just PNO, and the key value for the parts tuple for part P1 is TUPLE {PNO ‘P1’} and not just ‘P1’.

5.11 Let m be the smallest integer greater than or equal to n/2. R will have the maximum possible number of keys if either (a) every distinct set of m attributes is a key or (b) n is odd and every distinct set of m−1 attributes is a key. Either way, it follows that the maximum number of keys in R is n!/(m!*(n-m)!).[201] Relvars TAX_BRACKET and MARRIAGE—see the answer to Exercise 5.7 above—are examples of relvars with the maximum possible number of keys; so is any relvar of degree zero. (If n = 0, the formula becomes 0!/(0!*0!), and 0! is 1.)

5.12 A superkey is a subset of the heading with the uniqueness property; a key is a superkey with the irreducibility property. All keys are superkeys, but “most” superkeys aren’t keys.

The concept of a subkey can be useful in studying normalization. Here’s a definition: Let X be a subset of the heading of relvar R; then X is a subkey for R if and only if there exists some key K for R such that X is a subset of K. For example, the following are all of the subkeys for relvar SP: {SNO,PNO}, {SNO}, {PNO}, and {} (note that the empty set {} is necessarily a subkey for all possible relvars R). By way of illustration, here’s a definition of third normal form that makes use of the subkey concept: Relvar R is in third normal form, 3NF, if and only if, for every nontrivial functional dependency X ® Y to which R is subject, X is a superkey or Y is a subkey. (A nontrivial functional dependency is one for which the right side isn’t a subset of the left side.)

5.13 Sample data:

EMP

ENO

MNO

E4

E2

E3

E2

E2

E1

E1

E1

I’m using the trick here of pretending that a certain employee (namely, employee E1) acts as his or her own manager, which is one way of avoiding the use of nulls in this kind of situation. Another and probably better way is to separate the reporting structure relationships out into a relvar of their own, excluding from that relvar any employee who has no manager:

image with no caption

Subsidiary exercise: What are the predicates for relvar EM and the two versions of relvar EMP here? Thinking carefully about this question should serve to reinforce the suggestion that the second design is preferable.

5.14 Because it doesn’t need to, on account of the fact that column correspondences are established in SQL (in this context, at least, though not in all contexts) on the basis of ordinal position rather than name. See the discussion in the body of the chapter.

5.15 Note first that such a situation must represent a one to one relationship, by definition. One obvious case arises if we split some relvar “vertically,” as in the following example (suppliers):

     VAR SNT BASE RELATION
       { SNO CHAR , SNAME CHAR , STATUS INTEGER }
       KEY { SNO }
       FOREIGN KEY { SNO } REFERENCES SC ;

     VAR SC BASE RELATION
       { SNO CHAR , CITY CHAR }
       KEY { SNO }
       FOREIGN KEY { SNO } REFERENCES SNT ;

One implication is that we probably need a mechanism for updating two or more relvars at the same time, and probably a mechanism for defining two or more relvars at the same time as well. See the discussion of multiple assignment in Chapter 8.

5.16 Tutorial D definitions (in accordance with the paper “Toward an Industrial Strength Dialect of Tutorial D”—see Appendix G—I assume here that Tutorial D supports the self-explanatory referential actions CASCADE and NO CASCADE):

     VAR P BASE RELATION { PNO ... , ... } KEY { PNO } ;
     VAR PP BASE RELATION { MAJOR_PNO ... , MINOR_PNO ... , QTY ... }
         KEY { MAJOR_PNO , MINOR_PNO }
         FOREIGN KEY { MAJOR_PNO } REFERENCES P
                 RENAME { PNO AS MAJOR_PNO } ON DELETE CASCADE
         FOREIGN KEY { MINOR_PNO } REFERENCES P
                 RENAME { PNO AS MINOR_PNO } ON DELETE NO CASCADE ;

With these definitions, deleting a part p will cascade to delete parts that are components of p but not parts of which p is a component.

SQL definitions:

     CREATE TABLE P ( PNO ... , ... , UNIQUE ( PNO ) ) ;
     CREATE TABLE PP ( MAJOR_PNO ... , MINOR_PNO ... , QTY ... ,
                       UNIQUE ( MAJOR_PNO , MINOR_PNO ) ,
                       FOREIGN KEY ( MAJOR_PNO ) REFERENCES P ( PNO )
                                                 ON DELETE CASCADE ,
                       FOREIGN KEY ( MINOR_PNO ) REFERENCES P
                                                 ON DELETE RESTRICT ) ;

Regarding the specification ON DELETE RESTRICT here, see the answer to the next exercise.

Note: In this example, the two foreign keys in table PP both refer to the same key in table P. Now, in the body of the chapter, I said that in such a case “you might want to ensure ... that one of the foreign keys has the same column names as [the target key], even though the other one doesn’t (and can’t).” As you can see, however, I haven’t adopted my own suggestion in the case at hand; instead, I’ve opted for a more symmetric design, in which each of the foreign key columns has a name consisting of the corresponding target column name prefixed with a kind of role name (MAJOR_ and MINOR_, respectively).

5.17 It’s obviously not possible to give a definitive answer to this exercise. I’ll just mention the referential actions supported by the standard, which are NO ACTION (the default), CASCADE, RESTRICT, SET DEFAULT, and SET NULL. Subsidiary exercise: What do you think the difference is (if any) between NO ACTION and RESTRICT? Does it make sense? Is it useful?

5.18 Loosely, a predicate is a truth valued function, and a proposition is a predicate with an empty set of parameters. See the body of the chapter for some examples, and Chapter 10 for more examples and an extended discussion of these concepts in general.

5.19 Relvar P: Part PNO is used in the enterprise, is named PNAME, has color COLOR and weight WEIGHT, and is stored in city CITY. Relvar SP: Supplier SNO supplies part PNO in quantity QTY.

5.20 The intension of relvar R is the intended interpretation of R; the extension of relvar R at a given time is the set of tuples appearing in R at that time. In other words, the intension corresponds to the heading and the extension to the body.

5.21 No answer provided.

5.22 The Closed World Assumption says (loosely) that everything stated or implied by the database is true and everything else is false. And The Open World Assumption—yes, there is such a thing—says that everything stated or implied by the database is true and everything else is unknown. (Loosely speaking, in other words, The Closed World Assumption says tuple t appears in relvar R if and only if t satisfies the predicate for R; The Open World Assumption says tuple t appears in relvar R only if t satisfies the predicate for R.)

What are the implications of the foregoing? Well, first let’s agree to abbreviate Closed World Assumption and Open World Assumption to CWA and OWA, respectively. Now consider the query “Is supplier S6 under contract?” Of course, the system has no understanding of what it means for a “supplier” to be “under contract,” and so we have to formulate the query a little more precisely, thus: “Does there exist a tuple for supplier S6 in relvar S?” Given our usual sample data values, the answer is no, and under the CWA that no is interpreted as meaning supplier S6 isn’t under contract. Under the OWA, however, that same no is interpreted as meaning it’s unknown whether supplier S6 is under contract. Now consider the inverse query “Is it not the case that supplier S6 is under contract?”—more precisely, “Is it not the case that there exists a tuple for supplier S6 in relvar S?” The answer is yes, which is interpreted as meaning supplier S6 isn’t under contract under the CWA but as meaning, again, that it’s unknown whether supplier S6 is under contract under the OWA. In this example, therefore, yes and no apparently mean the same thing, under the OWA! The net of this discussion is that the OWA can’t properly deal with negation, and further that it’s likely to lead to a requirement for three-valued logic, and hence that it’s strongly deprecated for these very reasons. The paper “The Closed World Assumption” (see Appendix G) gives more information. See also Appendix C.

5.23 To say relvar R has an empty key is to say R can never contain more than one tuple. Why? Because every tuple has the same value for the empty set of attributes—namely, the empty tuple (see the answer to Exercise 3.16, elsewhere in this appendix); thus, if R had an empty key, and if R were to contain two or more tuples, we would have a key uniqueness violation on our hands. And, yes, constraining R never to contain more than one tuple could certainly be useful. I’ll leave finding an example of such a situation as a subsidiary exercise.

5.24 See the answer to Exercise 5.18 above.

5.25 The question certainly makes sense, insofar as every relvar has an associated predicate. However, just what the predicate is for some given relvar is in the mind of the definer of that relvar (and in the user’s mind too, I trust). For example, if I define a relvar C as follows—

     VAR C BASE RELATION { CITY CHAR } KEY { CITY } ;

—the corresponding predicate might be almost anything! It might, for example, be CITY is a city in California; or CITY is a city in which at least one supplier is located; or CITY is a city that’s the capital of some country;[202] and so on. In the same way, the predicate for a relvar of degree zero—

     VAR Z BASE RELATION { } KEY { } ;

—might also be “almost anything,” except that (since the relvar has no attributes and the corresponding predicate therefore has no parameters) the predicate in question must in fact degenerate to a proposition. That proposition will evaluate to TRUE when the value of Z is TABLE_DEE and FALSE when the value is TABLE_DUM.

By the way, observe that relvar Z has an empty key. It’s obvious that every degree zero relvar must have an empty key; however, you shouldn’t conclude that degree zero relvars are the only ones with empty keys (see the answer to Exercise 5.23 above).

5.26 Of course not. In fact, “most” relations aren’t values of some relvar. As a trivial example, the relation denoted by S{CITY}, the projection of the current value of relvar S on {CITY}, isn’t a value of any relvar in the suppliers-and-parts database. Note, therefore, that throughout this book, when I talk about some relation, I don’t necessarily mean a relation that’s the value of some relvar.

5.27 There are two cases to consider: (a) The relation depicted is a sample value for some relvar R; (b) the relation depicted is a sample value for some relational expression rx, where rx is something other than a simple relvar reference (recall that a relvar reference is basically just the pertinent relvar name). In the first case, double underlining simply indicates that a primary key PK has been declared for R and the pertinent attribute is part of PK. In the second case, you can think of rx as the defining expression for some temporary relvar R (think of it as a view defining expression and R as the corresponding view, if you like); then double underlining indicates that a primary key PK could in principle be declared for R and the pertinent attribute is part of PK.



[200] Details of Boyce/Codd normal form and other normal forms higher than 1NF are beyond the scope of this book. However, I’m sure you know something about them anyway, so I’ll feel free to mention them from time to time in the present appendix without further apology. For a detailed tutorial treatment of such topics, see the book Normal Forms and All That Jazz (referenced in Appendix G).

[201] Recall from Chapter 3 that the expression n! (which is read as either “n factorial” or “factorial n” and is often pronounced “n bang”) is defined as the product n * (n−1) * ... * 2 * 1.

[202] Or even CITY is the name of somebody’s favorite teddy bear. There’s nothing in the relvar definition to say that CITY has to denote a city.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.111.30