Chapter 13

Database Design II : Keys and Related Constraints

Like the concept of database design in general, the familiar concept of a key in the relational sense also involves a number of complications in the temporal context that don’t arise in the conventional (nontemporal) context. This state of affairs arises in part because of the redundancy and circumlocution problems, first discussed in Chapter 5, that can arise in the temporal context. This chapter proposes two new kinds of constraints, PACKED ON constraints and WHEN / THEN constraints, that can be used to help address such problems. Together, these constraints lead to a generalization of the familiar key notion, referred to in this book as “U_keys.”

Keywords

redundancy problem; circumlocution problem; contradiction problem; PACKED ON constraint; WHEN / THEN constraint; “temporal” key; current relvar; historical relvar

Two men say they’re Jesus—one of them must be wrong.

—Dire Straits:

“Industrial Disease” (1982)

In this chapter and the next, we turn our attention to the question of the constraints that might apply to temporal data. The present chapter considers key (or “key like”) constraints in particular, and Chapter 14 then addresses temporal constraints in general. Now, in Chapter 5, we saw how difficult it was, in the absence of proper interval support, even to formulate such constraints correctly; in these two chapters, we’ll see how the concepts introduced in Chapters 6-11 can help to alleviate the problem, somewhat. As we’ll also see, however, the solutions aren’t always as straightforward as they might be. This topic is surprisingly tricky!

Simplifying the Running Example

We assume for the purposes of this chapter that the database has been designed in accordance with the recommendations of the section “Both Since and During Relvars” in Chapter 12, and our database therefore consists of the following seven relvars:

S_SINCE  SP_SINCE
S_DURING  SP_DURING
S_NAME_DURING
S_STATUS_DURING
S_CITY_DURING

(We leave discussion of designs involving since relvars only or during relvars only to the next chapter.) With respect to the foregoing “combination” design, however, it should be clear that:

■ The attribute pairs {SNAME,SNAME_SINCE}, {STATUS,STATUS_SINCE}, and {CITY,CITY_SINCE} within the since relvar S_SINCE will all exhibit similar behavior.

■ Likewise, the during relvars S_NAME_DURING, S_STATUS_DURING, and S_CITY_DURING will also all exhibit similar behavior.

Without loss of generality, therefore, (a) we can ignore the attribute pairs {SNAME,SNAME_SINCE} and {CITY,CITY_SINCE} in relvar S_SINCE, and (b) we can ignore relvars S_NAME_DURING and S_CITY_DURING in their entirety. For the purposes of this chapter, therefore, and indeed—please note—for the remainder of this book (barring explicit statements to the contrary, of course), we can simplify the database still further. In other words, we can take the database to be as defined, in outline, in Fig. 13.1 below (“in outline” meaning here among other things that all constraints other than conventional KEY and FOREIGN KEY constraints have been omitted, because of course we’re not yet in a position to say what those additional constraints should look like).

image
Fig. 13.1 Simplified database design (outline)

Here for purposes of subsequent reference are informal statements of the predicates for this simplified design. First the since relvars:

■ S_SINCE: Supplier SNO has been under contract since SNO_SINCE and has had status STATUS since STATUS_SINCE.

■ SP_SINCE: Supplier SNO has been able to supply part PNO since SINCE.

Now the during relvars:

■ S_DURING: DURING denotes a maximal interval throughout which supplier SNO was under contract.

■ S_STATUS_DURING: DURING denotes a maximal interval throughout which supplier SNO had status STATUS.

■ SP_DURING: DURING denotes a maximal interval throughout which supplier SNO was able to supply part PNO.

Figs. 13.2 and 13.3—which repay very careful study—show some sample values for these relvars.

image
Fig. 13.2 Relvars S_SINCE and SP_SINCE–sample values
image
Fig. 13.3 Relvars S_DURING, S_STATUS_DURING, and SP_DURING–sample values

Note: The specific values shown in these figures deliberately don’t correspond exactly to our usual sample values as shown in Fig. 6.1 or Fig. 9.1, though they’re close. Note too that little in the present chapter depends on those specific values. As already suggested, however, a careful study of the figures should help you understand the semantics of the database in general.

Keys and Foreign Keys

The keys that apply to the database of Fig. 13.1 are as indicated in that figure, and in the case of the since relvars, at least, there’s not much more to say about them. By contrast, there certainly is more to say in the case of the during relvars, and those discussions form the bulk of the rest of this chapter. As for foreign keys, in fact there’s only one such in our simplified database, and that’s {SNO} in the current relvar SP_SINCE, which is a foreign key referencing the sole key {SNO} in the current relvar S_SINCE:

FOREIGN KEY { SNO } REFERENCES S_SINCE

This specification, which appears as part of the definition of relvar SP_SINCE, reflects the fact that if a supplier is currently able to supply some part, then that supplier must be currently under contract. (Of course, we’re appealing here to the informal notion that since relvars contain “current data” only.) As pointed out in Chapter 5, however, in connection with what was there called Constraint XST1, this foreign key constraint is insufficient by itself; what we really need is something more general, in order to enforce the constraint that whenever a supplier is, was, or will be able to supply some part, then that supplier is, was, or will be under contract at that same time. This more general constraint isn’ t a foreign key constraint as such, however, and we therefore defer further discussion of it for now. We’ll come back to it in the next chapter.

We now proceed to examine, in the next several sections, three general problems that might arise in connection with temporal databases like that of Fig. 13.1, in the absence of suitable constraints: namely, the redundancy problem, the circumlocution problem, and the contradiction problem. (We mentioned these terms in passing in Chapter 5, as you might recall.)

Note: Relvars S_SINCE and SP_SINCE aren’t discussed any further in this chapter (they’re shown again in Fig. 13.4, but that’s all). We’ll come back to them in Chapter 14.

image
Fig. 13.4 Revised version of Fig. 13.1

The Redundancy Problem

Consider relvar S_STATUS_DURING. Because {SNO,DURING} is a key, a Tutorial D definition for that relvar might look as follows:

VAR S_STATUS_DURING BASE RELATION
 { SNO SNO , STATUS INTEGER , DURING INTERVAL_DATE }
  KEY { SNO , DURING } ; /* Warning: Inadequate! */

As the comment indicates, however, the KEY constraint here, though it is logically correct, is also inadequate, in a sense. To be specific, it fails to prevent the relvar from containing (for example) both of the following tuples at the same time:

image

As you can see, these two tuples display a certain redundancy, inasmuch as the status for supplier S4 on day 6 is effectively stated twice. Clearly, it would be better if we were to replace the two tuples shown by the following single tuple:

image

Observe now that if the two original tuples were the only tuples in some relation and we were to pack that relation on DURING, we would wind up with a relation containing only the tuple just shown. Loosely speaking, therefore, we might say the tuple just shown is a “packed” tuple, obtained by packing the two original tuples on DURING (“loosely speaking” because, of course, the operation of packing really applies to relations, not tuples). So what we want to do is replace those two original tuples by that “packed” tuple. Indeed, as pointed out in Chapter 5, not performing that replacement—i.e., permitting both original tuples to appear—would be as bad as permitting duplicate tuples to appear, in a sense. What’s more, if both original tuples did appear, the relvar would be in violation of its own predicate! Consider the tuple containing the interval [d06:d07]. That tuple says among other things that supplier S4 did not have status 25 on the day immediately before day 6. But then the other tuple says among other things that supplier S4 did have status 25 on day 5, and of course day 5 is the day immediately before day 6.1

The Circumlocution Problem

The KEY constraint shown in the previous section for S_STATUS_DURING is inadequate in another way also. To be specific, it fails to prevent the relvar from containing (for example) both of the following tuples at the same time:

image

Here there’s no redundancy as such, but there’s a certain circumlocution, inasmuch as we’re taking two tuples to say what could be better said with just a single “packed” tuple (the same one as before, in fact):

image

In fact, not replacing the two original tuples by that single “packed” tuple would mean, again, that the relvar would be in violation of its own predicate, as you can easily confirm.

PACKED ON

Now, it should be clear that in order to avoid redundancies and circumlocutions like the ones we’ve been discussing, what we need to do is enforce a constraint—let’s call it Constraint A—along the following lines:

■ Constraint A: If at any given time relvar S_STATUS_DURING contains two tuples that are identical except for their DURING values i1 and i2, then i1 MERGES i2 must be false.

Recall that MERGES is basically the logical OR of OVERLAPS and MEETS. Replacing MERGES by OVERLAPS in Constraint A gives the constraint we need to enforce in order to avoid the redundancy problem; replacing it by MEETS gives the constraint we need to enforce in order to avoid the circumlocution problem.

It should be clear too that there’s a very simple way to enforce Constraint A: namely, by keeping the relvar packed at all times on DURING. We therefore propose a new PACKED ON specification, with syntax PACKED ON (ACL)—ACL standing as usual for a commalist of attribute names2—that can appear in a relvar definition, as here:

VAR S_STATUS_DURING BASE RELATION
 { SNO SNO , STATUS INTEGER , DURING INTERVAL_DATE }
  PACKED ON ( DURING ) /* Warning: Still */
  KEY { SNO , DURING } ; /* inadequate! */

The PACKED ON (DURING) specification here implies that any attempt to update the relvar in such a way that the result is not packed on DURING must be rejected. That is, the relvar must at all times be kept packed on DURING, and hence must at all times be identical to the result of the expression PACK S_STATUS_DURING ON (DURING)—implying among other things that this latter expression can always be replaced by the much simpler one S_STATUS_DURING, an observation that could be of interest to the optimizer. This special syntax thus suffices to solve the redundancy and circumlocution problems; in other words, it solves the problems exemplified by the constraint we referred to in Chapter 5 as Constraint XFT1.3

Note: An argument might possibly be made for providing two separate syntactic extensions, one for solving the redundancy problem and another for solving the circumlocution problem.4 Indeed, we’ll give an example later in this chapter of a relvar, TERM, that might be cited in support of such a position. However, we’ve yet to see a truly convincing argument for such a position; for now, therefore, our preference is to kill two birds with one stone and solve both problems at once.

The Contradiction Problem

We continue to focus on relvar S_STATUS_DURING specifically. Unfortunately, the PACKED ON and KEY constraints together are still not quite adequate, because they fail to prevent the relvar from containing (for example) both of the following tuples at the same time:

image

Here supplier S4 is shown as having a status of both 10 and 25 on days 5 and 6—clearly an impossible state of affairs. In other words, we have a contradiction on our hands; in fact, the relvar is in violation of its own predicate once again, because of course (as you’ll recall) any given supplier is supposed to have exactly one status on any given day.

Note: To say any given supplier is supposed to have exactly one status on any given day is to say, more formally, that if we were to unpack S_STATUS_DURING on DURING, thereby producing a result in which every DURING value consists of a unit interval, then the functional dependency {SNO,DURING} → {STATUS} would hold in that result. (Or, rather, it’s to say the specified functional dependency would still hold in that result—it already holds in S_STATUS_DURING as such, of course, thanks to the specification KEY {SNO,DURING} for that relvar.) We’ll have more to say regarding such matters in the next section, also in the section “Neither PACKED ON nor WHEN / THEN” later in the chapter.

WHEN / THEN

It should be clear that, in order to avoid contradictions like the ones discussed in the previous section, what we need to do is enforce a constraint—let’s call it Constraint B—along the following lines:

■ Constraint B: If at any given time relvar S_STATUS_DURING contains two tuples that have the same SNO value and different STATUS values, then their DURING values i1 and i2 must be such that i1 OVERLAPS i2 is false.

Note carefully that, as we’ve already seen, the fact that the relvar is kept packed on DURING isn’t sufficient to enforce Constraint B. Nor, of course, is the fact that {SNO,DURING} is a key. But suppose the relvar were kept unpacked on DURING (ignore for the moment the fact that this supposition is an impossibility, given that, as just mentioned, the relvar is actually kept packed on DURING). Then:

■ All DURING values in that unpacked form would be unit intervals and would thus effectively correspond to individual time points.

■ The sole key for that unpacked form would still be {SNO,DURING}, because any given supplier under contract at any given time has exactly one status at that time. In other words, as noted in the previous section, the functional dependency {SNO,DURING} → {STATUS} would still hold in that unpacked form.

■ Thus, if the original relvar did contain the contradictory tuples shown earlier—

image


—then the unpacked form would contain (among other things) these two tuples:

image


And these two tuples would together violate the functional dependency {SNO,DURING} → {STATUS} that’s supposed to hold in the unpacked form. Equivalently, they would violate the constraint that {SNO,DURING} is supposed to be a key for that unpacked form.

It follows that if we were to enforce the constraint that {SNO,DURING} is a key for the unpacked form UNPACK S_STATUS_DURING ON (DURING), then we’d be enforcing Constraint B a fortiori. We therefore propose another new specification, WHEN / THEN, that can appear in a relvar definition and has syntax as illustrated in this example:

VAR S_STATUS_DURING BASE RELATION
 { SNO SNO , STATUS INTEGER , DURING INTERVAL_DATE }
  PACKED ON ( DURING )
  WHEN UNPACKED ON ( DURING ) THEN KEY { SNO , DURING }
  KEY { SNO , DURING } ;

The specification WHEN UNPACKED ON (DURING) THEN KEY {SNO,DURING} implies that any attempt to update the relvar in such a way that the unpacked form of the result violates the functional dependency {SNO,DURING} → {STATUS} must be rejected. This special syntax thus suffices to solve the contradiction problem. Note: For reasons that should become clear later, given the WHEN / THEN specification WHEN UNPACKED ON (ACL) THEN KEY {BCL}, where ACL and BCL are both commalists of attribute names, every attribute mentioned in ACL must also be mentioned in BCL.

It follows from the foregoing discussion that the WHEN / THEN, PACKED ON, and KEY specifications are together sufficient—at last—to fix all of the problems we’ve been discussing in this chapter so far. However, it can’t be denied that those specifications do seem to be a little clumsy, or cumbersome, in combination. In particular, the specifications WHEN UNPACKED ON (DURING) THEN KEY {SNO,DURING} and KEY {SNO,DURING}, taken together, certainly look as if they might somehow be saying the same thing twice. It therefore seems worth considering the possibility of simplifying the syntax somewhat, and we will. But there are several other topics we need to get out of the way first before we can consider such a possibility in any detail. Those topics are addressed in the next four sections. So we’ll come back to the syntax issue later, in the section “Keys Revisited.”

Combining Specifications

We’ve now met three kinds of constraints that can be specified in a relvar definition: KEY constraints, PACKED ON constraints, and WHEN / THEN constraints. At first sight, therefore, there appear to be eight possible combinations that might be specified for any given relvar. But do all of those combinations make sense?

Well, we can simplify our investigation by first stipulating—until further notice, at any rate—that at least one explicit KEY constraint is always required, since we know from Chapter 3 that every relvar certainly does have at least one key. This stipulation reduces the possibilities from eight to four. Now let R be a relvar with an interval attribute. Then we already know from previous discussions that R might need both a PACKED ON and a WHEN / THEN constraint. So the possibilities we still need to examine are:

■ PACKED ON without WHEN / THEN

■ WHEN / THEN without PACKED ON

■ Neither PACKED ON nor WHEN / THEN

These three possibilities are the subject of the next three sections.

PACKED ON without WHEN / THEN

It should be readily apparent without the need for detailed analysis that relvar S_DURING, with its attributes SNO and DURING, is susceptible to redundancy and circumlocution problems analogous to those discussed earlier for relvar S_STATUS_DURING. However, it’s not susceptible to a corresponding contradiction problem. Why not? Answer: Because it’s all key—i.e., its sole key is the attribute combination {SNO,DURING}—and so it can’t possibly contain two tuples that contradict each other.5 Thus, PACKED ON (DURING) does apply, but WHEN UNPACKED ON (DURING) THEN KEY {SNO,DURING} is unnecessary. The following thus serves as an adequate definition for this relvar:

VAR S_DURING BASE RELATION
 { SNO SNO , DURING INTERVAL_DATE }
  PACKED ON ( DURING )
  KEY { SNO , DURING } ;

So a WHEN / THEN constraint isn’t needed for this relvar. However, it wouldn’t be wrong to specify one, as here:

VAR S_DURING BASE RELATION
 { SNO SNO , DURING INTERVAL_DATE }
  PACKED ON ( DURING )
  WHEN UNPACKED ON ( DURING ) THEN KEY { SNO , DURING }
  KEY { SNO , DURING } ;

To repeat, the WHEN / THEN constraint here isn’t wrong, but it might lead to some inefficiency in implementation if the system tried to enforce it too blindly. (As we’ll see in the next section, however—and as in fact should be fairly obvious—if some relvar definition does specify both WHEN … THEN KEY {K} and KEY {K}, then enforcing the specified WHEN / THEN constraint will enforce the specified KEY constraint automatically.)

Remarks analogous to the foregoing apply to relvar SP_DURING as well and effectively dispose of that case also. Thus, the following is a possible definition for that relvar:

VAR SP_DURING BASE RELATION
 { SNO SNO , PNO PNO , DURING INTERVAL_DATE }
  PACKED ON ( DURING )
  KEY { SNO , PNO , DURING } ;

From these examples, we see that if the relvar is all key, then no WHEN / THEN is required. But it doesn’t follow that if no WHEN / THEN is required, then the relvar is all key! A counterexample (relvar INFLATION) is given in the section after next, and another, perhaps slightly more convincing, is given in the answer to Exercise 13.8 at the end of the chapter.

WHEN / THEN without PACKED ON

Suppose we’re given a relvar TERM representing U.S. presidential terms, with attributes DURING and PRESIDENT and sole key {DURING}. Here’s a sample value (perhaps we should say rather, here’s part of the value that’s current at the time of writing):

image

Aside: This example raises a number of points that are somewhat tangential to the main topic of this chapter but are worth mentioning in passing nonetheless. Here are some of them:

■ In practice, presidential terms are usually stated in the form of overlapping intervals—Ford, 1974-1977; Carter, 1977-1981; and so on—instead of the way we’ve shown them above. But the reason for this state of affairs is that the granularity of those intervals as usually stated is wrong; presidential terms really stretch from one Inauguration Day (normally a day in January) to the next, and so, e.g., Ford was president for the first few days of 1977 and Carter was president for the rest of that year.

■ President last names aren’t necessarily unique—think of Roosevelt, for example, or Adams—but we choose to overlook this fact for the purposes of this chapter.

■ There are quite a few additional constraints that apply to relvar TERM, over and above the key and “key like” constraints that are the primary focus of the present chapter. For example:

a. There’s exactly one president at any given time (this is an example of what in Chapter 14 we’ll be calling a denseness constraint).

b. Nobody is allowed to serve as president for more than two terms (at least not since 1951, when the 22nd Amendment to the U.S. Constitution was ratified).

c. No presidential term is allowed to exceed four years.


And so on, possibly.
We leave it as an exercise for the reader to ponder the implications of such considerations. End of aside.

Back to the main thread of our discussion. It should be clear that PACKED ON (DURING) must not be specified for relvar TERM, because (with reference to the sample value shown above) such a constraint would cause the two Reagan tuples to be packed into one and the two Clinton tuples likewise. At the same time, it should be clear that a WHEN / THEN specification is needed, in order to avoid the possibility that the relvar might contain, e.g., both of the following tuples at the same time:

image

In other words, without such a specification—or something equivalent to such a specification, of course—relvar TERM would be susceptible to the contradiction problem. Here then is a possible relvar definition:6

VAR TERM BASE RELATION
 { DURING INTERVAL_… , PRESIDENT NAME }
  WHEN UNPACKED ON ( DURING ) THEN KEY { DURING }
  KEY { DURING } ;

Several further points arise in connection with this example, however. First, note that we certainly do want to avoid the possibility that the relvar might contain, e.g., both of the following tuples at the same time:

image

(an example of the redundancy problem). At the same time, we don’t want to avoid the possibility that the relvar might contain, e.g., both of the following tuples at the same time:

image

On the face of it, therefore, it looks as if TERM is an example of a relvar for which we might want to avoid the redundancy problem but not the circumlocution problem (a possibility touched on, briefly, earlier in the chapter).

That said, we have to say too that in our opinion the example intuitively fails, because the circumlocution it displays—if it truly is circumlocution—is intentional. To be specific, if the two Clinton tuples were to be replaced by a single packed tuple, then information would actually be lost (namely, the information that one of the DURING intervals corresponds to Clinton’s first term and the other to his second). In other words, the real problem is that relvar TERM has no explicit “term number” attribute. If we introduce such an attribute, the problem goes away:

image

(Well, the problem doesn’t entirely go away; the relvar might still contain a pair of tuples that would better be replaced by a single tuple. For example, it might contain two tuples for Carter’s single term, one with a DURING value of [1977:1978] and the other with a DURING value of [1979:1980], thus:

image

This state of affairs genuinely is an example of the circumlocution problem. But an exactly analogous situation could occur without the TERMNO attribute! The trouble is, before we introduced that attribute, the system couldn’t tell the difference between a genuine circumlocution like this Carter example and a “false circumlocution” like the Clinton example shown earlier.)

Of course, the key constraint KEY {DURING} still holds in this revised version of relvar TERM (with its TERMNO attribute). On the face of it, however, the relvar is now susceptible to all three of our usual problems (redundancy, circumlocution, and contradiction). Redundancy would occur if the relvar were to contain, e.g., both of the following tuples at the same time:

image

As already noted, circumlocution would occur if the relvar were to contain, e.g., both of the following tuples at the same time:

image

And contradiction would occur if the relvar were to contain, e.g., both of the following tuples at the same time:

image

It therefore might appear as if this revised version of TERM does need both PACKED ON and WHEN / THEN, as follows:

VAR TERM BASE RELATION
 { DURING INTERVAL_… , PRESIDENT NAME, TERMNO INTEGER }
  PACKED ON ( DURING )
  WHEN UNPACKED ON ( DURING ) THEN KEY { DURING }
  KEY { DURING } ;

But observe now that there’s no good reason for this relvar ever to have two distinct tuples with the same values for PRESIDENT and TERMNO. The following constraint thus applies:

KEY { PRESIDENT , TERMNO }

And this specification makes the PACKED ON specification superfluous!—not actually wrong, as it was with the original version of the relvar, but certainly unnecessary. The reason is that, since {PRESIDENT,TERMNO} is a key, packing the relvar on DURING can’t possibly have any effect.7 (More generally, the operation PACK R ON (ACL) can’t possibly have any effect if R has a key that doesn’t include the set of attributes mentioned in ACL. Thus, there’s no point in specifying the constraint PACKED ON (ACL) for such a relvar.) So our final definition for relvar TERM (revised version) looks like this:

VAR TERM BASE RELATION
 { DURING INTERVAL_… , PRESIDENT NAME, TERMNO INTEGER }
  WHEN UNPACKED ON ( DURING ) THEN KEY { DURING }
  KEY { DURING }
  KEY { PRESIDENT , TERMNO } ;

In other words, relvar TERM, even with the addition of attribute TERMNO, still serves as an example of a relvar for which we would probably want to specify WHEN / THEN and not PACKED ON.

Aside: It’s only fair to point out that the introduction of attribute TERMNO, while it does solve some problems, also introduces others. Obviously, there’s the problem of guaranteeing the uniqueness of a second key. Then there’s the problem of ensuring that, for a given president, a tuple with TERMNO = 2 exists only if a tuple also exists with TERMNO = 1, and ensuring moreover that BEGIN (DURING) in the tuple with TERMNO = 2, if it exists, must be greater than END (DURING) in the tuple with TERMNO = 1. For such reasons, it might be argued that the original design without TERMNO is preferable. But for definiteness we’ll stay with the revised design from this point forward, barring explicit statements to the contrary. End of aside.

Does WHEN / THEN Imply KEY?

Here again is the definition of relvar TERM, but now with certain specifications highlighted:

VAR TERM BASE RELATION
 { DURING INTERVAL_… , PRESIDENT NAME , TERMNO INTEGER }
  WHEN UNPACKED ON ( DURING ) THEN KEY { DURING }
  KEY { DURING }

  KEY { PRESIDENT , TERMNO } ;

As noted in the section “WHEN / THEN,” the highlighted specifications here certainly look as if they might be saying the same thing twice. So let’s take a closer look. The WHEN / THEN specification means that if the relvar were kept unpacked on DURING, then {DURING} would still be a key. But if that relvar were kept unpacked on DURING, then each DURING value would be a unit interval; and if {DURING} were a key for that unpacked form, then each such unit interval would appear in that unpacked form in exactly one tuple, associated therefore with exactly one combination of PRESIDENT and TERMNO values. It follows that if we were now to (re)pack that unpacked form on DURING, any given DURING value in the result, regardless of whether it’s a unit interval or not, would also appear in exactly one tuple and be associated with exactly one combination of PRESIDENT and TERMNO values. Within that result, in other words, {DURING} would certainly satisfy the key uniqueness property. And since it obviously satisfies the irreducibility property as well, it appears that the WHEN / THEN specification in this example does imply that {DURING} is a key for TERM.

Before we try to generalize from the foregoing analysis, let’s take another look at relvar S_STATUS_DURING, with definition as follows:

VAR S_STATUS_DURING BASE RELATION
 { SNO SNO , STATUS INTEGER , DURING INTERVAL_DATE }
  PACKED ON ( DURING )
  WHEN UNPACKED ON ( DURING ) THEN KEY { SNO , DURING }
  KEY { SNO , DURING } ;

This example is a little more general than the TERM example, in the following sense. Recall that in the specification WHEN UNPACKED ON (ACL) THEN KEY {BCL}, every attribute mentioned in ACL must also be mentioned in BCL. In the TERM example, ACL and BCL were identical. In the S_STATUS_DURING example, by contrast, BCL contains an attribute, SNO, not mentioned in ACL. And if we perform the same kind of analysis on S_STATUS_DURING as we did a moment ago for TERM, we’ll see that the WHEN / THEN specification certainly implies that any given {SNO,DURING} value appearing in S_STATUS_DURING does appear in exactly one tuple and is associated with exactly one STATUS value. So {SNO,DURING} certainly satisfies the uniqueness property that a key for S_STATUS_DURING is required to satisfy. And since it’s clear that neither {SNO} nor {DURING} by itself satisfies that property, it follows that {SNO,DURING} satisfies the irreducibility property as well, from which we can conclude that it’s actually a key, and hence that the WHEN/THEN specification does imply the KEY specification.

The conclusion to be drawn from the foregoing discussion is this: If a relvar definition does specify both WHEN … THEN KEY {K} and KEY {K}, then enforcing the specified WHEN / THEN constraint will enforce the specified KEY constraint automatically. Thus, we might reasonably invent a syntax rule to say that the KEY specification can be omitted in such a situation. We choose not to adopt such a rule, however, for reasons that should become clear in the section “Keys Revisited,” later.

Aside: We’ve seen that the answer to the question in the title of this subsection (“Does WHEN / THEN imply KEY?”) is yes. But there’s a sense in which the answer to the inverse question is yes, too. Suppose suppliers are subject to the (unlikely, but possible) additional constraints that (a) the status of a given supplier never changes and (b) no supplier is ever allowed to be under contract during two disjoint intervals (i.e., once a supplier’s contract terminates, that supplier is never placed under contract again). Then the definition for relvar S_STATUS_DURING can be simplified to just:

VAR S_STATUS_DURING BASE RELATION

 { SNO SNO , STATUS INTEGER , DURING INTERVAL_DATE }

  KEY { SNO } ;

And the fact that SNO values are unique in this relvar clearly implies that {SNO,DURING} values are unique in the corresponding unpacked form; in other words, the constraint KEY {SNO} in this example clearly implies the constraint WHEN UNPACKED ON (DURING) THEN KEY {SNO,DURING}. End of aside.

One final point to close this section: Just as a relvar can have two or more keys, so a relvar can also be subject to two or more WHEN / THEN constraints. By way of example, consider the following relvar (the semantics are meant to be obvious):

VAR MARRIAGE BASE RELATION

 { SPOUSE1 NAME , SPOUSE2 NAME , DURING INTERVAL_… }
  PACKED ON ( DURING )
  WHEN UNPACKED ON ( DURING ) THEN KEY { SPOUSE1 , DURING }
  WHEN UNPACKED ON ( DURING ) THEN KEY { SPOUSE2 , DURING }
  KEY { SPOUSE1 , DURING }
  KEY { SPOUSE2 , DURING } ;

Neither PACKED ON nor WHEN / THEN

Suppose we’re given a relvar INFLATION representing inflation rates for a certain country during certain intervals of time. The attributes are DURING and RATE, and the sole key is {DURING}. Here’s a sample value, showing that the inflation rate was 18 percent for the first three months of the year, went up to 20 percent for the next three months, stayed at 20 again for the next three months (but went up to 25 percent in month 7), …, and averaged out at 20 percent for the year as a whole:

image

Note: This relvar is perhaps not very well designed (or so it could be argued, at any rate), since it contains monthly, quarterly, and annual information all mixed together; that is, it might be better to replace it by three separate relvars. But bad or not, the design is certainly a possible one, and we’ll stay with it for the sake of the discussion.

It should be clear, then, that this relvar must not be kept packed on DURING—i.e., PACKED ON (DURING) must not be specified—because if it were, then (in terms of the sample data shown above) the three tuples with RATE = 20 would be packed into one, and we would lose the information that the inflation rate for months 4-6 and months 7-9 was 20 percent (as it was also for the year overall). What’s more, unpacking the relvar on DURING also causes information to be lost, in general (try it on the sample data above if you don’t immediately see why). We could if we like specify the following:

WHEN UNPACKED ON ( DURING ) THEN KEY { DURING , RATE }

But this specification merely says, in effect, that the result of UNPACK INFLATION ON (DURING) is all key and thus never contains any duplicate tuples, and this “constraint” is thus no constraint at all, because no relation ever contains any duplicate tuples.

So relvar INFLATION does seem to be an example of a relvar for which it doesn’t really seem to make sense to specify either PACKED ON or WHEN / THEN. The relvar definition thus becomes simply:

VAR INFLATION BASE RELATION
 { DURING INTERVAL_… , RATE INTEGER }
  KEY { DURING } ;

In fact, this relvar is subject to none of our usual redundancy, circumlocution, and contradiction problems—or rather, to state the matter more precisely, it’s impossible to tell the difference between (a) a value of the relvar that does suffer from such problems and (b) a value of the relvar that looks as if it suffers from such problems but in fact is correct and doesn’t.

INFLATION vs. TERM

Now, relvar INFLATION and relvar TERM from the previous section both seem not to need a PACKED ON constraint. So how do these two relvars differ from each other? Or, rather, do they differ, at least in any significant respect?

Well, they certainly resemble each other inasmuch as they both have the same key, {DURING}; thus, the functional dependency {DURING} → {PRESIDENT,TERMNO} holds in TERM, and the functional dependency {DURING} → {RATE} holds in INFLATION. However, they also differ in an important respect. To be specific, suppose we unpack them both on DURING. Then:

■ In the unpacked form of TERM, the functional dependency {DURING} → {PRESIDENT,TERMNO} still holds.

■ In the unpacked form of INFLATION, the functional dependency {DURING} → {RATE} does not still hold.

Intuitively speaking, in other words, a given RATE value in relvar INFLATION is a property of the corresponding DURING interval taken as a whole—it’s not a property of the individual time points that go to make up that interval. To put it another way, just because the inflation rate for, e.g., the interval [m07:m09] was 20 percent, we can’t infer that the inflation rate for, e.g., the interval [m07:m07] was also 20 percent—and indeed it wasn’t, according to the sample value shown.

Keys Revisited

Despite—or perhaps because of!—the discussions of the last few sections, it does seem likely in practice that relvars with interval attributes will often be subject to both a PACKED ON constraint and a WHEN / THEN constraint (not to mention the required KEY constraint). For that reason, it seems desirable to come up with some syntax that combines the functionality of all three. We therefore propose another shorthand; to be specific, we propose that the definition of any given relvar R should be allowed to contain a specification of the following form:

USING ( ACL ) : KEY { K }

Here:

■ ACL and K are both commalists of attribute names, and every attribute mentioned in ACL must also be mentioned in K.

■ The specification is defined to be shorthand for the combination of all three of the following:

PACKED ON ( ACL )
WHEN UNPACKED ON ( ACL ) THEN KEY { K }
KEY { K }

We refer to {K} here as a U_key for short (but see further discussion below). Using this shorthand, the definition of relvar S_STATUS_DURING, for example, might be simplified to just:

VAR S_STATUS_DURING BASE RELATION
 { SNO SNO , STATUS INTEGER , DURING INTERVAL_DATE }
  USING ( DURING ) : KEY { SNO , DURING } ;

Suppose now that the commalist of attribute names ACL within a given U_key specification is empty (i.e., contains no attribute names at all), thus:

USING ( ) : KEY { K }

By definition, this specification is shorthand for:

PACKED ON ( )
WHEN UNPACKED ON ( ) THEN KEY { K }
KEY { K }

So:

■ First, the pertinent relvar must be kept packed on no attributes at all. But packing a relation r on no attributes at all simply returns r, so the implied PACKED ON specification has no effect.

■ Second, the pertinent relvar must be such that if it’s unpacked on no attributes at all, then {K} is a key for the result. But unpacking a relation r on no attributes at all simply returns r, so the implied WHEN / THEN specification simply means that {K} is a key for the pertinent relvar, and the implied regular KEY constraint is thus redundant.

It follows that we can take a regular KEY specification of the form KEY {K} to be shorthand for a certain U_key specification—namely, one of the form USING ( ) : KEY {K}. In other words, regular KEY constraints are essentially just a special case of our proposed new syntax! Thus, if we redefine the syntax of a regular KEY specification as follows—

[ USING ( ACL ) : ] KEY { K }

—and if we allow the USING specification and colon separator to be omitted if and only if ACL is empty, then we no longer have any need to talk about U_keys as such at all; all keys effectively become U_keys, and we can generalize the meaning of “key” accordingly. And so we will.

Note: It’s occasionally useful to refer to PACKED ON and WHEN / THEN constraints that have, and can have, no logical effect as trivial constraints. In particular, PACKED ON ( ) and WHEN UNPACKED ON ( ) THEN … are both trivial in this sense. So too are:

■ PACKED ON (ACL), if the set of attributes of the relvar in question not included in ACL is a superkey for that relvar

■ WHEN … THEN KEY {K}, if K is the entire heading of the relvar in question

Foreign U_keys

Note: This topic is discussed in more detail in the next chapter. We include a brief and very incomplete discussion of it here only because you might feel the chapter is missing something without it. Note too—this is important!—that it’s necessary to assume for the purposes of this short discussion that the database contains during relvars only, not a mixture of since and during relvars.

Consider relvars S_DURING and SP_DURING, whose definitions currently look like this:8

VAR S_DURING BASE RELATION
 { SNO SNO , DURING INTERVAL_DATE }
  USING ( DURING ) : KEY { SNO , DURING } ;

VAR SP_DURING BASE RELATION

 { SNO SNO , PNO PNO , DURING INTERVAL_DATE }
  USING ( DURING ) : KEY { SNO , PNO , DURING } ;

Observe now that (as the implied WHEN / THEN constraint in fact asserts) if S_DURING were actually kept unpacked on DURING, then {SNO,DURING} would be a key for that unpacked form. Furthermore, if relvar SP_DURING were also kept unpacked on DURING, then {SNO,DURING} would be a matching foreign key in that unpacked form.9 Thus, if {SNO,DURING} is regarded as a U_key for S_DURING, then {SNO,DURING} might reasonably be regarded as a matching foreign U_key in SP_DURING. We therefore propose another shorthand. To be specific, we propose that the definition of any given relvar R2 should be allowed to contain a specification of the form:

USING ( ACL ) : FOREIGN KEY { FK } REFERENCES R1

As we’ve more or less indicated already, the semantics are that if R1 and R2 were both kept unpacked on the attributes specified in ACL, then FK in R2 would be a foreign key matching the key in R1 that’s implied by the corresponding U_key definition for R1. We skip the details here, except to note that—as by now you should surely be expecting—regular FOREIGN KEY constraints are basically just a special case of this proposed new syntax.

Putting it all Tgether

We close this chapter by showing the overall effect of the syntactic simplifications discussed in the previous section on our running example (see Fig. 13.4). Note: We revert now to our preferred “combination” design (recall that in the subsection “Foreign U_keys” at the end of the previous section, we were assuming a design that consisted of during relvars only). Observe in particular in that combination design that, contrary to what you might have expected, relvar SP_DURING doesn’t have a foreign U_key referencing S_DURING, nor do relvars S_DURING and S_STATUS_DURING have foreign U_keys referencing each other. We’ll have more to say about such matters in the next chapter.

Exercises

13.1 (This exercise subsumes Exercise 5.2 from Chapter 5.) Explain the redundancy, circumlocution, and contradiction problems in your own words.

13.2 Explain in your own words:

a. KEY constraints

b. PACKED ON constraints

c. WHEN / THEN constraints

d. U_key constraints

13.3 Give an example of a relvar that has at least one interval attribute and requires:

a. Both a PACKED ON and a WHEN / THEN constraint

b. A PACKED ON but no WHEN / THEN constraint

c. A WHEN / THEN but no PACKED ON constraint

d. Neither a PACKED ON nor a WHEN / THEN constraint


(where by constraint we mean a nontrivial one throughout).

13.4 Explain how regular keys can be regarded as a special case of U_keys.

13.5 Use Tutorial D to formulate as many reasonable constraints as you can think of that might apply to the revised version of relvar TERM (i.e., the version with the TERMNO attribute).

13.6 Consider relvar MARRIAGE from the very end of the section “WHEN / THEN without PACKED ON” in the body of the chapter. Its definition can be simplified to:

VAR MARRIAGE BASE RELATION

 { SPOUSE1 NAME , SPOUSE2 NAME , DURING INTERVAL_… }

  PACKED ON ( DURING )

  USING ( DURING ) : KEY { SPOUSE1 , DURING }

  USING ( DURING ) : KEY { SPOUSE2 , DURING } ;


Observe now that the two U_keys here both have the same USING specification. What it would mean (if anything) for a relvar to have U_keys with different USING specifications?

13.7 Consider the revised version of courses-and-students from Exercise 12.8c in Chapter 12, with the following as an appropriate definition for relvar COURSE_OFFERING:

VAR COURSE_OFFERING BASE RELATION

 { COURSENO COURSENO , OFFERINGNO INTEGER , QUOTA INTEGER , OFFERED_DURING INTERVAL_DATE }

 KEY { COURSENO , OFFERINGNO } ;


The predicate (slightly simplified as usual) is: Offering OFFERINGNO of course COURSENO has quota QUOTA and took place, or is scheduled to take place, during interval OFFERED_DURING. Revise the database definition again to show all such PACKED ON, WHEN / THEN, and/or U_key constraints as you think necessary.

13.8 Suppose a given relvar has attributes A, B, and DURING, and predicate DURING is a maximal interval of days throughout which person A was dating person B. Assume it’s possible for the interval during which A was dating B to overlap with the interval during which A was dating some distinct person B’. However, assume also that, in such a case, at least it’s not allowed for the intervals in question to be identical (one does have to have some standards). Give an appropriate relvar definition.

Answers

13.1 The redundancy, circumlocution, and contradiction problems are all problems that can arise in connection with relvars having interval attributes. We explain them in terms of relvars S_DURING and S_STATUS_DURING:

■ For relvar S_DURING, the redundancy problem occurs if two or more tuples can have the same SNO value and overlapping DURING values. For example, if supplier S4 is shown in one tuple to be under contract from d05 to d06 and in another to be under contract from d06 to d07, then the information that S4 is under contract on d06 is represented twice.

■ For relvar S_DURING again, the circumlocution problem occurs if two or more tuples can have the same SNO value and DURING values that meet or “abut.” For example, if supplier S4 is shown in one tuple to be under contract from d05 to d05 and in another to be under contract from d06 to d07, then the information that S4 is under contract from d05 to d07 is represented in a roundabout way, using two tuples when one would clearly be enough.

■ For relvar S_STATUS_DURING, the contradiction problem—which can occur only in a relvar that has at least one nonkey attribute10—arises if two or more tuples can have the same SNO value, overlapping DURING values, and different STATUS values. For example, suppose supplier S4 is shown in one tuple as having status 10 from d04 to d06 and in another as having status 25 from d05 to d07. Then supplier S1 is represented as having both status 10 and status 25 on d05 and d06, thereby violating the constraint that a supplier has just one status at one time.

13.2 

a. A KEY constraint specifies that a given subset of the heading of a given relvar constitutes a superkey for that relvar. The subset in question is generally assumed, but can’t be guaranteed, to be a key as such (i.e., it’s assumed to be irreducible). Note: A superkey that’s not a key as such is said to be a proper superkey.

b. A PACKED ON constraint specifies that the relation that’s the current value of the pertinent relvar at any given time must be in the specified packed form.

c. A WHEN / THEN constraint specifies that a given subset of the heading of a given relvar constitutes a superkey for the specified unpacked form of that relvar. The subset in question is generally assumed, but can’t be guaranteed, to be a key as such for that unpacked form (i.e., it’s assumed to be irreducible).

d. A U_key constraint specifies a commalist ACL of attribute names and a commalist K of attribute names. It’s shorthand for the combination of (a) the PACKED ON constraint PACKED ON (ACL), (b) the KEY constraint KEY {K}, and (c) the WHEN / THEN constraint WHEN UNPACKED ON (ACL) THEN KEY {K}.

13.3 

a. S_STATUS_DURING (redundancy, circumlocution, and contradiction must all be addressed)

b. SP_DURING (contradiction can’t arise)

c. TERM with no TERMNO attribute (“circumlocution”—sort of—must be permitted)

d. INFLATION

13.4 A U_key constraint specifying KEY {K} and an empty USING commalist is equivalent to a regular KEY constraint specifying KEY {K}.

13.5 The following constraints are over and above those shown explicitly in the body of the chapter.

a. No president has more than two terms:
CONSTRAINT C135a IS_EMPTY ( TERM WHERE TERMNO ≠ 1 AND TERMNO ≠ 2 ) ;

b. No presidential term lasts more than four years:
CONSTRAINT C135b IS_EMPTY ( TERM WHERE COUNT ( DURING ) > 4 ) ;

c. For a given president, a tuple with TERMNO = 2 exists only if a tuple also exists with TERMNO = 1, and BEGIN (DURING) in the tuple with TERMNO = 2, if it exists, must be greater than END (DURING) in the tuple with TERMNO = 1:
CONSTRAINT C135c IS_EMPTY
( ( ( TERM RENAME { DURING AS D1 , TERMNO AS T1 } ) JOIN
   ( TERM RENAME { DURING AS D2 , TERMNO AS T2 } ) )
     WHERE T1 < T2 AND BEGIN ( D2 ) ≤ END ( D1 ) ) ;
Note: This formulation assumes Constraint C135a is in effect.

13.6 Such a state of affairs is impossible, except in certain pathological cases (e.g., if the relvar is constrained never to contain more than one tuple). For if USING (ACL) … and USING (BCL) … are both specified, then—among other things—the relvar is apparently to be kept packed on both ACL and BCL at the same time.

13.7 For STUDENT_HISTORY:

USING ( REG_DURING ) : KEY { STUDENTNO , REG_DURING }


For COURSE_OFFERING:

USING ( OFFERED_DURING ) : KEY { COURSENO , OFFERED_DURING }

13.8 This is an example of a relvar for which PACKED ON applies but WHEN / THEN doesn’t (even though, unlike, e.g., relvars S_DURING and SP_DURING, it’s not all key). For example, the following is a possible sample value:

image


Unpacking this relation on A and DURING yields among things the following pair of tuples:

image


Thus, {A,DURING} is clearly not a key for the unpacked form. So here’s a possible relvar definition:

VAR ABC BASE RELATION

  { A NAME , B NAME , DURING INTERVAL_DATE }

    PACKED ON ( A , DURING )

    KEY { A , DURING } ;


Note: If we assume that A is dating B at a given time if and only if B is dating A at the same time, we might want to impose a constraint on this relvar to the effect that A < B. The sample value shown above assumes such a constraint is in effect.

As a subsidiary exercise, what changes would be required to the foregoing definition if we wanted to insist that the interval during which A was dating B didn’t properly include the interval during which A was dating some distinct person B’?


1As pointed out in Chapter 5, therefore, the two original tuples don’t just suffer from redundancy, they actually imply a certain contradiction. In this chapter, however, when we talk of “the contradiction problem,” we’re using—perhaps usurping is the mot juste—that term to refer specifically to the kind of situation to be discussed in the section after next. Note: These remarks apply also to the example to be discussed in the section “The Circumlocution Problem,” mutatis mutandis.

2Of course, all of the attributes mentioned must be attributes of the relvar in question and must be interval valued (and a similar remark applies to all of the other commalists of attribute names mentioned in this chapter, unless the context demands otherwise).

3Well … it solves those problems by replacing them by another: namely, the problem of ensuring that updates don’t violate the PACKED ON constraint. In Chapter 16 we’ll introduce some new operators that can help with this latter task. Note: Analogous remarks apply to the solution we’ll be proposing to the contradiction problem also (see the next two sections).

4We remark in passing that SQL does provide syntax that addresses—we don’t say solves!—the redundancy problem (also the contradiction problem, in fact) but not the circumlocution problem. See Part IV for further explanation.

5At least, not in the special sense in which we’re using “contradict” here.

6Here and elsewhere in this chapter we use the syntax “INTERVAL_…” to denote an interval type that we don’t want to get sidetracked into discussing in detail at this juncture.

7Indeed, as you might have realized, the fact that {PRESIDENT,TERMNO} is a key is sufficient to avoid the redundancy and circumlocution problems—though not the contradiction problem—as described on the previous page.

8Specifying explicit U_keys for these two relvars means we’ve implicitly specified certain WHEN / THEN constraints, of course. Note, however, that these latter constraints are trivial ones, since the relvars are both all key.

9This is where we rely on the fact that the database contains during relvars only; the claim we’re making here wouldn’t be valid for our preferred “combination” design (why not?).

10A nonkey attribute is an attribute of a given relvar that isn’t part of any key of that relvar. (Of course, a key attribute is an attribute of a given relvar that is part of some key of that relvar.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.244.137