Chapter 12

Database Design I: Structure

This chapter is the first of three devoted to the topic of logical database design in the temporal context. Database design in that context has the potential to be a much more complicated matter than its analog in the conventional (nontemporal) context. There are several reasons for this state of affairs, including (a) the need to deal with the fact that different “properties” of the same “entity” tend to vary at different rates and (b) the need to deal with the concept of “until further notice”—i.e., the need to be able to record the fact that a given “property” of a given “entity” has a given value right now and will continue to have that same value until some unknown time in the future. The chapter proposes some new design techniques (in particular, a new normal form) for dealing with such matters.

Keywords

temporal database design; horizontal decomposition; vertical decomposition; sixth normal form; 6NF; current relvar; historical relvar; “the moving point now

How dangerous it always is to reason from insufficient data.

—Arthur Conan Doyle:

The Adventure of the Speckled Band (1891)

Database design isn’t just a matter of pinning down relvars and their attributes—the associated constraints are crucial too. But there’s so much to say about constraints in general in the temporal context that to include that topic in this chapter would make the chapter far too long and unwieldy. For that reason, we’ve spread our discussion of design issues across three chapters, as follows:

■ The present chapter is concerned mainly just with structure, or in other words with relvars and their attributes—though even here there’ll unavoidably be quite a lot of discussion of constraints as well, especially in the section “A New Normal Form.”

■ Chapters 13 and 14 then discuss constraints as such: Chapter 13 discusses keys and what might be called “key like” constraints, and Chapter 14 discusses general constraints.

Note: For obvious reasons, the unqualified term design should be taken throughout this chapter and the next two as referring to temporal database design specifically, unless the context demands otherwise.

The Running Example Revisited

Up to this point, our sample relvars S_DURING and SP_DURING have served us well, clearly demonstrating the need for interval types and the desirability of special operators for dealing with interval data. Obviously enough, however, those relvars are extremely simple in structure—and S_DURING, at least, is really too simple to serve as a basis for a proper investigation into design issues. So let’s now go all the way back to our original nontemporal relvars S and SP from Chapter 1. Here again are the original relvar definitions:

VAR S BASE RELATION
  { SNO SNO , SNAME NAME , STATUS INTEGER , CITY CHAR }
  KEY { SNO } ;
VAR SP BASE RELATION
  { SNO SNO , PNO PNO }
  KEY { SNO , PNO }
FOREIGN KEY { SNO } REFERENCES S ;

Now let’s concentrate on relvar S specifically until further notice (we’ll come back to SP at the very end of the chapter). Just to remind you, the predicate for relvar S is:

Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY.

However, noting that relvar S is subject to the constraint that {SNO} is a key—and hence that there’s a functional dependency from {SNO} to each of {SNAME}, {STATUS}, and {CITY}—we can observe that the following more precise statement is in fact the case:

The supplier identified by SNO is under exactly one contract, has exactly one name SNAME, has exactly one status STATUS, and is located in exactly one city CITY.

In what follows we won’t always bother to spell matters out quite so carefully, but we’ll certainly rely on the fact that such more careful statements do in fact apply, and hence that in the case at hand, for example, a given supplier does have exactly one name, one status, and one city at any one time.

Now suppose we want to design a temporal analog of this nontemporal relvar. How can we do this? The most obvious approach, of course, is just to add an appropriate timestamp attribute—a “since” attribute if we merely want to “semitemporalize” the design, or a “during” attribute if we want to temporalize it fully, as follows (semitemporal on the left, fully temporal on the right):

VAR SSSC_SINCE BASE RELATION
  { SNO SNO , SNAME NAME , STATUS INTEGER , CITY CHAR , SINCE DATE }
  KEY { SNO } ;
VAR SSSC_DURING BASE RELATION
  { SNO SNO , SNAME NAME , STATUS INTEGER , CITY CHAR , DURING INTERVAL_DATE }
  KEY { SNO , DURING } ;

Note the revised relvar names and, in the case of relvar SSSC_DURING, the revised KEY specification as well.1 Terminology: We’ll refer to SSSC_SINCE and SSSC_DURING as a “since relvar” and a “during relvar,” respectively; loosely speaking, a since relvar is one that contains current information, while a during relvar is one that contains historical information. Please note immediately that these characterizations are, as indicated, both fairly loose; we’ll tighten them up later, in the sections “Since Relvars Only” and “During Relvars Only,” respectively. However, we do find them—the characterizations, that is—useful from an intuitive point of view, and we’ll appeal to them from time to time in what follows.

Now, it’s easy to see (and subsequent sections will show) that relvars SSSC_SINCE and SSSC_DURING are in fact both very badly designed; that is, simply adding a timestamp attribute of some kind to a nontemporal relvar is not a good way to do temporal design.2 So what would a good design look like? Well, the answer to this question depends on whether we want a fully temporal design or merely a semitemporal one:

■ If a semitemporal design is all we want, then all we need to do, in terms of our example, is revise the structure of relvar SSSC_SINCE appropriately. But it turns out that the revisions in question are quite straightforward, and they’re discussed in the next section.

■ Alternatively, if a fully temporal design is our target, then we need to revise the structure of relvar SSSC_DURING instead—and here it turns out that the revisions we need are rather more drastic. To be specific, we propose that, in general, such relvars be subjected to

a. “Vertical” decomposition, to deal with the fact that different “properties” of the same “entity” typically vary at different rates,
and preferably also to

b. “Horizontal” decomposition, to deal with the fact that there’s a logical difference between current and historical information.

But first things first. Here’s the plan for the chapter. In the next section (“Since Relvars Only”), we consider relvars like SSSC_SINCE and make some recommendations for dealing with that comparatively simple case. Then, in the subsequent section (“During Relvars Only”), we consider relvars like SSSC_DURING; that discussion leads to the proposed vertical decomposition, which we consider in detail in a section of its own (“A New Normal Form”). We then go on to discuss “the moving point now” and the special problems caused by that construct. As we’ll see, those problems can occur in connection with during relvars even if vertical decomposition is done, and it’s that fact that leads to the proposed horizontal decomposition, which is considered in detail in the section “Both Since and During Relvars.”

One further preliminary point: The wording of the previous paragraph might be taken to suggest that a given database has to follow one of the various design approaches exclusively (since relvars only, during relvars only, or a mixture of both). In practice, of course, any combination can be used; that is, some “entities” and their corresponding “properties” might be represented by since relvars only, others by during relvars only, and others by a mixture. For simplicity, however, we ignore this nicety from this point forward. We remark, however, that most of the research reported in the literature—on all aspects of temporal data, not just design issues—does seem to concentrate on the case of during or “historical” relvars only.

Since Relvars Only

We’ve used the qualifier since several times already to refer to a relvar like SSSC_SINCE that’s merely semitemporal and thereby contains “current information.” To repeat, however, this latter characterization, such as it is, is actually pretty loose. The fact is, such a relvar contains not only current information as such but also, at least implicitly, information about both the past and the future, as we’ll see. What’s more, it also contains explicit information about the past—again as we’ll see—and it might even contain explicit information about the future! So to say just that such a relvar contains “current information” is only a very rough approximation to the truth.

Be that as it may, here again, repeated from the previous section but now shown only in outline, is the definition of relvar SSSC_SINCE:

SSSC_SINCE { SNO , SNAME , STATUS , CITY, SINCE } KEY { SNO }

The predicate for this relvar, in detail, is as follows:

Ever since day SINCE,3 all four of the following have been true:

a. Supplier SNO has been under contract.

b. Supplier SNO has been named SNAME.

c. Supplier SNO has had status STATUS.

d. Supplier SNO has been located in city CITY.

And it should be clear right away, from this formulation of the predicate, that the relvar isn’t very well designed. To see why not, suppose it contains the following tuple:

image

Suppose too that today is day 10 and that, starting from today, the status of supplier S1 is to be changed to 30, and so we replace the tuple just shown by this one:

image

And now we’ve lost (among other things) the information that supplier S1 has been located in London since day 4!

Well, it should be clear from this simple example that relvar SSSC_SINCE as so far defined is incapable of representing any supplier information that predates the time of the most recent update to that supplier (speaking a little loosely). But this problem is easily fixed: We simply replace the existing “since” attribute by four such attributes, one for each of the “nonsince” attributes, thus (in outline):

SSSC_SINCE { SNO , SNO_SINCE , SNAME , SNAME_SINCE , STATUS , STATUS_SINCE , CITY , CITY_SINCE }
   KEY { SNO }

The predicate for this revised version of SSSC_SINCE is:

Supplier SNO has been under contract ever since SNO_SINCE, has been named SNAME ever since SNAME_SINCE, has had status STATUS ever since STATUS_SINCE, and has been located in city CITY ever since CITY_SINCE.

Here’s a typical tuple for this revised design:

image

This tuple shows among other things that supplier S1 has had status 30 since day 10 but has been located in London since day 4, and so we’ve solved the problem.

Aside: To repeat, the problem was that if we were to replace the original tuple for supplier S1 (with STATUS = 20 and SINCE = d04) by one with STATUS = 30 and SINCE = d10, then we’d lose the information that supplier S1 was previously located in London. Now, you might be thinking we could have solved that problem by changing the KEY specification for relvar SSSC_SINCE in such a way as to allow two distinct tuples to appear for supplier S1, one showing S1 with status 20 since day 4 and the other showing S1 with status 30 since day 10. But if you try to state the predicate for this revised version of the relvar, or if you try to write some queries against it, you’ll quickly see why we don’t seriously suggest such an approach—especially when you realize that analogous changes would presumably have to be made in connection with the SNAME and CITY attributes as well. What’s more, even if we did adopt such a design and thus be able, in a sense, to deal with supplier name and status and city histories, we still wouldn’t be able to deal with supplier contract histories. End of aside.

Of course, it’s still the case with this revised design that the relvar can represent “current” information only—it’s still just a since relvar, and the database is still merely semitemporal. Thus, for example, if on day 10 the status of supplier S1 changes to 30, and we therefore replace the tuple shown for supplier S1, with STATUS = 20 and STATUS_SINCE = d04, by another with STATUS = 30 and STATUS_SINCE = d10, then we lose the information that supplier S1 had status 20 from day 4 to day 9 inclusive. In other words, this semitemporal design can’t represent historical information (other than as explained in the paragraph immediately following). However, it could still be the right design in some circumstances. In particular, it could serve as part of a combination design as discussed later in the section “Both Since and During Relvars.”

Note: When we say a relvar such as SSSC_SINCE can’t represent historical information, we mean it can’t represent purely historical information—i.e., information regarding a state of affairs that held in the past but no longer does so. But if today is day 10 and the relvar states that supplier S1 has been located in London ever since day 4, then it clearly does represent historical information of a kind. Thus, it would be more accurate to say that a since relvar can’t represent historical information other than what can be inferred from the since values. Those since values can be thought of as explicit historical information, and information inferred from them can be thought of as implicit historical information.

Further Points

A number of further points arise in connection with our revised design for relvar SSSC_SINCE, which we now proceed to discuss. First of all, of course, it’s clearly not necessary to have a since attribute for every “nonsince” attribute in the relvar. For example, if supplier names never change, or if they do change but we’re simply not interested in knowing about such changes—i.e., we’re interested only in what those names happen to be at the present time—then attribute SNAME_SINCE is clearly not needed.

Second, let the SNO_SINCE value in some SSSC_SINCE tuple be d, and let the value for any of the other since attributes in that same tuple be d'; then it must be the case that d'd.4 In other words, we assume, reasonably enough, that if relvar SSSC_SINCE contains information regarding a contract for some supplier Sx, it doesn’t contain any name, status, or city information for supplier Sx that predates the start of that contract. To put it another way: Although suppliers presumably do have a name and a status and a city before their contract starts, our database isn’t concerned with such matters.

Third, although we’ve described SSSC_SINCE (informally, at any rate) as containing “current” information, it might nevertheless quite legitimately contain information that explicitly pertains to the future. For example, the SNO_SINCE value for some supplier Sx might be some future date d, meaning the indicated supplier will be placed under contract on that date in the future (in which case attribute SNO_SINCE might better be named SNO_FROM, where FROM means “effective from”).

Fourth and last—and this one is important—we note that, even if it contains no information that pertains to the future explicitly, a relvar like SSSC_SINCE does necessarily contain information that pertains to the future implicitly. For example, if that relvar contains a tuple indicating that supplier S1 has been under contract ever since day 4, that “ever since” must be understood to be open ended; that is, the tuple must be understood to mean that, as far as we know, supplier S1 was, is, or will be under contract on every day from day 4 until “the last day.” In other words, the associated “valid time” is, currently, the interval from day 4 to the last day. Of course, if we subsequently learn that supplier Si’s contract actually terminated on some specific day, say day 25, and we update the database accordingly, the associated valid time will then become the interval [d04:d25].

Aside: See Chapter 3 if you need to refresh your memory regarding the concept of valid time, and Chapter 17 for an extended discussion of the same concept. Here we merely remind you that, strictly speaking, valid times are sets of intervals, not just individual intervals per se. In the case at hand, therefore, it would be more accurate to say the valid time is initially {[d04:d99]}—where we assume for the sake of the example that d99 is “the last day”—and subsequently becomes {[d04:d25]} instead. Note, moreover, that because the database is merely semitemporal, this latter valid time won’t actually be recorded in the database at all; rather, all information concerning supplier S1 will simply be deleted from the database when we learn the contract has terminated. More generally, in fact, any valid time that involves an interval in which the end point is anything other than “the end of time,” and/or involves two or more distinct intervals, can’t be represented in a database that’s merely semitemporal. Again, see Chapter 17 for further discussion. End of aside.

During Relvars Only

We began the previous section by examining the since, or “current,” relvar SSSC_SINCE. Now we turn our attention to the during, or “historical,” relvar SSSC_DURING. Note immediately, however, that (as with our earlier use of the term “current”) our use of the term “historical” here is somewhat loose; clearly, a during relvar might contain explicit information pertaining to the future as well as to the past. For example, relvar SSSC_DURING might contain a tuple showing the contract for supplier Sx as extending from di to dj, where dj is some date in the future, and di might be in the future also. (And if dj is in the future but di is in the past, or is the date today, then the relvar contains current information as well.)

By way of another example, consider a relvar VACATION {EMPNO,DURING}, with predicate:

DURING denotes a maximal interval of days throughout which employee EMPNO was, is, or will be on vacation.

Here the end date probably is known, even for vacations that haven’t yet ended—indeed, even for ones that haven’t yet begun. So to say a during relvar contains just “historical information” is, again, only an approximation to the true state of affairs. As noted earlier, however, we find the characterization useful from an intuitive point of view, and we’ll appeal to it from time to time in what follows.

Here then, repeated from the section “The Running Example Revisited” (but now shown in outline only), is the definition of relvar SSSC_DURING:

SSSC_DURING { SNO , SNAME , STATUS , CITY, DURING } KEY { SNO , DURING }

The predicate for this relvar, in detail, is as follows:

DURING denotes a maximal interval of days throughout which all four of the following were true:

a. Supplier SNO was under contract.

b. Supplier SNO was named SNAME.

c. Supplier SNO had status STATUS.

d. Supplier SNO was located in city CITY.

As in the case of the original version of relvar SSSC_SINCE in the previous section, it should be clear right away from this formulation of the predicate that relvar SSSC_DURING isn’t very well designed. To see why not, suppose it contains the following tuple:

image

Apparently, then, supplier S2 was under contract throughout the interval [d02:d04], named Jones throughout that same interval, had status 10 throughout that same interval, and was located in Paris throughout that same interval—a possible state of affairs, perhaps, but in general not a very likely one, surely; surely it would be more likely in general for the intervals during which a given supplier was under contract, had a given name, had a given status, and was located in a given city all to be different. In other words, attribute DURING (the timestamp attribute) “timestamps too much,” as it were; in effect, it timestamps a combination of four separate predicates—supplier is under contract, supplier has name, supplier has status, supplier has city—instead of just a single predicate. So the obvious idea suggests itself: Why not replace the original SSSC_DURING by four separate relvars, each with its own timestamp? The relvars in question would look like this (in outline):

S_DURING { SNO , DURING } KEY { SNO , DURING }
S_NAME DURING { SNO , SNAME , DURING } KEY { SNO , DURING }
S_STATUS_DURING { SNO , STATUS , DURING } KEY { SNO , DURING }
S_CITY_DURING { SNO , CITY , DURING } KEY { SNO , DURING }

Aside: The problems discussed in the previous section with relvar SSSC_SINCE (original version) could be characterized analogously: The timestamp attribute SINCE “timestamped too much.” And we solved that problem by replacing that single SINCE attribute by four separate such attributes. Clearly, however, we can’t solve the problem with relvar SSSC_DURING analogously—that is, we can’t just replace the single DURING attribute by four separate such attributes (but why not?). End of aside.

The example thus illustrates the proposed vertical decomposition, as it applies to relvar SSSC_DURING.5 Observe that, with respect to that decomposition, relvar S_DURING shows which suppliers were under contract when; relvar S_NAME_DURING shows which suppliers had which name when; relvar S_STATUS_DURING shows which suppliers had which status when; and relvar S_CITY_DURING shows which suppliers were located in which city when. Here are some possible sample tuples:

image

Note: Of the four relvars resulting from the foregoing decomposition, relvar S_DURING in particular is, of course, the relvar on which we based a large part of our discussions in Part II of this book. In fact, though, that relvar isn’t actually needed in the “during relvars only” design now under discussion. The reason is that—thanks to a certain constraint that (as we’ll see in Chapter 14) will necessarily be in effect—the information contained in that relvar can always be obtained from any of the other three.6 Nevertheless, we still prefer to include the relvar in our design, partly just for completeness, and partly because a certain degree of awkwardness and arbitrariness would occur if we didn’t.

A New Normal Form

The vertical decomposition proposed in the previous section (of relvar SSSC_DURING into four separate relvars) is very reminiscent in both rationale and effect of classical normalization, and it’s worth taking the time to examine the similarities in some detail. In fact, of course, vertical decomposition is exactly what classical normalization theory has always been concerned with; the decomposition operator in that theory is projection (which is a “vertical” decomposition operator by definition), and the corresponding recomposition operator is join. Indeed, the ultimate normal form with respect to classical normalization theory, fifth normal form or 5NF,7 was originally called projection/join normal form for these very reasons [56].

Now, even before temporal data was studied, some writers—see, e.g., reference [60]—argued in favor of decomposing relvars as far as possible, instead of just as far as classical normalization would suggest. Some even argued that databases should contain binary relvars only. This latter position isn’t really tenable, however. For one thing, unary relvars are sometimes needed. For another, some relvars of degree three or more simply can’t be decomposed into relvars of lower degree by taking projections—or, to be more precise about the matter, they can’t be so decomposed in a nonloss way (it’s crucially important that the decomposition be nonloss, of course). By way of example, consider a ternary relvar SPJ, with attributes SNO, PNO, and JNO, representing a many to many to many relationship among suppliers, parts, and projects (“supplier SNO supplies part PNO to project JNO”). This relvar can’t be nonloss decomposed into projections of lower degree in any way whatsoever.8

Be that as it may, the idea of decomposing relvars as far as possible is motivated by a desire for reduction to the simplest possible terms—in other words, reduction to irreducible components. Now, the argument in favor of such decomposition is perhaps not very strong in the case of a nontemporal relvar like the suppliers relvar from Chapter 1. However, it’s much stronger in the case of a relvar like the fully temporal analog of that relvar (i.e., relvar SSSC_DURING from the previous section). Typically, a supplier’s name, status, and city will vary independently over time. What’s more, they’ll probably vary at different rates as well. For example, it might be that a supplier’s name hardly ever changes, while that same supplier’ s location changes occasionally and the corresponding status changes quite often—and it would be quite annoying, e.g., to have to repeat the name and city information every time the status changes (which we would have to do if we didn’t do the vertical decomposition). Besides, the name history, status history, and city history of a supplier are probably each more interesting and more digestible concepts than the concept of a combined name / status / city history. Hence our proposed vertical decomposition.

There’s another point to be made here, too. With SSSC_DURING as initially defined, we have to use a slightly nontrivial expression to get, e.g., suppliers and their city history:

USING ( DURING ) : SSSC_DURING { SNO , CITY , DURING }

(a U_projection—see Chapter 11). At the same time, the expression to get suppliers and their much less interesting combined name / status / city history is far simpler, consisting as it does of just a simple relvar reference:

SSSC_DURING

By contrast, the suggested vertical decomposition has the effect of making the more interesting queries easier to express and the less interesting ones harder. E.g., to get suppliers and their city history (more interesting):

S_CITY_DURING

And to get suppliers and their combined name / status / city history (less interesting):

USING ( DURING ) : S_NAME_DURING JOIN
   ( USING ( DURING ) : S_STATUS_DURING JOIN S_CITY_DURING )

Or (perhaps a trifle more reasonably, using an n-adic U_join):

USING ( DURING ) : JOIN { S_NAME_DURING, S_STATUS_DURING, S_CITY_DURING }

Join Dependencies and Fifth Normal Form

With the foregoing by way of motivation, we can now take a closer look at what’s really going on here. First let’s review the classical concept of fifth normal form (5NF), which as we said earlier is the ultimate normal form with respect to classical normalization theory. Fifth normal form is based on the concept of join dependencies, which can be defined thus:

Definition: Let H be a heading; then a join dependency (JD) with respect to H is an expression of the form image {X1,X2, …,Xn}—pronounced “star X1,X2, …,Xn”—where X1,X2, …,Xn (the components of the JD) are subsets of H whose union is equal to H. Note: The phrase JD with respect to H can be abbreviated to just JD if H is understood. Note too that different writers use different symbols to denote a JD; we use a special kind of star (“image ”), but the “bow tie” symbol “image” is more often encountered in recent research literature.

Here are some examples of JDs, all of them defined in terms of the heading of the suppliers relvar from Chapter 1:9

image { { SNO , SNAME } , { SNO , STATUS } , { SNO, CITY } }
image { { SNO , SNAME , STATUS } , { SNAME, CITY } }
image { { SNO , SNAME , CITY } , { CITY, STATUS } }

Next we define what it means for a JD to be satisfied by some given relation:

Definition: Let relation r have heading H and let image {X1,X2, …,Xn} be a JD, J say, with respect to H. If r is equal to the join of its projections on X1, X2, …, Xn, then r satisfies J; otherwise r violates J. Note: The term join here must be understood as referring to the n-adic version of that operator, since the JD has n components.

Note carefully that it’s relations, not relvars, that satisfy or violate some given JD. For example, the relation shown as the value of relvar S in Fig. 1.1 in Chapter 1 satisfies the second of the foregoing JDs—

image { { SNO , SNAME , STATUS } , { SNAME, CITY } }

—and violates the third:

image { { SNO , SNAME , CITY } , { CITY, STATUS } }

Finally, we define what it means for a JD to hold in some given relvar:

Definition: Let relvar R have heading H and let J be a JD with respect to H. Then J holds in relvar R if and only if every relation that can validly be assigned to relvar R satisfies J. The JDs that hold in R are the JDs of R.

Note carefully that it’s relvars, not relations, for which some given JD holds or not. For the suppliers relvar S from Chapter 1, for example, the first in our list of three JDs—

image { { SNO , SNAME } , { SNO , STATUS } , { SNO , CITY } }

—does hold, but this one (the second in our list) doesn’t:

image { { SNO , SNAME , STATUS } , { SNAME , CITY } }

Taken together, what the foregoing definitions mean is that relvar R can be nonloss decomposed into its projections on X1,X2, …, Xn if and only if the JD image {X1,X2, …,Xn} holds in R. And now we can define fifth normal form:

Definition: Relvar R is in fifth normal form (5NF)—also known as projection-join normal form (PJ/NF)—if and only if every JD of R is implied by the keys of R.

But what does it mean for a JD to be “implied by keys”? We need another definition:

Definition: Let relvar R have heading H and let J be a JD with respect to H. Then J is implied by the keys of R if and only if every relation r that satisfies R’s key constraints also satisfies J.

However, this definition, though accurate, isn’t much help with the practical question of determining whether or not some given JD J is in fact implied by the keys of some given relvar R. But it turns out there’s an algorithm, the membership algorithm, that does the job. It works like this. Let relvar R have heading H, and let image {X1,X2, …,Xn} be a JD, J say, with respect to H. Then:

1. If two distinct components of J both include the same key K of R, replace them in J by their union.

2. Repeat the previous step until no further replacements are possible.

At the conclusion of this process, the original JD J is implied by the keys of R if and only if the final version of J contains H as a component. (In practice, of course, we don’t need to go all the way to the bitter end and compute that final version of J—we can quit as soon as a component is produced that’s equal to H.)

Here’s a simple example. Consider the nontemporal suppliers relvar S from Chapter 1. Here’s a JD—let’s call it J1—that (as we know) does hold in that relvar:

image { { SNO , SNAME } , { SNO , STATUS } , { SNO , CITY } }

Observe now that the components {SNO,SNAME} and {SNO,STATUS} both include the key {SNO}. Applying the membership algorithm, therefore, we can replace them by their union {SNO,SNAME,STATUS}. J1 now looks like this:

image { { SNO , SNAME , STATUS } , { SNO , CITY } }

Note that (a) this revised version of J1 is itself a JD with respect to the heading of relvar S, and also that (b) the JD in question does in fact hold in relvar S—two facts that together should give some insight as to how and why the membership algorithm works.

Next, the two components of this latter JD both include the key {SNO}, and so we can replace them by their union, to obtain:

image { { SNO , SNAME , STATUS , CITY } }

But the sole component here is the entire heading of S, and so it follows that JD J1 as originally stated is implied by the keys of relvar S.

For a greatly extended discussion of these matters and much more, please see reference [43].

U_Join Dependencies and Sixth Normal Form

Given that, in Chapter 11, we were able to define (among other things) generalized versions of the projection and join operators, which we called U_projection and U_join, respectively, you won’t be surprised to learn that we can define a generalized version of join dependencies too, which we’ll call U_join dependencies. Here are the relevant definitions:

Definition: Let H be a heading, and let ACL be a commalist of attribute names in which every attribute mentioned (a) is one of the attributes in H and (b) is of some interval type. Then a U_join dependency (U_JD) with respect to ACL and H is an expression of the form USING (ACL) : image {X1,X2, …Xn}, where X1,X2, …,Xn (the components of the U_JD) are subsets of H whose union is equal to H. Note: The phrase U_JD with respect to ACL and H can be abbreviated to U_JD with respect to ACL if H is understood; to U_JD with respect to H if ACL is understood; and to just U_JD if ACL and H are both understood.

Definition: Let relation r have heading H and let USING (ACL) : image {X1,X2, …,Xn} be a U_JD, UJ say, with respect to ACL and H. If r is U_equal to the U_join of its U_projections on X1, X2, …, Xn, then r satisfies UJ; otherwise r violates UJ. Note: The U_equality comparison, the U_join, and the U_projections mentioned in this definition must all be with respect to ACL (i.e., they must all have a prefix of the form “USING (ACL):”). Also, the term U_join here must be understood as referring to the n-adic version of that operator, since the U_JD has n components.

Definition: Let relvar R have heading H and let UJ be a U_JD with respect to H. Then UJ holds in relvar R if and only if every relation that can validly be assigned to relvar R satisfies UJ. The U_JDs that hold in R are the U_JDs of R.

By way of example, the following U_JD clearly holds in relvar SSSC_DURING:

USING ( DURING ) : image { S_DURING , S_NAME_DURING , S_STATUS_DURING , S_CITY_DURING }

Explanation: Here, just for the moment, we’re using the names S_DURING, etc., to refer to the corresponding U_projections, thus:

S_DURING

  def¯¯image USING ( DURING ) : SSSC_DURING { SNO , DURING }

S_NAME_DURING

  def¯¯image USING ( DURING ) : SSSC_DURING { SNO , SNAME , DURING }

S_STATUS_DURING

  def¯¯image USING ( DURING ) : SSSC_DURING { SNO , STATUS , DURING }

S_CITY_DURING

  def¯¯image USING ( DURING ) : SSSC_DURING { SNO , CITY , DURING }

What’s more, of course, since the foregoing U_JD does hold, it follows that the decomposition of SSSC_DURING into these four U_projections is nonloss. So our recommended vertical decomposition is certainly safe—and, of course, it enjoys certain advantages (as we saw in the previous section), which is precisely why it’s recommended.

Aside: In fact, the S_DURING component in the foregoing U_JD could be dropped without loss—i.e., the resulting U_JD would still hold in SSSC_DURING—because that component is equal to the U_projection on SNO and DURING of each of the other three components (where the U_projections are, of course, each with respect to DURING).10 It follows that we could if we liked decompose the relvar into just three projections instead of four. However, for reasons explained earlier (see the note at the very end of the section “During Relvars Only”), we prefer to keep the S_DURING projection, even though it’s redundant in the foregoing sense. What’s more, we’ll be recommending later that we switch from the “during relvars only” design to one involving a mixture of both since and during relvars, and then S_DURING will no longer be redundant anyway11—at least, not in the same sense. See Chapter 14 for further explanation of this point. End of aside.

Note, by the way, that classical join dependencies are a special case of the generalized version (i.e., U_join dependencies) as defined above, and so we can legitimately use the same term join dependency to refer to both. To be specific, a classical JD is basically just that special case of a U_JD in which the USING specification mentions no attributes at all (and we’ll allow that specification and colon separator to be omitted in that case). Thus, we no longer need to talk explicitly about U_JDs, as such, at all (and we no longer will, except occasionally for emphasis). Please note carefully, therefore, that throughout the rest of this chapter we’ll take all references to JDs to refer to the generalized version as described above, unless the context demands otherwise. For clarity, however, we’ll occasionally make use of the explicit qualifiers regular (or classical) and generalized, as applicable, and we’ll sometimes use the explicit U_ qualifier for the same reason.

Given all of the above, we can now define a new (“sixth”) normal form:12

Definition: Relvar R is in sixth normal form (6NF) if and only if every JD of R is trivial—where a JD is trivial if and only if one of its components is equal to the pertinent heading in its entirety.

Note that it’s immediate from the definition of “implied by keys” that a JD is certainly—in fact, trivially—implied by the pertinent keys if the JD in question is trivial. As a consequence, it’s immediate from the definition of sixth normal form that every 6NF relvar is also in 5NF. (It’s also immediate that a relvar is in 6NF if and only if it’s irreducible, in the sense of that term explained earlier.) And since regular projection is a special case of U_projection and regular join is a special case of U_join, it really does make sense to think of 6NF as another level of normalization, over and above 5NF. (Indeed, it makes sense, in a way, to think of 6NF as a “tighter” version of projection/join normal form.) And that’s why the name “sixth normal form” is appropriate.

Observe now that relvar SSSC_DURING isn’t in 6NF, because as we saw earlier the following JD (actually a U_JD) holds—

USING ( DURING ) : image { S_DURING , S_NAME_DURING , S_STATUS_DURING , S_CITY_DURING }

—and this JD is certainly nontrivial. However, the relvar is in 5NF, since the only JDs that hold (including the one just shown in particular) are implied by the sole key {SNO,DURING}.13

Aside: As noted (in effect) in an earlier aside, the nontrivial U_JD USING (DURING) : image{SND,STD,SCD}—where SND is an abbreviation for S_NAME_DURING, and similarly for STD and SCD—also holds in SSSC_DURING. In fact, the classical (also nontrivial) JD image{SND,STD,SCD} holds as well! That’s because (a) the sole key is {SNO,DURING}; (b) as explained in Chapter 3, therefore, every attribute of the relvar is functionally dependent on {SNO,DURING}; hence, (c) the relvar can be nonloss decomposed into its (regular) projections on {SNO,SNAME,DURING}, {SNO,STATUS,DURING}, and {SNO,CITY,DURING}. Note, however, that while {SNO,DURING} would be a key for each of those projections, it wouldn’t be a “U_key” (see Chapter 13.) End of aside.

Finally—as you must surely have been expecting—we propose that, as a general rule, a temporal relvar like SSSC_DURING that’s not in 6NF should be nonloss decomposed into U_projections that are. (In our example, of course, the projection relvars S_DURING, S_NAME_DURING, S_STATUS_DURING, and S_CITY_DURING are all indeed in 6NF, as you can easily confirm for yourself.)

“The Moving Point Now

Throughout our discussions prior to this point, we’ve assumed, reasonably enough, that history as conventionally understood starts at “the beginning of time” and continues up until the present time. In particular, we’ve assumed that time intervals in during relvars can stretch from any point b to any point e, where be and e ≤ the present time.14 However, we’ve also assumed that that present time is represented as some explicit value (which we’ve taken to be d10—day 10—whenever we needed to show concrete examples), and that assumption isn’ t reasonable at all. In particular, it suggests that whenever time marches on, so to speak, the database must be updated accordingly; in the case at hand, for example, it suggests that at midnight on day 10 every “present time” appearance of d10 must somehow be replaced by an appearance of d11, instantaneously (because those appearances of d10 didn’t really mean day 10 per se, they really meant until further notice). A different example, involving intervals of finer granularity, might require such updates to be performed as often as every millisecond!

Aside: Actually it should be obvious that using an explicit value such as d10 to stand for until further notice makes no sense, because it’s ambiguous. That is, there’s no way to tell, in general, whether a given appearance of that value really means what it says or whether it’s supposed to be understood as until further notice. End of aside.

Considerations such as the foregoing have led some writers—see, e.g., reference [19]—to propose the use of a special marker, which we’ll call NOW, to denote what in Chapter 4 we called “the moving point now” (in other words, to stand for until further notice). The basic idea is to permit that special marker to appear wherever both (a) a value of the applicable point type is permitted and (b) the desired interpretation is indeed until further notice. Under that scheme, for example, relvar S_DURING might contain a tuple for supplier S1 with a DURING value of [d04:NOW] instead of [d04:d10]. (Of course, we’re assuming here that the appearance of d10 in the original interval [d04:d10] was indeed supposed to represent “the moving point now”—i.e., to denote until further notice—and not to refer to day 10 as such.)

Others, however (the present writers included), regard the introduction of that NOW marker as an incautious departure from sound relational principles. In particular, noting that NOW is really a variable, we observe that the approach involves the very strange—we would say logically indefensible—notion of values (interval values, in the case at hand) containing variables. In fact, the NOW construct bears some resemblance to the “null” construct of SQL, inasmuch as nulls too lead to the notion of values containing something that’s not a value. The present writers among many others have long argued that nulls are and always were a huge mistake (see, e.g., references [24] and [34], also reference [90]), and we certainly don’t want to repeat a mistake of such magnitude if we can help it.

By the way, note that if the DURING value in (let’s assume) the unique tuple for supplier S1 in relvar S_DURING really is [d04:NOW], then the result of the query “When does supplier S1 ’s contract terminate?” must be until further notice (i.e., “NOW”), not whatever the date happens to be today. For if the system actually evaluates that NOW at the time the query is executed and responds with (say) the value d10, then that response is clearly incorrect, since supplier S1 ’s contract has in fact not yet terminated. Furthermore, if the result is indeed NOW, then that NOW must be interpreted as meaning “some indefinite point in the future,” an interpretation that doesn’t fit well with most people’ s intuitive understanding of the word now. What’s more, if the query is issued from within some application program, then that NOW will have to be returned to some program variable—so what exactly will the value of that variable be after that NOW has been assigned to it? And what data type would that variable have to have?

Here are some further examples of the kinds of questions, arising from the notion of NOW, that you might care to ponder:

■ (The creeping delete problem.) Let i be the interval [NOW:d14], let t be a tuple containing i, and let today be day 10. Then tuple t can be thought of as a kind of shorthand for five separate tuples, containing the unit intervals [d10:d10], [d11:d11], [d12:d12], [d13:d13], and [d14:d14], respectively. But when the clock reaches midnight on day 10, the first of these tuples is (in effect) automatically deleted! Likewise for day 11, day 12, and day 13 … and what exactly happens at midnight on day 14?

■ What’s the result of the comparison d10 = NOW? Note: Some might suggest that the result should be unknown (“the third truth value”)—a suggestion that takes us straight back into the nulls mess, of course, a possibility we reject outright.

■ What do the expressions “NOW+1” and “NOW−1” return?

■ If i1 and i2 are the intervals [d01:NOW] and [d06:d07], respectively, do they meet, or overlap? Can we form their union?

■ What’s the result of unpacking a relation containing a tuple in which the interval attribute on which the unpacking is to be done has the value [d04:NOW]? In particular, does the result contain a tuple in which that attribute has the value [NOW:NOW]?

■ What’s the effect of assigning the interval [d01:NOW] to the variable I1? And then (perhaps the next day) assigning it to another variable I2? And then performing an “=” comparison on I1 and I2?

■ What’s the cardinality of the set {[d01:NOW],[d01:d04]}?

And so on (the list is far from exhaustive). We believe it’s impossible to give coherent answers to questions like these; clearly, we would prefer an approach that doesn’t rely on any such suspect notions as the NOW marker and values containing variables.

But, of course, if we limit ourselves to a design consisting of during relvars only, then we must put something in the database to represent until further notice when until further notice is really what we mean. Once again, consider the case of a supplier whose contract hasn’t yet terminated. As explained in the section “Since Relvars Only,” such a supplier can be regarded as having a contract that currently extends all the way into the future, right up to the very last day. Clearly, therefore, we can explicitly specify the END (DURING) value for such a supplier as the last day, and then replace that artificial value by the true value when the true value later becomes known.15 (Of course, we’re assuming here that we don’t currently know when the contract will terminate. If we do know, there’s no problem.) But note that this approach does mean that if “the last day” appears in the result of a query, then the user will—probably, but not necessarily (?)—have to interpret that value to mean until further notice, not the last day per se.

To conclude this section, we’d like to stress that the previous paragraph merely describes one possible approach to the problem of “the moving point now.” Describing is not the same as recommending! In general, we feel it’s a bad idea to record information in the database that’s known to be incorrect—and, of course, an explicit statement to the effect that some contract won’t terminate until “the end of time” is certainly incorrect (or, at least, so one would normally assume). In fact, recording information in the database that’s known to be incorrect could be regarded as a violation of another fundamental principle: namely, the principle that tuples in the database are supposed to represent propositions we believe to be true. Don’t tell lies! As already indicated, however, if our design consists of during relvars only, we might be forced to tell this particular lie on occasion—a fact that in itself might be seen as a good reason to opt for the combination design to be discussed in the section immediately following.

Both Since and During Relvars

So far in this chapter, we’ve described two possible design approaches, one based on since relvars only and one based on during relvars only. And we’ve seen that both of those approaches have problems. To be specific, the first has the problem that it doesn’t let us keep proper historical records, and the second has the problem that it forces us to deal with “the moving point now” in an unpleasantly ad hoc way. So why not combine the two approaches to get the best of both? In this section, we explore this possibility.

We begin with the observation that there’s an obvious logical difference between historical and current information. To be specific, for historical information, the begin and end times are both known; for current information, by contrast, the begin time is known but the end time isn’t. (Actually these claims are both somewhat oversimplified, but we’ll stay with them for the time being.) And this difference suggests rather strongly that there should be two different sets of relvars, one for the current state of affairs and one for the history. (After all, there are certainly two different sets of predicates.) Here in outline is what the design for our running example will look like if we adopt this approach (remember that we’re still considering suppliers only and ignoring shipments):

S_SINCE { SNO , SNO_SINCE , SNAME , SNAME_SINCE , STATUS , STATUS_SINCE , CITY , CITY_SINCE }
    KEY { SNO }
S_DURING { SNO , DURING }
    KEY { SNO , DURING }
S_NAME_DURING { SNO , SNAME, DURING }
    KEY { SNO , DURING }
S_STATUS_DURING { SNO , STATUS , DURING }
    KEY { SNO , DURING }
S_CITY_DURING { SNO , CITY , DURING }
    KEY { SNO , DURING }

To elaborate briefly:

■ Relvar S_SINCE is the sole since relvar. It’s exactly the same as relvar SSSC_SINCE (the revised version with four “since” attributes as discussed in the section “Since Relvars Only” earlier in the chapter), except that we’ve abbreviated the name. Note: Don’t confuse this new S_SINCE relvar with the relvar of the same name from Chapter 5. Also, note that this new S_SINCE relvar is in 5NF but not 6NF.

■ The relvars with DURING in their name are the during relvars, of course, and they’re as discussed in the section “During Relvars Only” earlier in the chapter, except for one crucial difference: They quite definitely do not contain any tuples with artificial end times, to be interpreted as until further notice (thus, they don’t violate the principle that tuples in the database are supposed to represent propositions we believe to be true). Rather, all information that would previously have been represented by such tuples is now represented by tuples in relvar S_SINCE instead.

This suggested division of temporal data into “since” and “during” pieces is the horizontal decomposition process first mentioned in the section “The Running Example” near the beginning of the chapter.16 As you can see, that process isn’t nearly as cut and dried—it’s not as formal—as the vertical decomposition process discussed in earlier sections; in terms of our running example, however, you can think of it as working as follows. Suppose we start with the original fully temporal relvar SSSC_DURING from the section “The Running Example,” which we assume does, in general, contain information about the current state of affairs (and the future) as well as information about the past. Then:

■ First, we introduce a “since” relvar S_SINCE with an attribute for every attribute of SSSC_DURING other than the DURING attribute itself.

■ Second, we associate with each attribute of S_SINCE as just defined a corresponding “since” attribute.

■ Third, relvar SSSC_DURING is now to be understood as a pure “during” relvar (it won’t contain tuples with artifical end times), and we can now go on to decompose that relvar vertically into 6NF projections as discussed earlier in the chapter.

Now we return briefly to those two claims, regarding the difference between historical and current information, that we said earlier in this section were slightly oversimplified. The first was that for historical information the begin and end times are both known. Sometimes, however, we don’t know the end time, or possibly even the begin time, for such information; for example, we might know supplier S1 was once under contract, but not exactly when. Of course, such a state of affairs is basically just a specific example of the general (and generally vexing) problem of missing information. This isn’t the place to get into a detailed discussion of that general problem;17 instead, we content ourselves with invoking Wittgenstein’s famous dictum Wovon man nicht reden kann, darüber muss man schweigen (“Whereof one cannot speak, thereon one must remain silent”), which we interpret to mean, in the context at hand, that another good general principle is that it’s a bad idea to state explicitly in the database that there’s something you don’t know. Record only what you know!

Our second claim was that for current information the begin time is known but the end time isn’t. But sometimes we do know the end time after all, even for current information—see, e.g., the VACATION example mentioned briefly in the section “During Relvars Only.” In such a case, we can adopt the approach described in that section and keep during relvars only, discarding the since relvars entirely.

Considerations such as the foregoing show that the question of which design approach is best in any given situation will depend on circumstances. The combination scheme, involving both since and during relvars, is surely the most general, and is our preferred approach overall. But it’s unfortunately true that the combination scheme does have the potential to complicate constraints, queries, and updates somewhat, possibly quite considerably—essentially because it’ll often be the case that those constraints, queries, and updates have to span both since and during relvars. In the next four chapters, we’ll explain some of those complications and offer some suggestions as to how the difficulties might be alleviated in practice.

Concluding Remarks

This chapter has been concerned rather more than most of its predecessors with temporal data specifically. The principal reason for this state of affairs is that “the moving point now” is a concept that applies to temporal data specifically (other kinds of data for which the concepts of previous chapters apply—the interval concept in particular—typically have nothing analogous). And it’s that concept of “the moving point now” that led us, eventually, to our preferred approach to design as described in the previous section. To review that approach very briefly:

■ We suggest that horizontal decomposition be used to separate “current” and “historical” information. Note, incidentally, that this separation is essentially the separation already found in many installations today between operational data and the data warehouse.

■ We suggest that since relvars be normalized in accordance with classical normalization theory to 5NF.18 We observe, however, that such a relvar will typically require several “since” attributes, and we’ve found it necessary to give a very careful interpretation for such attributes.

■ We suggest that vertical decomposition be used to break during relvars down into irreducible (6NF) components.

A couple of further points:

■ All of our examples of during relvars in this chapter have involved keys that in turn involved an interval attribute. For example, the sole key for relvar S_DURING is the combination {SNO,DURING}. (That relvar happens to be all key.) But suppose suppliers never get a second chance—in other words, suppose that once a supplier’ s contract terminates, that supplier is never placed under contract again. Then the sole key for S_DURING would be just {SNO}, with no interval component.

■ All of our examples of during relvars have also involved at least one attribute that’s not interval valued. But of course it’s possible for a relvar to consist of, and hence to have a key that consists of, interval attributes only. As a simple example, consider a relvar BLACKOUTS with a single attribute that shows intervals during which a certain airline’s frequent flyer program doesn’t allow award travel.

We close this section, and this chapter, with (as promised) a brief word concerning shipments. Without going into too much detail, it should be fairly obvious that our preferred design for shipments will involve one since relvar and one during relvar, thus:

SP_SINCE { SNO , PNO , SINCE }
  KEY { SNO , PNO }
  FOREIGN KEY { SNO } REFERENCES S_SINCE
SP_DURING { SNO , PNO , DURING }
  KEY { SNO , PNO , DURING }

These relvars are already both in 6NF, and so horizontal decomposition does apply as shown, but vertical decomposition doesn’t. Also, note the foreign key from SP_SINCE to S_SINCE, which reflects the fact that any supplier able to supply some part at some time must be under contract at that time. However, perhaps you can see that this constraint, though certainly necessary, is very far from being sufficient! Indeed, we’ll have a lot more to say about such matters in the next two chapters.

Exercises

12.1 We saw in the body of the chapter that the root problem with each of the relvars SSSC_SINCE and SSSC_DURING as originally defined was that their predicates really consisted of four distinct predicates that were ANDed together. But isn’t the same true of the nontemporal suppliers relvar S from Chapter 1?

12.2 Define sixth normal form (6NF). When is 6NF recommended?

12.3 We recommended in the body of the chapter that “historical” (i.e., during) relvars, at least, should usually be in 6NF. But what would it mean for a nontemporal relvar to be in 6NF?

12.4 Explain in your own words the horizontal and vertical decompositions described in the body of the chapter.

12.5 “Current” (i.e., since) relvars can represent information about the past and future as well as the present. Explain this remark.

12.6 “The moving point now” is not a value but a variable. Discuss.

12.7 Give a realistic example of a relvar consisting of interval attributes only.

12.8 Consider the following revised version of the courses-and-students database from earlier chapters (see, e.g., Exercise 6.10 in Chapter 6):

VAR SINCE_COURSE BASE RELATION

  { COURSENO COURSENO , CNAME NAME , AVAILABLE DATE }

  KEY { COURSENO } ;

VAR OLD_COURSE BASE RELATION

  { COURSENO COURSENO , CNAME NAME , AVAILABLE_DURING INTERVAL_DATE }

  KEY { COURSENO } ;

VAR SINCE_STUDENT BASE RELATION

  { STUDENTNO STUDENTNO , SNAME NAME , REGISTERED DATE }

  KEY { STUDENTNO } ;

VAR STUDENT_HISTORY BASE RELATION

  { STUDENTNO STUDENTNO , SNAME NAME , REG_DURING INTERVAL_DATE }

  KEY { STUDENTNO , REG_DURING } ;

VAR ENROLLMENT BASE RELATION

  { COURSENO COURSENO , STUDENTNO STUDENTNO , ENROLLED DATE }

  KEY { COURSENO , STUDENTNO }

  FOREIGN KEY { COURSENO } REFERENCES SINCE_COURSE

  FOREIGN KEY { STUDENTNO } REFERENCES SINCE_STUDENT ;

VAR COMPLETED_COURSE BASE RELATION

  { COURSENO COURSENO , STUDENTNO STUDENTNO , STUDIED_DURING INTERVAL_DATE , GRADE GRADE }

  KEY { COURSENO , STUDENTNO } ;

The predicates are as follows:19

■ SINCE_COURSE: Course COURSENO, named CNAME, has been available since date AVAILABLE.

■ OLD_COURSE: Course COURSENO, named CNAME, was available throughout interval AVAILABLE_DURING.

■ SINCE_STUDENT: Since student STUDENTNO, named SNAME, registered with the university on date REGISTERED.

■ STUDENT_HISTORY: Student STUDENTNO, named SNAME, was registered with the university throughout interval REG_DURING.

■ ENROLLMENT: Student STUDENTNO enrolled on course COURSENO on date ENROLLED.

■ COMPLETED_COURSE: Student STUDENTNO attended course COURSENO throughout interval STUDIED_DURING, achieving grade GRADE.

No course number appears in both SINCE_COURSE and OLD_COURSE.

a. For each relvar, state whether or not it’s in 6NF. If it isn’t, identify any problems that might be solved by decomposing it accordingly.

b. Write a query to obtain a relation showing, for each student, the number of courses completed during each registration interval for that student.

c. Assume that for each course there are zero or more offerings, each taking place over a given interval of time. Distinct offerings of the same course can take place “back to back” or even at the same time, either partly or completely. Some offerings have already taken place; others are currently under way (but have a scheduled completion date); others are scheduled to start at some future time (but, again, have a scheduled completion date). When students enroll on a course, they must specify which offering they’re enrolling for. Each offering has a quota, and the number of students enrolled on that offering mustn’t exceed that quota. Write the predicate and an appropriate relvar definition for a relvar COURSE_OFFERING to reflect these requirements.

d. Write a query to get STUDENTNO-DURING pairs for students who have completed at least one course, where DURING designates a maximal interval during which the specified student was attending at least one course.

Answers

12.1 Yes, it does, and it’s precisely for that kind of reason that (as noted in the body of the chapter) some writers have argued that even nontemporal relvars should be decomposed all the way down to irreducible components. Certainly such a decomposition is logically cleaner, in a sense. On the other hand, nontemporal relvars (and even temporal relvars, so long as they’re “current” and not “historical”) that are in 5NF but not 6NF don’t seem to give rise to the same kinds of—or, at any rate, as many—practical problems as non6NF during relvars do, which is why we advocate the position we do: namely, 6NF for during relvars, 5NF for everything else.

12.2 Relvar R is in 6NF if and only if no JDs hold in R apart from trivial ones. Decomposition to 6NF is generally recommended if all four of the following conditions hold for R (which in practice they usually will, if R is a during relvar that isn’t in 6NF already):

■ R has a key K and at least two additional attributes.

■ Those additional attributes don’t include a key of R.

■ K contains an interval attribute A.

■ We want R to be kept packed on A (see Chapter 13).

12.3 Such a relvar is subject to no JDs at all apart from trivial ones (see the answer to the previous exercise)—but here “JDs” must be understood as purely classical ones, not the generalized U_JDs defined in the body of this chapter. Such a relvar is irreducible, in the sense that it can’t be nonloss decomposed at all, other than trivially. Note: It’s easy to see that such a relvar R is in 6NF if and only if it’s in 5NF, is of degree n, and has no key of degree less than n-1. Observe, therefore, that:

a. Despite what we said in the body of the chapter regarding 5NF, it’s really 6NF, in a sense, that’s the ultimate normal form with respect to normalization as usually understood.

b. As noted in the body of the chapter, every 6NF relvar is in 5NF.

12.4 First let’s consider relations, not relvars. In general, decomposition of a relation r is the act of deriving, from r and r alone, a collection of n relations r1, r2, …, rn such that r1, r2, …, rn together represent exactly the same information as r does (so the decomposition is nonloss).20 Every relation can be “nonloss decomposed”;21 however, trivial cases where some ri isn’t needed in the process of reconstructing r from r1, r2, …, rn—including in particular cases where some ri is equal to r—are normally not considered.
Decomposition is also usefully, and in the design context more importantly, applicable to relvars. If it’s the case that, at all times, the relation assigned to relvar R can be nonloss decomposed in a nontrivial way, then R can be discarded in favor of the relvars R1, R2, …, Rn that correspond to the decomposition in question. Moreover:

■ If each of R1, R2, …, Rn has the same heading as R itself, then the decomposition is horizontal. In this case, the operation to derive R1, R2, …, Rn from R might be restriction, in which case the recomposition operator is union.22 Note: A special form of horizontal decomposition is recommended in the temporal context, to separate “since” and “during” information. In this variation, the heading of the since relvar SR isn’t exactly the same as the heading of the relvar R that’s to be decomposed. To be specific, the interval valued “during” attribute of R is replaced in SR by a point valued “since” attribute, where the declared type of that “since” attribute is the point type of the interval type of the corresponding “during” attribute.23

■ If R1, R2, …, Rn all have different headings (whose union is necessarily the heading of R), then the decomposition is vertical. In this case, the operation to derive R1, R2, …, Rn from R is projection and the recomposition operation is join—where, in the temporal context, the terms projection and join refer to the generalized versions of those operators as defined in Chapter 11. Note: If, in every nonloss vertical decomposition of R, at least one of R1, R2, …, Rn is guaranteed always to be equal to R, then R is said to be irreducible.

12.5 Perhaps the best way to explain how since relvars can represent past and future information is by means of examples:

■ Past information might be represented by an attribute whose value represents the time at which a certain event took place. If it can be assumed that a certain consequential state of affairs has existed ever since that date, then present information and further past information are both implicitly represented by that same attribute. If it can be further assumed that that state of affairs will continue to exist until some date in the future, then future information is also implicitly represented. For example, an employee’s date of hire indicates that the employee in question has been on the company’s payroll ever since that date. If d is some date between the date of hire and the end of time inclusive, then the past, present, or future information that the employee was, is, or will be on the payroll on day d is implicitly represented. By contrast, the information that the employee was on the payroll on the actual date of hiring is explicitly represented.

■ Future information might be explicitly represented by something like a planned retirement date, representing information that on that day (explicitly) and every day after that date (implicitly) the employee will no longer be on the company’s payroll.


In all cases, the since relvar in question represents our current beliefs about the past, present, or future (see Chapter 17 for further discussion of this point).

12.6 “The moving point now” is a variable precisely because it “moves”—it doesn’t represent the same value at all times. If some object x denotes different values at different times, then that object x is a variable by definition. Because now is a variable, it’s a solecism to entertain any notion of a relation (which is a value, by definition) containing it.

12.7 Here’s an example of such a relvar with three interval attributes:

VAR RECOMMENDED_WEIGHTS BASE RELATION { HEIGHTS INTERVAL_HEIGHT , AGES INTERVAL_AGE , WEIGHTS INTERVAL_WEIGHT }

      KEY { HEIGHTS , AGES } ;


The intended interpretation is: HEIGHTS denotes a maximal range of heights, and AGES denotes a maximal range of ages within that range of heights, such that for people whose height is in that range of heights and whose age is in that range of ages, WEIGHTS denotes the maximal range of recommended weights. Note: We’re assuming here that the relvar is kept packed on AGES within HEIGHTS. See Chapter 13 for further explanation.

12.8 

a. Relvar ENROLLMENT is in 6NF, but none of the others is, and that fact can certainly lead to problems. By way of example, consider relvar STUDENT_HISTORY. It should be clear that the timestamp REG_DURING in that relvar does indeed “timestamp too much,” and hence that it would be better to replace the relvar by two 6NF relvars, one showing who was registered when and one showing who had what name when. (If that were done, it might also be a good idea to add a NAME_SINCE attribute to SINCE_STUDENT, analogous to the STATUS_SINCE attribute of relvar S_SINCE.) The situation with each of the other four relvars (SINCE_COURSE, OLD_COURSE, SINCE_STUDENT, and COMPLETED_COURSE) is analogous.

b. WITH ( t1 : = EXTEND SINCE_STUDENT : { REG_DURING := INTERVAL_DATE ( [ REGISTERED : LAST_DATE ( ) ] ) } { ALL BUT REGISTERED } ,
    t2 := t1 UNION STUDENT_HISTORY ,
    t3 := t2 JOIN COMPLETED_COURSE ,
    t4 := t3 WHERE END ( STUDIED_DURING ) ∈ REG_DURING ) :
EXTEND t4 { STUDENTNO, REG_DURING } : { NUMBER_OF_COURSES_COMPLETED := COUNT ( !!t4 ) }

c. Predicate: Offering OFFERINGNO of course COURSENO ran or is scheduled to run from the begin point of DURING to the end point of DURING, inclusive, and was or is restricted to QUOTA students.
Relvar definition:
VAR COURSE_OFFERING BASE RELATION
  { COURSENO COURSENO , OFFERINGNO INTEGER , QUOTA INTEGER , OFFERED_DURING INTERVAL_DATE }
  KEY { COURSENO , OFFERINGNO } ;

d. ( USING ( STUDIED_DURING ) : COMPLETED_COURSE { STUDENTNO, STUDIED_DURING } ) RENAME { STUDIED_DURING AS DURING }


1In order to head off certain questions that might already be occurring to you, we repeat that we’ll have a lot more to say about key constraints and related matters in the next two chapters.

2Which raises an interesting question in passing: namely, how does one draw an E/R (“entity/relationship”) diagram for a temporal database? In this connection, you might like to ponder the following lightly edited remarks from reference [108], which are, as you’ll observe, quite clearly at odds with our own position on this matter: “In the approach we espouse, we initially ignore the time-varying nature of the applications. We focus on capturing the current reality and temporarily ignore any history that may be useful to capture. This selective amnesia somewhat simplifies what is often a highly complex task of capturing the full semantics. An added benefit is that existing design methodologies apply in full. Only after the full design is complete do we augment the E/R schema with time-varying semantics.”

3We remind you from Chapter 4 that we take expressions of the form “ever since day d” to mean “ever since and not immediately before day d.”

4We said at the beginning of this chapter that we would be deferring most consideration of constraints to the next two chapters. However, this particular constraint is such an obvious one that it seemed worth mentioning right away.

5Vertical decomposition is discussed in the literature—under the name “horizontal splitting”!—in reference [57]. The concept seems to have originated in reference [64].

6In other words, relvar S_DURING is redundant in the design currently under discussion. To be specific, it’s subject to the constraint that its value at any time t is equal to the U_projection (using DURING) on {SNO,DURING} of the value of each of the other three relvars at that time t. Note, however, that this constraint will no longer hold when we get to the combination design to be discussed later in the chapter, in the section “Both Since and During Relvars.”

7It’s true that fifth normal form is, as stated, the ultimate normal form with respect to classical normalization theory (but see Exercise 12.3), and it’s certainly sufficient for eliminating the kind of redundancy that classical normalization is intended to eliminate. What’s not true is that it’s necessary for that purpose! Rather, a recently defined and strictly weaker normal form called ETNF (“essential tuple normal form”) is sufficient for the purpose [30]. In practice, however, it would be extremely unusual for a relvar that’s in ETNF not to be in 5NF. For the sake of the present discussion, therefore, we adopt the harmle ss fiction that as far as normalization as conventionally understood is concerned, 5NF is indeed the target to be aimed for.

8Unless a certain rather unlikely “cyclic” constraint happens to be in effect, as described in, e.g., reference [43].

9Note very carefully that we’re not saying here that these JDs are all satisfied by the suppliers relation shown in Fig. 1.1 (in fact, the first two are but the third isn’t, as we’ll see). Nor are we saying that these JDs all hold in relvar S (in fact, the first does but the second and third don’t).

10We made the same point earlier, in footnote 6. Note that we can say “equal” here, rather than “U_equal,” because we’re assuming that relvar S_DURING is kept packed on DURING. See Chapter 13 for further discussion.

11We touched on this point also in footnote 6.

12Sixth normal form was first defined, by the present authors, in this book’s predecessor [53].

13The membership algorithm, defined earlier for testing whether a regular JD is implied by keys, applies to U_JDs as well.

14In other words, we’ve assumed that those during relvars don’t contain any explicit information regarding the future (i.e., no DURING interval [b:e] is such that either b or e is in the future). Please note carefully that this assumption is not valid in general; we make it here only because (a) it simplifies the subsequent discussion and (b) more important, because it’s safe to do so (i.e., it doesn’t invalidate or materially affect in any way the conclusions to be drawn from that discussion).

15So here—to spell out the obvious—we’re dropping our assumption that no DURING interval [b:e] is such that either b or e is in the future.

16Horizontal decomposition is also discussed (under the name “temporal partitioning”) in reference [108]. The concept seems to have originated in reference [5], and an extensive discussion appears in reference [1]. However, the primary emphasis in all of those references seems to be on physical storage issues rather than issues of logical design—for example, the title of reference [1] is “Partitioned Storage Structures for Temporal Databases” (emphasis added). Note: The literature sometimes refers to since and during information as “the current database” and “the temporal pool,” respectively.

17A relational approach to that problem—i.e., one that (unlike SQL’s nulls scheme) doesn’t violate relational principles—is described in reference [27].

18Or at least ETNF.

19In somewhat simplified form. Note in particular that all intervals involved should be understood to be maximal.

20Be aware, however, that the term nonloss decomposition is usually taken to refer, informally, to vertical decomposition specifically.

21For some relations, however, the only possible decompositions might be trivial ones. Note in particular that every relation can be “nonloss decomposed,” trivially, into just its identity projection.

22If the restriction conditions for deriving R1, R2, …, Rn from R ensure that R1, R2, …, Rn are pairwise disjoint, then recording the same information more than once is (desirably) avoided, and the recomposition operation becomes disjoint union.

23If distinct since relvars are produced for the same entity type by such a process, they can be replaced by an appropriate join.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.195.97