Chapter 17

Logged Time and Stated Time

This chapter takes a much closer look at two concepts originally introduced in Chapter 4, viz., “valid time” and “transaction time.” Valid time has to do with when we believe (explicitly or implicitly) that some proposition is, was, or will be true. Transaction time has to do with when the database said (explicitly or implicitly) that we believe (again, explicitly or implicitly) that some proposition is, was, or will be true. Valid times are maintained—at least in a relational system—just like any other kind of data; in particular, they’re updatable just like any other kind of data. By contrast, transaction times are maintained by the system; in particular, therefore, they’re read-only (at least from the user’s point of view). The chapter proposes a model for querying transaction times. It also proposes some alternative technology: stated time for valid time, and logged time for transaction time.

Keywords

valid time; transaction time; stated time; logged time

“Why, sometimes I’ve believed as many as six impossible things before breakfast.”

—The White Queen, in Lewis Carroll:

Through the Looking-Glass and What Alice Found There (1871)

In this chapter, we take a much closer look at two concepts we first encountered in Chapter 4, valid time and transaction time. We begin by briefly reviewing some of the ideas originally introduced in connection with those concepts in that earlier chapter.

A Quick Review

Consider the following simple example. Suppose our usual relvar S_DURING currently contains just this single tuple for supplier S2:

image

Then we might say, loosely, that the valid time for the proposition “Supplier S2 was under contract” is the interval from day 2 to day 4.

Suppose further that the foregoing tuple exists in the database, in relvar S_DURING, from time t1 to time t2 (only). Then we might say, again loosely, that the transaction time for the proposition “Supplier S2 was under contract from day 2 to day 4” is the interval from t1 to t2.

Points arising from this example:

■ Valid times refer to something we currently believe to be true (we currently believe the interval during which “Supplier S2 was under contract” was day 2 to day 4). By contrast, transaction times refer to when the database said something was true (the interval during which the database said “Supplier S2 was under contract from day 2 to day 4” was t1 to t2).

■ As the previous bullet item clearly indicates, the two “somethings” in question are, in general, different—that is, they’re two different propositions.

■ Valid times and transaction times are both, more precisely, sets of intervals. In the example, the valid time for the proposition “Supplier S2 was under contract” is the set of intervals {[d02:d04]}. And if relvar S_DURING currently shows that supplier S2 was additionally under contract from day 6 to day 9 (but not at any other time), then the valid time for the proposition “Supplier S2 was under contract” would be the set of intervals {[d02:d04], [d06:d09]}. However, when the set of intervals constituting some particular valid time or transaction time contains just one interval, we often say—for simplicity, albeit rather sloppily—that the valid time or transaction time in question is just that single interval as such.

■ Valid times are updatable, because they represent our beliefs about history, and beliefs can change. By contrast, transaction times represent history as such; hence, they’re not updatable, because history as such is immutable.

■ When we say valid times represent our “beliefs about” history, we mean those beliefs are ones we hold right now … and, of course, what we believe right now can refer to something in the past and/or the present and/or the future. For example, we might believe right now that supplier S2 was under contract from day 2 to day 4, in the past; or we might believe right now that supplier S2 is under contract right now, in the present; or we might believe right now that supplier S2 will be under contract from day 20 to day 40, in the future. Thus, we take the term history, in such phrases as “belief(s) about history,” to include the present and/or future, as applicable, unless the context demands otherwise.

■ By contrast, transaction times can refer only to the past. That is, we can’t possibly say that at some future time the database will say something was true.1 Nor, strictly speaking, can we even sensibly say that right now the database says something is true, because by definition “right now” is always at least a little bit later in time than when we most recently looked to see what the database in fact does say.2

■ For obvious reasons, transaction times also can’t refer to a time in the past earlier than the time when the proposition they apply to was added to the database.

■ Finally, observe that all of the “times” we’ve been dealing with in this book prior to the present chapter (with the exception of those few introductory remarks in Chapter 4) have been valid times specifically.

All of that being said, we now remind you that, as noted in Chapter 4, there’s quite a lot more to be said about these concepts; hence the present chapter. Indeed, as the title of the chapter itself suggests, one of the things we want to do is propose some alternative terms for the concepts, and we’ll do that in the “Terminology” section, later. However, we need to discuss a lot of material of a more fundamental nature first.

One final preliminary remark: The notion of transaction time applies to every relvar in the database (obviously enough).3 By contrast, the notion of valid time applies only to temporal relvars, because—by definition—temporal relvars are the only ones with “valid time” attributes. The rest of this chapter explains just what we mean when we say that either of these two concepts “applies to” some particular relvar!

A Closer Look

Note: The discussions in this section are unavoidably a little complicated, which is why we’ve left them to this late stage in the book. Caveat lector.

Again assume relvar S_DURING contains just this one tuple for supplier S2:

image

Assume as before that:

a. This tuple represents the proposition—let’s call it p—“Supplier S2 was under contract from day 2 to day 4.” In other words, it tells us that the interval from day 2 to day 4 is the valid time for the proposition—let’s call it q—“Supplier S2 was under contract.”

b. The tuple exists in the database from time t1 to time t2 only. In other words, the transaction time for proposition p is the interval from t1 to t2.

Please observe now—important!—that proposition p (“Supplier S2 was under contract from day 2 to day 4”) does not logically imply proposition q (“Supplier S2 was under contract”). Rather, it implies the proposition “Supplier S2 was under contract at some time.”4 Now, if you find what we’ve just said a little difficult to understand, we frankly wouldn’t blame you; but here’s a different example that might make the point more clearly. Let x be the proposition “All politicians are corrupt this year.” Then proposition x does not imply the proposition y “All politicians are corrupt”: certainly not in logic, and possibly not in informal discourse either. (Even if the latter proposition y is in fact true, we still can’t conclude as much, logically, from the truth of proposition x alone.)

To return to proposition p: We’ve said that (a) proposition p doesn’t imply the proposition “Supplier S2 was under contract” but that (b) what it does imply is the proposition “Supplier S2 was under contract at some time,” or (a little more formally) “There exists a time t such that supplier S2 was under contract at time t.” And while an argument might be made from common sense that this latter proposition surely does imply that supplier S2 was under contract, such an argument wouldn’t be a valid logical argument. In fact, proposition q (“Supplier S2 was under contract”) isn’t represented in the database at all, neither explicitly nor implicitly.

Aside: To say some proposition is represented implicitly is to say that (a) no tuple representing that proposition appears in the database explicitly, but that (b) such a tuple can be derived in some way from those tuples that do appear explicitly. For example, a tuple representing the proposition “Supplier S2 was under contract on day 3” can be derived from the tuple representing the proposition “Supplier S2 was under contract from day 2 to day 4.” Thus, if the latter proposition is represented explicitly, then the former proposition is also represented, but implicitly. End of aside.

Now, given that, as we’ve said, the valid time for proposition q is the interval [d02:d04], it makes sense to say the valid time for any proposition that’s implied by q—i.e., that can be derived from q—is also [d02:d04]. For example, since [d02:d04] is indeed the valid time for q (“Supplier S2 was under contract”), then it’s also the valid time for the proposition “There exists some supplier SNO such that supplier SNO was under contract.”

The net of the discussion so far is this: Valid times apply, typically, to propositions that aren’t represented in the database at all!—neither explicitly nor implicitly. We recommend that you make sure you understand this point before reading any further.

Back to the example. Observe now that the statement “The transaction time for proposition p is the interval from t1 to t2” is itself a proposition and can thus be represented as a tuple, like this (using X_DURING to denote transaction time):

image

This tuple, involving as it does two distinct timestamps, one representing valid time and the other transaction time, is an example of what’s sometimes called in the literature a “bitemporal tuple.” Note carefully, however, that the DURING timestamp (valid time) in that tuple applies to one proposition and the XDURING timestamp (transaction time) in that tuple applies to another, different proposition. As the example suggests, therefore, a bitemporal tuple can be thought of (yet again somewhat loosely) as showing the transaction time tt for the valid time vt for some proposition q. See the section “Logged Times” for further discussion.

Now we turn to an example that in some ways is simpler than the ones we’ve been considering so far. Suppose some relvar in the database contains the following tuple:

image

The intended interpretation is, of course, “Triangles have three sides.” Note in particular that this proposition—let’s call it z—has no valid time component at all. But we can certainly think of it as having a valid time! To be precise, it’s certainly the case that we believe right now that proposition z is true right now in the present, always was true in the past, and always will be true in the future. Implicitly, therefore, the valid time for z is always (i.e., “all of time,” or in other words the interval from the beginning of time to the end of time, inclusive).5 And precisely because the valid time is indeed “all of time,” there’s no point in saying so explicitly, by means of an explicit timestamp in the database. Turning this statement around, it’s reasonable to say that, in general, a proposition that’s represented in the database without an explicit valid time has an implicit valid time of “always.”6

Aside: Here’s another example to illustrate the same point. Suppose we want to record the values of various “universal constants”—π, φ, e, c, and so on—in the database; in other words, we want to record propositions of the form “The value of universal constant u is v.” Then it should be clear that, again, explicit valid times make little sense for such propositions. End of aside.

Observe next that every proposition that has ever been recorded in the database, regardless of whether it has or had an explicit valid time, certainly has a transaction time. For example, if the tuple corresponding to proposition z—“Triangles have three sides”—is recorded in the database from time t1 to time t2 (only), then the transaction time for the proposition “z is, was, and always will be true” is the interval from t1 to t2.

Given the foregoing, again consider the following “bitemporal tuple” (let’s call it bt):

image

Just to remind you, the proposition (let’s call it b) represented by this tuple is “The transaction time for proposition p is the interval from t1 to t2,” where proposition p in turn is the proposition “Supplier S2 was under contract from day 2 to day 4.”

Suppose now that tuple bt itself currently appears in the database (which, as we’ll see later, in the section “Logged Times,” it certainly might do). Then the corresponding proposition b has a valid time and a transaction time. The valid time is, implicitly, “always” (certainly it has no explicit valid time). What about the transaction time? Well, presumably tuple bt was—at least conceptually (again see the section “Logged Times”)—inserted into the database at some time t;7 and if tuple bt does indeed, as stated, currently appear in the database, and if it has done so ever since time t, then the transaction time for proposition b is clearly the interval from time t to whatever the time happens to be right now. (Recall that transaction times can’t refer to the future, so there’s no question of saying the transaction time is the interval from time t to “the end of time.”)

With the foregoing examples and discussion providing the necessary intuitive underpinnings, we can now—at last—offer some reasonably precise definitions:

Definition: The transaction time for a proposition p is the set of times t such that, according to what the database stated at time t, p was true. Note that proposition p (a) must have been represented in the database at some time, either explicitly or implicitly, and (b) can itself refer to the past and/or present and/or future.

Definition: The valid time for a proposition q is the set of times t such that, according to what the database currently states (which is to say, according to our current beliefs), q is, was, or will be true at time t. Note that proposition q itself probably isn’t represented in the database, either explicitly or implicitly (though it might be).

In a nutshell:

■ Transaction times are the times in the past when the database said we believed something is, was, or will be true.

■ Valid times are the times (past and/or present and/or future) when, according to what we believe right now, something is, was, or will be true.

Furthermore, we assume for convenience and definiteness that (a) the transaction time for a given proposition p consists of a set of intervals, and that set is effectively kept in packed form; (b) the valid time for a given proposition q likewise consists of a set of intervals, and that set also is effectively kept in packed form. Note: By “effectively kept in packed form” here, we mean that if the set of intervals in question is presented to the user by means of an interval valued attribute A within some relation r, then r is packed on A.

The Database and the Log

Following on from the definitions and explanations of the previous section, we now observe that there’s a significant operational difference between valid times and transaction times, a difference that can be characterized informally as follows:

Valid times are kept in the database, transaction times are kept in the log.

We believe this informal characterization can be a great aid to clear thinking in this potentially confusing area, and we therefore elaborate on it in this section.

First of all, then, note that a database is really a variable; the operation of “updating the database” causes the current value of that variable to be replaced by another value. The values in question are database values, and the variable is a database variable. In other words, the logical difference discussed in Chapter 3 between relation values and relation variables (relvars) applies to databases as well, mutatis mutandis. Here’s a lightly edited but pertinent quote from reference [52]:

The first version of this document drew a distinction between database values and database variables, analogous to the distinction between relation values and relation variables. It also introduced the term dbvar as shorthand for database variable. While we still believe this distinction to be a valid one, we found it had little direct relevance to other aspects of these proposals. We therefore decided, in the interests of familiarity, to revert to more traditional terminology.

Now this bad decision has come home to roost! With hindsight, it would have been much better to “bite the bullet” and adopt the more logically correct terms database value and database variable (or dbvar), despite their lack of familiarity.8

But there’s another way to think about the foregoing. To be specific, a database update can be thought of, not so much as replacing the current database value by another such value, but rather as deriving a “new” database value from the “old” one (thereby making that new value the current value), while at the same time keeping the old value around in the system as well. The “old” database value is, of course, the one that was current immediately prior to the update in question. It follows that the overall database can be thought of as a sequence or stack of database values, where each such value is timestamped with the time of the update that produced it, and the complete sequence or stack is ordered chronologically.9 The most recent value in the sequence is, of course, the current one—it’s the database as it appears right now. And the only kind of update operation we can apply to the overall database, conceptually speaking, is the one that takes the current database value and derives from it a new such value, which then becomes current in turn.

What’s more, that sequence of database values can usefully be thought of as a log; after all, it’s effectively an abstract or idealized version of the recovery log as implemented in real systems, providing as it does a full historical record of every update that has ever been performed on the database. And the most recent log entry—i.e., the most recent database value in the chronological sequence of such values—provides a record of our current beliefs.10

Note: In real systems, of course, log entries typically refer to updates at the level of (e.g.) disk pages, not at the level of the entire database. This state of affairs doesn’t materially affect the discussion, however. Also, we remark in passing that the well known concept of point in time recovery can be seen as a process that makes use of the actual physical log, together with a backup copy of the database, to (re)construct one of the database values that make up our hypothetical imaginary log.

By way of illustration, we return to the example from the section “Valid Time vs. Transaction Time” in Chapter 4, in which a tuple showing supplier S1 as being under contract during some interval was inserted into the database at time t1 and replaced at time t2, and that replacement tuple was then deleted at time t3. Let t2' and t3' be the timestamps of the log entries—i.e., the database values—immediately preceding those for times t2 and t3, respectively. Clearly, then, the log entries for times t1 to t2' will contain the original tuple, while those for times t2 to t3' will contain the replacement tuple, and those from time t3 onward will contain no corresponding tuple at all. (We assume for simplicity that no tuple showing supplier S1 as being under contract appeared in the database either before time t1 or after time t3.) The transaction times for the applicable propositions can thus clearly be obtained from the timestamps associated with the pertinent log entries.

What about the valid times? By now, it should be clear that these times don’t correspond to log entry timestamps at all. Indeed, as we saw in the section “A Closer Look,” if q is a proposition to which the concept of valid time applies, we don’t keep q as such in the database; rather, we keep p, the corresponding timestamped extension of q, in the database. And the timestamps in such timestamped extensions, which denote the applicable valid times, are represented by means of attribute values in the usual relational way. Hence, current valid times—by which we mean, by definition, those times we currently believe to be valid—always appear in the current value of the database, which always corresponds, again by definition, to the most recent entry in the log.

We now proceed to consider a number of further points that arise in connection with the idea that “the log is the real database.” Note: Some of these points have been touched on or at least hinted at previously in this chapter, but it still seems worthwhile to bring them together and spell them out explicitly here.

■ As noted earlier, a proposition p might have a transaction time consisting of several discrete intervals—from t1 to t2, then from t3 to t4, then from t5 to t6, and so on (where t1, t2, etc., are all distinct). That is, times t1, t3, t5, etc., correspond to updates that caused a representation of p to appear in the database (either implicitly or explicitly), while times t2, t4, t6, etc., correspond to updates that caused that representation to disappear again.

■ Likewise, a proposition q might have a current valid time that consists of several discrete intervals. Here, however, the intervals concerned are simply values that currently appear in the database (or are implied by such values). To repeat an earlier example, suppose relvar S_DURING currently contains just the following two tuples for supplier S2:

image


Then the current valid time for the proposition “Supplier S2 was under contract” is the set of intervals {[d02:d04], [d06:d09]}.

■ As our examples and definitions have clearly indicated—note in particular the fact that the definitions are asymmetric—if p is a proposition to which the concept of transaction time applies and q is a proposition to which the concept of valid time applies, then p and q are typically not the same proposition. Certainly it seems to be hard to find a nontrivial proposition that has both a nontrivial associated transaction time and a nontrivial associated valid time.

■ You might be surprised to learn that, in some respects, valid times are less interesting than transaction times. Why so? Well, we’ve seen that valid times are represented by means of attributes in the usual relational way. They can thus certainly be queried. Moreover, when such attributes appear in relvars—as opposed to relations—they can of course be updated as well as queried (loosely speaking); thus, valid times can be changed to reflect changing beliefs. In principle, therefore, valid time attributes aren’t significantly different from attributes of any other kind, at least insofar as their ability to participate in queries and updates (and constraints) is concerned.
What about transaction times? Well, if transaction times really are based on log entry timestamps as suggested above, then it should be clear that, unlike valid times, they’re maintained by the system, not by users. Indeed, we pointed out earlier that transaction times are nonupdatable, meaning that (of course) they certainly can’t be updated by users. However, users do need to be able to query them; for example, we might want to ask—perhaps for audit purposes—“When if ever did the database say that supplier S2 was under contract on day 4?” We might want to reference them in constraints, too.
At least two problems arise immediately from this requirement. First, of course, the system doesn’t really maintain the log in the form we’ve been describing (i.e., as a timestamped sequence of database values); as a consequence, it’s highly unlikely that the system will allow us to formulate queries against the database value at some arbitrary past time t directly. Second, even if we could formulate such a query, the corresponding timestamp t isn’t represented as part of that database value as such but is, rather, a kind of tag that’s associated with that value; as a consequence, we still couldn’t formulate a query—at least, not a relational query—that references that timestamp directly.
For such a query even to be expressible, therefore, the relevant timestamps, along with the information they timestamp, (a) will have to be made available in standard relational form and (b) will have to be made part of the current database value. (They must be made part of the current database value specifically, because, in a sense, the current database value is the only one that “really exists”; certainly it’s the only one that can be directly queried.)11 The section “Logged Times,” later, offers some suggestions as to how these requirements (a) and (b) might be satisfied in practice.

■ It follows from all of the above that—despite what we said in this connection earlier—the imaginary log we’ve been talking about is best thought of not as an abstraction of the usual recovery log, but rather as an audit trail. After all, its purpose is essentially to serve as a history of update events and to allow questions to be asked later about what was done to the database when, and such questions are certainly questions of an auditing nature and have little to do with recovery.

■ We’ve been talking as if updates take effect on the database at the instant at which they’re executed. But that’s not what happens in practice; in practice, updates are effectively applied only if and when the relevant transaction reaches a successful termination (commit point). It follows that the transaction time instant—i.e., the time point—corresponding to a particular update isn’t exactly the time of the update operation per se but is, logically, the time of the corresponding COMMIT operation (suggesting among other things that several distinct updates might all be associated with the same transaction time instant). This state of affairs has a variety of implications and ramifications—some of which aren’t immediately obvious—but the details are beyond the scope of this chapter. See Exercise 19.9 in Chapter 19 (and the answer to that exercise) for further discussion.

Terminology

We’ve now discussed the valid time and transaction time concepts at some length. As indicated in Chapter 4, however, we don’t much care for the conventional terminology in this area—as we said in that chapter, the meanings of the terms can hardly be said to “leap off the page,” as it were—and we’d like to try to come up with some better terms if possible. Such is the purpose of the present section.

First, regarding valid time, we have to say we find little need to refer to the concept at all, precisely because we regard valid times as essentially just regular data. Indeed, as you must surely have noticed, we hardly used the term at all in Chapters 5-16, even though the concept underpinned almost all of the discussions in those chapters.12 These facts notwithstanding, we’d still like to find a term that captures, better than “valid time” does, the essence of what’s really going on, in order that we might have a good term to use on those rare occasions when we do need to refer to the concept explicitly. As for transaction time, here we definitely do need to refer to the concept by name from time to time—pun intended—and so we’d definitely like to find a better term.13

It’s not worth discussing here all of the various alternatives we considered and subsequently discarded. Suffice it to say that, as the title of this chapter suggests, we finally settled on the terms logged time and stated time (for transaction time and valid time, respectively). Thus, to repeat the definitions:

Definition: The logged time for a proposition p is the set of times t such that, according to what the database stated at time t, p was true.

Definition: The stated time—sometimes currently stated time, for emphasis—for a proposition q is the set of times t such that, according to what the database currently states (which is to say, according to our current beliefs), q is, was, or will be true at time t.

However, it’s only fair to warn you that our preferred terms aren’t used much (or at all!) in the temporal database literature in general. Indeed, by way of a kind of postscript to everything we’ve been saying in this chapter so far, we briefly consider some definitions from the literature of the concepts we’ve been discussing. The definitions in question are taken from a document titled, in part, “The Consensus Glossary of Temporal Database Concepts” [62]:

■ “The valid time of a fact is the time when the fact is true in the modeled reality.”

■ “A database fact is stored in a database at some point in time, and after it is stored, it is current until it is logically deleted. The transaction time of a database fact is the time when the fact is current in the database and may be retrieved.”

Here’s our own gloss on these glossary definitions. First, the term fact, which is mentioned several times, doesn’t itself appear in the glossary; in ordinary English, however, facts are usually taken to be true by definition. Indeed, a “false fact” is a contradiction in terms, and so, in the definition of valid time, “when the fact is a fact” might perhaps be more apt than “when the fact is true.” (But of course the fact is a fact is a tautology.) Also, it’s not clear whether any distinction is intended—and if so, what the distinction might be—between the term fact, unqualified (mentioned in the definition of valid time), and the term database fact (mentioned in the definition of transaction time).

Second, the term the modeled reality (also not in the glossary) presumably means what in such contexts is more usually referred to as “the real world.”14 “Valid time” thus apparently means the time when the “fact” was actually true in the real world, not, as we claimed earlier, the time when it was true according to our current beliefs. But if this interpretation is correct, it’s not clear why the term is being defined at all. (We can operate only in terms of what we believe to be true, not in terms of any kind of “absolute” truth, and the definition seems to be saying that valid time does have something to do with some kind of absolute truth.)

Third, the term current (in the transaction time definition) seems to be used in a very strange way. Surely it would be more usual to say that something is “current in the database” if and only if that something was part of the current database value; yet this interpretation is certainly not—it can’t be—what’s intended in the transaction time definition.

Fourth, reference [62] elsewhere says this:

■ “[The] transaction-time lifespan refers to the time when the database object is current in the database.”

Presumably the term database object here is meant as a synonym for what was previously called a database fact (?). But what about transaction time and transaction-time lifespan? Are these terms also synonyms of each other? If they are, then why there are two different terms? And if they aren’t, what’s the difference between them?

All in all, it seems to us that the definitions in reference [62] simply serve to reinforce our earlier contention that the notions of valid time and transaction time need very careful explanation, as well as (preferably) more carefully chosen names. And, of course, the provision of such an explanation and such names has been our aim in this chapter, up to this point. In particular, we feel bound to say that, in our opinion, it’s very hard to come up with such an explanation and such names without first facing up to the idea that a database—more specifically, a database value—is essentially a collection of true propositions, instead of trying to rely on undefined and fuzzy concepts like “database facts” or “database objects.”

Logged Times

Now, we pointed out earlier that even though users can’t update logged times, they certainly must be able to query them. We also pointed out that this requirement implies that:

a. Logged times must be made available (along with the data they refer to) in standard relational form, and

b. They must be made available as part of the current database value.

In this section, we describe one possible way of achieving these objectives in a user friendly fashion.

Essentially, what we propose is that if R is a database relvar, then the definition of R should be allowed to contain a request for the automatic provision and maintenance of an auxiliary relvar that gives the logged time history for that relvar R. For example (note the highlighted text):

VAR S_DURING BASE RELATION
 { SNO SNO , DURING INTERVAL_DATE }
 USING ( DURING ) : KEY { SNO , DURING }
 LOGGED_TIMES_IN ( S_DURING_LOG ) ;

The effect of the LOGGED_TIMES_IN specification is to cause the system to provide a relvar called S_DURING_LOG (the “logged time relvar” for S_DURING) with attributes SNO, DURING, and—let’s agree—X_DURING, and with tuples that together represent the logged times for all of the tuples that have ever appeared, explicitly or implicitly, in S_DURING. (We’ll explain that “explicitly or implicitly” a little later in this section.) For example, suppose today is day 75. Then Fig. 17.1 shows a possible value for S_DURING, together with a possible corresponding value for S_DURING_LOG. Note: We’ve numbered the tuples in the logged time relvar S_DURING_LOG for purposes of subsequent explanation.

image
Fig. 17.1 Relvars S_DURING and S_DURING_LOG–sample values

Explanation:

■ First of all, S_DURING currently says among other things that supplier S2 was under contract from day 2 to day 4. Tuple 1 of S_DURING_LOG tells us that S_DURING also said the same thing at an earlier time, from day 4 to day 7. Tuple 2 of S_DURING_LOG tells us that relvar S_DURING said the same thing again at another earlier time, from day 10 to day 20.

Aside: In a more realistic example, of course, the granularity for X_DURING intervals would almost certainly be, not days, but (for example) microseconds; moreover, no X_DURING interval would ever have an end value exactly equal to “the time right now.” (The most it could ever be is something just slightly earlier than “the time right now.”) We choose to ignore such details here in the interest of simplicity. End of aside.

■ Tuple 3 of S_DURING_LOG tells us that S_DURING has been saying that same thing—i.e., that supplier S2 was under contract from day 2 to day 4—from day 50 to the present day (i.e., day 75). Now, in Chapter 12, in the section “The Moving Point Now,” we argued that representing the present day in the database by an actual value like d75 was a bad idea.15 In particular, we pointed out that it implied that all of those d75’s would somehow have to be replaced by d76’s on the stroke of midnight on day 75. However, those arguments don’t apply in the present situation, because:

a. Relvar S_DURING_LOG doesn’t actually need to exist at all times—it’s sufficient for the system to materialize it when it’s referenced in some query (and even then it surely shouldn’t be necessary to materialize it in its entirety). In other words, a helpful way to think of S_DURING_LOG is as a view—but it’s not a view that’s defined in terms of other relvars in the database; rather, it’s a view that’s defined in terms of the log. Furthermore, of course, the definition of that view in terms of the log is provided by the system, not by some human user.

b. Relvar S_DURING_LOG is updatable by the system but not by ordinary users. Thus, the process of replacing all of those d75’s by d76’s on the stroke of midnight on day 75 is carried out by the system, not by some user (and, of course, that process is probably never carried out physically at all).

c. In any case, the proposition “From day 50 to day 75, the database said that supplier S2 was under contract from day 2 to day 4” is true! Certainly we mustn’t use the artificial “end of time” trick in connection with logged times that we do sometimes use in connection with stated times, because (e.g.) the proposition “From day 50 to the end of time, the database said that supplier S2 was under contract from day 2 to day 4” is false. (Recall once again that logged times can never refer to the future.)

■ S_DURING also currently says that supplier S6 was under contract from day 3 to day 5. By contrast, tuple 4 of S_DURING_LOG tells us that S_DURING said previously (from day 15 to day 25) that supplier S6 was under contract from day 2 to day 5; however, tuple 5 of S_DURING_LOG tells us that ever since day 26, relvar S_DURING has been saying that supplier S6 was under contract from day 3 to day 5.

■ Finally, S_DURING currently has nothing to say about supplier S1 at all. However, tuple 6 of S_DURING_LOG tells us that S_DURING did say from day 20 to day 30 that supplier S1 was under contract, on day 1 only; likewise, tuple 7 tells us that S_DURING also said from day 40 to day 50 that supplier S1 was under contract from day 5 to day 6. (We deduce that all information regarding supplier S1 was deleted from relvar S_DURING on day 31 and again on day 51, presumably because it was discovered to be incorrect.)

Let’s summarize what we’ve learned so far. First, let R be a regular relvar, and let R' be the associated logged time relvar. Let R have interval attributes A1, A2, …, An (only), possibly with other attributes (not interval valued) as well. Then:

■ The heading of R' consists of the heading of R extended with an interval attribute called X_DURING.

■ For every tuple t that has ever appeared in the fully unpacked form of R—see the note immediately following this paragraph for an explanation of what we mean by “fully unpacked” in this context—R' effectively contains m distinct tuples for some m > 0. Each such tuple consists of tuple t extended with an X_DURING value, such that the transaction time for t (or, rather, for the proposition represented by t) is precisely the set containing just those m X_DURING values. R' is kept packed on (A1,A2,…,An,X_DURING). By the way, observe that the packing of R' must be done on attribute X_DURING last, as indicated.

Note: The “fully unpacked form of R” referred to in the foregoing paragraph is the result of UNPACK R ON (A1,A2,…,An). When we said earlier that R’ gives the logged times for all of the tuples that have ever appeared explicitly or implicitly in R, what we were referring to was the set of all tuples t such that t has appeared at some time in that fully unpacked form.

■ Finally, if the constraint USING (A1,A2,…,An) : KEY {K} holds in R, then the constraint USING (A1,A2,…,An,X_DURING) : KEY {K,X_DURING} holds in R'.
As a basis for a second example, we turn to our usual relvar S_SINCE:
VAR S_SINCE BASE RELATION
 { SNO SNO , SNO_SINCE DATE … , STATUS INTEGER , STATUS_SINCE DATE … }
 KEY { SNO }
 LOGGED_TIMES_IN ( S_SINCE_LOG ) ;

Fig. 17.2 shows some sample values. We leave a detailed examination of that figure to you, except that:

■ First, we remind you that we’re assuming that today is day 75.

■ Second, we draw your attention to the S_SINCE_LOG tuple for supplier S2 in which the SNO_SINCE and S_STATUS_SINCE values are both d07, while the X_DURING value is [d06:d75]. What does this combination of values imply? (Answer: On day 6, a tuple was inserted into relvar S_SINCE to say supplier S2 would be placed under contract on day 7—a date in the future, at the time of the INSERT.)

image
Fig. 17.2 Relvars S_SINCE and S_SINCE_LOG–sample values

Queries Involving Logged Time

In this final section of the chapter, we consider a few sample queries involving the logged time relvars S_SINCE_LOG and S_DURING_LOG.

Query X1: When if ever did the database say that supplier S6 was under contract on day 4?

Assume for the sake of this first example that we know the supplier we’re interested in, supplier S6, isn’t currently under contract. Thus, if that supplier was ever under contract at all, it was definitely in the past; in particular, the information that the supplier was under contract on day 4, if currently thought ever to have been true, will appear in S_DURING. Of course, it won’t still appear in S_DURING if it was previously thought to be true but was subsequently discovered not to be, because in that case the pertinent tuple(s) will have been deleted. Either way, however, the pertinent logged time information will certainly be present in S_DURING_LOG. So let’s take a look at the sample value for S_DURING_LOG in Fig. 17.1. Here’s what we get if we restrict that sample value to just the tuples showing supplier S6 as being under contract on day 4:

image

So here’s a suitable formulation of the query:

WITH ( t1 := S_DURING_LOG WHERE SNO = SNO ( 'S6' ) AND d04 ∈ DURING ) : USING ( X_DURING ) : t1 { X_DURING }

And here’s the result:

image

Query X2: When if ever did the database say that supplier S2 was under contract on day 4?

The query has the same form as Query X1—the only difference is in the supplier number. But this time let’s assume we don’t know whether the supplier we’re interested in is currently under contract;16 thus, the information we want (or some of it, at any rate) might be in S_SINCE_LOG. Let’s assume also that we don’t know whether that supplier ever had a previous contract, so the information we want (or, again, some of it) might be in S_DURING_LOG. Given the relation shown as a sample value for S_SINCE_LOG in Fig. 17.2, here’s what we get if we restrict that relation to just the tuples showing supplier S2 as being under contract on day 4:

image

And given the relation shown as a sample value for S_DURING_LOG in Fig. 17.1, here’s what we get if we restrict that relation to just the tuples showing supplier S2 as being under contract on day 4:

image

So here’s a possible formulation of the query:

WITH ( t1 := S_SINCE_LOG WHERE SNO = SNO ( 'S2' ) AND d04 ≥ SNO_SINCE ,
  t2 := t1 { X_DURING } ,
  t3 := S_DURING_LOG WHERE SNO = SNO ( 'S2' ) AND d04 ∈ DURING ,
  t4 := t3 { X_DURING } ) :
USING ( X_DURING ) : t2 UNION t4

And if the sample values for S_SINCE_LOG and S_DURING_LOG are the actual values,17 then here’s the result:

image

Query X3: On day 8, what did the database say was supplier S2’s term of contract?

Note that this query treats logged time as the known value (namely, day 8) and stated time as the unknown and is thus the inverse of the previous examples, in a sense. Here’s a suitable formulation:

WITH ( t1 := S_SINCE_LOG WHERE SNO = SNO ( 'S2' ) AND d08 ∈ X_DURING,
   t2 := EXTEND t1 : { DURING := INTERVAL_DATE ( [ SNO_SINCE : d99 ] ) } ,
   t3 := t2 { DURING } ,
   t4 := S_DURING_LOG WHERE SNO = SNO ( 'S2' ) AND d08 ∈ X_DURING ,
   t5 := t4 { DURING } ) :
USING ( DURING ) : t3 UNION t5

Exercises

17.1 Consider Fig. 17.2 again. Explain in your own words how to interpret that figure.

17.2 Are the sample values in Figs. 17.1 and 17.2 mutually consistent?

17.3 Instead of having just one logged time relvar for S_SINCE as in Fig. 17.2, might it have made more sense to have separate logged time relvars for (a) the projection of S_SINCE on SNO and SNO_SINCE and (b) the projection of S_SINCE on SNO, STATUS, and STATUS_SINCE? If so, why?

17.4 Following on from Exercise 17.3, might it make sense to allow the user to request automatic provision of a logged time relvar for an arbitrary projection (in particular, one not including a key) of an arbitrary relvar R (in particular, one that includes no interval attributes)? If so, why? What syntactic extensions might be needed to the proposals described in this chapter in order to permit such requests to be specified?

17.5 Would there be any point in requesting a logged time relvar for the nullary projection of some relvar R (i.e., the projection of R on no attributes at all)? If so, why? Again, what syntactic extensions (if any) might be needed in order to permit such requests to be specified?

Answers

17.1 Relvar S_SINCE in Fig. 17.2 represents current beliefs concerning suppliers under contract right now (or possibly in the future): when their contracts started, what their current status values are, and when those current status values were assigned. The predicate for S_SINCE is:

Supplier SNO has been under contract ever since day SNO_SINCE (and not on the day immediately before that day) and has had status STATUS ever since day STATUS_SINCE (and not on the day immediately before that day).


Relvar S_SINCE_LOG in Fig. 17.2 is a record of the contents of S_SINCE from the time of creation of this latter relvar up to the time right now (i.e., up to the present day, since we’re assuming, rather unrealistically, a granularity of one day).18 It contains just enough information to enable the value of S_SINCE on any particular day in that interval to be reconstructed. The predicate for S_SINCE_LOG is:

X_DURING denotes a maximal interval throughout which relvar S_SINCE contained a tuple stating that supplier SNO had been under contract ever since day SNO_SINCE and had had status STATUS ever since day STATUS_SINCE.19


See the answer to Exercise 17.3 below for further discussion.

17.2 The sample values in Figs. 17.1 and 17.2 aren’t mutually consistent. A detailed explanation follows.
Consider Fig. 17.2. Using the names “S_SINCE” and “S_SINCE_LOG” to denote the relations shown as values of the relvars with those names in that figure, we now show that at least those two relations are consistent with each other. First, let r be the result of S_SINCE_LOG WHERE END (X_DURING) = d75 and let s be the result of S_SINCE JOIN r. Then the result of s {ALL BUT X_DURING} is equal to both S_SINCE and r {ALL BUT X_DURING}, indicating that:

a. Every tuple in S_SINCE is equal to the projection of some tuple of r on all but X_DURING.20 Thus, wherever S_SINCE shows some information as being current, that information is indeed shown in S_SINCE_LOG as being currently recorded in S_SINCE.

b. The projection of every tuple in r on all but X_DURING is equal to some tuple of S_SINCE. Thus, wherever S_SINCE_LOG shows information as being currently recorded in S_SINCE, that information is indeed currently recorded in S_SINCE.


The only tuples not yet considered are those in result of S_SINCE_LOG MINUS r. As these tuples represent information that was recorded in S_SINCE on days in the past, they can’t possibly fail to be consistent with the other tuples shown in S_SINCE in Fig. 17.2.
The same argument, mutandis mutatis, can be used to show that the sample values shown for S_DURING and S_DURING_LOG in Fig. 17.1 are also mutually consistent.
Further, the values of S_DURING and S_SINCE shown in Figs. 17.1 and 17.2, respectively, are also mutually consistent. The S_DURING value shows supplier S2 as having been under contract on days 2, 3, and 4 but not on days 1 or 5, which is consistent with that supplier having been under contract ever since day 7 but not on day 6, as shown in S_SINCE. Similarly, that same S_DURING value shows supplier S6 as having been under contract on days 3, 4, and 5 but not on days 2 or 6, which is consistent with that supplier not currently being under contract, as shown by omission by S_SINCE. (Note the appeal to The Closed World Assumption here.)
So far, so good. However, it’s easy to see that the value of S_DURING_LOG in Fig. 17.1 isn’t consistent with the value of S_SINCE_LOG in Fig. 17.2. For example, consider tuple 7 in S_DURING_LOG. It shows among other things that on day 40 S_DURING stated that supplier S1 had been under contract on days 5 and 6 but not on days 4 or 7. But “the second tuple” in S_SINCE_LOG (if you’ll pardon the phrase) has it that on day 40 S_SINCE stated that supplier S1 had been under contract ever since day 4, contradicting the information that S1 was not under contract on days 4 or 7. In other words, if we’re to believe the information currently shown in S_SINCE_LOG and S_DURING_LOG, then the database must have been in an inconsistent state on day 40.

17.3 Yes, it probably does make better sense. Take a look at the predicate for that relvar (i.e., relvar S_SINCE_LOG) as stated in the answer to Exercise 17.1. It should be clear from that predicate that the timestamp X_DURING “timestamps too much,” and hence that S_SINCE_LOG isn’t in 6NF, and hence that it probably ought to be decomposed into 6NF projections. For example, if we restrict the sample value shown for the relvar in Fig. 17.2 to just the tuples for supplier S1, we get this result—

image


—which tells us twice that the database once said supplier S1 was under contract on day 4. In other words, relvar S_SINCE_LOG suffers from redundancy.21
Further, those same two tuples for supplier S1 mean the relvar also suffers from circumlocution. To be specific, the relvar is taking two tuples to say what could have been said in one: namely, that supplier S1 was under contract from day 4 to day 75. (By the way, both of these problems (i.e., redundancy and circumlocution) mean that relvar S_SINCE_LOG is in violation of its own predicate!)
Now, it might be argued that because it’s the system that takes all responsibility for the maintenance of S_SINCE_LOG, any “update anomalies”—see reference [43]—arising from the failure to normalize to 6NF are of no real consequence. (We’d certainly need to tidy up the predicate, though.) But failure to decompose to 6NF, while it does make certain queries (basically those requiring joins) easier, doesn’t make all the difficult ones easier, and it does make what should be easy ones harder (see the section “Queries Involving Logged Time” in the body of the chapter).

17.4 First, it clearly does make sense to let the user request a logged time relvar for a relvar without any interval attributes—S_SINCE_LOG (for relvar S_SINCE) is a case in point. For an example of a logged time relvar for an arbitrary projection of such a relvar, consider one for the projection of S_SINCE on STATUS (let’s call it STATUS_LOG). STATUS_LOG would show, for each status value that has ever been assigned to any supplier, the continuous intervals during which it was stated by the database as being assigned to at least one supplier. But that information can easily be derived from S_SINCE_LOG, anyway; thus, it would make sense to ask for automatic provision of STATUS_LOG only if not all of the more detailed information provided by S_SINCE_LOG is required.
As for a syntactic extension, here are some pertinent observations. The LOGGED_TIMES_IN syntax proposed in the body of the chapter appears as part of the definition of some relvar R, and the corresponding logged time relvar is thus implicitly associated with R as such. Now, if we want it to be associated with some projection of R—in particular, a projection that doesn’t retain some key of R—then we might as well let it be associated with some arbitrary relational expression.22 But then a drastic alteration to the syntax proposed in the chapter would be needed. To be specific, instead of a LOGGED_TIMES_IN option on a relvar definition, what would be needed would be a separate definition for the logged time relvar as such, perhaps as in this example:

VAR SUPPLIER_EXISTENCE_LOG BASE RELATION

 { X_DURING INTERVAL_DATE }

 USING ( X_DURING ) : KEY { X_DURING }

 LOGGED_TIMES_FOR ( S_SINCE { } UNION S_DURING { } ) ;


The intent here is to define a logged time relvar showing intervals during which the database has acknowledged the existence at some time of at least one supplier under contract (every X_DURING value in SUPPLIER_EXISTENCE_LOG represents an interval throughout which it was the case that relvars S_SINCE and S_DURING weren’t both empty). In general, the attributes of such a relvar will be the single specified one (X_DURING in the example), which must be of some temporal interval type, together with those of the result of the LOGGED_TIMES_FOR expression.
Note that such a logged time relvar (R’, say) would permit the result of the LOGGED_TIMES_FOR expression as it would have been on any given day d to be obtained by evaluating an expression of the following form:

( R ’ WHERE d ∈ X_DURING ) { ALL BUT X_DURING }

17.5 The answer to Exercise 17.4 includes an example of a logged time relvar for the union of two nullary projections.


1At least, if we do say such a thing, we can’t possibly be certain that what we’re saying is true.

2As the physicist John Bell once observed: “We have no access to the past. We have only our memories and records.” (Quoted in The End of Time by Julian Barbour, Cambridge University Press, 1999.)

3In fact it makes sense to extend that notion to apply to any relation that can be derived from the relvars in the database. See the answer to Exercise 17.4 at the end of the chapter.

4The following might help here. Recall the nontemporal suppliers relation depicted in Fig. 1.1 in Chapter 1. That relation contains among other things the tuple (S1,Smith,20,London). Therefore, the projection of that relation on all but CITY contains among other things the tuple (S1,Smith,20). But the proposition corresponding to this latter tuple isn’t “Supplier S1 is named Smith and has status 20”; rather, it’s “Supplier S1 is named Smith and has status 20 and is located somewhere.” The situation in this example is precisely analogous to the situation under discussion in the main text here.

5Alternatively, we could say proposition z is independent of time.

6Don’t be misled here: The fact that a given proposition has a valid time of “always” doesn’t mean the proposition in question is, was, and always will be true—it means only that we believe right now that the proposition in question is, was, and always will be true. In other words, it’s not impossible that some future change in the state of affairs could cause us to modify that belief. For example, consider the proposition “The angles of a triangle add up to 180 degrees.” This proposition might well be represented in the database at some time with a valid time of “always”. But suppose we then discover that triangles laid out on the surface of the Earth, instead of in a plane, have angles that add up to more than 180 degrees. Then our beliefs have changed, and something must therefore be done to the database to reflect that fact.

7Note that t can’t be less than t2 (why?).

8A much more detailed discussion of these ideas and some of the issues they raise can be found in reference [44].

9Compare the discussion in the section “The Underlying Intuition” in Chapter 11. Note, however, that the sequences discussed in that chapter were timestamped with valid times, whereas the sequence we’re discussing here is timestamped with transaction times.

10In other words,“the database is not the database—the log is the database, and the database is just an optimized access path to the most recent version of the log.” The thought expressed by this aphorism is a piece of folklore that has been circulating in the database community for a long time. It appears to have its origin in a paper by Schueler [100].

11As already pointed out, in effect, in footnote 2.

12We speculate that it’s precisely because the approach to temporal data espoused in so much of the literature—even to some extent in the SQL standard—is, at best, less than fully relational, that the term valid time (or some equivalent term) occurs in that literature so ubiquitously.

13We’re not alone in finding the terms transaction time and valid time unsatisfactory, incidentally. For example, the SQL standard uses the perhaps slightly more reasonable terms system time and application time, respectively. And IBM’s DB2 product agrees with the standard with regard to system time, but uses business time in place of application time.

14Or does it perhaps mean not, as normal English usage would have it, “the reality that’s being modeled” but, rather, “the modeled version of reality”?

15That is, it’s a bad idea if that value doesn’t actually denote the present day but is meant to be interpreted as until further notice.

16Considerations of this kind tend to suggest that the COMBINED_IN syntax, mentioned briefly in Chapter 15 in connection with regular relvars, could be useful in connection with logged time relvars too.

17Actually they can’t be (see Exercise 17.2).

18Note that S_SINCE_LOG presumably continues to exist even if S_SINCE is dropped—a fact that suggests that it might be intuitively preferable not to bundle together the creation of the two relvars, as our proposed LOGGED_TIMES_IN syntax does, but rather to create them separately. For further discussion of this possibility, see the answer to Exercise 17.4.

19Two points here: First, we’re relying on our usual understanding of ever since as meaning ever since and not immediately before; second, we ought really to say X_DURING denotes a maximal interval of days (or, more realistically, of microseconds, or nanoseconds, or whatever the applicable granularity happens to be).

20We’re appealing here to the fact that although (by definition) projection is an operator that applies to relations, it obviously makes sense to talk in terms of projections of individual tuples; in other words, we can “overload” the relational projection operator and define a version that works for tuples as well. Analogous remarks apply to certain other relational operators also. See, e.g., reference [45] for further discussion.

21Of course, the root problem here is that relvar S_SINCE itself isn’t in 6NF. If we were to decompose it into projections (one on SNO and SNO_SINCE and the other on SNO, STATUS, and STATUS_SINCE), each with its own logged time relvar, then the problem under discussion would go away (despite the fact that the second of these projections would still not be in 6NF).

22Indeed, such a feature could be extremely useful. Consider the (fairly nontrivial!) query “When if ever did the database say some supplier could supply all of the parts supplied?” The answer to this query is precisely the logged time for rx, where rx is a relational expression denoting suppliers able to supply all parts supplied.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.107.220