Appendix B

Is Ordinality Necessary?

This appendix considers the possibility of dropping the so called discreteness assumption (the assumption, that is, that the timeline consists of a finite number of discrete time points); in other words, it investigates the idea of defining intervals over a point type that’s ordered but not ordinal, and using such intervals in practical applications. Allen’s operators are reexamined in this light and are found still to be applicable, though their definitions need to be refined somewhat. However, UNPACK doesn’t apply, and there are serious problems in even representing such intervals (none of the usual styles—closed:closed, etc.—turns out to be completely satisfactory). The conclusion is that there’s nothing to lose, and much to gain, by staying with the usual discreteness assumption.

Keywords: ordinality; discreteness assumption; UNPACK; Allen’s operators

The ordinality of a table relationship is a message the designer uses to identify two facts about the relationship.

www.cs.kent.edu/~wfan/link/dbapre/dbatest/54903f.htm

Throughout the bulk of this book, we’ve assumed that the point types over which intervals are defined must be ordinal types specifically. In other words, we’ve assumed that any such point type—call it T—must be “isomorphic to type INTEGER,” meaning that:

1. A total ordering exists—i.e., the operator “>” is defined for every pair of values of type T.

2. A first and a last value of type T, denoted FIRST_T ( ) and LAST_T ( ), respectively, exist with respect to that ordering.

3. A successor function, denoted NEXT_T (p) and returning the (unique) immediate successor of p with respect to that ordering, is defined for every value p of type T except for p = LAST_T ( ). Moreover—we didn’t bother to spell these assumptions out before, but now we feel it desirable to do so—(a) if p1p2 then NEXT_T (p1) ≠ NEXT_T (p2), and (b) there’s exactly one value of type T, viz., FIRST_T ( ), that’s not the successor of any p.

Properties 2 and 3 here are what distinguish an ordinal type from one that’s merely ordered. Let’s agree to refer to those two properties, taken together, as ordinality. Now, if ordinality applies to type T, then it’s clear that every interval of type INTERVAL_T consists of a finite sequence of discrete points. But what would happen if ordinality failed to hold? Would it be possible to build an approach to intervals based on a point type for which, say, LAST_T ( ) is undefined? Alternatively, would it be possible to build such an approach based on a point type for which NEXT_T (p) is undefined? Note that questions like these do seem worth exploring, because many languages do support types that are ordered but not ordinal.1 For example, consider the Tutorial D types RATIONAL and CHAR. Both of these types are ordered, and yet Tutorial D has no explicit support for LAST_RATIONAL ( ), NEXT_RATIONAL (p), LAST_CHAR ( ), or NEXT_CHAR (p). As a consequence, a Tutorial D implementation is free not to provide such operators, and—according to the proposals discussed in the body of the book—types RATIONAL and CHAR would then not be usable as point types, a state of affairs that has certainly been criticized by some.

So let’s see, first, what happens if we drop property 2 but keep property 3. By way of illustration, consider what happens if type T is INTEGER. For that type, it would be quite usual for a language to support NEXT_T (p)—if only in the form “p+1”—and yet not support either FIRST_T ( ) or LAST_T ( ).2 But let’s consider the general case:

■ With regard to FIRST_T ( ), we observe that nothing in Parts II and III of this book depended on support for that operator (apart from that operator FIRST_T ( ) itself, of course).

■ With regard to LAST_T ( ), by contrast, we did effectively make use of that operator in our “during relvars only” designs, and we also used it in certain relational expressions (i.e., in certain queries and view definitions)—that is, we frequently made use of intervals of the form [d:LAST_T ( )], to be interpreted as “from day d until further notice.” Note: Here, of course, we’re assuming that T is a temporal type specifically.

Thus, dropping LAST_T ( ) would appear to militate strongly in favor of our horizontal decomposition approach to temporal database design (so, good!). However, certain “temporal” queries, and in particular our proposed COMBINED_IN views, won’t work any longer, and some other solution will have to be found to the issues those queries and views address. But these are hardly insuperable problems.

We turn now to the possibility of dropping property 3 but keeping property 2. Here there’s rather more to say (and rather more to be investigated), inasmuch as we did propose several operators in Parts II and III of this book that depended on—or at least were defined in terms of—that successor function NEXT_T (p) that’s now no longer available.

Aside: Dropping property 3—that is, dropping the assumption that a successor function exists—isn’t usually referred to in the literature in such terms; instead, it’s usually described as “adopting the continuity assumption,” on the grounds that a point type for which no successor function exists behaves like the real number line (which is certainly continuous), and hence that intervals defined on such a point type behave like sections of that line. Note that time in particular certainly feels as if it were continuous in this sense;3 so adopting the continuity assumption, if it were possible, might well be more intuitively attractive, at least in this particular context. We claim, however, that continuity as such isn’t really the issue; in fact, it seems to us a complete red herring. To see why, it’s sufficient to consider just the rational numbers,4 which resemble the reals in that they’re what the mathematicians call “everywhere dense,” but differ from the reals in that they don’t form a continuum. In other words, the rationals have no successor function—if p is a rational number, there’s no rational number p’ that can be considered the immediate successor of p—even though, to repeat, they don’t form a continuum as such. Note too that any interval defined over the rational numbers will in fact contain an infinite number of points (except for the special case of a unit interval, of course).

Now, we’re not suggesting here that a point type consisting of all rational numbers might, or even could, actually be supported. However, you might find it helpful to interpret what follows in terms of a hypothetical “rational numbers” point type (containing, therefore, an infinite number of point values), and hypothetical intervals defined over that hypothetical point type. End of aside.

Before we go any further, we should make it clear that we’re certainly not alone in assuming the existence of a successor function—essentially all of the temporal database literature does the same thing. Nor have we seen a concrete language proposal based on the idea that such an assumption be dropped. We therefore confine ourselves in the remainder of this appendix to what we hope are a few pertinent remarks.

■ With regard to points: As already indicated, the comparison operators (“=”, “>”, etc.) all still work, of course, but NEXT_T (p) and PRIOR_T (p) make no sense. By contrast, FIRST_T ( ) and LAST_T ( ) do make sense—in temporal terms, they return “the beginning of time” and “the end of time,” respectively.

■ With regard to intervals: COUNT clearly no longer applies. By contrast, Allen’s operators do apply, but any definition of such an operator that previously depended on the existence of a successor function will obviously have to be revised. Consider MEETS, for example. We can never validly say that, e.g., intervals i1 = [b1:e1] and i2 = [b2:e2]—both expressed using closed:closed notation, observe—meet, because we can never validly say either that b2 is the immediate successor of e1 or that b1 is the immediate successor of e2. A similar though somewhat more complicated remark applies if both intervals are expressed using open:open notation. By contrast, if i1 uses closed notation at its end and i2 uses open notation at its beginning, or the other way around, then it does make sense to say of i1 and i2 either that they meet or that they don’t. To be specific (and for the record), two intervals meet if and only if they take one of the following forms:

[b:p] and (p:e] [b:p) and [p:e]

[b:p] and (p:e) [b:p) and [p:e)

(b:p] and (p:e] (b:p) and [p:e]

(b:p] and (p:e) (b:p) and [p:e)

We now observe that the foregoing discussion of MEETS touches on a potentially serious anomaly. In Chapter 6, we said the four notations—closed:closed, closed:open, open:closed, open:open—constituted four different possible representations for intervals, implying among other things that those notations were effectively interchangeable. However, we did also point out, in a footnote, that the interval [b:e] can’t be represented in either open:closed or open:open notation if b is “the beginning of time,” nor can it be represented in either closed:open or open:open notation if e is “the end of time.” And we went on to say that the closed:closed notation is actually the only one that’s capable of representing all possible intervals. In other words, if we assume reliance on a successor function, then the closed:closed notation is really the only one that’s one hundred percent valid as a possible representation—a state of affairs that accounts in part for our favoring it throughout Parts II and III of this book, of course.

Now, given that Parts II and III of this book did assume support for a successor function, the fact that the other three notations were subject to the slight weaknesses just explained was perhaps not all that significant. But it’s not at all clear that the same remains true if that assumption is dropped. For suppose we do drop it. Then:

■ First of all, none of the intervals [b:e), (b:e], and (b:e) has an exact closed:closed equivalent. In other words, there are intervals—in fact, an infinite number of them, at least in principle—that can’t be represented in closed:closed notation.

■ At the same time, the interval [b:e] has no exact equivalent in any of the other three notations. Thus, there are also intervals—again, an infinite number of them, in principle—that can be represented only in closed:closed notation. Note: As a particularly important special case of the foregoing, intervals of the form [p:p]—that is, unit intervals—can be represented only in closed:closed notation.

■ In fact, the situation is even worse than just indicated. First, each of the four notations is capable of representing (in principle) an infinite number of intervals that can’t be represented at all in any of the other three. Second, any given interval is in fact capable of representation in just one of the four notations!

It follows that we can never validly say that an expression using any particular one of the four notations denotes the same interval as one using any of the other three. The full implications of this state of affairs are unclear, to say the least, but it does seem at any rate that none of the four can truly be said by itself to constitute a valid possible representation for intervals, absent a successor function. Thus, it looks as if the user is going to have to be extremely careful in choosing the appropriate notation to use in any particular situation, absent such a function; in fact, the user is probably going to be forced into adopting a kind of “mix and match” approach, instead of choosing just one notation and sticking with it as we did most of the time in Parts II and III of this book. What makes matters worse is that every interval will then to have to carry with it some indication as to the notation used to express it; moreover, the system will then have to remember, somehow, which particular notation was used on each particular occasion.

Aside: Alternatively, we could say—as SQL does,5 in effect, even though SQL does assume the existence of a successor function—that just one of the four notations is supported. In that case, the notation in question would have to be either closed:open or open:closed, if MEETS is to be supported. But then certain intervals wouldn’t be expressible at all; in particular, we’d lose the concept of unit intervals altogether!—since, as we’ve seen, unit intervals can be expressed only using closed:closed notation. End of aside.

Well, let’s soldier on … Let’s consider closed:open notation, just to be definite. Then the operators BEGIN (i) and POST (i) are available, but the operators PRE (i) and END (i) aren’t. Of course, if i is the interval [b:e), then the point e is given by POST (i); however, that point isn’t actually contained in i and is thus not truly the end point of that interval, at least not in the sense in which that term was defined in Chapter 6.

Next, regardless of which notation we consider, the UNPACK operator is no longer available. (Why not?) This loss appears to be a matter of some concern, at least at first sight. Although UNPACK is perhaps not all that useful in its own right, our “U_” operators are all defined in terms of it (and so is PACK, if the packing is done on more than one attribute); thus, we would need to revisit all of those operators to see if we could redefine them in such a way as to avoid having to rely on UNPACK. What’s more, we would then have to ensure that those redefined operators were all implementable.

Stop Press: Shortly before completion of the final draft of this book, we were made aware of some recent work by Erwin Smout [102]. In that reference, Smout shows that PACK and the U_ operators—or most of them, at any rate6—can indeed be defined without appealing to UNPACK. Now, we do believe our definition, using UNPACK, is easier to understand from a conceptual point of view; but given Smout’s definition, it does seem that some limited support for intervals over nonordinal types could be provided alongside support for intervals over ordinal types as described in the body of the book. See the annotation to reference [102] in Appendix F for further discussion.

The foregoing remarks notwithstanding, we certainly don’t want to discard reliance on a successor function altogether, for the following reasons among others:

■ First, we’ve seen that certain operators—e.g., COUNT (i)—that can be defined when a successor function exists can’t be defined when it doesn’t (other than as discussed under “One final point” below). Note in particular that not having to depend on some specific notation—closed:open, for example—does convey significant advantages.

■ Second, nobody has yet been able to show us any useful operators that can be defined if we discard the successor function and not if we don’t.

It seems, then, that there’s nothing to lose, and much to gain, by assuming the successor function exists.

One final point: It has been suggested that it might be possible to make operators that depend on a successor function work even if no such function is defined for the point type in question. The idea is that the desired scale—and thereby the desired successor function, in effect—could be specified explicitly when the operator in question is invoked. For example, let i be a temporal interval, and let the underlying temporal point type be ordered but not ordinal. Then we’ve seen that, e.g., the expression COUNT (i) makes no sense. However, if we define j to be the expression—

INTERVAL_DDATE ( [ CAST_TO_DDATE ( BEGIN ( i ) ) , CAST_TO_DDATE ( POST ( i ) ) ) )

—then the expression COUNT (j) clearly does make sense. Explanation: The CAST invocations each effectively convert their argument to a time point that’s accurate to the day. The overall interval selector invocation j thus effectively converts the interval i to one whose contained points are discrete and accurate to the day, and thereby effectively specifies the pertinent scale and successor function.

Our response to the foregoing is as follows: If the assumption that there’s no successor function means having to replace all corresponding intervals by ones for which such a function does apply every time we want to perform certain important operations on them, then there seems little point in adopting that assumption in the first place. Indeed, if we’re correct in saying that adopting that assumption brings with it no useful operators that are unavailable otherwise, then a language that makes that assumption would be isomorphic to one that doesn’t, and the entire argument reduces to a mere matter of syntax. However, we note that not having to specify the scale explicitly every time we want to perform any of the otherwise undefined operations would surely save a great deal of writing.


1As noted in Chapter 18, if by types here we mean types that are supported by some actual implementation of some actual programming language, then every such type is necessarily finite. So if such a type is ordered, it can (at least in principle) be made “isomorphic to type INTEGER,” in which case it will effectively be an ordinal type after all. We choose to overlook this point of detail here.

2SQL is a case in point; as noted in Chapter 19, SQL doesn’t really support FIRST_T ( ) or LAST_T ( ) as such—instead, users have to write out the corresponding literals.

3We realize not everyone agrees with this assertion.

4And here we really do mean rational numbers as defined in mathematics, not just values of Tutorial D’s RATIONAL type.

5In fact, as SQL must do (see Chapter 19).

6Unfortunately the jury is still out regarding U_restrict and U_extend, and when it will come back remains to be seen. Our discussion of these operators in Appendix E is relevant here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.37.161