WHY DATABASE CONSTRAINT CHECKING MUST BE IMMEDIATE

I have at least five reasons for taking the position I do (viz., that database constraints must be satisfied at statement boundaries). The first and biggest one is this: As we know from Chapter 5, a database can be regarded as a collection of propositions, propositions we believe to be true ones. And if that collection is ever allowed to include any inconsistencies, then all bets are off; as I’ll show in the section CONSTRAINTS AND PREDICATES later, we can never trust the answers we get from an inconsistent database. And while it might be true, thanks to the isolation property, that no more than one transaction ever sees any particular inconsistency, the fact remains that that particular transaction does see the inconsistency and can therefore produce wrong answers.

Now, I think this first argument is strong enough to stand on its own, but for completeness I’ll give the other arguments as well. Second, then, I don’t agree that any given inconsistency can be seen by only one transaction, anyway; that is, I don’t really believe in the isolation property. Part of the problem here is that the word isolation doesn’t mean quite the same in the world of transactions as it does in ordinary English—in particular, it doesn’t mean that transactions can’t communicate with one another. For if transaction TX1 produces some result, in the database or elsewhere, that’s subsequently read by transaction TX2, then TX1 and XT2 aren’t truly isolated from each other (and this remark applies regardless of whether TX1 and TX2 run concurrently or otherwise). In particular, therefore, if (a) TX1 sees an inconsistent state of the database and therefore produces an incorrect result, and (b) that result is then seen by TX2, then (c) the inconsistency seen by TX1 has effectively been propagated to TX2. In other words, it can’t be guaranteed that a given inconsistency, if permitted, will be seen by just one transaction, anyway. Note: Similar remarks apply if TX1 (a) sees an inconsistency and therefore assigns an incorrect value to some local variable V and then (b) transmits the value of that variable V to some outside user (since local variables aren’t and can’t possibly be subject to the jurisdiction of the transaction management subsystem).

Third, we surely don’t want every program (or other “code unit”) to have to deal with the possibility that the database might be inconsistent when it’s invoked. There’s a severe loss of orthogonality if some piece of code that assumes consistency can’t be used safely while constraint checking is deferred. In other words, I want to be able to specify code units independently of whether they’re to be executed as a transaction as such or just as part of a transaction. (In fact, I’d like support for nested transactions, but that’s a topic for another day.)

Fourth, The Principle of Interchangeability (of base relvars and views—see the next chapter) implies that the very same constraint might be a single relvar constraint with one design for the database and a multirelvar constraint with another. For example, suppose we have two virtual relvars, or views, with Tutorial D definitions as follows (LS = London suppliers, NLS = non London suppliers):

     VAR LS  VIRTUAL ( S WHERE CITY = 'London' ) ;

     VAR NLS VIRTUAL ( S WHERE CITY ≠ 'London' ) ;

These views are subject to the constraint that no supplier number appears in both. However, there’s no need to state that constraint explicitly, because it’s implied by the fact that every supplier has exactly one city—i.e., the FD {SNO} → {CITY} holds in base relvar S—together with the real world fact that any given city is either equal to London or not equal to London. But suppose we made LS and NLS base relvars and then defined their union as a view called S. Then the constraint would have to be stated explicitly:

image with no caption

Now what was previously a single relvar constraint has become a multirelvar constraint instead. Thus, if we agree that single relvar constraints must be checked immediately, we must surely agree that multirelvar constraints must be checked immediately as well (since, logically, there’s no real difference between the two, as the example demonstrates).

Fifth and last, there’s an optimization technique called semantic optimization (it involves expression transformation, but I deliberately didn’t discuss it in the section of that name in Chapter 6). By way of example, consider the expression (SP JOIN S){PNO}. Now, the join here is based on the correspondence between a foreign key in a referencing relvar, SP, and the relevant candidate key in the referenced relvar, S. As a consequence, every SP tuple does join to some S tuple, and every SP tuple thus does contribute a part number to the projection that’s the overall result. So there’s no need to do the join!—the expression can be simplified to just SP{PNO}. Note carefully, however, that this transformation is valid only because of the semantics of the situation; with join in general, each operand will include some tuples that have no counterpart in the other and so don’t contribute to the overall result, and transformations such as the one just shown therefore won’t be valid. But in the case at hand every SP tuple necessarily does have a counterpart in S, because of the integrity constraint—actually a foreign key constraint—that says that every shipment must have a supplier, and so the transformation is valid after all. A transformation that’s valid only because a certain integrity constraint is in effect is called a semantic transformation, and the resulting optimization is called a semantic optimization.

Now, in principle, any constraint whatsoever can be used in semantic optimization; we’re not limited to foreign key constraints as in the example.[120] For example, suppose the suppliers-and-parts database is subject to the constraint “All red parts must be stored in London,” and consider the query:

Get suppliers who supply only red parts and are located in the same city as at least one of the parts they supply.

This is a fairly complex query; but thanks to the integrity constraint, we see that it can be transformed—by the optimizer, I mean, not by the user—into this much simpler one:

Get London suppliers who supply only red parts.

We could easily be talking about several orders of magnitude improvement in performance here. And so, while commercial products do comparatively little in the way of semantic optimization at the time of writing (as far as I know), I certainly expect them to do more in the future, because the payoff is so dramatic.

To get back to the main thread of the discussion, I now observe that if a given constraint is to be usable in semantic optimization, then that constraint must be satisfied at all times (or rather, and more precisely, at statement boundaries), not just at transaction boundaries. As we’ve just seen, semantic optimization means using constraints to simplify queries in order to improve performance. Clearly, then, if some constraint is violated at some time, then any simplification based on that constraint won’t be valid at that time, and query results based on that simplification will be wrong at that time (in general). Note: Alternatively, we could adopt the weaker position that “deferred constraints” (meaning constraints for which the checking is deferred) can’t be used in semantic optimization—but I think such a position would effectively just mean we’ve shot ourselves in the foot, that’s all.

To sum up: Database constraints must be satisfied—that is, they must evaluate to TRUE, given the values currently appearing in the database—at statement boundaries (or, very informally, “at semicolons”); in other words, they must be checked at the end of any statement that might cause them to be violated. If any such check fails, the effects on the database of the offending statement must be undone and an exception raised.



[120] The constraint must be stated declaratively, however; obviously there’s no way the optimizer can “understand” and exploit constraints that have been specified procedurally (and so we have here another strong reason for requiring declarative constraint support).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.255.24