SOME EQUIVALENCES

I’ll finish up this chapter with a few remarks regarding certain equivalences that might have already occurred to you (indeed, I’ve touched on some of them myself from time to time at earlier points). First of all, recall the IS_EMPTY operator, which I introduced in Chapter 3 and made heavy use of in Chapter 8. If the system supports that operator, then there’s no logical need for it to support the quantifiers, thanks to the following equivalences:

  • EXISTS x ( p ) ≡ NOT ( IS_EMPTY ( X WHERE p ) )

  • FORALL x ( p ) ≡ IS_EMPTY ( X WHERE NOT ( p ) )

(I’m assuming here that the variable x ranges over the set X.)

In fact, SQL’s support for EXISTS—and FORALL, such as it is—is based on exactly the foregoing equivalences. The fact is, SQL’s EXISTS isn’t really a quantifier, as such, at all, because it doesn’t involve any bound variables. Instead, it’s an operator, in the conventional sense of that term: a monadic operator of type BOOLEAN, to be precise. Like any monadic operator invocation, an invocation of the SQL EXISTS operator is evaluated by first evaluating the expression that denotes its sole argument, and then applying the operator per se—in this case EXISTS—to the result of that evaluation. Thus, given the expression EXISTS (tx), where tx is a table expression, the system first evaluates tx to obtain a table t; then it applies EXISTS to t, returning TRUE if t is nonempty and FALSE otherwise. (At least, that’s the conceptual algorithm; numerous optimizations are possible, but they’re irrelevant to the present discussion.)

And now I can explain why SQL doesn’t support FORALL. The reason is that representing the universal quantifier by means of an operator with syntax of the form FORALL(tx)—where tx is again a table expression—couldn’t possibly make any sense. For example, consider the hypothetical expression FORALL (SELECT * FROM S WHERE CITY = ‘Paris’). What could such an expression possibly mean? It certainly couldn’t mean anything like “All suppliers are in Paris,” because—loosely speaking—the argument to which the hypothetical operator is applied isn’t all suppliers, it’s all suppliers in Paris.

In fact, however, we don’t need the quantifiers anyway if the system supports the aggregate operator COUNT, thanks to the following equivalences:

  • EXISTS x ( p ) ≡ COUNT ( X WHERE p ) > 0

  • FORALL x ( p ) ≡ COUNT ( X WHERE p ) = COUNT ( X )

  • UNIQUE x ( p ) ≡ COUNT ( X WHERE p ) = 1

Now, I’m certainly not a fan of the idea of replacing quantified expressions by expressions involving COUNT invocations—though sometimes we have to, if we’re in an algebraic context[151]—but it would be wrong of me not to mention the possibility.

Aside: Although this book generally has little to say on performance, I should at least point out that the foregoing equivalences (the ones involving COUNT, I mean) could lead to performance problems. For example, consider the following expression, which is an SQL formulation of the query “Get suppliers who supply at least one part”:

     SELECT *
     FROM   S
     WHERE  EXISTS
          ( SELECT *
            FROM   SP
            WHERE  SP.SNO = S.SNO )

Now, here’s another formulation that’s logically equivalent to the foregoing:

     SELECT *
     FROM   S
     WHERE  ( SELECT COUNT ( * )
              FROM   SP
              WHERE  SP.SNO = S.SNO ) > 0

But we don’t really want the system to perform the complete count that’s apparently being requested here and then check to see whether that count is greater than zero; rather, we want it to stop counting, for any given supplier, as soon as it finds the first shipment for that supplier. In other words, we’d really like some optimization to be done. Writing code that effectively requires a certain optimization to be done is usually not a good idea! Recommendation: Be careful over the use of COUNT, therefore; in particular, don’t use it where EXISTS would be more logically correct. End of aside.

Relational Completeness

Every operator of the relational algebra has a precise definition in terms of logic. (I didn’t state this fact explicitly before, but it’s easy to see the definitions I gave in Chapter 6 and Chapter 7 for join and the rest can be reformulated in terms of logic as described in the present chapter.) It follows as a direct consequence that, for every expression of the relational algebra, there’s an expression of the relational calculus that’s logically equivalent to—i.e., has the same semantics as—that algebraic expression. In other words, the relational calculus is at least as “powerful” (better: expressive) as the relational algebra: Anything that can be expressed in the algebra can be expressed in the calculus.

Now, it might not be obvious, but actually the opposite is true too; that is, for every expression of the relational calculus, there’s an expression of the relational algebra that’s logically equivalent to that calculus expression. Thus, the algebra is at least as expressive as the calculus, and so the two formalisms are logically equivalent: Both are what’s called relationally complete.[152] Relational completeness is a basic measure of the expressive capability of a language; if a language is relationally complete, it means (among other things, and speaking a trifle loosely) that queries of arbitrary complexity can be formulated without having to resort to iterative loops or branching. In other words, it’s relational completeness that allows end users—at least in principle, though possibly not in practice—to access the database directly, without having to go through the potential bottleneck of the IT department.

The Importance of Consistency

I have a small piece of unfinished business to attend to. Recall my claim in Chapter 8 that any proposition whatsoever (even obviously false ones like 1 = 0) can be shown to be “true” in an inconsistent system. Now I can elaborate on that claim.

I’ll start with a really simple example. Suppose (a) relvar S is currently nonempty; (b) there’s a constraint to the effect that there must always be at least one part; but (c) relvar P is in fact currently empty (there’s the inconsistency). Now consider the relational calculus query:

     { SX } WHERE EXISTS PX ( TRUE )

Or if you prefer SQL:

     SELECT *
     FROM   S
     WHERE  EXISTS
          ( SELECT *
            FROM   P )

Now, if this expression is evaluated directly, the result will be empty. Alternatively, if the system (or the user) observes that there’s a constraint that says that EXISTS PX (TRUE) must evaluate to TRUE—or, in SQL, that SELECT * FROM P must return a nonempty result—the WHERE clause can be reduced to one saying simply WHERE TRUE, and the result will then be all suppliers. At least one of these results must be wrong! In a sense, in fact, they’re both wrong; given an inconsistent database, there simply isn’t—there can’t be—any well defined notion of correctness, and any answer is as good (or bad) as any other. Indeed, this state of affairs should be self-evident: If I tell you some proposition p is both true and false, and then ask you whether some proposition that relies on p in some way is true, there’s simply no right answer you can give me.

In case you’re still not convinced, consider the following slightly more realistic SQL example (under the same assumptions as before):

     SELECT DISTINCT
            CASE WHEN EXISTS ( SELECT * FROM P ) THEN x ELSE y END
     FROM   S

This expression will return either x or y—more precisely, it will return a table containing (a row containing) either x or y—depending, in effect, on whether or not the EXISTS invocation is replaced by just TRUE. Now consider that x and y can each be essentially anything at all ... For example, x might be an SQL expression denoting the total weight of all parts, while y might be the literal 0—in which case executing the query could easily lead to the erroneous conclusion that the total part weight is null instead of zero.



[151] This might not be true. I have in mind here the fact that COUNT is often used in an algebraic context in formulating constraints to the effect that some functional dependency is in effect; for example, to express the fact that the FD {A} → {B} holds in relvar R, we can write COUNT(R{A}) = COUNT(R{A,B}). But—assuming for simplicity that R has no attributes other than A and B—the following will also do the trick: IS_EMPTY ((R JOIN (R RENAME {B AS C})) WHERE BC).

[152] Don’t confuse relational completeness with any other kind of completeness: in particular, with truth functional completeness (mentioned in an earlier footnote).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.31.11