MORE ON QUANTIFICATION

There are a number of further issues I need to discuss regarding quantification in particular.

We Don’t Need Both Quantifiers

It’s easy to see that any predicate that can be expressed in terms of EXISTS can be expressed in terms of FORALL instead and vice versa. By way of example, consider the following predicate once again:

     EXISTS x ( x is taller than Steve )

(“Somebody is taller than Steve”; of course, this predicate is in fact a simple proposition). Another way to say the same thing is:

     NOT ( FORALL x ( NOT ( x is taller than Steve ) ) )

(“It is not the case that nobody is taller than Steve”). More generally, in fact, the predicate

     EXISTS x ( p ( x ) )

is logically equivalent to the predicate

     NOT ( FORALL x ( NOT ( p ( x ) ) ) )

(where the predicate p might legitimately involve other parameters in addition to x). Likewise, the predicate

     FORALL x ( p ( x ) )

is logically equivalent to the predicate

     NOT ( EXISTS x ( NOT ( p ( x ) ) ) )

(where, again, the predicate p might legitimately involve other parameters in addition to x).

It follows from all of the above that a formal language doesn’t need to support both EXISTS and FORALL explicitly. But it’s very desirable to support them both in practice. The reason is that some problems are “more naturally” formulated in terms of EXISTS, while others are “more naturally” formulated in terms of FORALL instead. For example, SQL supports EXISTS but not FORALL, as we know; as a consequence, certain queries are quite awkward to formulate in SQL. Consider again the query “Get suppliers who supply all parts,” which can be expressed in relational calculus quite simply as follows:

     { SX } WHERE FORALL PX ( EXISTS SPX
                            ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )

In SQL, by contrast, the query has to look something like this:

     SELECT SX.*
FROM   S AS SX
     WHERE  NOT EXISTS
          ( SELECT *
            FROM   P AS PX
            WHERE  NOT EXISTS
                 ( SELECT *
                   FROM   SP AS SPX
                   WHERE  SX.SNO = SPX.SNO
                   AND    SPX.PNO = PX.PNO ) )

(“Get suppliers SX such that there does not exist a part PX such that there does not exist a shipment SPX linking that supplier SX to that part PX”). Well, single negation is bad enough (users often have trouble with it); double negation, as in this example, is much worse.

Empty Ranges

Consider again the fact that the predicates

     EXISTS x ( p ( x ) )

and

     NOT ( FORALL x ( NOT ( p ( x ) ) ) )

are logically equivalent. As we know, the bound variable x in each of these predicates must range over some set of permissible values. Suppose now that the set in question is empty; it might, for example, be the set of persons over fifty feet tall (or in the database context, more realistically, it might be the set of tuples in a relvar that’s currently empty). Then:

  • The expression EXISTS x (p(x)) evaluates to FALSE, because “there is no x”—i.e., there’s no value available to be substituted for x in order to make the expression true. Note carefully that these remarks are valid regardless of what p(x) happens to be. For example, “There exists a person over fifty feet tall who works for IBM” evaluates to FALSE (unsurprisingly).

  • It follows that the negation NOT EXISTS x (p(x)) evaluates to TRUE—again, regardless of what p(x) happens to be. For example, “There doesn’t exist a person over fifty feet tall who works for IBM”—more colloquially, “No person over fifty feet tall works for IBM”—evaluates to TRUE (again unsurprisingly).

  • But NOT EXISTS x (p(x)) is equivalent to FORALL x (NOT (p(x))), and so this latter expression also evaluates to TRUE—once again, regardless of what p(x) happens to be.

  • But if the predicate p(x) is arbitrary, then so is the predicate NOT (p(x)). And so we have the following possibly surprising result: The expression FORALL x (...) evaluates to TRUE if there are no x’s, regardless of what appears inside the parentheses. For example, “All persons over fifty feet tall do work for IBM” also evaluates to TRUE—because, to say it again, there aren’t any persons over fifty feet tall.

One implication of the foregoing state of affairs is that certain queries produce a result that you might not expect (if you don’t know logic, that is). For example, the query discussed earlier—

     { SX } WHERE FORALL PX ( EXISTS SPX ( SPX.SNO = SX.SNO AND
                                      SPX.PNO = PX.PNO ) )

(“Get suppliers who supply all parts”)—will return all suppliers if there aren’t any parts.

Incidentally, the foregoing example serves as a good illustration of the point that while logic is certainly necessary as a foundation for database systems, it might not be sufficient. For example, how do you think an only child should respond to the question “Are your siblings all boys?” The logically correct answer is, of course, yes (though I observe that yes is the logically correct answer to the question “Are your siblings all girls?” as well). In practice, however, we would surely expect some more informative response, along the lines of “Well, actually I don’t have any siblings.” In other words—and now reverting to database systems as such—it might be nice if the system, as well as simply giving its responses as such, could also explain those responses if it’s asked to do so.

Defining EXISTS and FORALL

As you might have realized, EXISTS and FORALL can be defined as an iterated OR and an iterated AND, respectively. I’ll consider EXISTS first. Let p(x) be a predicate with a parameter x and let x range over the set X = {x1,x2,...,xn}. Then

     EXISTS x ( p ( x ) )

is a predicate, and it’s defined to be equivalent to (and hence shorthand for) the predicate

     p ( x1 ) OR p ( x2 ) OR ... OR p ( xn ) OR FALSE

Observe in particular that this expression evaluates to FALSE if X is empty (equivalently, if n = 0), as we already know. By way of example, let p(x) be “x has a moon” and let X be the set {Mercury, Venus, Earth, Mars}. Then the predicate EXISTS x (p(x)) becomes “EXISTS x (x has a moon),” and it’s shorthand for

     ( Mercury has a moon ) OR ( Venus has a moon ) OR
     ( Earth has a moon )   OR ( Mars has a moon )  OR FALSE

which evaluates to TRUE because, e.g., “Mars has a moon” is true. Similarly,

     FORALL x ( p ( x ) )

is a predicate, and it’s defined to be equivalent to (and hence shorthand for) the predicate

     p ( x1 ) AND p ( x2 ) AND ... AND p ( xn ) AND TRUE

And this expression evaluates to TRUE if X is empty (again, as we already know). By way of example, let p(x) and X be as in the EXISTS example above. Then the predicate FORALL x (p(x)) becomes “FORALL x (x has a moon),” and it’s shorthand for

     ( Mercury has a moon ) AND ( Venus has a moon ) AND
     ( Earth has a moon )   AND ( Mars has a moon )  AND TRUE

which evaluates to FALSE because, e.g., “Venus has a moon” is false.

As an aside, let me remark that, as the examples demonstrate, defining EXISTS and FORALL as iterated OR and AND, respectively, means that every predicate that involves quantification is equivalent to one that doesn’t. Thus, you might be wondering, not without some justification, just what this business of quantification is really all about ... Why all the fuss? The answer is as follows: We can define EXISTS and FORALL as iterated OR and AND only because the sets we have to deal with are—thankfully—always finite (because we’re operating in the realm of computers and computers are finite in turn). In pure logic, where there’s no such restriction, those definitions aren’t valid.[145]

Perhaps I should add that, even though we’re always dealing with finite sets and EXISTS and FORALL are thus merely shorthand, they’re extremely useful shorthand! For my part, I certainly wouldn’t want to have to formulate queries and the like purely in terms of AND and OR, without being able to use the quantifiers. Much more to the point, the quantifiers allow us to formulate queries without having to know the precise content of the database at any given time (which wouldn’t be the case if we always had to use the explicit iterated OR and AND equivalents).[146]

Other Kinds of Quantifiers

While it’s certainly true that EXISTS and FORALL are the most important quantifiers in practice, they aren’t the only ones possible. There’s no a priori reason, for example, why we shouldn’t allow quantifiers of the form

there exist at least three x’s such that

or

a majority of x’s are such that

or

an odd number of x’s are such that

(and so on). One fairly important special case is there exists exactly one x such that. I’ll use the keyword UNIQUE for this one. Here are some examples:

     UNIQUE x ( x is taller than Arnold )

Meaning: Exactly one person is taller than Arnold; probably FALSE.

     UNIQUE x ( x has social security number y )

Meaning: Exactly one person has social security number y (y is a parameter). We can’t assign a truth value to this example because it’s a (monadic) predicate and not a proposition.

     FORALL y ( UNIQUE x ( x has social security number y ) )

Meaning: Everybody has a unique social security number (I’m assuming here that y ranges over the set of all social security numbers actually assigned, not all possible ones). Exercise: Does this predicate—which is in fact a proposition—evaluate to TRUE?

As another exercise, what does the following predicate mean?

     FORALL x ( UNIQUE y ( x has social security number y ) )

Here’s how UNIQUE might be used in the formulation of constraints. Recall the formulation I gave earlier for constraint CX3 (“every supplier has a unique supplier number”):

     CONSTRAINT CX3 FORALL SX ( FORALL SY ( IF SX.SNO = SY.SNO THEN
                                               SX.SNAME = SY.SNAME AND
                                               SX.STATUS = SY.STATUS AND
                                               SX.CITY = SY.CITY ) ) ;

A much better formulation would clearly be as follows:

     CONSTRAINT CX3 FORALL SX ( UNIQUE SY ( SX.SNO = SY.SNO ) ) ;

(“For all suppliers SX, there’s exactly one supplier SY with the same supplier number.”) For example, if SX denotes the tuple for supplier S4, say, then SY must also denote the tuple for supplier S4—in other words, SX and SY must denote the very same tuple—in order for the constraint to be satisfied.

By way of another example, recall the following constraint: “Every supplier number in relvar SP must appear in relvar S.” Here’s the formulation I gave previously:

     CONSTRAINT CX6 FORALL SPX ( EXISTS SX ( SX.SNO = SPX.SNO ) ) ;

However, I hope you can see a more accurate formulation is:

     CONSTRAINT CX6 FORALL SPX ( UNIQUE SX ( SX.SNO = SPX.SNO ) ) ;

In other words, for a given tuple in relvar SP, we want there to be not at least one (EXISTS), but exactly one (UNIQUE), corresponding tuple in relvar S. The previous formulation “works” because there’s an additional constraint in effect: viz., that {SNO} is a key for relvar S. But the revised formulation is closer to what we really want to say.

Now, SQL does support UNIQUE (sort of), though its support is even more indirect than its support for EXISTS is. To be specific, let sq be a subquery; then UNIQUE sq is a boolean expression, and it evaluates to FALSE if the table denoted by sq contains any duplicate rows and TRUE otherwise. Note that it follows from this definition that the operator certainly returns TRUE if its argument table has either just one row or no rows at all.[147] And it further follows that, whereas the logic expression

     UNIQUE x ( p ( x ) )

means “There exists exactly one argument value a corresponding to the parameter x such that p(a) evaluates to TRUE,” the (very approximate!) SQL analog—

     UNIQUE ( SELECT k FROM T AS ... WHERE p ( x ) )

—where k denotes an arbitrary constant value, say the integer 0—means “Given an argument value a corresponding to the parameter x, there exists at most one row in the pertinent table T such that p(a) evaluates to TRUE.” For example, given our usual sample value for relvar S, the SQL expression

     UNIQUE ( SELECT 0 FROM S AS SX WHERE SX.CITY = 'Athens' )

returns TRUE, while the SQL expression

     UNIQUE ( SELECT 0 FROM S AS SX WHERE SX.CITY = 'Paris' )

returns FALSE.[148]

All of that being said, I won’t attempt to give an SQL formulation here for constraint CX6 that uses UNIQUE—I’ll leave it to Chapter 11 (see Example 10 in that chapter).

As you can see, the foregoing examples, in which the only item in the SELECT clause is a simple literal, are designed to exploit the fact that SQL retains duplicates in the result of a SELECT expression if DISTINCT isn’t specified. Of course, I’ve suggested elsewhere in this book—in Chapter 4, to be specific—that DISTINCT should “always” be specified. In contexts like the one under discussion, however, DISTINCT must definitely not be specified (right?).

Now, I don’t mean to suggest that the argument expression in an SQL UNIQUE invocation must always be of the form “SELECT k FROM ...” where k denotes some constant. By way of a counterexample, here repeated from Chapter 8 is one possible SQL formulation of the constraint that distinct suppliers must have distinct supplier numbers:

     CREATE ASSERTION CX3 CHECK ( UNIQUE ( SELECT SNO FROM S ) ) ;

Recall now that SQL also uses the keyword UNIQUE in key constraints. For example, the CREATE TABLE for table S includes the following specification:

     UNIQUE ( SNO )

You can think of this specification as shorthand for the following (which could be part of a more general base table constraint or a CREATE ASSERTION statement):[149]

     CHECK ( UNIQUE ( SELECT SNO FROM S ) )

SQL also uses the keyword UNIQUE in MATCH expressions. Here’s an example (“Get suppliers who supply exactly one part”):[150]

     SELECT SX.*
     FROM   S AS SX
     WHERE  SX.SNO MATCH UNIQUE
          ( SELECT SPX.SNO
            FROM   SP AS SPX )

But this usage too is basically just shorthand. For example, the example just shown is equivalent to the following:

     SELECT SX.*
     FROM   S AS SX
     WHERE  UNIQUE ( SELECT SPX.SNO             /* i.e., there's AT  */
                     FROM   SP AS SPX           /* MOST one shipment */
                     WHERE  SPX.SNO = SX.SNO )  /* for supplier SX   */
AND    EXISTS ( SELECT SPX.SNO             /* ... and there's   */
                     FROM   SP AS SPX           /* also AT LEAST one */
                     WHERE  SPX.SNO = SX.SNO )

Incidentally, note that the UNIQUE invocation here is indeed of the form “UNIQUE (SELECT k FROM ...)” where k denotes a constant value, thanks to the boolean expression in the inner WHERE clause (“SPX.SNO = SX.SNO”)—constant, that is, with respect to “the current row” of table S, and hence with respect to each evaluation of the outer WHERE clause. Of course, it’s a distinct constant value with respect to distinct evaluations of that clause.



[145] To elaborate: Consider by way of example the proposition EXISTS x (p(x)), where p is a predicate with just one parameter, x. If x ranges over an infinite set, then any attempt to use an “iterated OR” algorithm for evaluating the proposition will inevitably be flawed, since the algorithm might never terminate (it might never find the one value of x that satisfies p). Likewise, any attempt to use an “iterated AND” algorithm for FORALL x (p(x)) will also inevitably be flawed, since again the algorithm might never terminate (it might never find the one value of x that fails to satisfy p).

[146] A note on syntax: Recall from Chapter 7 that Tutorial D supports the aggregate operators AND and OR, thereby allowing us to write, e.g., AND(SP, QTY > 0), to express the fact that QTY values in relvar SP must be greater than zero. The discussions of the present section suggest that more “user friendly” names for these operators might well be FORALL and EXISTS, respectively; for example, the expression FORALL(SP, QTY > 0) does read quite well from an intuitive point of view. Likewise, EXISTS(SP, QTY > 250) seems to be an intuitively pleasing way of expressing the fact that at least one QTY value in relvar SP must be greater than 250.

[147] By contrast, the UNIQUE quantifier gives FALSE if its range is empty.

[148] It follows that AT_MOST_ONE (or perhaps NO_DUPS) would be a better name for the SQL operator than UNIQUE, at least in a context like the one under discussion. (Come to that, AT_LEAST_ONE might be a better name than EXISTS, too, both for the existential quantifier as such and for SQL’s analog of that quantifier.)

[149] To repeat something I said (in a footnote) in Chapter 8, what the standard actually says in this connection is as follows (more or less): “The constraint UNIQUE (SNO) is not satisfied if and only if EXISTS (SELECT * FROM S WHERE NOT (UNIQUE (SELECT SNO FROM S))) evaluates to TRUE.” Well, it seems to me this definition could surely be simplified, thus: “The constraint UNIQUE (SNO) is satisfied if and only if UNIQUE (SELECT SNO FROM S) evaluates to TRUE.” Now, I dare say there’s a good reason for the standard’s circumlocution here, but whatever it is certainly escapes me. Perhaps it has to do with nulls, in which case I’m not interested.

[150] Note that in this context, by contrast, UNIQUE does mean exactly one.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.98.34