RELATIONAL CALCULUS

Essentially everything I’ve discussed in this chapter so far maps very directly into the relational calculus. Let’s look at a simple example—a relational calculus representation of the query “Get supplier number and status for suppliers in Paris who supply part P2.” Here first for comparison purposes is an algebraic formulation:

     ( S WHERE CITY = 'Paris' ) { SNO , STATUS }
                           MATCHING ( SP WHERE PNO = 'P2' )

And here’s a relational calculus equivalent:

     RANGEVAR SX  RANGES OVER S  ;
     RANGEVAR SPX RANGES OVER SP ;

     { SX.SNO , SX.STATUS }
           WHERE SX.CITY = 'Paris' AND
                 EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = 'P2' )

Explanation:

  • The first two lines are definitions, defining SX and SPX to be range variables that range over S and SP, respectively. What those definitions mean is that, at any given time, permitted values of SX are tuples in the relation that’s the value of relvar S at that time; likewise, permitted values of SPX are tuples in the relation that’s the value of relvar SP at that time.

  • The remaining lines are the actual query. Observe that they take the following generic form:

    proto tuple WHERE predicate

    This expression overall is the relational calculus version of a relational expression (i.e., an expression that denotes a relation), and it evaluates to a relation containing every possible value of the proto tuple for which the predicate evaluates to TRUE, and no other tuples. (The term proto tuple, standing for “prototype tuple,” is apt but nonstandard; in fact, a standard term for the concept doesn’t seem to exist.) In the example, therefore, the result is a relation of degree two, containing every (SNO,STATUS) pair from relvar S such that (a) the corresponding city is Paris and (b) there exists a shipment in relvar SP with the same supplier number as the one in that (SNO,STATUS) pair and with part number P2.

Note the use of dot qualified names in this example (in both the proto tuple and the predicate); I won’t go into details, because dot qualified names will be familiar to you from SQL. Indeed, SQL has a formulation of the query under discussion that’s very similar in general terms to the foregoing relational calculus formulation:

     SELECT SX.SNO , SX.STATUS
     FROM   S AS SX
     WHERE  SX.CITY = 'Paris'
     AND    EXISTS
          ( SELECT *
            FROM   SP AS SPX
            WHERE  SPX.SNO = SX.SNO
            AND    SPX.PNO = 'P2' )

As this example indicates:

  • First, SQL does support range variables (though it doesn’t usually use that term): The specifications S AS SX and SP AS SPX serve to define such variables, and those variables are then explicitly referenced elsewhere in the overall expression by means of dot qualified names such as SX.SNO, SPX.SO, and so on. Note: In practice, such AS specifications and such explicit range variable references are often omitted, at least in simple queries. I’ll explain exactly how such omissions are possible in Chapter 12, when I discuss range variables in SQL in more detail.

  • Second, and perhaps more important, SQL also supports EXISTS. However, that support is somewhat indirect. To be specific, let sq be a subquery; then EXISTS sq is a boolean expression (and so represents a predicate), and it evaluates to FALSE if the table denoted by sq is empty and TRUE otherwise.[142] Note: The table expression tx in parentheses that constitutes sq will usually, though not invariably, be of the form SELECT * FROM ... WHERE ..., and the WHERE clause will usually, though not invariably, include some reference to some “outer” table, meaning sq will typically be a correlated subquery specifically. In the foregoing example, S is that outer table, and it’s referenced by means of the range variable SX. Again, see Chapter 12 for further explanation.

    Aside: There’s a certain irony here, though. As we saw in Chapter 4, SQL, because it supports nulls, is based on what’s called three-valued logic, 3VL (instead of the conventional two-valued logic I’m discussing in this chapter, which is what the relational model is based on). In 3VL, the existential quantifier can return three different results: TRUE, FALSE, and UNKNOWN (where UNKNOWN is “the third truth value”; again, see Chapter 4). But SQL’s EXISTS operator always returns TRUE or FALSE, never UNKNOWN. For example, EXISTS(tx) will return TRUE, not UNKNOWN, if tx evaluates to a table containing nothing but nulls (I’m speaking a trifle loosely here); yet UNKNOWN is the logically correct result.[143] As a consequence, (a) SQL’s EXISTS isn’t a faithful implementation of the existential quantifier of 3VL, and (b) once again, therefore, SQL queries sometimes return the wrong answer. Example 3 in the next chapter is a case in point. End of aside.

Let’s look at another example—the query “Get supplier names for suppliers who supply all parts.” I’ll assume we have the same range variables SX and SPX as before, but I’ll also define another one (PX) ranging over P:

     RANGEVAR PX RANGES OVER P ;

     { SX.SNAME } WHERE FORALL PX ( EXISTS SPX ( SPX.SNO = SX.SNO AND
                                                 SPX.PNO = PX.PNO ) )

In somewhat stilted natural language: “Get names of suppliers such that, for all parts, there exists a shipment with the same supplier number as the supplier and the same part number as the part.” Note: As you probably know, SQL has no direct support for FORALL. For that reason, I won’t show an SQL analog of this example here—I’ll come back to it later, in the section MORE ON QUANTIFICATION. I will point out, however, that there’s a logical difference between the foregoing calculus expression and this one, where the quantifiers have been switched:

     { SX.SNAME } WHERE EXISTS SPX ( FORALL PX ( SPX.SNO = SX.SNO AND
                                                 SPX.PNO = PX.PNO ) )

Exercise: What does this latter expression mean? And do you think the query is a “sensible” one?

One more example (“Get supplier names for suppliers who supply at least one red part”):

     { SX.SNAME } WHERE EXISTS PX ( PX.COLOR = 'Red' AND
                   EXISTS SPX ( SPX.SNO = SX.SNO AND
                                     SPX.PNO = PX.PNO ) )

I’m assuming here that we have the same range variables available to us as we had in the earlier examples; in fact, I’ll continue to make that same assumption for the rest of this chapter.

By the way, here’s another possible formulation of the foregoing query:

     { SX.SNAME } WHERE EXISTS PX ( EXISTS SPX ( PX.COLOR = 'Red' AND
                                                 SPX.SNO = SX.SNO AND
                                                 SPX.PNO = PX.PNO ) )

In this latter formulation, the predicate in the WHERE clause is in what’s called “prenex normal form,” meaning, loosely, that the quantifiers all appear at the beginning. Here’s a precise definition of this concept:

Definition: A predicate is in prenex normal form (PNF) if and only if (a) it’s quantifier free (i.e., it contains no quantifiers at all) or (b) it’s of the form EXISTS x (p) or FORALL x (p), where p is in PNF in turn. In other words, a PNF predicate takes the form

Q1 x1 ( Q2 x2 ( ... ( Qn xn ( q ) ) ... ) )

where (a) n ≥ 0; (b) each Qi (i = 1, 2, ..., n) is either EXISTS or FORALL; and (c) the predicate q—which is sometimes called the matrix—is quantifier free.

Prenex normal form isn’t more or less correct than any other form, but with a little practice it does tend to become the most natural formulation, and the easiest to write, in many cases (not all).

More on Range Variables

From what I’ve said in this section so far, it should be clear that range variables in the relational calculus serve as the free and bound variables that are required by formal logic. As I mentioned earlier, those variables always have to range over some set of permissible values; in the relational calculus context specifically, that set is always the body of some relation (usually but not necessarily the relation that’s the current value of some relvar). Note: It follows that a given range variable always denotes some tuple. For that reason, the relational calculus is sometimes known more specifically as the tuple calculus, and the variables themselves as tuple variables. This latter usage can be confusing, however, since the term tuple variable already has a somewhat different (and more conventional) meaning—see Chapter 2—and I won’t adopt it in this book.[144]

Now I can say a little more about the syntax of relational calculus expressions:

  • First of all, a proto tuple is a commalist of items enclosed in braces, in which each item is either a range attribute reference—possibly with an associated AS clause to introduce a new attribute name—or a range variable reference. (There are other possibilities too, but I’ll limit my attention to just these cases until further notice. See Example 5 below.) Note: It’s usual to omit the braces if the commalist contains just a single item, but I’ll generally show them even when they’re not actually required, for clarity.

  • A range attribute reference is an expression of the form R.A, where A is an attribute of the relation that range variable R ranges over; SX.SNO is an example. And a range variable reference is just a range variable name, like SX, and it’s shorthand for a commalist of range attribute references, one for each attribute of the relation the range variable ranges over.

  • Let some range attribute reference involving range variable R appear, explicitly or implicitly, within some proto tuple. Then the predicate in the corresponding WHERE clause can, and usually will, contain at least one free range attribute reference involving R—where by “free range attribute reference involving R” I mean a range attribute reference of the form R.A that’s not within the scope of any quantifier in which R is the bound variable.

  • The WHERE clause is optional; omitting it is equivalent to specifying WHERE TRUE.

More Sample Queries

I’ll give a few more examples of relational calculus queries, in order to illustrate a few more points; however, I’m not trying to be exhaustive in my treatment. For simplicity, I’ll omit the RANGEVAR definitions that would be needed in practice and will just assume that SX, SY, etc., have been defined as range variables over S; PX, PY, etc., have been defined as range variables over P; and SPX, SPY, etc., have been defined as range variables over SP. Please note that the formulations shown aren’t the only ones possible, in general. I’ll leave it as an exercise for you to show equivalent SQL formulations in each case.

Example 1: Get all pairs of supplier numbers such that the suppliers concerned are colocated.

     { SA := SX.SNO , SB := SY.SNO } WHERE SX.CITY = SY.CITY
                                       AND SX.SNO < SY.SNO

Note the introduction of result attribute names SA and SB in this example. Incidentally, this example provides a good illustration of the point that some queries are more easily formulated in the calculus than they are in the algebra (an algebraic formulation for this query—rather more complicated than the calculus formulation just shown—was given if you recall in the section FORMULATING EXPRESSIONS ONE STEP AT A TIME in Chapter 6).

Example 2: Get supplier names for suppliers who supply at least one Paris part.

     { SX.SNAME } WHERE EXISTS SPX ( EXISTS PX ( SX.SNO = SPX.SNO AND
                                                 SPX.PNO = PX.PNO AND
                                                 PX.CITY = 'Paris' ) )

Example 3: Get supplier names for suppliers who supply at least one part supplied by supplier S2.

     { SX.SNAME } WHERE EXISTS SPX ( EXISTS SPY ( SX.SNO = SPX.SNO AND
                                                  SPX.PNO = SPY.PNO AND
                                                  SPY.SNO = 'S2' ) )

Example 4: Get supplier names for suppliers who don’t supply part P2.

     { SX.SNAME } WHERE NOT ( EXISTS SPX ( SPX.SNO = SX.SNO AND
                                           SPX.PNO = 'P2' ) )

The outer parentheses in this example (i.e., the ones enclosing the expression following NOT) might not be needed in practice; indeed, I’ll often omit such parentheses in later examples.

Incidentally, the predicate in the foregoing formulation isn’t in prenex normal form. It would be possible to replace it by one that is, like this—

     { SX.SNAME } WHERE FORALL SPX ( SPX.SNO ≠ SX.SNO OR SPX.PNO ≠ 'P2' )

—but I don’t think this alternative formulation is as “natural” as the non PNF version; that is, I think this example illustrates the point that a PNF formulation isn’t always the one that comes most readily to mind.

Example 5: For each shipment, get full shipment details, including total shipment weight.

     { SPX , SHIPWT := PX.WEIGHT * SPX.QTY } WHERE PX.PNO = SPX.PNO

Note the use of a computational expression in the proto tuple here. An algebraic version of this example would involve EXTEND, and probably image relations also.

Example 6: For each part, get the part number and the total shipment quantity.

     { PX.PNO , TOTQ := SUM ( SPX WHERE SPX.PNO = PX.PNO , QTY ) }

This example illustrates the use of an aggregate operator invocation within the proto tuple (it’s also the first example to omit the WHERE clause). Incidentally, note that the following expression, though syntactically legal, would not be a correct formulation of the query (why not?):

     { PX.PNO , TOTQ := SUM ( SPX.QTY WHERE SPX.PNO = PX.PNO ) }

Answer: Because duplicate quantities would be eliminated before the sum is computed.

Example 7: Get part cities that store more than five red parts.

     { PX.CITY }
          WHERE COUNT ( PY WHERE PY.CITY = PX.CITY AND PY.COLOR = 'Red' ) > 5

Sample Constraints

Now I’d like to give some examples of the use of relational calculus to formulate constraints. The first eight are based on, and use the same numbering as, the examples in Chapter 8. I’ll assume the availability of range variables as in the previous subsection. Please note again that the formulations shown aren’t the only ones possible, in general.

Example 1: Status values must be in the range 1 to 100 inclusive.

     CONSTRAINT CX1 FORALL SX ( SX.STATUS ≥ 1 AND SX.STATUS ≤ 100 ) ;

Note: SQL allows a constraint like this one to be simplified by (in effect) eliding both the explicit range variable and, more important, the explicit universal quantification. To be specific, we can specify a base table constraint—see Chapter 8—as part of the definition of base table S that looks like this:

     CONSTRAINT CX1 CHECK ( STATUS >= 1 AND STATUS <= 100 )

Similar remarks apply to subsequent examples also.

Example 2: Suppliers in London must have status 20.

     CONSTRAINT CX2 FORALL SX ( IF SX.CITY = 'London'
                                THEN SX.STATUS = 20 ) ;

Example 3: No two tuples in relvar S have the same supplier number (i.e., {SNO} is a key, or rather a superkey, for relvar S).

     CONSTRAINT CX3 FORALL SX ( FORALL SY ( IF SX.SNO = SY.SNO THEN
                                               SX.SNAME = SY.SNAME AND
                                               SX.STATUS = SY.STATUS AND
                                               SX.CITY = SY.CITY ) ) ;

This formulation isn’t very elegant, to say the least! I’ll come back to this example and give a better formulation of it in the next section.

Example 4: Whenever two tuples in relvar S have the same supplier number, they also have the same city (in other words, the functional dependency {SNO} → {CITY} holds in relvar S).

     CONSTRAINT CX4 FORALL SX ( FORALL SY ( IF SX.SNO = SY.SNO
                                            THEN SX.CITY = SY.CITY ) ) ;

As noted in Chapter 8, this constraint is actually a logical consequence of the fact that {SNO} is a superkey for relvar S. If this latter constraint is stated, therefore, constraint CX4 needn’t be.

Example 5: No supplier with status less than 20 can supply part P6.

     CONSTRAINT CX5 FORALL SX ( IF SX.STATUS < 20 THEN
                                NOT EXISTS SPX ( SPX.SNO = SX.SNO AND
                                                 SPX.PNO = 'P6' ) ) ;

Example 6: Every supplier number in relvar SP must appear in relvar S.

     CONSTRAINT CX6 FORALL SPX ( EXISTS SX ( SX.SNO = SPX.SNO ) ) ;

As with Example 3, I’ll have more to say about this example in the next section.

Example 7: No supplier number appears in both relvar LS and relvar NLS.

     CONSTRAINT CX7 FORALL LX ( FORALL NX ( LX.SNO ≠ NX.SNO ) ) ;

LX and NX range over LS and NLS, respectively.

Example 8: Supplier S1 and part P1 must never be in different cities.

     CONSTRAINT CX8 NOT EXISTS SX ( EXISTS PX ( SX.SNO = 'S1' AND
                                                PX.PNO = 'P1' AND
                                           SX.CITY ≠ PX.CITY ) ) ;

By the way, is this constraint satisfied if there’s no tuple for supplier S1 in relvar S and/or no tuple for part P1 in relvar P? (Answer: Yes, it is.)

Example 9: There must always be at least one supplier. (There’s no counterpart to this example in Chapter 8.)

     CONSTRAINT CX9 EXISTS SX ( TRUE ) ;

The expression EXISTS SX (TRUE) evaluates to FALSE if and only if SX ranges over an empty relation. (By contrast, the expression EXISTS SX (FALSE) always evaluates to FALSE. Conversely, the expression FORALL SX (FALSE) evaluates to TRUE if and only if SX ranges over an empty relation—see the discussion of empty ranges in the next section—while the expression FORALL SX (TRUE) always evaluates to TRUE.)



[142] It might help to point out that SQL’s EXISTS is rather similar to Tutorial D’s IS_NOT_EMPTY (see Chapter 3). See the section SOME EQUIVALENCES, later.

[143] To be a little more precise about the matter: Suppose tx denotes a nonempty restriction of some table T and the restriction condition evaluates to UNKNOWN for every row in T; then EXISTS(tx) ought logically to return UNKNOWN but will actually return TRUE, in SQL.

[144] In practice, the term tuple calculus is used mainly to distinguish the version of the relational calculus discussed in the present chapter from the domain calculus, which is a version of the relational calculus in which the variables range over domains—i.e., types—instead of relations. But there’s no need to discuss the domain calculus in this book; if you want to know more, you can find a detailed explanation in my book An Introduction to Database Systems (see Appendix G).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.70.248