UNION, INTERSECTION, AND DIFFERENCE

Union, intersection, and difference (UNION, INTERSECT, and MINUS in Tutorial D; UNION, INTERSECT, and EXCEPT in SQL) all follow the same general pattern. I’ll start with union.

Union

Definition: Let relations r1 and r2 be of the same type; then their union, r1 UNION r2, is a relation of the same type, with body consisting of all tuples t such that t appears in r1 or r2 or both.

For example (I’ll assume for the sake of all of the examples in this section that parts have an extra attribute called STATUS, of type INTEGER):

image with no caption

As with projection, it’s worth noting explicitly in connection with union that “duplicates are eliminated.” Note that we don’t need to specify DISTINCT in the SQL version in order to achieve this effect; although UNION provides the same options as SELECT does (DISTINCT vs. ALL), the default for UNION is DISTINCT, not ALL (for SELECT it’s the other way around, as you’ll recall from Chapter 4). The result heading has attributes or columns STATUS and CITY—in that order, in SQL. As for the CORRESPONDING specification in the SQL formulation, that specification allows us to ignore the possibility that those columns might appear at different ordinal positions within the operand tables. Recommendations:

  • Make sure every column of the first operand table has the same name and type as some column of the second operand table and vice versa.[82]

  • Always specify CORRESPONDING if possible.[83] If it isn’t—in particular, if the SQL product you’re using doesn’t support it—then make sure columns line up properly, as in this revised version of the example:

         SELECT STATUS , CITY FROM P
         UNION
         SELECT STATUS , CITY FROM S  /* note the reordering */
  • Don’t include the “BY (column name commalist)” option in the CORRESPONDING specification, unless it makes no difference anyway (e.g., specifying BY (STATUS,CITY) would make no difference in the example).[84] Note: This recommendation is perhaps a little debatable. At least the BY option might sometimes save keystrokes (though not always—see the example below). But it’s misleading, because it means the union operands aren’t the specified tables as such but certain projections of those tables; it’s also unnecessary, because those projections could always be specified explicitly anyway. For example, the SQL expression

         SELECT * FROM P
         UNION  CORRESPONDING BY ( CITY )
         SELECT * FROM S

    is logically equivalent to this (shorter!) one:

         SELECT CITY FROM P
         UNION
         SELECT CITY FROM S
  • Never specify ALL. Note: The usual reason for specifying ALL on UNION isn’t that users want to see duplicate rows in the output; rather, it’s that they know there aren’t any duplicate rows in the input—i.e., the union is disjoint (see below)—and so they’re trying to prevent the system from having to do the extra work of trying to eliminate duplicates that they know aren’t there in the first place. In other words, it’s a performance reason. See the discussion of such matters in Chapter 4, in the section AVOIDING DUPLICATES IN SQL.

Tutorial D also supports “disjoint union” (D_UNION), which is a version of union that requires its operands to have no tuples in common. For example:

     S { CITY } D_UNION P { CITY }

Given our usual sample data, this expression will produce a run time error, because supplier cities and part cities aren’t disjoint. SQL has no direct counterpart to D_UNION.

Tutorial D also supports n-adic forms of both UNION and D_UNION. The syntax consists—with one small exception, explained below—of the operator name (i.e., UNION or D_UNION), followed by a commalist in braces of relational expressions r1, r2, ..., rn. The relations denoted by r1, r2, ..., rn must all be of the same type. For example, the foregoing D_UNION example could alternatively be expressed as follows

     D_UNION { S { CITY } , P { CITY } }

Note: The union or disjoint union of a single relation r is just r. The union or disjoint union of no relations at all is the empty relation of the pertinent type—but that type needs to be specified explicitly, since there aren’t any relational expressions from which the type can be inferred. Thus, for example, the expression

     UNION { SNO CHAR , STATUS INTEGER } { }

denotes the empty relation of type RELATION {SNO CHAR, STATUS INTEGER}. Compare the answer to Exercise 3.15 in Chapter 3 (see Appendix F).

Intersection

Definition: Let relations r1 and r2 be of the same type; then their intersection, r1 INTERSECT r2, is a relation of the same type, with body consisting of all tuples t such that t appears in both r1 and r2.

For example:

image with no caption

All comments and recommendations noted under “Union” apply here also, mutatis mutandis. Note: As we’ve already seen, intersect is really just a special case of join. Tutorial D and SQL both support it, however, if only for psychological reasons. As mentioned in a footnote earlier, Tutorial D also supports an n-adic form, but I’ll skip the details here.

Difference

Definition: Let relations r1 and r2 be of the same type; then their difference, r1 MINUS r2 (in that order), is a relation of the same type, with body consisting of all tuples t such that t appears in r1 and not r2.

For example:

image with no caption

All comments and recommendations noted under “Union” apply here also, mutatis mutandis. Note, however, that minus is strictly dyadic—Tutorial D doesn’t support any kind of “n-adic minus” operation (see Exercise 6.17 at the end of the chapter). But it does support “included minus” (I_MINUS), which is a version of minus that requires the second operand to be included in the first (i.e., the second operand mustn’t have any tuples that aren’t also in the first operand). For example:

S { CITY } I_MINUS P { CITY }

Given our usual sample data, this expression will produce a run time error, because there’s at least one part city that isn’t also a supplier city. SQL has no direct counterpart to I_MINUS.



[82] Another SQL question for you: Does SQL in fact allow those corresponding columns to be of different types?

[83] I omitted CORRESPONDING from examples in earlier chapters because at the time it would only have been distracting.

[84] In the interest of completeness, I note that omitting the BY option is actually equivalent to specifying BY (A,B,...,C), where A, B, ..., C are all of the common columns, in the left to right order in which they appear in the first operand table.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.22.58