MORE ON CLOSURE

To say it again, the result of every relational operation is a relation. Conversely, any operator that produces a result that isn’t a relation is, by definition, not a relational operator.[73] For example, any operator that produces an ordered result isn’t a relational operator (see the discussion of ORDER BY in the next chapter). And in SQL in particular, the same is true of any operator that produces a result with duplicate rows, or left to right column ordering, or nulls, or anonymous columns, or duplicate column names. Closure is crucial! As I’ve already said, closure is what makes it possible to write nested expressions in the relational model, and (as we’ll see later) it’s also important in expression transformation, and hence in optimization. Strong recommendation: Don’t use any operation that violates closure if you want the result to be amenable to further relational processing.

Now, when I say the result of every algebraic operation is another relation, I hope it’s clear that I’m talking from a conceptual point of view; I don’t mean the system always has to materialize those results in their entirety. For example, consider the following expression (a restriction of a join—Tutorial D on the left and SQL on the right as usual, and I’ve deliberately shown all name qualifications explicitly in the SQL version):[74]

image with no caption

Clearly, as soon as any given tuple of the join is formed, the system can test that tuple right away against the restriction condition PNAME > SNAME (P.PNAME > S.SNAME in the SQL version) to see if it belongs in the final output, discarding it if not. Thus, the intermediate result that’s the output from the join might never have to exist as a fully materialized relation in its own right at all. In practice, in fact, the system tries very hard not to materialize intermediate results in their entirety, for obvious performance reasons. (As an aside, I remark that the process by which tuples of an intermediate result are produced and passed on to another operation one at a time instead of en bloc is sometimes referred to as pipelining.)

The foregoing example raises another point, however. Consider the boolean expression PNAME > SNAME in the Tutorial D version. That expression applies, conceptually, to the result of P JOIN S, and the attribute names PNAME and SNAME in that expression therefore refer to attributes of that result—not to the attributes of the same names in relvars P and S. But how do we know that result has any such attributes? What is the heading of that result? More generally, how do we know what the heading is for the result of any algebraic operation? Clearly, what we need is a set of rules—to be specific, relation type inference rules—such that if we know the headings (and therefore the types) of the input relations for an operation, we can infer the heading (and therefore the type) of the output relation from that operation. And the relational model does include such a set of rules. In the case of join, for example, those rules say the output from P JOIN S is of this type:

     RELATION { PNO CHAR , PNAME CHAR , COLOR CHAR , WEIGHT RATIONAL ,
                CITY CHAR , SNO CHAR , SNAME CHAR , STATUS INTEGER }

In fact, for join, the heading of the output is the union of the headings of the inputs (where by union I mean the regular set theory union, not the special relational union I’ll be discussing later in this chapter). In other words, the output has all of the attributes of the inputs, except that common attributes—just CITY in the example—appear once, not twice, in that output. Of course, those attributes don’t have any left to right order, so I could equally well say the type of the result of P JOIN S is (for example):

     RELATION { SNO CHAR , PNO CHAR , SNAME CHAR , WEIGHT RATIONAL ,
                CITY CHAR , STATUS INTEGER , PNAME CHAR , COLOR CHAR }

Note that type inference rules of some kind are definitely needed in order to support the closure property fully—closure says every result is a relation, and relations have a heading as well as a body; thus, every result must have a proper relational heading as well as a proper relational body.

Now, the RENAME operator mentioned in the previous section is needed in large part because of the foregoing type inference rules; it allows us to perform, e.g., a join, even when the relations involved don’t meet the attribute naming requirements for that operation (speaking a trifle loosely). Here’s the definition:

Definition: Let r be a relation and let A be an attribute of r. Then the (attribute) renaming r RENAME {A AS B} is a relation with (a) heading identical to that of r except that attribute A in that heading is renamed B and (b) body identical to that of r (except that references to A in that body are replaced by references to B, a nicety that can be ignored for present purposes). Note: I assume for simplicity that relation r doesn’t already have an attribute named B.

For example:

image with no caption

Given our usual sample values, the result looks like this (it’s identical to our usual suppliers relation, except that the city attribute is called SCITY):

SNO

SNAME

STATUS

SCITY

S1

Smith

20

London

S2

Jones

10

Paris

S3

Blake

30

Paris

S4

Clark

20

London

S5

Adams

30

Athens

Note: I won’t usually bother to show results explicitly in this chapter unless I think the particular operator I’m talking about might be unfamiliar to you, as in the case at hand.

Important: The foregoing example does not change relvar S in the database! RENAME isn’t like SQL’s ALTER TABLE; the RENAME invocation is only an expression (just as, for example, P JOIN S or N + 2 are only expressions), and like any expression it simply denotes a value. What’s more, since it is an expression, not a statement or “command,” it can be nested inside other expressions. We’ll see plenty of examples of such nesting later.

So how does SQL handle this business of result type inference? The answer is: Not very well. First of all, as we saw in Chapter 3, it doesn’t really have a notion of “relation type” (or table type, rather) anyway. Second, it can produce results with columns that effectively have no name at all (for example, consider SELECT PNO, 2 * WEIGHT FROM P). Third, it can also produce results with duplicate column names (for example, consider SELECT DISTINCT P.CITY, S.CITY FROM P, S). Strong recommendation: Follow the column naming discipline from Chapter 3 wherever necessary to ensure that SQL conforms as far as possible to the relational rules described in this chapter. Just to remind you, that discipline involved using AS specifications to give proper column names to columns that otherwise (a) wouldn’t have a name at all or (b) would have a name that wasn’t unique. My SQL examples in this chapter and the next (indeed, throughout the rest of this book) will all abide by this discipline.

I haven’t finished with the example from the beginning of this section. Here it is again:

image with no caption

As you can see, the counterpart in the SQL version to Tutorial D’s PNAME > SNAME is P.PNAME > S.SNAME (note the “P.” and “S.” qualifiers)—which is curious when you come to think about it, because that expression is supposed to apply to the result of the FROM clause (see the section “Evaluating SQL Expressions,” later), and tables P and S certainly aren’t part of that result! Indeed, it’s quite difficult to explain how references to the names P and S in the WHERE and SELECT clauses (and possibly elsewhere in the overall expression) can make any sense at all in terms of the result of the FROM clause. The SQL standard does explain it, but the machinations it has to go through in order to do so are much more complicated than Tutorial D’s type inference rules—so much so that I won’t even try to explain them here, but will simply rely on the fact that they can be explained if necessary. I justify this omission by appealing to the fact that you’re supposed to be familiar with SQL already. It’s tempting to ask, though, whether you had ever thought about this issue before ... but I won’t.

Now I can go on to describe some other algebraic operators. Please note that I’m not trying to be exhaustive in this chapter (or the next); I won’t be covering “all known operators,” and I won’t even describe all of the operators I do cover in full generality. In most cases, in fact, I’ll just give a careful but somewhat informal definition and show some simple examples.



[73] With one slight exception: Some writers regard relational inclusion (“⊆”) as a relational operation—more specifically, as part of the relational algebra—even though it produces a result that’s a truth value, not a relation. The point isn’t very important, however; certainly it’s not worth fighting over here.

[74] I assume for the sake of the example that the comparison PNAME > SNAME is a sensible one—though if it is, then attributes PNAME and SNAME must presumably represent “the same kind of information,” and in accordance with my own recommendations in Chapter 3 I ought perhaps to have given them the same name.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.108.111