EVALUATING SQL TABLE EXPRESSIONS

In addition to natural join, Codd originally defined an operator he called θ-join, where θ denoted any of the usual scalar comparison operators (“=”, “≠”, “<”, and so on). Now, θ-join isn’t primitive; in fact, it’s defined to be a restriction of a product. Here by way of example is the “not equals” join of suppliers and parts on cities (so θ here is “≠”):

image with no caption

Now let’s focus on the SQL formulation specifically. You can think of the expression constituting that formulation as being evaluated in three steps, as follows:

  1. The FROM clause is evaluated and yields the product of tables S and P. Note: If we were doing this relationally, we would have to rename at least one of the CITY attributes before that product could be computed. SQL gets away with renaming them afterward because its tables have a left to right ordering to their columns, meaning it can distinguish the two CITY columns by their ordinal position. For simplicity, let’s ignore this detail.

  2. Next, the WHERE clause is evaluated and yields a restriction of that product by eliminating rows in which the two city values are equal. Note: If θ had been “=” instead of “≠” (or “<>”, rather, in SQL), this step would have been: Restrict the product by retaining just the rows in which the two city values are equal—in which case we would now have formed what’s called the equijoin of suppliers and parts on cities. In other words, an equijoin is a θ-join for which θ is “=”. Exercise: What’s the difference between an equijoin and a natural join?

  3. Finally, the SELECT clause is evaluated and yields a “projection” of that restriction on the columns specified in the SELECT clause—“projection” in quotes, because it won’t actually eliminate duplicates, as true projection does, unless DISTINCT is specified. (Actually it’s doing some renaming as well, in this particular example, and I mentioned earlier in this chapter that SELECT provides other functionality too, in general—but for now I want to ignore these details as well, for simplicity.)

At least to a first approximation, then, the FROM clause corresponds to a product, the WHERE clause to a restriction, and the SELECT clause to a projection; thus, the overall SELECT - FROM - WHERE expression denotes a projection of a restriction of a product. It follows that I’ve just given a loose, but reasonably formal, definition of the semantics of SQL’s SELECT - FROM - WHERE expressions; equivalently, I’ve given a conceptual algorithm for evaluating such expressions. Now, there’s no implication that the implementation has to use exactly that algorithm in order to evaluate such expressions; au contraire, it can use any algorithm it likes, just so long as whatever algorithm it does use is guaranteed to give the same result as the conceptual one. And there are often good reasons—usually performance reasons—for using a different algorithm, thereby (for example) evaluating the clauses in a different order or otherwise rewriting the original query. However, the implementation is free to do such things only if it can be proved that the algorithm it does use is logically equivalent to the conceptual one. Indeed, one way to characterize the job of the optimizer is to find an algorithm that’s guaranteed to be equivalent to the conceptual one but performs better ... which brings us to the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.166.101