SUBQUERIES

A subquery in SQL is a table expression, tx say, enclosed in parentheses; if the table denoted by tx is t, the table denoted by the subquery is t also. Note, however, that (as mentioned in Chapter 1 and Chapter 6) the expression tx can’t be an explicit JOIN expression. Thus, for example,

     ( A NATURAL JOIN B )

isn’t a legal subquery.[169] By contrast, the following expression is a legal subquery:

     ( SELECT * FROM A NATURAL JOIN B )

Subqueries fall into three categories (though the syntax is the same in every case). The details, partly repeated from earlier chapters, are as follows:

  • A table subquery is a subquery that’s neither a row subquery nor a scalar subquery.

  • A row subquery is a subquery appearing in a position where a row expression is expected. Let rsq be such a subquery; then rsq must denote a table with just one row. Let the table in question be t, and let the single row in t be r; then rsq behaves as if it denoted that row r (in other words, t is coerced to r). Note: If rsq doesn’t denote a table with just one row, then (a) if it denotes a table with n rows (n > 1), an error is raised; (b) if it denotes a table with no rows at all, then that table is treated as if it contained just one row, where the row in question contains a null in every column position.

  • A scalar subquery is a subquery appearing in a position where a scalar expression is expected. Let ssq be such a subquery; then ssq must denote a table with just one row and just one column. Let the table in question be t, let the single row in t be r, and let the single value in r be v; then ssq behaves as if it denoted that value v (in other words, t is coerced to r, and then r is coerced to v). Note: If ssq doesn’t denote a table with just one row and just one column, then (a) if it denotes a table with m columns (m > 1), an error is raised (probably at compile time); (b) if it denotes a table with one column and n rows (n > 1), an error is raised (probably at run time); (c) if it denotes a table with one column and no rows at all, then that table is treated as if it contained just one row, where the row in question contains a single null.

The following examples involve, in order, a table subquery, a row subquery, and a scalar subquery:

     SELECT SNO
     FROM   S
     WHERE  CITY IN
          ( SELECT CITY             /* table subquery */
            FROM   P
            WHERE  COLOR = 'Red' )

     UPDATE S
     SET  ( STATUS , CITY ) =
          ( SELECT STATUS , CITY    /* row subquery */
            FROM   S
            WHERE  SNO = 'S1' )
     WHERE  CITY = 'Paris' ;

     SELECT SNO
     FROM   S
     WHERE  CITY =
          ( SELECT CITY             /* scalar subquery */
            FROM   P
            WHERE  PNO = 'P1' )

Next, a correlated subquery is a special kind of (table, row, or scalar) subquery; to be specific, it’s a subquery that includes a reference to some “outer” table. In the following example, the parenthesized expression following the keyword IN is a correlated subquery, because it includes a reference to the outer table S (the query is “Get names of suppliers who supply part P1”):

     SELECT DISTINCT S.SNAME
     FROM   S
     WHERE  'P1' IN
           ( SELECT PNO              /* correlated subquery */
             FROM   SP
             WHERE  SP.SNO = S.SNO )

As noted in Chapter 11, correlated subqueries are often contraindicated from a performance point of view, because—conceptually, at any rate—they have to be evaluated once for each row in the outer table instead of just once and for all. (In the example, if the overall expression is evaluated as stated, the subquery will be evaluated n times, where n is the number of rows in table S.) For that reason, it’s a good idea to avoid correlated subqueries if possible. In the case at hand, it’s very easy to reformulate the query to achieve this goal:

     SELECT DISTINCT S.SNAME
     FROM   S
     WHERE  SNO IN
          ( SELECT SNO              /* noncorrelated subquery */
            FROM   SP
            WHERE  PNO = 'P1' )

Finally, a “lateral” subquery is a special kind of correlated subquery. To be specific, it’s a correlated subquery that (a) appears in a FROM clause and (b) includes a reference to an “outer” table that’s defined by a table specification appearing earlier in that same FROM clause. For example, consider the query “For each supplier, get the supplier number and the number of parts supplied by that supplier.” Here’s one possible formulation of that query in SQL:

     SELECT S.SNO , TEMP.PCT
     FROM   S , LATERAL ( SELECT COUNT ( PNO ) AS PCT
                          FROM   SP
                          WHERE  SP.SNO = S.SNO ) AS TEMP

The purpose of the keyword LATERAL is to tell the system that the subquery to which it’s prefixed is correlated with something previously mentioned in the very same FROM clause (in the example, that “lateral” subquery yields exactly one value—namely, the applicable count—for each SNO value in table S). Given the sample values in Figure 1-1 in Chapter 1, the result looks like this:

SNO

PCT

S1

6

S2

2

S3

1

S4

3

S5

0

Now, there’s something going on here that you might be forgiven for finding a bit confusing. The items in a FROM clause are table specifications, and so they denote tables. In the example, though, the particular table specification that begins with the keyword LATERAL—more precisely, what remains of that table specification if the keyword LATERAL is removed—looks more like what might be called a scalar specification, or more precisely a scalar subquery; certainly it could used as such, should the context demand such an interpretation (e.g., in a SELECT clause). In fact, however, it’s a table subquery. The table it denotes, for a given value of S.SNO, is called TEMP; that table has just one column, called PCT, and just one row, and hence in fact contains a single scalar value. Then the expression TEMP.PCT in the SELECT clause causes that scalar value to become the contribution of table TEMP to the applicable result row (just as the expression S.SNO in that same SELECT clause causes the applicable SNO value to become the contribution of table S to that result row).

Following on from the foregoing rather complicated explanation, I feel bound to add that it’s not exactly clear why “lateral” subqueries are needed anyway. Certainly the foregoing example can easily be reformulated in such a way as to avoid the “need” (?) for any such thing:

     SELECT S.SNO , ( SELECT COUNT ( PNO )
                      FROM   SP
                      WHERE  SP.SNO = S.SNO ) AS PCT
     FROM   S

The subquery has moved from the FROM clause to the SELECT clause; it still refers to something else in the same clause (S.SNO, to be specific), but now the keyword LATERAL is no longer needed (?). However, do note what’s happened to the specification AS PCT, which appeared inside the subquery in the LATERAL formulation but has now moved outside (this point is discussed in more detail in an aside in the section SUMMARIZATION in Chapter 7).

Finally: I’ve defined the term subquery; perhaps it’s time to define the term query, too!—even though I’ve used that term ubiquitously throughout previous chapters. So here goes: A query is a retrieval request; in the SQL context, in other words, it’s either a table expression—though such expressions can also be used in contexts other than queries per se—or a statement, such as a SELECT statement in “direct” (i.e., interactive) SQL, that asks for such an expression to be evaluated. Note: The term is sometimes used (though not in this book!) to refer to an update request also. It’s also used to refer to the natural language version of some retrieval or update request.



[169] It was legal in SQL:1992 but became illegal in SQL:2003.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.121.209