12.11. SQL: Correlated and Existential Subqueries

This section extends the subquery work in Section 12.8 by considering correlated and existential subqueries. The subqueries discussed earlier were simple, uncorrelated sub-queries. An uncorrelated subquery is computed once, replaced by its result, and the outer query is run. In contrast, a correlated subquery relates its search condition to each row of a table named in the outer query and is effectively recomputed for each outer row (see Figure 12.58).

Figure 12.58. A correlated subquery relates its condition to each row of the outer query.


Because of repeated computation, correlated subqueries can sometimes be slow to run. However, they significantly extend the range of queries that can be expressed as a single SQL query and are well worth mastering. Let us refer to the condition inside the correlated subquery as the correlation condition. This condition typically takes the form:

a ⊖ b

where a is an inner column (listed in the from clause of the subquery), b is an outer column (listed in the from clause of the outer query), and ⊖ is a comparator (e.g., =, >).

To avoid ambiguity, the column names for a or b may need to be qualified by a table name. The three basic cases are summarized in Table 12.15. Since local scope is assumed by default, we may normally leave a unqualified (unless it occurs in two inner tables). However, if a and b have the same unqualified name, we need to distinguish b by qualifying it with its outer table name. If the outer table has the same name as the inner table, we must introduce an alias for it to distinguish it from the inner table.

Table 12.15. Correlation between an inner column a and an outer column b.
CaseSituationDeclaration
1a is the name of a column in inner table only b is the name of a column in outer table onlya ⊖ b
2a, b are the same column name but in different tables T is the base name of the outer tableaT.b
3inner and outer tables have the same base name X is an alias introduced for the outer tablea⊖X.b

For example, consider the table scheme Human (firstname. gender, height). For simplicity, we assume that people in this UoD can be identified by their firstname. A sample population is provided in Figure 12.59, along with a correlated subquery to retrieve the name and height (in centimeters) of those who are above average height for their gender. In English, correlations are often made with a pronoun, such as “their”, “his/her”, “its”, and “that”. In this context, pronouns effectively function as object variables.

Figure 12.59. Correlation between inner and outer tables with the same base name.


Here the same table name “Human” appears in the from clause of both inner and outer queries. So to distinguish the gender column in the inner table from the gender column in the outer table, the alias X is introduced for the outer table. This allows us to specify the correlation condition as “gender = X.gender”, rather than “gender = gender” (which, of course, would always be true). Here X is said to be a correlation variable. You can think of X as an extra copy of the Human table.

Alternatively, you can think of it as a tuple variable that ranges over the rows of the Human table, as depicted in Figure 12.60. Despite its set-oriented syntax, SQL queries are processed one row at a time. Here X is successively assigned each row of the Human table, passing through six states (in this example), in each of which it holds one row. For each of these states, the inner query is evaluated, computing the average height for the gender currently assigned to X. If you’re familiar with programming languages, this should remind you of a nested for loop.

Figure 12.60. In the outer query, X is successively assigned one of the six rows.


Let’s walk through the execution of the query. At the outer level, we start at the top row, so X is assigned the tuple (Ann’, ‘F’, 160). Hence in the outer level, gender = ‘F’ and height = 160. In other words, X.gender = ‘F’ and X.height = 160. In the subquery, the correlation condition “gender = X.gender” now becomes “gender = ‘F’”. So the subquery returns the average height of the females, which for the given data is 165. The condition in the outer query now becomes “160 > 165”, which is false. Since the top row does not satisfy the condition, it is filtered out, so Ann is excluded from the result.

At the outer level, we now move on to the second row, so X is assigned the tuple (‘Bill’, ‘M’, 180). Hence X.gender = ‘M’ and height = 180. In the subquery, the condition “gender = X.gender” becomes “gender = ‘M’”. So the subquery returns the average height of the males, which is 175. The condition in the outer query now becomes “180 > 175”, which is true. So Bill’s firstname and height (‘Bill’, 180) are included in the result.

Similarly we move through rows 3, 4, 5, and 6 at the outer level, recomputing the subquery each time. Carol is accepted (170 > 165 is true) but the others are rejected (because the conditions 175 > 175, 165 > 165, and 170 > 175 are all false). So the final result lists the firstname and height of just Bill and Carol as shown in Figure 12.59.

Now consider the table scheme Employee (emoNr. empName, pay, [bossNr]), where pay stores the weekly pay and bossNr is the employee number of the boss of the employee on that row. The foreign key constraint bossNr references empNr also applies. A sample population is shown in Table 12.16. Now consider the query: Who earns at least 80% of his/her boss’s pay?

Table 12.16. Who earns at least 80% of his/her boss’s pay?


Let’s try answering the question first, using the sample data: Employee 1 is eliminated because he/she has no boss. Employee 2 earns 900/1000 or 90% of his/her boss’s pay, employee 3 earns 70% of his/her boss’s pay, and employee 4 earns 780/900 or 86.67% of his/her boss’s pay. So the result should list employees 2 and 4. How can we formulate the query in SQL to work with any set of data? You might like to try it yourself before reading on.

The inclusion of the pronoun “his/her” in the question is a clue that the query can be formulated with a correlated subquery, as set out later. E is used to denote the employee at the outer level. Although not needed, a second alias B has been used to help indicate that the inner employee is a boss.

select empNr
from Employee as E
where pay > =
    (select 0.8 * pay
    from Employee as B
    where B.empNr = E.bossNr)

As an alternative, you can formulate the query using a self-join as follows:

select E.empNr
from Employee as E join Employee as B
   on E.bossNr = B.empNr
      and E.pay > = 0.8 * B.pay

To help you understand the join approach, the following list shows rows formed from the join before the final projection is made on E.empNr.

E   B   
empNrempNamepaybossNrempNrempNamepaybossNr
----------------------------------------------------------------
2Jones, E90011Davis, J1000?
4Brown, T78022Jones, E9001

As this example suggests, computing a correlated subquery is basically equivalent to computing a join, so some care is needed in the formulation of correlated subqueries to avoid unacceptable performance.

Now let’s consider existential subqueries. In predicate logic, the proposition “There is a frog” may be formulated as “∃x Fx”. Here “∃” is the existential quantifier, with the meaning “there exists at least one”, “x” is an individual variable standing for some object, and “F ...” abbreviates the predicate “... is a frog”. An existential subquery in SQL is somewhat similar. It appears in an existential condition of the form:

exists(subquery)

Here exists is used instead of “∃” for the existential quantifier, and subquery is effectively a tuple variable ranging over rows returned by the subquery. The exists quantifier may be thought of as a Boolean function, returning True if the subquery returns a row and False otherwise. The exists quantifier may be preceded by the logical not operator. The possible cases are shown here. The term “a row” means “at least one row”. Since the system always knows whether or not a row is returned, a simple two-valued logic applies—existential conditions evaluate to True or False, never Unknown.

exists (subquery)True--a row is returned (by the subquery)
 False-- no row is returned
not exists(subquery)True--no row is returned
 False-- a row is returned

Existential subqueries may be used within search conditions in a where clause, leading to overall queries of the following form:

select...
from ...
where [not] exists
 (select*...)

Since the mere existence of a row is all that counts in the subquery result, it is normal to simply use “*” instead of a column list in the select list for the existential sub-query. Prior to SQL-92, existential subqueries were the only subqueries that could return multiple columns, since in SQL-89 both membership and comparison subqueries had to return a single column. Some SQL dialects still have this restriction.

Let’s look at some examples using the populated schema in Figure 12.61. The Member table stores details about members of a martial arts club. The Ranked table indicates what rank (if any) members have achieved in what martial arts. Below black belt level, kyu grades are used, whereas dan grades are used for black belt ranks.

Figure 12.61. Martial arts club members and their ranks (if any).


Consider the query: Who (member number and name) is a black belt in at least one art? We can formulate this in SQL with an existential subquery as shown here. This is also a correlated subquery since the subquery condition refers back to the outer table (Member). Because the outer table has a different name from the inner table (Ranked), there is no need to introduce an alias for it. The existential condition checks to see if the member has a row in the Ranked table with a dan rank.

The same query could have been formulated using a membership query instead, e.g.,

select memberNr, memberName
from Member
where memberNr in
   (select memberNr from Ranked
   where rank like‘% dan’)

To list the members who do not have a black belt, we simply substitute not exists for exists and not in for in in the previous queries. Would the following query also work for this?

select memberNr, memberName

from Member natural join Ranked

where rank not like‘% dan’

As you no doubt realized, this fails because it also returns members with both a dan rank and a nondan rank. As a more complex example, suppose that club members are identified instead by combining their surname and firstname. The populated schema for this is shown in Figure 12.62.

Figure 12.62. Members have a composite reference scheme.


Now consider the query: Who is not ranked in judo? The existential subquery solution is straightforward, matching on both surname and firstname:

Although we can do this with a membership subquery, in SQL-89 the logic becomes more complex because we cannot use a column-name pair (here surname, firstname) as an argument for the in operator. In the following query, the surname column is compared using the in operator and the firstname is matched using a correlation:

select surname, firstname from Member
where surname not in
  (select surname from Ranked
   where firstname = Member.firstname
            and art = ‘judo’)

For such cases, the existential solution is easier to understand.

Yet another solution to this query is to use the concatenate operator to combine the name parts into a single string (recall that some dialects use “+” instead of “|]”):

select surname, firstname from Member
where surname || firstname not in
   (select surname || firstname from Ranked
    where art = ‘judo’)

Further examples of correlated and existential subqueries are discussed in the context of set-comparison queries in the advanced SQL supplement (Appendix C).

Exercise 12.11

  1. A population for the table scheme Pupil pupilNr, surname, firstname. gender, [ iqj) was provided in Figure 12.55. Formulate the following queries in SQL:

    1. Which person has the highest IQ?

    2. Who has the highest IQ for his/her gender?

    3. List each male, female pair where both genders have the same IQ.

  2. For the Member and Ranked tables discussed in Figure 12.62, formulate the following in SQL.

    1. Who holds a kyu rank?

    2. Which genders hold a kyu rank?

    3. Who is not ranked?

    4. Who does not hold a kyu rank?

  3. The following schema refers to part of the student database considered in Exercise 12.1. Specify the following queries in SQL.

    1. List the student number and name of each student who did not score any 7s. (Use a subquery.)

    2. Same question, but use a join instead of a subquery. (Hint: 7 is the highest rating.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.140.206