Chapter 10. Correlated Subqueries

A correlated subquery is an inner subquery whose information is referenced by the main, outer query such that the inner query may be thought of as being executed repeatedly. In this chapter, we discuss correlated subqueries in detail. We discuss existence queries (EXISTS) and correlation as well as NOT EXISTS. We also take a look at SQL’s universal and existential qualifiers. Before discussing correlated subqueries in detail however, let’s make sure that you understand what constitutes a noncorrelated subquery.

Noncorrelated Subqueries

A noncorrelated subquery is a subquery that is independent of the outer query. In other words, the subquery could be executed on its own. The following is an example of a query that is not correlated:

SELECT  s.sname
FROM    Student s
WHERE   s.stno IN
             (SELECT gr.student_number
             FROM    Grade_report gr
             WHERE   gr.grade = 'A')

The first part of the preceding query (the first three lines) is the main, outer query, and the second part (the part in parentheses) is the subquery (also referred to as an inner, nested, or embedded query). To demonstrate that this subquery is an independent entity, you could run it by itself:

SELECT   gr.student_number
 FROM    Grade_report gr
 WHERE   gr.grade = 'A'

which would produce the following output (17 rows):

student_number
--------------
2
3
8
8
10
14
20
129
142
129
34
49
123
125
126
127
142
 
(17 row(s) affected)

The preceding subquery is thought of as being evaluated first, creating the set of student numbers who have As. Then, the subquery’s result set is used to determine which rows (tuples) in the main query will be SELECTed. So, the full query results in the following output (14 rows):

sname
--------------------
Lineas
Mary
Brenda
Richard
Lujack
Donald
Lynette
Susan
Holly
Sadie
Jessica
Steve
Cedric
Jerry
 
(14 row(s) affected)

Correlated Subqueries

As stated at the beginning of the chapter, a correlated subquery is an inner subquery whose information is referenced by the main, outer query such that the inner query may be thought of as being executed repeatedly.

Correlated subqueries present a different execution scenario to the database manipulation language (DML) than do ordinary, noncorrelated subqueries. The correlated subquery cannot stand alone, as it depends on the outer query; therefore, completing the subquery prior to execution of the outer query is not an option. The efficiency of the correlated subquery varies; it may be worthwhile to test the efficiency of correlated subqueries versus joins or sets.

Tip

One situation in which you cannot avoid correlation is the “for all” query, which is discussed later in this chapter.

To illustrate how a correlated subquery works, the following is an example of the non-correlated subquery from the previous section revised as a correlated subquery:

SELECT  s.sname
FROM    Student s
WHERE   s.stno IN
  (SELECT    gr.student_number
   FROM      Grade_report gr
   WHERE     gr.student_number = s.stno
   AND       gr.grade = 'A')

This query produces the following output (14 rows), which is the same as the output of the noncorrelated subquery (shown earlier):

sname
--------------------
Lineas
Mary
Brenda
Richard
Lujack
Donald
Lynette
Susan
Holly
Sadie
Jessica
Steve
Cedric
Jerry
 
(14 row(s) affected)

In this example, the inner query (the part in parentheses) references the outer one—observe the use of s.stno in the WHERE clause of the inner query. Rather than thinking of this query as creating a set of student numbers that have As, each row from the outer query can be considered to be SELECTed individually and tested against all rows of the inner query one at a time until it is determined whether a given student number is in the inner set and whether that student earned an A.

This query was illustrated with and without correlation. You might think that a correlated subquery is less efficient than doing a simple subquery, because the simple subquery is done once, whereas the correlated subquery is done once for each outer row. However, the internal handling of how the query executes depends on the SQL and the optimizer for that database engine.

The correlated subquery acts like a nested DO loop in a programming language, where the first row from the Student table is SELECTed and tested against all the rows in the Grade_report table, and then the second Student row is SELECTed and tested against all rows in the Grade_report table. The following is the DO loop in pseudocode:

LOOP1: For each row in Student  s  DO
      LOOP2: For each row in Grade_report  gr  DO
             IF (gr.student_number = s.stno) THEN
                   IF (gr.grade = 'B') THEN TRUE
      END LOOP2;
     IF TRUE, THEN Student row is SELECTed
END LOOP1

Existence Queries and Correlation

Correlated queries are often written so that the question in the inner query is one of existence. For example, suppose you want to find the names of students who have taken a computer science (COSC) class and have earned a grade of B in that course. This query can be written in several ways. For example, you can write it as a noncorrelated subquery as follows:

SELECT  s.sname
FROM    Student s
WHERE   s.stno IN
 (SELECT   gr.student_number FROM Grade_report gr, Section
  WHERE    Section.section_id = gr.section_id
  AND         Section.course_num LIKE 'COSC%'
  AND         gr.grade = 'B')

This query produces the following output (17 rows):

sname
--------------------
Lineas
Mary
Brenda
Lujack
Reva
Harley
Chris
Lynette
Hillary
Phoebe
Holly
George
Cramer
Fraiser
Francis
Lindsay
Stephanie
 
(17 row(s) affected)

You can think of this query as first forming the set of student numbers of students who have made Bs in COSC courses—the inner query result set. In the inner query, you must have both the Grade_report table (for the grades) and the Section table (for the course numbers). Once you form this set of student numbers (by completing the inner query), the outer query looks through the Student table and SELECTs only those students who are in the inner query set.

Tip

This query could also be done by creating a double-nested subquery containing two INs, or it could be written using a three-table join.

Had we chosen to write the query with an unnecessary correlation, it might look like this:

SELECT  s.sname
FROM    Student s
WHERE   s.stno IN
 (SELECT      gr.student_number
  FROM        Grade_report gr, Section
  WHERE       Section.section_id = gr.section_id
  AND           Section.course_num LIKE 'COSC%'
  AND           gr.student_number = s.stno
  AND           gr.grade = 'B')

The output of this query would be the same as the previous query. In this case, the use of the Student table in the subquery is unnecessary. Although correlation is unnecessary, this example is included to show the following:

  • When correlation is necessary

  • How to untangle unnecessarily correlated queries

  • How you might migrate your thought process toward correlation, should it be necessary

First, let’s look at situations in which the correlation of a subquery is necessary, and introduce a new predicate: EXISTS.

Using EXISTS

In situations in which the correlation of a subquery is necessary, you can write the correlated subquery with the EXISTS predicate, which looks like this:

SELECT s.sname
FROM   Student s
WHERE EXISTS
 (SELECT 1 FROM Grade_report gr, Section
  WHERE Section.section_id = gr.section_id
  AND     Section.course_num LIKE 'COSC%'
  AND     gr.student_number  = s.stno
  AND     gr.grade = 'B')

The output of this query would be the same as the output (17 rows) of both of the previous queries.

Let’s dissect this query. The EXISTS predicate says, “Choose the row from the Student table in the outer query if the subquery is true (that is, if a row in the subquery exists that satisfies the condition in the subquery WHERE clause).” Because no actual result set is formed, "SELECT 1” is used as a “dummy” result set to indicate that the subquery is true (1 is returned) or false (no rows are returned). In the noncorrelated case, we tied the student number in the Student table to the inner query by the IN predicate as follows:

SELECT   s.stno
FROM     Student s
WHERE    s.stno IN
   (SELECT "student number ...)

When using the EXISTS predicate, we do not use any column of the Student table, but rather are seeking only to find whether the subquery WHERE can be satisfied.

We have indicated that we are using EXISTS with (SELECT 1...). Using the EXISTS predicate, the subquery does not form a result set per se, but rather causes EXISTS to returns true or false. The use of SELECT * in the inner query is common among SQL programmers. However, from an “internal” standpoint, SELECT * causes the SQL engine to check the data dictionary unnecessarily. As the actual result of the inner query is not important, it is strongly suggested that you use SELECT 'X' (or SELECT 1 ...) instead of SELECT * ... so that a constant is SELECTed instead of some “sensible” entry. The SELECT 'X' .. or SELECT 1 ... is simply more efficient.

In the EXISTS case, we do not specify any columns to be SELECTed in the inner query’s result set; rather, we use a dummy result--SELECT 'X' (or we could use SELECT 1). If the subquery WHERE is satisfied, it returns true, and if the inner query is not satisfied, it selects nothing, then the subquery returns false. The EXISTS predicate forces us to correlate the query. To illustrate that correlation is usually necessary with EXISTS, consider the following query:

SELECT   s.sname
FROM     Student s
WHERE EXISTS
 (SELECT 'X' FROM Grade_report gr, Section t
  WHERE   t.section_id = gr.section_id
  AND     t.course_num LIKE 'COSC%'
  AND     gr.grade = 'B')

This query produces 48 rows of output (of which we show the first 20 rows):

sname
--------------------
Lineas
Mary
Zelda
Ken
Mario
Brenda
Romona
Richard
Kelly
Lujack
Reva
Elainie
Harley
Donald
Chris
Jake
Lynette
Susan
Monica
Bill.
.
.
 
(48 row(s) affected)

This query uses EXISTS, but has no correlation. This syntax infers that for each student row, we test the joined Grade_report and Section tables to see whether there is a course number like COSC and a grade of B (which, of course, there is). We unnecessarily ask the subquery question over and over again. The result from this latter, uncorrelated EXISTS query is the same as the following:

SELECT  s.sname
FROM    Student s

The point is that the correlation is usually necessary when we use EXISTS.

Consider another example in which a correlation could be used. Suppose that we want to find the names of all students who have three or more Bs. A first pass at a query might be something like this:

SELECT  s.sname
FROM    Student s WHERE "something" IN
 (SELECT "something"
  FROM    Grade_report
  WHERE  "count of grade = 'B'" > 2)

This query can be done with a HAVING clause, as you saw previously (Chapter 9), but we want to show how to do this in yet another way. Suppose we arrange the subquery to use the student number (stno) from the Student table as a filter and count in the subquery only when a row in the Grade_report table correlates to that student. The query (this time with an implied EXISTS) looks like this:

SELECT   s.sname
FROM     Student s
WHERE 2 < (SELECT COUNT(*)
           FROM     Grade_report gr
           WHERE    gr.student_number = s.stno
           AND      gr.grade = 'B')

which results in the following output (8 rows):

sname
--------------------
Lineas
Mary
Lujack
Reva
Chris
Hillary
Phoebe
Holly
 
(8 row(s) affected)

Although there is no EXISTS in this query, it is implied. The syntax of the query does not allow an EXISTS, but the sense of the query is "WHERE EXISTS a COUNT of 2 which is less than...” In this correlated subquery, we have to examine the Grade_report table for each member of the Student table to see whether the student has more than two Bs. We test the entire Grade_report table for each student row in the outer query.

If it were possible, a subquery without the correlation would be more desirable, because it would appear simpler to understand. The overall query might be as follows:

SELECT  s.sname
FROM    Student s
WHERE   s.stno IN
 (subquery that defines a set of students who have made 3 Bs)

Therefore, we might attempt to write the following query:

SELECT   s.sname
FROM     Student s
WHERE    s.stno IN
  (SELECT   gr.student_number
    FROM    Grade_report gr
    WHERE   gr.grade = 'B')

However, as the following output (27 rows) shows, this query would give us only students who earned at least one B:

sname
--------------------
Lineas
Mary
Zelda
Ken
Mario
Brenda
Kelly
Lujack
Reva
Harley
Chris
Lynette
Hillary
Phoebe
Holly
Sadie
Jessica
Steve
Cedric
George
Cramer
Fraiser
Francis
Smithly
Sebastian
Lindsay
Stephanie
 
(27 row(s) affected)

To get a list of students who have earned at least three Bs, we could try the following query:

SELECT   s.sname
FROM     Student s
WHERE    s.stno IN
 (SELECT    gr.student_number, COUNT(*)
  FROM      Grade_report gr
  WHERE     gr.grade = 'B'
  GROUP BY  gr.student_number
  HAVING    COUNT(*) > 2)

However, this approach does not work, because the subquery cannot have two columns in its result set unless the main query has two columns in the WHERE .. IN.

Here, the subquery must have only gr.student_number to match s.stno. So, we might try to construct an inline view, as shown in the following query:

SELECT  s.sname
FROM    Student s
WHERE   s.stno IN
  (SELECT vi.student_number
   FROM  (SELECT    student_number, ct = COUNT(*)
                 FROM      Grade_report gr
                 WHERE     gr.grade = 'B'
                 GROUP BY student_number
                 HAVING COUNT(*) > 2) AS vi)

This is an example of the inline view, discussed in Chapter 6. This query succeeds in SQL Server 2005, producing the following output (8 rows):

sname
--------------------
Lineas
Mary
Lujack
Reva
Chris
Hillary
Phoebe
Holly
 
(8 row(s) affected)

Tip

This query also works in Oracle, but it may fail in other SQL languages.

As you can see, several ways exist to query the database with SQL. In this case, the correlated subquery may be the easiest to see and perhaps the most efficient.

From IN to EXISTS

A simple example of converting from IN to EXISTS--uncorrelated to correlated (or vice versa)--would be to move the set test in the WHERE .. IN of the uncorrelated subquery to the WHERE of the EXISTS in the correlated query.

As an example, consider the following uncorrelated subquery:

SELECT *
FROM   Student s
WHERE  s.stno  IN
  (SELECT  g.student_number
   FROM    Grade_report g
   WHERE   grade = 'B')

The following is the same query written as a correlated subquery:

SELECT *
FROM   Student s
WHERE EXISTS
  (SELECT   g.student_number
   FROM     Grade_report g
   WHERE    grade = 'B'
    AND     s.stno = g.student_number)

This query produces 27 rows of output (of which we show the first 15 rows):

STNO   SNAME                MAJOR CLASS  BDATE
------ -------------------- ----- ------ -----------------------
2      Lineas               ENGL  1      1980-04-15 00:00:00
3      Mary                 COSC  4      1978-07-16 00:00:00
5      Zelda                COSC  NULL   1978-02-12 00:00:00
6      Ken                  POLY  NULL   1980-07-15 00:00:00
7      Mario                MATH  NULL   1980-08-12 00:00:00
8      Brenda               COSC  2      1977-08-13 00:00:00
13     Kelly                MATH  4      1980-08-12 00:00:00
14     Lujack               COSC  1      1977-02-12 00:00:00
15     Reva                 MATH  2      1980-06-10 00:00:00
19     Harley               POLY  2      1981-04-16 00:00:00
24     Chris                ACCT  4      1978-02-12 00:00:00
34     Lynette              POLY  1      1981-07-16 00:00:00
121    Hillary              COSC  1      1977-07-16 00:00:00
122    Phoebe               ENGL  3      1980-04-15 00:00:00
123        Holly                  POLY  4       1981-01-15 00:00:00.
.
.
 
(27 row(s) affected)

This example gives you a pattern to move from one kind of query to the other kind and to test the efficiency of both kinds of queries. Both of the preceding queries should produce the same output.

NOT EXISTS

As with the IN predicate, which has a NOT IN compliment, EXISTS may also be used with NOT. In some situations, the predicates EXISTS and NOT EXISTS are vital. For example, if we ask a “for all” question, it must be answered by “existence"--actually, the lack thereof (that is, “not existence”). In logic, the statement, “find x for all y” is logically equivalent to “do not find x where there does not exist a y.” Or, there is no x for no y. Or, you cannot find an x when there is no y.

In SQL, there is no “for all” predicate. Instead, SQL uses the idea of “for all” logic with NOT EXISTS. (A word of caution, however—SQL is not simply a logic exercise, as you will see.) In this section, we look at how EXISTS and NOT EXISTS work in SQL. In the following section, we address the “for all” problem.

Consider the following query:

SELECT   s.sname
FROM     Student s
WHERE EXISTS
  (SELECT 'X'
   FROM        Grade_report gr
   WHERE       s.stno = gr.student_number
   AND         gr.grade = 'C')

which produces the following output (24 rows):

sname
--------------------
Zelda
Ken
Mario
Brenda
Richard
Reva
Donald
Jake
Susan
Monica
Bill
Sadie
Jessica
Steve
Alan
Rachel
Smithly
Sebastian
Losmith
Genevieve
Thornton
Gus
Benny
Lionel
 
(24 row(s) affected)

For this correlated subquery, “student names” are SELECTed when:

  • The student is enrolled in a section (WHERE s.stno = gr.student_number)

  • The same student has a grade of C (note the correlation in the WHERE clause in the inner query)

Both statements must be true for the student row to be SELECTed. Recall that we use SELECT 1 or SELECT 'X' in our inner query, because we want the subquery to return something if the subquery is true. The actual value of the “something” does not matter. true means something is returned; false means nothing was returned from the subquery. Therefore, SELECT .. EXISTS “says” SELECT .. WHERE true. The inner query is true if any row is SELECTed in the inner query.

Now consider the preceding query with a NOT EXISTS in it instead of EXISTS for students who do not have a grade of C:

SELECT s.sname
FROM   Student s
WHERE NOT EXISTS
  (SELECT 'X'
   FROM    Grade_report gr
   WHERE   s.stno = gr.student_number
   AND     gr.grade = 'C')

This query produces the following output (24 rows):

sname
--------------------
Lineas
Mary
Romona
Kelly
Lujack
Elainie
Harley
Chris
Lynette
Smith
Hillary
Phoebe
Holly
Brad
Cedric
George
Jerry
Cramer
Fraiser
Harrison
Francis
Lindsay
Stephanie
Jake
 
(24 row(s) affected)

In this query, we are still SELECTing with the pattern SELECT .. WHERE true because all SELECTs with EXISTS work that way. But, the twist is that the subquery has to be false to be SELECTed with NOT EXISTS. If the subquery is false, then NOT EXISTS is true and the outer row is SELECTed.

Now, logic implies that if either s.stno <> gr.student_number or gr.grade <> 'C', then the subquery “fails"--that is, it is false for that student row. As the subquery is false, the NOT EXISTS would return a true for that row. Unfortunately, this logic is not quite what happens. Recall that we characterized the correlated subquery as follows:

LOOP1: For each row in Student  s  DO
      LOOP2: For each row in Grade_report DO
             IF (gr.student_number = s.stno) THEN
                     IF (gr.grade = 'C') THEN TRUE
      END LOOP2;
     IF TRUE, THEN student row is SELECTed
END LOOP1

Note that LOOP2 is completed before the next student is tested. In other words, just because a student number exists that is not equal, it will not cause the subquery to be false. Rather, the entire subquery table is parsed and the logic is more like this:

For the case .. WHERE EXISTS s.stno = gr.student_number ..., is there a gr.grade = 'C'? If, when the student numbers are equal, no C can be found, then the subquery returns no rows—it is false for that student row. So, with NOT EXISTS, we will SELECT students who have student numbers equal in the Grade_report and Student tables, but who have no C in the Grade_report table. The point about “no C in the Grade_report table” can be answered true only by looking at all the rows in the inner query and finding no C for that student.

SQL Universal and Existential Qualifiers

In SQL, “for all” and “for each” are the universal qualifiers, whereas “there exists” is the existential qualifier. As mentioned in the preceding section, SQL does not have a “for all” predicate; however, logically, the following relationship exists:

For all x, WHERE P(x) is true ...

which is logically the same as the following:

There does not exist an x, WHERE P(x) is not true.

A “for all” type SQL query is less straightforward than the other queries we have used, because it involves a double-nested, correlated subquery using the NOT EXISTS predicate. The next section shows an example.

Example 1

To show a “for all” type SQL query, we will use another table in our Student_course database—a table called Cap (for “capability”). This table has names of students who have multiple foreign-language capabilities. We begin by looking at the table by typing the following query:

SELECT *
FROM   Cap
ORDER BY name

This query produces the following output (18 rows):

NAME      LANGU
--------- -------
BRENDA    FRENCH
BRENDA    CHINESE
BRENDA    SPANISH
JOE       CHINESE
KENT      CHINESE
LUJACK    SPANISH
LUJACK    FRENCH
LUJACK    GERMAN
LUJACK    CHINESE
MARY JO   FRENCH
MARY JO   GERMAN
MARY JO   CHINESE
MELANIE   FRENCH
MELANIE   CHINESE
RICHARD   SPANISH
RICHARD   FRENCH
RICHARD   CHINESE
RICHARD   GERMAN
 
(18 row(s) affected)

Suppose that we want to find out which languages are spoken by all students (for which we would ask the question, “For each language, does it occur with all students?”). Although this manual exercise would be very difficult for a large table, for our practice table, we can answer the question by displaying and manually counting in the table ordered by language.

To see how to answer a question of the type--"Which languages are spoken by all students?"--for a much larger table where sorting and examining the result would be tedious, we will construct a query. After showing the query, we will dissect the result. Following is the query to answer our question:

SELECT  name, langu
FROM    Cap x
WHERE NOT EXISTS
             (SELECT 'X'
             FROM Cap y
             WHERE NOT EXISTS
                           (SELECT 'X'
                           FROM Cap z
                           WHERE x.langu = z.langu
                           AND y.name = z.name))

Tip

As you will see, all the for all/for each questions follow this double-nested, correlated NOT EXISTS pattern.

This query produces the following output (7 rows):

name      langu
--------- -------
BRENDA    CHINESE
RICHARD   CHINESE
LUJACK    CHINESE
MARY JO   CHINESE
MELANIE   CHINESE
JOE       CHINESE
KENT      CHINESE
 
(7 row(s) affected)

The way the query works

To SELECT a “language” spoken by all students, the query proceeds as follows:

  1. SELECT a row in Cap (x) (outer query).

  2. For that row, begin SELECTing each row again in Cap (y) (middle query).

  3. For each of the middle query rows, we want the inner query (Cap z) to be true for all cases of the middle query (remember that true is translated to false by the NOT EXISTS). As each inner query is satisfied (it is true), it forces the middle query to continue looking for a match—to look at all cases and eventually conclude false (evaluate to false overall). If the middle query is false, the outer query sees true because of its NOT EXISTS.

    To make the middle query (y) find false, all the inner query (z) occurrences must be true; that is, the languages from the outer query must exist with all names from the middle one (y) in the inner one (z). For an eventual “match,” every row in the middle query for an outer query row must be false (that is, every row in the inner query is true).

These steps are explained in further detail in the next example, in which we use a smaller table, so that the explanation is easier to understand.

Example 2

Suppose that we have the simpler table Cap1 (see Table 10-1) when attempting to answer the question “Which languages are spoken by all students?”

Table 10-1. Cap1

Name         Language
------------ ------------
Joe          Hindi
Mary         Hindi
Mary         French
 
(3 row(s) affected)

Tip

The table Cap1 does not exist in the Student_course database. You will have to create it. Keep the column names and types similar to the table Cap.

The query will be similar to the one used in the previous section:

SELECT name, language
FROM   Cap1 x
WHERE NOT EXISTS
     (SELECT 'X'
       FROM Cap1 y
       WHERE NOT EXISTS
         (SELECT 'X'
              FROM   Cap1 z
              WHERE  x.language = z. language
              AND    y.name = z.name))
ORDER BY language

This query produces the following output:

name         language
------------ ------------
Joe          Hindi
Mary         Hind
 
(2 row(s) affected)

The way this query works

The following is a step-by-step explanation of how this query would work in Table 10-1 (Cap1):

  1. The row <Joe, Hindi> is SELECTed by the outer query (x).

  2. The row <Joe, Hindi> is SELECTed by the middle query (y).

  3. The row <Joe, Hindi> is SELECTed by the inner query (z).

  4. The inner query is true:

        X.LANGUAGE = Hindi
        Z.LANGUAGE = Hindi
        Y.NAME = Joe
        Z.NAME = Joe
  5. Because the inner query returns a row (is true), the NOT EXISTS of the middle query translates this to false and continues with the next row in the middle query. The middle query SELECTs <Mary, Hindi> and the inner query begins again with <Joe, Hindi> seeing:

         X.LANGUAGE = Hindi
         Z.LANGUAGE = Hindi
         Y.NAME = Mary
         Z.NAME = Joe

    This is false, so the inner query SELECTs a second row <Mary, Hindi>:

        X.LANGUAGE = Hindi
        Z.LANGUAGE = Hindi
        Y.NAME = Mary
        Z.NAME = Mary

    This is true, so the inner query is true. (Notice that the X.LANGUAGE has not changed yet; the outer query [X] is still on the first row.)

  6. Because the inner query returns a row (is true), the NOT EXISTS of the middle query translates this to false and continues with the next row in the middle query.

    The middle query now SELECTs <Mary, French> and the inner query begins again with <Joe, Hindi> seeing:

        X.LANGUAGE = Hindi
        Z.LANGUAGE = Hindi
        Y.NAME     = Mary
        Z.NAME     = Joe

    This is false, so the inner query SELECTs a second row <Mary, Hindi>:

        X.LANGUAGE = Hindi
        Z.LANGUAGE = Hindi
        Y.NAME     = Mary
        Z.NAME     = Mary

    This is true, so the inner query is true.

  7. Because the inner query is true, the NOT EXISTS of the middle query again converts this true to false and wants to continue, but the middle query is out of rows. Thus the middle query is false.

  8. Because the middle query is false, and because we are testing

    "SELECT distinct name, language
     FROM Cap1 x
     WHERE NOT EXISTS
        (SELECT 'X' FROM Cap1 y ...",

    the false from the middle query is translated to true for the outer query and the row <Joe, Hindi> is SELECTed for the result set. Note that “Hindi” occurs with both “Joe” and “Mary.”

  9. The second row in the outer query will repeat the previous steps for <Mary, Hindi>. The value “Hindi” will be seen to occur with both “Joe” and “Mary” as <Mary, Hindi> is added to the result set.

  10. The third row in the outer query begins with <Mary, French>. The middle query SELECTs <Joe, Hindi> and the inner query SELECTs <Joe, Hindi>. The inner query sees the following:

       X.LANGUAGE = French
       Z.LANGUAGE = Hindi
       Y.NAME     = Joe
       Z.NAME     = Mary

    This is false, so the inner query SELECTs a second row, <Mary, Hindi>:

       X.LANGUAGE = French
       Z.LANGUAGE = Hindi
       Y.NAME     = Joe
       Z.NAME     = Mary

    This is false, so the inner query SELECTs a third row, <Mary, French>:

       X.LANGUAGE = French
       Z.LANGUAGE = French
       Y.NAME     = Joe
       Z.NAME     = Mary

    This is also false. The inner query returns no rows (fails). The inner query evaluates to false, which causes the middle query to returns rows (see true) because of the NOT EXISTS. Because the middle query sees true, it is finished and evaluated to true. Because the middle query evaluates to true, the NOT EXISTS in the outer query changes this to false and X.LANGUAGE = French fails because X.LANGUAGE = French did not occur with all the values of NAME.

Consider again the “for all” query presented in Example 2:

SELECT name, language
FROM   Cap1 x
WHERE NOT EXISTS
     (SELECT 'X'
       FROM Cap1 y
       WHERE NOT EXISTS
         (SELECT 'X'
              FROM    Cap1 z
              WHERE   x.language = z. language
              AND     y.name = z.name))
ORDER BY language

A clue as to what a query of this kind means can be found in the inner query where the outer query is tested. In the phrase that says WHERE x.language = z. language ..., the x.language is where the query is testing which language occurs for all names.

This query is a SQL realization of a relational division exercise. Relational division is a “for all” operation just like that illustrated earlier. In relational algebra, the query must be set up into a divisor, dividend, and quotient in this pattern:

Quotient (B) ← Dividend(A, B) divided by Divisor (A).

If the question is “What language for all names?” then the Divisor, A, is names, and the Quotient, B, is language. It is most prudent to set up SQL like relational algebra with a two-column table (like Cap or Cap1) for the Dividend and then treat the Divisor and the Quotient appropriately. Our query will have the column for language, x.language, in the inner query, as language will be the quotient. We have chosen to also report name in the result set.

Example 3

Note that the preceding query is completely different from the following query, which asks, “Which students speak all languages?”:

SELECT DISTINCT name, language
FROM   Cap1 x
WHERE NOT EXISTS
             (SELECT 'X'
             FROM Cap1 y
             WHERE NOT EXISTS
                     (SELECT 'X'
                          FROM     Cap1 z
                          WHERE    y.language = z.language
                          AND      x.name = z.name))
ORDER BY language

This query produces the following output:

name         language
------------ ------------
Mary         French
Mary         Hindi
 
(2 row(s) affected)

Note that the inner query contains x.name, which means the question was “Which names occur for all languages?” or, put another way, “Which students speak all languages?” The “all” goes with languages for x.name.

Summary

In this chapter, we discussed the correlated subquery, noncorrelated subquery, EXISTS, and NOT EXISTS. We described situations where the correlation of a subquery is necessary and can be written with the EXISTS predicate, and other times when EXISTS can be used, even with no correlation. We also introduced loops and discussed how the “for all” and “for each” are used in SQL.

Review Questions

  1. What is a noncorrelated subquery?

  2. Which type of subquery can be executed on its own?

  3. Which part of a query is evaluated first, the query or the subquery?

  4. What are correlated subqueries?

  5. What does the EXISTS predicate do?

  6. What are considered universal qualifiers?

  7. Is correlation necessary when we use EXISTS? Why?

  8. Explain how the “for all” type SQL query involves a double-nested correlated subquery using the NOT EXISTS predicate.

Exercises

Unless specified otherwise, use the Student_course database to answer the following questions. Also, use appropriate column headings when displaying your output.

  1. List the names of students who have received Cs. Do this in three ways: (a) as a join, (b) as an uncorrelated subquery, and (c) as a correlated subquery. Show both results and account for any differences.

  2. In section "Existence Queries and Correlation,” you were asked to find the names of students who have taken a computer science class and earned a grade of B. We noted that it could be done in several ways. One query could look like this:

    SELECT   s.sname
    FROM     Student s
    WHERE    s.stno IN
       (SELECT  gr.student_number
        FROM    Grade_report gr, Section
        WHERE   Section.section_id =  gr.section_id
        AND         Section.course_num LIKE 'COSC___  _'
        AND         gr.grade = 'B')

    Redo this query, putting the finding of the COSC course in a correlated subquery. The query should be as follows:

    The Student table uncorrelated subquery to the Grade_report table, correlated EXISTS to the Section table.

  3. In the section "SQL Universal and Existential Qualifiers,” we illustrated both an existence query:

    SELECT  s.sname
    FROM    Student s
    WHERE EXISTS
      (SELECT 'X'
       FROM     Grade_report gr
       WHERE    Student.stno = gr.student_number
       AND      gr.grade = 'C')

    and a NOT EXISTS version:

    SELECT   s.sname
    FROM     Student s
    WHERE NOT EXISTS
      (SELECT 'X'
       FROM     Grade_report gr
       WHERE    Student.stno = gr.student_number
       AND      gr.grade = 'C')

    Show that the EXISTS version is the complement of the NOT EXISTS version—count the rows in the EXISTS result, the rows in the NOT EXISTS result, and the rows in the Student table. Also, devise a query to give the same result with IN and NOT..IN.

  4. Discover whether all students take courses by counting the students, and then count those students whose student numbers are in the Grade_report table and those whose student numbers are not in the table. Use IN and then NOT..IN, and then use EXISTS and NOT EXISTS. How many students take courses and how many students do not?

    1. Find out which students have taken courses but who have not taken COSC courses. Create a set of student names and courses from the Student, Grade_report, and Section tables (use the prefix COSC to indicate computer science courses). Then, use NOT..IN to “subtract” from that set another set of student names of students (who take courses) who have taken COSC courses. For this set difference, use NOT..IN.

    2. Change NOT..IN to NOT EXISTS (with other appropriate changes) and explain the result. The “other appropriate changes” include adding the correlation and the change of the result column in the subquery set.

  5. There exists a table called Plants. List the table and then find out what company or companies have plants in all cities. Verify your result manually.

  6. Run the following query and print the result:

    SELECT distinct name, langu
    FROM Cap x
    WHERE NOT EXISTS
                  (SELECT 'X'
                  FROM Cap y
                  WHERE NOT EXISTS
                              (SELECT 'X'
                              FROM Cap z
                              WHERE X.langu =Z.langu
                              AND Y.name=Z.name))

    Save the query (e.g., save forall) and hand in the result.

    1. Recreate the Cap table (e.g., call it some other name, such as LANG1). To do this, first create the table and then use the INSERT statement with the sub select option (INSERT INTO LANG1 AS SELECT * FROM Cap).

    2. Add a new person to your table who speaks only BENG.

    3. Recall your previous SELECT (get for all).

    4. CHANGE the table from CAP to LANG1 (for all occurrences, use CHANGE/Cap/lang1/ repeatedly, assuming that you called your table LANG1).

    5. Start the new query (the one you just created with LANG1 in it).

    6. How is this result different from the situation in which Newperson was not in LANG1? Provide an explanation of why the query did what it did.

  7. The Department_to_major table is a list of four-letter department codes with the department names. In Chapter 8, Exercise 7 (hereafter referred to as Exercise 8-7), you created a table called Secretary, which should now have data like this:

    Secretary

     

    dCode

    Name

    ACCT

    Beryl

    COSC

    Kaitlyn

    ENGL

    David

    HIST

    Christina

    BENG

    Fred

    Null

    Brenda

    In Exercise 8-7, you did the following:

    1. Create a query that lists the names of departments that have secretaries (use IN and the Secretary table in a subquery). Save this query as q8_7a.

    2. Create a query that lists the names of departments that do not have secretaries (use NOT..IN). Save this query as q8_7b.

    3. Add one more row to the Secretary table that contains <null,'Brenda'>. (This could be a situation in which you have hired Brenda but have not yet assigned her to a department.)

    4. Recall q8_7a and rerun it.

    5. Recall q8_7b and rerun it.

    We remarked in Exercise 8-7 that the NOT..IN predicate has problems with nulls: the behavior of NOT..IN when nulls exist may surprise you. If nulls may exist in the subquery, then NOT..IN should not be used. If you use NOT..IN in a subquery, you must ensure that nulls will not occur in the subquery or you must use some other predicate, such as NOT EXISTS. Perhaps the best advice is to avoid NOT..IN.

    Here, we repeat Exercise 8-7 using NOT EXISTS:

  8. Reword query q8_7a to use EXISTS. You will have to correlate the inner and outer queries. Save this query as q10_7aa.

  9. Reword query q8_7b to use NOT EXISTS. You will have to correlate the inner and outer queries. Save this query as q10_7bb. You should not have a phrase IS NOT NULL in your NOT EXISTS query.

  10. Rerun q8_9a with and without <null, Brenda>.

  11. Rerun q8_9b with and without <null, Brenda>.

    Note the difference in behavior versus the original question. List the names of those departments that do or do not have secretaries. The point here is to encourage you to use NOT EXISTS in a correlated subquery, rather than NOT..IN.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.170.134