Chapter 8. Joins Versus Subqueries

The purpose of this chapter is to demonstrate the use of subqueries. Subqueries may often be used as alternatives to joins. There are two main issues to consider when choosing between subqueries and joins (and other techniques for combining tables). First, you must consider how to get the information. By understanding the limitations of joins and subqueries (as well as sets and other table-combining techniques), you will increase your choices as to how to get information from the database. Second, you must also consider performance. You usually a have choice of how to get multi-table information—joins, sets, subqueries, views, and so forth. In larger databases, you need to be flexible and consider other choices if a query performs poorly and/or if the query is done often.

Tip

Although set operations logically are also viable choices for retrieving data from multiple tables, set operations (discussed in Chapter 7) are less common and usually less efficient than joins and subqueries.

Subquery with an IN Predicate

Suppose that a query requests a list of names and numbers of students (which are in the Student table in our Student_course database) who have made As or Bs in any course (grades are in the Grade_report table in our Student_course database). You can complete this query as either a subquery or a join. As a subquery with an IN predicate, it will take the following form:

    SELECT Student.sname, Student.stno
    FROM   Student
    WHERE  "link to Grade_report"
       IN  ("link to Student" - subquery involving Grade_report)

In this format, the part of the query that contains:

    SELECT Student.sname, Student.stno
    FROM   Student
    WHERE  "link to Grade_report"

is said to be the outer query. The part of the query that contains:

    ("link to Student" - subquery involving Grade_report)

is the inner query.

The link between the Student table and the Grade_report table is the student number. In the Student table, the appropriate column is stno, and in the Grade_report table, it is student_number. When linking the tables in the subquery with an IN predicate, the linking columns are all that can be mentioned in the WHERE..IN and in the result set of the subquery. Thus, the statement with a subquery is as follows:

    SELECT Student.sname, Student.stno
    FROM   Student
    WHERE  Student.stno
      IN (SELECT   gr.student_number
          FROM     Grade_report gr
          WHERE    gr.grade = 'B' OR  gr.grade = 'A')
    ORDER BY Student.stno

Tip

The part of the query before the IN is often called the outer query. The part of the query after the IN is called the inner query.

This query produces the following output (31 rows):

    sname                stno
    -------------------- -----
    Lineas               2
    Mary                 3
    Zelda                5
    Ken                  6
    Mario                7
    Brenda               8
    Richard              10
    Kelly                13
    Lujack               14
    Reva                 15
    Harley               19
    Donald               20
    Chris                24
    Lynette              34
    Susan                49
    Hillary              121
    Phoebe               122
    Holly                123
    Sadie                125
    Jessica              126
    Steve                127
    Cedric               129
    George               132
    Jerry                142
    Cramer               143
    Fraiser              144
    Francis              146
    Smithly              147
    Sebastian            148
    Lindsay              155
    Stephanie            157

    (31 row(s) affected)

The Subquery as a Join

An alternative way to perform the preceding query would be to use a join instead of a subquery, as follows:

    SELECT  Student.sname, Student.stno
    FROM    Student, Grade_report gr
    WHERE   Student.stno = gr.student_number
    AND        (gr.grade = 'B' OR gr.grade = 'A')

This query produces 67 rows of output (of which we show the first 15 rows here):

    sname     stno
    --------- ------
    Lineas    2
    Lineas    2
    Lineas    2
    Lineas    2
    Mary      3
    Mary      3
    Mary      3
    Mary      3
    Mary      3
    Mary      3
    Brenda    8
    Brenda    8
    Brenda    8
    Richard   10
    Kelly     13
    .
    .
    .

    (67 row(s) affected)

Now, the question is why the join has 67 rows of output instead of 31 rows of output (produced by the subquery).

When the join version is used to combine tables, any Student-Grade_report row (tuple) that has equal student numbers and a grade of A or B is selected. Thus, you should expect many duplicate names in the output. To get the result without duplicates, add the qualifier DISTINCT to the join query as follows:

    SELECT DISTINCT Student.sname, Student.stno
    FROM    Student, Grade_report AS gr
    WHERE   Student.stno = gr.student_number
    AND       (gr.grade = 'B' OR gr.grade = 'A')

This query produces the following output (31 rows):

    sname                stno
    -------------------- ------
    Lineas               2
    Mary                 3
    Zelda                5
    Ken                  6
    Mario                7
    Brenda               8
    Richard              10
    Kelly                13
    Lujack               14
    Reva                 15
    Harley               19
    Donald               20
    Chris                24
    Lynette              34
    Susan                49
    Hillary              121
    Phoebe               122
    Holly                123
    Sadie                125
    Jessica              126
    Steve                127
    Cedric               129
    George               132
    Jerry                142
    Cramer               143
    Fraiser              144
    Francis              146
    Smithly              147
    Sebastian            148
    Lindsay              155
    Stephanie            157

    (31 row(s) affected)

When DISTINCT is used, internal sorting is performed before the result set is displayed. Such internal sorting may decrease response time for a query.

In the subquery version of the query, duplication of names does not occur in the output. This is so because you are setting up a set (the subquery) from which you will choose names—a given name is either in the subquery set or it is not. Remember that the student number (stno) is unique in the Student table.

Also, the question of which is more efficient, the join or the subquery, depends on which SQL language and database you are using. Without using extra tools, one way to test alternatives is to try the queries on the data or a subset of the data. Database systems such as SQL Server 2005 provide ways (tools) to find out how queries are executed.

When the Join Cannot Be Turned into a Subquery

When a column from a table needs to be in the result set, that table has to be in the outer query. If two tables are being used, and if columns from both tables have to be in the result set, a join is necessary. This type of join cannot be turned into a subquery, because information from both tables has to be in the result set. But if the result set does not need the columns from more than one table, then the join can be turned into a subquery. The other tables can be included such that the filtering conditions can be in the subquery (or inner query), and the table that has the needed result set columns is in the outer query.

Consider this example. Our original query (the first query discussed in this chapter), requested the list of names and student numbers of students who made As or Bs in any course. Student names and numbers are both in the Student table; the Grade_report table is needed only as a filter, so we could write this as a subquery, and also turn it into a join.

Now, if this original query had asked for output from the Grade_report table also, such as, “list the names, numbers, and grades of all students who have made As or Bs,” the query would be asking for information from both the Student and Grade_report tables. In this case, you would have to join the two tables to get the information; you could not just query the Grade_report table, because that table has no names in it. Similarly, the Student table contains no grades. So you would not be able to write this as a subquery. Refer again to the original query example:

    SELECT Student.sname, Student.stno
    FROM   Student
    WHERE  Student.stno
      IN (SELECT   gr.student_number
          FROM     Grade_report gr
          WHERE    gr.grade = 'B' OR  gr.grade = 'A')
    ORDER BY Student.stno

This query asks for information only from the Student table (student names and numbers). Although the query used the Grade_report table, nothing from the Grade_report table was in the outer result set. Again, the Grade_report table is needed only as a filter (to get the student numbers of those who have As and Bs); hence we were able to write this as a subquery.

The following join query asks for information from both the Student and Grade_report tables (a result set that lists both names and grades of all students who have made As or Bs in any course):

    SELECT DISTINCT Student.sname, gr.grade
    FROM   Student, Grade_report gr
    WHERE  Student.stno = gr.student_number
    AND       (gr.grade = 'B' OR gr.grade = 'A')

This query produces 41 rows of output (of which we show the first 25 rows here):

    sname                grade
    -------------------- -----
    Brenda               A
    Brenda               B
    Cedric               A
    Cedric               B
    Chris                B
    Cramer               B
    Donald               A
    Fraiser              B
    Francis              B
    George               B
    Harley               B
    Hillary              B
    Holly                A
    Holly                B
    Jerry                A
    Jessica              A
    Jessica              B
    Kelly                B
    Ken                  B
    Lindsay              B
    Lineas               A
    Lineas               B
    Lujack               A
    Lujack               B
    Lynette              A
    .
    .
    .

    (41 row(s) affected)

As this example demonstrates, if information from a table is needed in a result set, then that table cannot be buried in a subquery—it must be in the outer query.

More Examples Involving Joins and IN

The purpose of this section is to further demonstrate several queries that will and will not allow the use of the subquery. As we have discussed, some joins can be expressed as subqueries whereas others cannot. Further, all subqueries with the IN predicate can be re-formed as a join. Whether you can use a subquery depends on the final, outer result set. Some more examples will help clarify this point.

Example 1

Find the names of all the departments that offer a course with INTRO in the title. To formulate our query, we need to use the Course table (to find the course names) and the Department_to_major table (to find the names of the departments).

Begin by viewing the column names in the tables.

Tip

If you have forgotten how to view the column names of a table, refer to Figure 1-21.

Figure 8-1 gives the column names in the Course table:

Column names of the Course table
Figure 8-1. Column names of the Course table

Figure 8-2 gives the column names of the Department_to_major table:

Column names of the Department_to_major table
Figure 8-2. Column names of the Department_to_major table

Our query needs a department name (dname) from the Department_to_major table. We also need course information from the Course table, because our query depends on a course name; however, no course information appears in the result set. We did not ask for the names of the courses, just that they have INTRO in the title. The result set asks only for department names. We can find this result by using a subquery, with the Department_to_major table as the outer query, because all the information in the result set is contained in the outer query. The query would be as follows:

    SELECT d2m.dname
    FROM   Department_to_major d2m
    WHERE  d2m.dcode
      IN (SELECT     Course.offering_dept
          FROM       Course
          WHERE      Course.course_name LIKE '%INTRO%')

which produces the following output:

    dname
    --------------------
    Computer Science
    Political Science
    Chemistry

    (3 row(s) affected)

Example 2

List the student name, student major code, and section identifier of students who earned Cs in courses taught by Professor Hermano (HERMANO).

First, we determine which tables are needed. We want to find the student name and major code, and a section identifier for courses taken, so we need the Student and Grade_report tables for the result set. We will need to use the Section table for a filter. The instructor does not appear in the result set. Again, it is a good idea to look at the column names in each of the tables first.

Figure 8-3 gives the column names of the Student table.

Columns names of the Student table
Figure 8-3. Columns names of the Student table

Figure 8-4 gives the column names of the Grade_report table.

Column names of the Grade_report table
Figure 8-4. Column names of the Grade_report table

Figure 8-5 gives the column names of the Section table.

After we have determined which tables we need, we have to determine where the columns that are needed in the result set are located. We need to get the names and major codes from the Student table, and the section identifiers from the Grade_report table. So the result set part of the query (the outer query) must contain the Student and Grade_report tables. The rest of the query can contain any other tables that we need to locate the columns. The resulting query may look like this (a combination of a join and a subquery):

Column names of the Section table
Figure 8-5. Column names of the Section table
    SELECT    s.sname, s.major, g.section_id
    FROM      Student s, Grade_report g
    WHERE     g.student_number = s.stno
    AND          g.grade = 'C'
    AND          g.section_id IN
      (SELECT t.section_id
       FROM   Section t
       WHERE  t.instructor LIKE 'HERMANO')

which produces the following output:

    sname                major section_id
    -------------------- ----- ----------
    Richard              ENGL  126

    (1 row(s) affected)

The previous query could also have been done as a three-table join, as follows:

    SELECT  s.sname, s.major, t.section_id
    FROM    Student s, Grade_report g, Section t
    WHERE   s.stno = g.student_number
    AND       g.section_id =t.section_id
    AND       g.grade='C'
    AND       t.instructor LIKE 'HERMANO'

Example 3

List the name and major code of students who earned Cs in courses taught by Professor King (KING).

Again, we first need to determine which tables are needed. We need to collect student names and major codes in the result set and we need the Grade_report and Section tables for filtering conditions. (You viewed the columns available in each of these tables in the preceding example.) Next, we need to determine where the columns that are needed in the result set are located. In this example, they are all in the Student table.

Because the only table needed in the outer query is the Student table, we can structure the query in any of the following ways:

  1. Student join Grade_report join Section [three-table join]

  2. Student subquery (Grade_report join Section) [Student outer, join in subquery]

  3. Student join Grade_report subquery (Section) [similar to Example 2 but with a different result set]

  4. Student (subquery Grade_report (subquery Section)) [a three-level subquery]

Each of these queries produces the same result set with different efficiencies. We’ll study them further in the exercises at the end of the chapter.

Using Subqueries with Operators

In this section, we look at examples that demonstrate the use of subqueries with comparison operators. These examples are based on the Room table, which has the following data:

    BLDG  ROOM  CAPACITY OHEAD
    ----- ----- -------- -----
    13    101   85       Y
    36    123   35       N
    58    114   60       NULL
    79    179   35       Y
    79    174   22       Y
    58    112   40       NULL
    36    122   25       N
    36    121   25       N
    36    120   25       N
    58    110   NULL     Y

    (10 row(s) affected)

In previous chapters, you have seen SELECTs with conditions like the following:

    SELECT *
    FROM   Room
    WHERE  capacity = 25

In this example, 25 is a constant and = is a comparison operator. The constant can be replaced by a subquery, and the operator can be any of the comparison operators (=, <>, <, >, <=, or >=). For example, we could devise a query to tell us which classrooms have a below-average capacity by computing the average in a subquery and using a comparison operator, like this:

    SELECT *
    FROM   Room
    WHERE  capacity <
      (SELECT AVG(capacity)
       FROM   Room)

This query produces the following six rows of output, showing six rooms with below-average capacity:

    BLDG   ROOM   CAPACITY OHEAD
    ------ ------ -------- -----
    36     120    25       N
    36     121    25       N
    36     122    25       N
    36     123    35       N
    79     174    22       Y
    79     179    35       Y

    Warning: Null value is eliminated by an aggregate or other SET operation.

    (6 row(s) affected)

The only problem with using subqueries in this fashion is that the subquery must return only one row. If an aggregate function is applied to a table in the subquery in this fashion, you will always get only one row—even if there is a WHERE clause that excludes all rows, the subquery returns one row with a null value. For example, if we were to change the preceding query to the following and force multiple rows in the subquery,

    SELECT *
    FROM   Room
    WHERE  capacity <
      (SELECT AVG(capacity)
       FROM   Room
       WHERE  bldg = 99)

we would get:

    BLDG   ROOM   CAPACITY OHEAD
    ------ ------ -------- -----

    (0 row(s) affected)

We get no rows selected because there is no bldg = 99. If we were to change the query to the following:

     SELECT *
     FROM   Room
     WHERE  bldg =
        (SELECT  bldg
         FROM    Room
         WHERE   capacity > 10)

we would get the following error message:

    BLDG   ROOM   CAPACITY OHEAD
    ------ ------ -------- -----
    Msg 512, Level 16, State 1, Line 1
    Subquery returned more than 1 value. This is not permitted when the subquery follows
    =, !=, <, <=, >, >= or when the subquery is used as an expression.

When using comparison operators, only single values are acceptable from the subquery. Again, to ensure that we get only one row in the subquery and hence a workable query, we can use an aggregate with no GROUP BY or HAVING (to be discussed in Chapter 9).

Tip

As with all queries, the caveat to audit the result is always applicable.

Summary

In this chapter, we have introduced the subquery. We have given examples of situations in which it would be good to use subqueries, cases where subqueries could be turned into joins, and cases where they cannot be turned into joins. After reading this chapter, you should have a better appreciation for subqueries and joins.

Review Questions

  1. What is a subquery?

  2. Which part of the query/subquery is considered the inner query, and which part is considered the outer query?

  3. Can a subquery always be done as a join? Why or why not?

  4. When writing a query that will have a subquery, how do you determine which table/tables will go in the outer query?

  5. Which predicate can usually be reformulated into a join?

  6. When using operators, are many values acceptable from a result of a subquery?

  7. What can you do to insure a working subquery?

Exercises

Unless specified otherwise, use the Student_course database to answer the following questions. Also, use appropriate column headings when displaying your output.

Use the techniques from this chapter to construct and execute the following queries:

  1. Find the student numbers of students who have earned As or Bs in courses taught in the fall semester. Do this in two ways: first using a subquery, and then using a join.

  2. Find all students who took a course offered by the Accounting department. List the student name and student number, the course name, and the grade in that course. (Hint: Begin with Department_to_major and use an appropriate WHERE.) Note that this task cannot be done with a multilevel subquery. Why?

  3. For every students who is a sophomore (class = 2), find the name and the name of the department that includes the student’s major.

  4. Find the names of the departments that offer courses at the junior or senior levels (either one) but not at the freshman level. The course level is the first digit after the prefix; for example, AAAA3yyy is a junior course, and so on.

    Hint: Begin by creating the outer query—the names of departments that offer courses at the junior or senior levels. Save this query as q8_4. Then, construct the subquery—a list of departments that offer courses at the freshman level. Save the subquery as a view. Examine both lists of departments. When you have the outer query and the subquery results, recall the original query that you saved (q8_4) and add the subquery. Check your result with the department lists you just generated. Redo the last part of the experiment with your view. You should get the same result.

  5. Find the names of courses that are prerequisites for other courses. List the course number and name, and the number and name of the prerequisite.

  6. List the names of instructors who teach courses that have other than three-hour credits. Do the problem in two ways: once with IN and once with NOT..IN.

  7. Create a table called Secretary with the columns dcode (of data type CHAR(4)) for department code and name (of data type VARCHAR(20)) for the secretary name. Populate the table as follows:

    Secretary

    dCode

    name

    ACCT

    Beryl

    COSC

    Kaitlyn

    ENGL

    David

    HIST

    Christina

    BENG

    Fred

    HINDI

    Chloe

    1. Create a query that lists the names of departments that have secretaries (use IN and the Secretary table in a subquery with the Department_to_major table in the outer query). Save this query as q8_7a.

    2. Create a query that lists the names of departments (using the Department_to_major table) that do not have secretaries (use NOT IN). Save this query as q8_7b.

    3. Add one more row to the Secretary table that contains <null,'Brenda'> (which you could see, for example, in a situation in which you have hired Brenda but have not yet assigned her to a department).

    4. Recall q8_7a and rerun it. Recall q87_b and rerun it.

      The behavior of NOT..IN when nulls exist may surprise you. If nulls may exist in the subquery, then NOT..IN either should not be used (Chapter 10 shows how to use another predicate, NOT EXISTS, which is a workaround to this problem), or should include AND whatever IS NOT NULL. If you use NOT..IN in a subquery, you must either ensure that nulls will not occur in the subquery or use some other predicate (such as NOT EXISTS). Perhaps the best advice is to avoid NOT..IN unless you cannot figure out another way to solve a problem.

    5. To see a correct answer, add the phrase WHERE dcode IS NOT NULL to the subquery in the IN and NOT..IN cases and run them again.

    Do not delete the Secretary table, because we will revisit this problem in Chapter 10.

  8. Devise a list of course names that are offered in the fall semester in rooms where the capacity is equal to or above the average room size.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.159.178