The purpose of this chapter is to demonstrate the use of subqueries. Subqueries may often be used as alternatives to joins. There are two main issues to consider when choosing between subqueries and joins (and other techniques for combining tables). First, you must consider how to get the information. By understanding the limitations of joins and subqueries (as well as sets and other table-combining techniques), you will increase your choices as to how to get information from the database. Second, you must also consider performance. You usually a have choice of how to get multi-table information—joins, sets, subqueries, views, and so forth. In larger databases, you need to be flexible and consider other choices if a query performs poorly and/or if the query is done often.
Although set operations logically are also viable choices for retrieving data from multiple tables, set operations (discussed in Chapter 7) are less common and usually less efficient than joins and subqueries.
Suppose that a query requests a list of names and numbers of students (which are in the Student
table in our Student_course
database) who have made As or Bs in any course (grades are in the Grade_report
table in our Student_course
database). You can complete this query as either a subquery or a join. As a subquery with an IN
predicate, it will take the following form:
SELECT Student.sname, Student.stno FROM Student WHERE "link to Grade_report" IN ("link to Student" - subquery involving Grade_report)
In this format, the part of the query that contains:
SELECT Student.sname, Student.stno FROM Student WHERE "link to Grade_report"
is said to be the outer query. The part of the query that contains:
("link to Student" - subquery involving Grade_report)
is the inner query.
The link between the Student
table and the Grade_report
table is the student number. In the Student
table, the appropriate column is stno
, and in the Grade_report
table, it is student_number
. When linking the tables in the subquery with an IN
predicate, the linking columns are all that can be mentioned in the WHERE..IN
and in the result set of the subquery. Thus, the statement with a subquery is as follows:
SELECT Student.sname, Student.stno FROM Student WHERE Student.stno IN (SELECT gr.student_number FROM Grade_report gr WHERE gr.grade = 'B' OR gr.grade = 'A') ORDER BY Student.stno
The part of the query before the IN
is often called the outer query. The part of the query after the IN
is called the inner query.
This query produces the following output (31 rows):
sname stno -------------------- ----- Lineas 2 Mary 3 Zelda 5 Ken 6 Mario 7 Brenda 8 Richard 10 Kelly 13 Lujack 14 Reva 15 Harley 19 Donald 20 Chris 24 Lynette 34 Susan 49 Hillary 121 Phoebe 122 Holly 123 Sadie 125 Jessica 126 Steve 127 Cedric 129 George 132 Jerry 142 Cramer 143 Fraiser 144 Francis 146 Smithly 147 Sebastian 148 Lindsay 155 Stephanie 157 (31 row(s) affected)
An alternative way to perform the preceding query would be to use a join instead of a subquery, as follows:
SELECT Student.sname, Student.stno FROM Student, Grade_report gr WHERE Student.stno = gr.student_number AND (gr.grade = 'B' OR gr.grade = 'A')
This query produces 67 rows of output (of which we show the first 15 rows here):
sname stno --------- ------ Lineas 2 Lineas 2 Lineas 2 Lineas 2 Mary 3 Mary 3 Mary 3 Mary 3 Mary 3 Mary 3 Brenda 8 Brenda 8 Brenda 8 Richard 10 Kelly 13 . . . (67 row(s) affected)
Now, the question is why the join has 67 rows of output instead of 31 rows of output (produced by the subquery).
When the join version is used to combine tables, any Student-Grade_report
row (tuple) that has equal student numbers and a grade of A or B is selected. Thus, you should expect many duplicate names in the output. To get the result without duplicates, add the qualifier DISTINCT
to the join query as follows:
SELECT DISTINCT Student.sname, Student.stno FROM Student, Grade_report AS gr WHERE Student.stno = gr.student_number AND (gr.grade = 'B' OR gr.grade = 'A')
This query produces the following output (31 rows):
sname stno -------------------- ------ Lineas 2 Mary 3 Zelda 5 Ken 6 Mario 7 Brenda 8 Richard 10 Kelly 13 Lujack 14 Reva 15 Harley 19 Donald 20 Chris 24 Lynette 34 Susan 49 Hillary 121 Phoebe 122 Holly 123 Sadie 125 Jessica 126 Steve 127 Cedric 129 George 132 Jerry 142 Cramer 143 Fraiser 144 Francis 146 Smithly 147 Sebastian 148 Lindsay 155 Stephanie 157 (31 row(s) affected)
When DISTINCT
is used, internal sorting is performed before the result set is displayed. Such internal sorting may decrease response time for a query.
In the subquery version of the query, duplication of names does not occur in the output. This is so because you are setting up a set (the subquery) from which you will choose names—a given name is either in the subquery set or it is not. Remember that the student number (stno
) is unique in the Student
table.
Also, the question of which is more efficient, the join or the subquery, depends on which SQL language and database you are using. Without using extra tools, one way to test alternatives is to try the queries on the data or a subset of the data. Database systems such as SQL Server 2005 provide ways (tools) to find out how queries are executed.
When a column from a table needs to be in the result set, that table has to be in the outer query. If two tables are being used, and if columns from both tables have to be in the result set, a join is necessary. This type of join cannot be turned into a subquery, because information from both tables has to be in the result set. But if the result set does not need the columns from more than one table, then the join can be turned into a subquery. The other tables can be included such that the filtering conditions can be in the subquery (or inner query), and the table that has the needed result set columns is in the outer query.
Consider this example. Our original query (the first query discussed in this chapter), requested the list of names and student numbers of students who made As or Bs in any course. Student names and numbers are both in the Student
table; the Grade_report
table is needed only as a filter, so we could write this as a subquery, and also turn it into a join.
Now, if this original query had asked for output from the Grade_report
table also, such as, “list the names, numbers, and grades of all students who have made As or Bs,” the query would be asking for information from both the Student
and Grade_report
tables. In this case, you would have to join the two tables to get the information; you could not just query the Grade_report
table, because that table has no names in it. Similarly, the Student
table contains no grades. So you would not be able to write this as a subquery. Refer again to the original query example:
SELECT Student.sname, Student.stno FROM Student WHERE Student.stno IN (SELECT gr.student_number FROM Grade_report gr WHERE gr.grade = 'B' OR gr.grade = 'A') ORDER BY Student.stno
This query asks for information only from the Student
table (student names and numbers). Although the query used the Grade_report
table, nothing from the Grade_report
table was in the outer result set. Again, the Grade_report
table is needed only as a filter (to get the student numbers of those who have As and Bs); hence we were able to write this as a subquery.
The following join query asks for information from both the Student
and Grade_report
tables (a result set that lists both names and grades of all students who have made As or Bs in any course):
SELECT DISTINCT Student.sname, gr.grade FROM Student, Grade_report gr WHERE Student.stno = gr.student_number AND (gr.grade = 'B' OR gr.grade = 'A')
This query produces 41 rows of output (of which we show the first 25 rows here):
sname grade -------------------- ----- Brenda A Brenda B Cedric A Cedric B Chris B Cramer B Donald A Fraiser B Francis B George B Harley B Hillary B Holly A Holly B Jerry A Jessica A Jessica B Kelly B Ken B Lindsay B Lineas A Lineas B Lujack A Lujack B Lynette A . . . (41 row(s) affected)
As this example demonstrates, if information from a table is needed in a result set, then that table cannot be buried in a subquery—it must be in the outer query.
The purpose of this section is to further demonstrate several queries that will and will not allow the use of the subquery. As we have discussed, some joins can be expressed as subqueries whereas others cannot. Further, all subqueries with the IN
predicate can be re-formed as a join. Whether you can use a subquery depends on the final, outer result set. Some more examples will help clarify this point.
Find the names of all the departments that offer a course with INTRO
in the title. To formulate our query, we need to use the Course
table (to find the course names) and the Department_to_major
table (to find the names of the departments).
Begin by viewing the column names in the tables.
If you have forgotten how to view the column names of a table, refer to Figure 1-21.
Figure 8-1 gives the column names in the Course
table:
Figure 8-2 gives the column names of the Department_to_major
table:
Our query needs a department name (dname
) from the Department_to_major
table. We also need course information from the Course
table, because our query depends on a course name; however, no course information appears in the result set. We did not ask for the names of the courses, just that they have INTRO
in the title. The result set asks only for department names. We can find this result by using a subquery, with the Department_to_major
table as the outer query, because all the information in the result set is contained in the outer query. The query would be as follows:
SELECT d2m.dname FROM Department_to_major d2m WHERE d2m.dcode IN (SELECT Course.offering_dept FROM Course WHERE Course.course_name LIKE '%INTRO%')
which produces the following output:
dname -------------------- Computer Science Political Science Chemistry (3 row(s) affected)
List the student name, student major code, and section identifier of students who earned Cs in courses taught by Professor Hermano (HERMANO
).
First, we determine which tables are needed. We want to find the student name and major code, and a section identifier for courses taken, so we need the Student
and Grade_report
tables for the result set. We will need to use the Section
table for a filter. The instructor does not appear in the result set. Again, it is a good idea to look at the column names in each of the tables first.
Figure 8-3 gives the column names of the Student
table.
Figure 8-4 gives the column names of the Grade_report
table.
Figure 8-5 gives the column names of the Section
table.
After we have determined which tables we need, we have to determine where the columns that are needed in the result set are located. We need to get the names and major codes from the Student
table, and the section identifiers from the Grade_report
table. So the result set part of the query (the outer query) must contain the Student
and Grade_report
tables. The rest of the query can contain any other tables that we need to locate the columns. The resulting query may look like this (a combination of a join and a subquery):
SELECT s.sname, s.major, g.section_id FROM Student s, Grade_report g WHERE g.student_number = s.stno AND g.grade = 'C' AND g.section_id IN (SELECT t.section_id FROM Section t WHERE t.instructor LIKE 'HERMANO')
which produces the following output:
sname major section_id -------------------- ----- ---------- Richard ENGL 126 (1 row(s) affected)
The previous query could also have been done as a three-table join, as follows:
SELECT s.sname, s.major, t.section_id FROM Student s, Grade_report g, Section t WHERE s.stno = g.student_number AND g.section_id =t.section_id AND g.grade='C' AND t.instructor LIKE 'HERMANO'
List the name and major code of students who earned Cs in courses taught by Professor King (KING
).
Again, we first need to determine which tables are needed. We need to collect student names and major codes in the result set and we need the Grade_report
and Section
tables for filtering conditions. (You viewed the columns available in each of these tables in the preceding example.) Next, we need to determine where the columns that are needed in the result set are located. In this example, they are all in the Student
table.
Because the only table needed in the outer query is the Student
table, we can structure the query in any of the following ways:
Student
join Grade_report
join Section
[three-table join]
Student
subquery (Grade_report
join Section
) [Student outer, join in subquery]
Student
join Grade_report
subquery (Section
) [similar to Example 2 but with a different result set]
Student
(subquery Grade_report
(subquery Section
)) [a three-level subquery]
Each of these queries produces the same result set with different efficiencies. We’ll study them further in the exercises at the end of the chapter.
In this section, we look at examples that demonstrate the use of subqueries with comparison operators. These examples are based on the Room
table, which has the following data:
BLDG ROOM CAPACITY OHEAD ----- ----- -------- ----- 13 101 85 Y 36 123 35 N 58 114 60 NULL 79 179 35 Y 79 174 22 Y 58 112 40 NULL 36 122 25 N 36 121 25 N 36 120 25 N 58 110 NULL Y (10 row(s) affected)
In previous chapters, you have seen SELECT
s with conditions like the following:
SELECT * FROM Room WHERE capacity = 25
In this example, 25
is a constant and =
is a comparison operator. The constant can be replaced by a subquery, and the operator can be any of the comparison operators (=
, <>
, <
, >
, <=
, or >=
). For example, we could devise a query to tell us which classrooms have a below-average capacity by computing the average in a subquery and using a comparison operator, like this:
SELECT * FROM Room WHERE capacity < (SELECT AVG(capacity) FROM Room)
This query produces the following six rows of output, showing six rooms with below-average capacity:
BLDG ROOM CAPACITY OHEAD ------ ------ -------- ----- 36 120 25 N 36 121 25 N 36 122 25 N 36 123 35 N 79 174 22 Y 79 179 35 Y Warning: Null value is eliminated by an aggregate or other SET operation. (6 row(s) affected)
The only problem with using subqueries in this fashion is that the subquery must return only one row. If an aggregate function is applied to a table in the subquery in this fashion, you will always get only one row—even if there is a WHERE
clause that excludes all rows, the subquery returns one row with a null value. For example, if we were to change the preceding query to the following and force multiple rows in the subquery,
SELECT * FROM Room WHERE capacity < (SELECT AVG(capacity) FROM Room WHERE bldg = 99)
we would get:
BLDG ROOM CAPACITY OHEAD ------ ------ -------- ----- (0 row(s) affected)
We get no rows selected because there is no bldg = 99
. If we were to change the query to the following:
SELECT * FROM Room WHERE bldg = (SELECT bldg FROM Room WHERE capacity > 10)
we would get the following error message:
BLDG ROOM CAPACITY OHEAD ------ ------ -------- ----- Msg 512, Level 16, State 1, Line 1 Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <=, >, >= or when the subquery is used as an expression.
When using comparison operators, only single values are acceptable from the subquery. Again, to ensure that we get only one row in the subquery and hence a workable query, we can use an aggregate with no GROUP BY
or HAVING
(to be discussed in Chapter 9).
In this chapter, we have introduced the subquery. We have given examples of situations in which it would be good to use subqueries, cases where subqueries could be turned into joins, and cases where they cannot be turned into joins. After reading this chapter, you should have a better appreciation for subqueries and joins.
What is a subquery?
Which part of the query/subquery is considered the inner query, and which part is considered the outer query?
Can a subquery always be done as a join? Why or why not?
When writing a query that will have a subquery, how do you determine which table/tables will go in the outer query?
Which predicate can usually be reformulated into a join?
When using operators, are many values acceptable from a result of a subquery?
What can you do to insure a working subquery?
Unless specified otherwise, use the Student_course
database to answer the following questions. Also, use appropriate column headings when displaying your output.
Use the techniques from this chapter to construct and execute the following queries:
Find the student numbers of students who have earned As or Bs in courses taught in the fall semester. Do this in two ways: first using a subquery, and then using a join.
Find all students who took a course offered by the Accounting department. List the student name and student number, the course name, and the grade in that course. (Hint: Begin with Department_to_major
and use an appropriate WHERE
.) Note that this task cannot be done with a multilevel subquery. Why?
For every students who is a sophomore (class
= 2), find the name and the name of the department that includes the student’s major.
Find the names of the departments that offer courses at the junior or senior levels (either one) but not at the freshman level. The course level is the first digit after the prefix; for example, AAAA3yyy
is a junior course, and so on.
Hint: Begin by creating the outer query—the names of departments that offer courses at the junior or senior levels. Save this query as q8_4. Then, construct the subquery—a list of departments that offer courses at the freshman level. Save the subquery as a view. Examine both lists of departments. When you have the outer query and the subquery results, recall the original query that you saved (q8_4) and add the subquery. Check your result with the department lists you just generated. Redo the last part of the experiment with your view. You should get the same result.
Find the names of courses that are prerequisites for other courses. List the course number and name, and the number and name of the prerequisite.
List the names of instructors who teach courses that have other than three-hour credits. Do the problem in two ways: once with IN
and once with NOT..IN
.
Create a table called Secretary
with the columns dcode
(of data type CHAR(4)
) for department code and name
(of data type VARCHAR(20)
) for the secretary name. Populate the table as follows:
Secretary | |
dCode |
name |
ACCT |
Beryl |
COSC |
Kaitlyn |
ENGL |
David |
HIST |
Christina |
BENG |
Fred |
HINDI |
Chloe |
Create a query that lists the names of departments that have secretaries (use IN
and the Secretary
table in a subquery with the Department_to_major
table in the outer query). Save this query as q8_7a.
Create a query that lists the names of departments (using the Department_to_major
table) that do not have secretaries (use NOT IN
). Save this query as q8_7b.
Add one more row to the Secretary
table that contains <null,'Brenda'>
(which you could see, for example, in a situation in which you have hired Brenda but have not yet assigned her to a department).
Recall q8_7a and rerun it. Recall q87_b and rerun it.
The behavior of NOT..IN
when nulls exist may surprise you. If nulls may exist in the subquery, then NOT..IN
either should not be used (Chapter 10 shows how to use another predicate, NOT EXISTS
, which is a workaround to this problem), or should include AND
whatever IS NOT NULL
. If you use NOT..IN
in a subquery, you must either ensure that nulls will not occur in the subquery or use some other predicate (such as NOT EXISTS
). Perhaps the best advice is to avoid NOT..IN
unless you cannot figure out another way to solve a problem.
To see a correct answer, add the phrase WHERE dcode IS NOT NULL
to the subquery in the IN
and NOT..IN
cases and run them again.
Do not delete the Secretary
table, because we will revisit this problem in Chapter 10.
Devise a list of course names that are offered in the fall semester in rooms where the capacity is equal to or above the average room size.
18.119.159.178