• What a subquery is
• The justifications of using subqueries
• Examples of subqueries in regular database queries
• Using subqueries with data manipulation commands
• Embedded subqueries
In this hour, you are introduced to the concept of subqueries. Using subqueries enables you to more easily preform complex queries.
A subquery, also known as a nested query, is a query embedded within the WHERE
clause of another query to further restrict data returned by the query. A subquery returns data that is used in the main query as a condition to further restrict the data to be retrieved. Subqueries are employed with the SELECT
, INSERT
, UPDATE
, and DELETE
statements.
You can use a subquery in some cases in place of a join operation by indirectly linking data between the tables based on one or more conditions. When you have a subquery in a query, the subquery is resolved first, and then the main query is resolved according to the condition(s) resolved by the subquery. The results of the subquery process expressions in the WHERE
clause of the main query. You can use the subquery either in the WHERE
clause or the HAVING
clause of the main query. You can use logical and relational operators, such as =
, >
, <
, <>
,!=, IN
, NOT IN
, AND
, OR
, and so on, within the subquery as well as to evaluate a subquery in the WHERE
or HAVING
clause.
The same rules that apply to standard queries also apply to subqueries. You can use join operations, functions, conversions, and other options within a subquery.
Notice the use of indentation in our examples. The use of indentation is merely for readability. The neater your statements are, the easier it is to read and find syntax errors.
Subqueries must follow a few rules:
• Subqueries must be enclosed within parentheses.
• A subquery can have only one column in the SELECT
clause, unless multiple columns are in the main query for the subquery to compare its selected columns.
• You cannot use an ORDER BY
clause in a subquery, although the main query can use an ORDER BY
clause. You can use the GROUP BY
clause to perform the same function as the ORDER BY
clause in a subquery.
• You can only use subqueries that return more than one row with multiple value operators, such as the IN
operator.
• The SELECT
list cannot include references to values that evaluate to a BLOB
, ARRAY
, CLOB
, or NCLOB
.
• You cannot immediately enclose a subquery in a SET
function.
• You cannot use the BETWEEN
operator with a subquery; however, you can use the BETWEEN
operator within the subquery.
The basic syntax for a subquery is as follows:
SELECT COLUMN_NAME
FROM TABLE
WHERE COLUMN_NAME = (SELECT COLUMN_NAME
FROM TABLE
WHERE CONDITIONS);
The following examples show how you can and cannot use the BETWEEN
operator with a subquery. Here is an example of a correct use of BETWEEN
in the subquery:
SELECT COLUMN_NAME
FROM TABLE_A
WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME
FROM TABLE_B)
WHERE VALUE BETWEEN VALUE)
You cannot use BETWEEN
as an operator outside the subquery. The following is an example of an illegal use of BETWEEN
with a subquery:
SELECT COLUMN_NAME
FROM TABLE_A
WHERE COLUMN_NAME BETWEEN VALUE AND (SELECT COLUMN_NAME
FROM TABLE_B)
Subqueries are most frequently used with the SELECT
statement, although you can use them within a data manipulation statement as well. The subquery, when employed with the SELECT
statement, retrieves data for the main query to use.
The basic syntax is as follows:
SELECT COLUMN_NAME [, COLUMN_NAME ]
FROM TABLE1 [, TABLE2 ]
WHERE COLUMN_NAME OPERATOR
(SELECT COLUMN_NAME [, COLUMN_NAME ]
FROM TABLE1 [, TABLE2 ]
[ WHERE ])
The following is an example:
SELECT E.EMP_ID, E.LAST_NAME, E.FIRST_NAME, EP.PAY_RATE
FROM EMPLOYEE_TBL E, EMPLOYEE_PAY_TBL EP
WHERE E.EMP_ID = EP.EMP_ID
AND EP.PAY_RATE < (SELECT PAY_RATE
FROM EMPLOYEE_PAY_TBL
WHERE EMP_ID = '443679012'),
The preceding SQL statement returns the employee identification, last name, first name, and pay rate for all employees who have a pay rate greater than that of the employee with the identification 443679012
. In this case, you do not necessarily know (or care) what the exact pay rate is for this particular employee; you only care about the pay rate for the purpose of getting a list of employees who bring home more than the employee specified in the subquery.
Subqueries are frequently used to place conditions on a query when the exact conditions are unknown. The pay rate for 220984332
was unknown, but the subquery was designed to do the footwork for you.
The next query selects the pay rate for a particular employee. This query is used as the subquery in the following example.
SELECT PAY_RATE
FROM EMPLOYEE_PAY_TBL
WHERE EMP_ID = '220984332';
PAY_RATE
----------
11
1 row selected.
The previous query is used as a subquery in the WHERE
clause of the following query:
SELECT E.EMP_ID, E.LAST_NAME, E.FIRST_NAME, EP.PAY_RATE
FROM EMPLOYEE_TBL E, EMPLOYEE_PAY_TBL EP
WHERE E.EMP_ID = EP.EMP_ID
AND EP.PAY_RATE > (SELECT PAY_RATE
FROM EMPLOYEE_PAY_TBL
WHERE EMP_ID = '220984332'),
EMP_ID LAST_NAME FIRST_NAME PAY_RATE
--------- --------- ----------- ---------
442346889 PLEW LINDA 14.75
443679012 SPURGEON TIFFANY 15
2 rows selected.
The result of the subquery is 11
(shown in the last example), so the last condition of the WHERE
clause is evaluated as
AND EP.PAY_RATE > 11
You did not know the value of the pay rate for the given individual when you executed the query. However, the main query was able to compare each individual’s pay rate to the subquery results.
Remember to use the COMMIT
and ROLLBACK
commands when using DML commands such as the INSERT
statement.
You can also use subqueries in conjunction with Data Manipulation Language (DML) statements. The INSERT
statement is the first instance you examine. It uses the data returned from the subquery to insert into another table. You can modify the selected data in the subquery with any of the character, date, or number functions.
The basic syntax is as follows:
INSERT INTO TABLE_NAME [ (COLUMN1 [, COLUMN2 ]) ]
SELECT [ *|COLUMN1 [, COLUMN2 ]
FROM TABLE1 [, TABLE2 ]
[ WHERE VALUE OPERATOR ]
The following is an example of the INSERT
statement with a subquery:
INSERT INTO RICH_EMPLOYEES
SELECT E.EMP_ID, E.LAST_NAME, E.FIRST_NAME, EP.PAY_RATE
FROM EMPLOYEE_TBL E, EMPLOYEE_PAY_TBL EP
WHERE E.EMP_ID = EP.EMP_ID
AND EP.PAY_RATE > (SELECT PAY_RATE
FROM EMPLOYEE_PAY_TBL
WHERE EMP_ID = '220984332'),
2 rows created.
This INSERT
statement inserts the EMP_ID
, LAST_NAME
, FIRST_NAME
, and PAY_RATE
into a table called RICH_EMPLOYEES
for all records of employees who have a pay rate greater than the pay rate of the employee with identification 220984332
.
You can use subqueries in conjunction with the UPDATE
statement to update single or multiple columns in a table. The basic syntax is as follows:
UPDATE TABLE
SET COLUMN_NAME [, COLUMN_NAME) ] =
(SELECT ]COLUMN_NAME [, COLUMN_NAME) ]
FROM TABLE
[ WHERE ]
Examples showing the use of the UPDATE
statement with a subquery follow. The first query returns the employee identification of all employees who reside in Indianapolis. You can see that four individuals meet this criterion.
SELECT EMP_ID
FROM EMPLOYEE_TBL
WHERE CITY = 'INDIANAPOLIS';
EMP_ID
---------
442346889
313782439
220984332
443679012
4 rows selected.
The first query is used as the subquery in the following statement; it proves how many employee identifications are returned by the subquery. The following is the UPDATE
with the subquery:
UPDATE EMPLOYEE_PAY_TBL
SET PAY_RATE = PAY_RATE * 1.1
WHERE EMP_ID IN (SELECT EMP_ID
FROM EMPLOYEE_TBL
WHERE CITY = 'INDIANAPOLIS'),
4 rows updated.
As expected, four rows are updated. One important thing to notice is that, unlike the example in the first section, this subquery returns multiple rows of data. Because you expect multiple rows to be returned, you use the IN
operator instead of the equal sign. Remember that IN
compares an expression to values in a list. If you had used the equal sign, an error would have been returned.
You can also use subqueries in conjunction with the DELETE
statement. The basic syntax is as follows:
DELETE FROM TABLE_NAME
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
In the following example, you delete the BRANDON GLASS
record from EMPLOYEE_PAY_TBL
. You do not know Brandon’s employee identification number, but you can use a subquery to get his identification number from EMPLOYEE_TBL
, which contains the FIRST_NAME
and LAST_NAME
columns.
DELETE FROM EMPLOYEE_PAY_TBL
WHERE EMP_ID = (SELECT EMP_ID
FROM EMPLOYEE_TBL
WHERE LAST_NAME = 'GLASS'
AND FIRST_NAME = 'BRANDON'),
1 row deleted.
You must check your particular implementation for limits on the number of subqueries, if any, that you can use in a single statement. It might differ between vendors.
You can embed a subquery within another subquery, just as you can embed the subquery within a regular query. When a subquery is used, that subquery is resolved before the main query. Likewise, the lowest level subquery is resolved first in embedded or nested subqueries, working out to the main query.
The basic syntax for embedded subqueries is as follows:
SELECT COLUMN_NAME [, COLUMN_NAME ]
FROM TABLE1 [, TABLE2 ]
WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME
FROM TABLE
WHERE COLUMN_NAME OPERATOR
(SELECT COLUMN_NAME
FROM TABLE
[ WHERE COLUMN_NAME OPERATOR VALUE ]))
The following example uses two subqueries, one embedded within the other. You want to find out what customers have placed orders in which the quantity multiplied by the cost of a single order is greater than the sum of the cost of all products.
SELECT CUST_ID, CUST_NAME
FROM CUSTOMER_TBL
WHERE CUST_ID IN (SELECT O.CUST_ID
FROM ORDERS_TBL O, PRODUCTS_TBL P
WHERE O.PROD_ID = P.PROD_ID
AND O.QTY + P.COST < (SELECT SUM(COST)
FROM
PRODUCTS_TBL));
CUST_ID CUST_NAME
---------- ------------------
090 WENDY WOLF
232 LESLIE GLEASON
287 GAVINS PLACE
43 SCHYLERS NOVELTIES
432 SCOTTYS MARKET
560 ANDYS CANDIES
6 rows selected.
Do not forget the use of the WHERE
clause with the UPDATE
and DELETE
statements. All rows are updated or deleted from the target table if the WHERE
clause is not used. You can utilize a SELECT
statement with the WHERE
clause first to ensure that you are modifying the correct rows. See Hour 5, “Manipulating Data.”
Six rows that meet the criteria of both subqueries were selected.
The following two examples show the results of each of the subqueries to aid your understanding of how the main query was resolved:
SELECT SUM(COST) FROM PRODUCTS_TBL;
SUM(COST)
----------
138.08
1 row selected.
SELECT O.CUST_ID
FROM ORDERS_TBL O, PRODUCTS_TBL P
WHERE O.PROD_ID = P.PROD_ID
AND O.QTY + P.COST > 138.08;
CUST_ID
-------
43
287
2 rows selected.
In essence, the main query, after the substitution of the second subquery, is evaluated as shown in the following example:
SELECT CUST_ID, CUST_NAME
FROM CUSTOMER_TBL
WHERE CUST_ID IN (SELECT O.CUST_ID
FROM ORDERS_TBL O, PRODUCTS_TBL P
WHERE O.PROD_ID = P.PROD_ID
AND O.QTY + P.COST > 138.08);
The following shows how the main query is evaluated after the substitution of the first subquery:
SELECT CUST_ID, CUST_NAME
FROM CUSTOMER_TBL
WHERE CUST_ID IN (287,43);
The following is the final result:
CUST_ID CUST_NAME
---------- ------------------
43 SCHYLERS NOVELTIES
287 GAVINS PLACE
2 rows selected.
The use of multiple subqueries results in slower response time and might result in reduced accuracy of the results due to possible mistakes in the statement coding.
Correlated subqueries are common in many SQL implementations. The concept of correlated subqueries is discussed as an ANSI-standard SQL topic and is covered briefly in this hour. A correlated subquery is a subquery that is dependent upon information in the main query. This means that tables in a subquery can be related to tables in the main query.
In the following example, the table join between CUSTOMER_TBL
and ORDERS_TBL
in the subquery is dependent on the alias for CUSTOMER_TBL (C)
in the main query. This query returns the name of all customers who have ordered more than 10 units of one or more items.
SELECT C.CUST_NAME
FROM CUSTOMER_TBL C
WHERE 10 < (SELECT SUM(O.QTY)
FROM ORDERS_TBL O
WHERE O.CUST_ID = C.CUST_ID);
CUST_NAME
------------------
SCOTTYS MARKET
SCHYLERS NOVELTIES
MARYS GIFT SHOP
3 rows selected.
You can extract and slightly modify the subquery from the previous statement in the next statement to show you the total quantity of units ordered for each customer, allowing the previous results to be verified:
SELECT C.CUST_NAME, SUM(O.QTY)
FROM CUSTOMER_TBL C,
ORDERS_TBL O
WHERE C.CUST_ID = O.CUST_ID
GROUP BY C.CUST_NAME;
CUST_NAME SUM(O.QTY)
----------------------- ----------
ANDYS CANDIES 1
GAVINS PLACE 10
LESLIE GLEASON 1
MARYS GIFT SHOP 100
SCHYLERS NOVELTIES 25
SCOTTYS MARKET 20
WENDY WOLF 2
7 rows selected.
The GROUP BY
clause in this example is required because another column is being selected with the aggregate function SUM
. This gives you a sum for each customer. In the original subquery, a GROUP BY
clause is not required because SUM
achieves a total for the entire query, which is run against the record for each customer.
Subqueries do have performance implications when used within a query. You must consider those implications prior to implementing them in a production environment. Consider that a subquery must be evaluated prior to the main part of the query, so the time that it takes to execute the subquery has a direct effect on the time it takes for the main query to execute. Let’s look at our previous example:
In the case of a correlated subquery, you must reference the table in the main query before you can resolve the subquery.
SELECT CUST_ID, CUST_NAME
FROM CUSTOMER_TBL
WHERE CUST_ID IN (SELECT O.CUST_ID
FROM ORDERS_TBL O, PRODUCTS_TBL P
WHERE O.PROD_ID = P.PROD_ID
AND O.QTY + P.COST < (SELECT SUM(COST)
FROM
PRODUCTS_TBL));
Imagine what would happen if PRODUCTS_TBL
contained a couple thousand product lines and ORDERS_TBL
contained a few million lines of customer orders. The resulting effect of having to do a SUM
across PRODUCTS_TBL
and then join it with ORDERS_TBL
could slow the query down quite considerably. So always remember to evaluate the effect that using a subquery has on performance when deciding on a course of action to take for getting information out of the database.
By simple definition and general concept, a subquery is a query that is performed within another query to place further conditions on a query. You can use a subquery in an SQL statement’s WHERE
clause or HAVING
clause. Queries are typically used within other queries (Data Query Language), but you can also use them in the resolution of DML statements such as INSERT
, UPDATE
, and DELETE
. All basic rules for DML apply when using subqueries with DML commands.
The subquery’s syntax is virtually the same as that of a standalone query, with a few minor restrictions. One of these restrictions is that you cannot use the ORDER BY
clause within a subquery; you can use a GROUP BY
clause, however, which renders virtually the same effect. Subqueries are used to place conditions that are not necessarily known for a query, providing more power and flexibility with SQL.
Q. In the examples of subqueries, I noticed quite a bit of indentation. Is this necessary in the syntax of a subquery?
A. Absolutely not. The indentation is used merely to break the statement into separate parts, making the statement more readable and easier to follow.
Q. Is there a limit on the number of embedded subqueries that can be used in a single query?
A. Limitations such as the number of embedded subqueries allowed and the number of tables joined in a query are specific to each implementation. Some implementations might not have limits, although the use of too many embedded subqueries could drastically hinder SQL statement performance. Most limitations are affected by the actual hardware, CPU speed, and system memory available, although there are many other considerations.
Q. It seems that debugging a query with subqueries can prove to be confusing, especially with embedded subqueries. What is the best way to debug a query with subqueries?
A. The best way to debug a query with subqueries is to evaluate the query in sections. First evaluate the lowest-level subquery, and then work your way to the main query (the same way the database evaluates the query). When you evaluate each subquery individually, you can substitute the returned values for each subquery to check your main query’s logic. An error with a subquery often results from the use of the operator that evaluates the subquery, such as (=)
, IN
, >
, <
, and so on.
The following workshop is composed of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises are intended to afford you the opportunity to apply the concepts discussed during the current hour, as well as build upon the knowledge acquired in previous hours of study. Please take time to complete the quiz questions and exercises before continuing. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.
1. What is the function of a subquery when used with a SELECT
statement?
2. Can you update more than one column when using the UPDATE
statement in conjunction with a subquery?
3. Do the following have the correct syntax? If not, what is the correct syntax?
a. SELECT CUST_ID, CUST_NAME
FROM CUSTOMER_TBL
WHERE CUST_ID =
(SELECT CUST_ID
FROM ORDERS_TBL
WHERE ORD_NUM = '16C17'),
b. SELECT EMP_ID, SALARY
FROM EMPLOYEE_PAY_TBL
WHERE SALARY BETWEEN '20000'
AND (SELECT SALARY
FROM EMPLOYEE_ID
WHERE SALARY = '40000'),
c. UPDATE PRODUCTS_TBL
SET COST = 1.15
WHERE CUST_ID =
(SELECT CUST_ID
FROM ORDERS_TBL
WHERE ORD_NUM = '32A132'),
4. What would happen if you ran the following statement?
DELETE FROM EMPLOYEE_TBL
WHERE EMP_ID IN
(SELECT EMP_ID
FROM EMPLOYEE_PAY_TBL);
1. Write the SQL code for the requested subqueries, and compare your results to ours. Use the following tables to complete the exercises:
EMPLOYEE_TBL
EMP_ID VARCHAR(9) NOT NULL primary key
LAST_NAME VARCHAR(15) NOT NULL
FIRST_NAME VARCHAR(15) NOT NULL
MIDDLE_NAME VARCHAR(15)
ADDRESS VARCHAR(30) NOT NULL
CITY VARCHAR(15) NOT NULL
STATE VARCHAR(2) NOT NULL
ZIP INTEGER(5) NOT NULL
PHONE VARCHAR(10)
PAGER VARCHAR(10)
EMPLOYEE_PAY_TBL
EMP_ID VARCHAR(9) NOT NULL primary key
POSITION VARCHAR(15) NOT NULL
DATE_HIRE DATETIME
PAY_RATE DECIMAL(4,2) NOT NULL
DATE_LAST_RAISE DATETIME
CONSTRAINT EMP_FK FOREIGN KEY (EMP_ID_ REFERENCES
EMPLOYEE_TBL (EMP_ID)
CUSTOMER_TBL
CUST_ID VARCHAR(10) NOT NULL primary key
CUST_NAME VARCHAR(30) NOT NULL
CUST_ADDRESS VARCHAR(20) NOT NULL
CUST_CITY VARCHAR(15) NOT NULL
CUST_STATE VARCHAR(2) NOT NULL
CUST_ZIP INTEGER(5) NOT NULL
CUST_PHONE INTEGER(10)
CUST_FAX INTEGER(10)
ORDERS_TBL
ORD_NUM VARCHAR(10) NOT NULL primary key
CUST_ID VARCHAR(10) NOT NULL
PROD_ID VARCHAR(10) NOT NULL
QTY INTEGER(6) NOT NULL
ORD_DATE DATETIME
PRODUCTS_TBL
PROD_ID VARCHAR(10) NOT NULL primary key
PROD_DESC VARCHAR(40) NOT NULL
COST DECIMAL(6,2) NOT NULL
2. Using a subquery, write an SQL statement to update CUSTOMER_TBL
. Find the customer with the order number 23E934, contained
in the field ORD_NUM,
and change the customer name to DAVIDS MARKET
.
3. Using a subquery, write a query that returns the names of all employees who have a pay rate greater than JOHN DOE
, whose employee identification number is 343559876
.
4. Using a subquery, write a query that lists all products that cost more than the average cost of all products.
18.225.57.164