Subqueries are a powerful tool that you can use in all four SQL data statements. In this chapter, I’ll show you how subqueries can be used to filter data, generate values, and construct temporary data sets. After a little experimentation, I think you’ll agree that subqueries are one of the most powerful features of the SQL language.
A subquery is a query contained within another SQL statement (which I refer to as the containing statement for the rest of this discussion). A subquery is always enclosed within parentheses, and it is usually executed prior to the containing statement. Like any query, a subquery returns a result set that may consist of:
A single row with a single column
Multiple rows with a single column
Multiple rows having multiple columns
The type of result set returned by the subquery determines how it may be used and which operators the containing statement may use to interact with the data the subquery returns. When the containing statement has finished executing, the data returned by any subqueries is discarded, making a subquery act like a temporary table with statement scope (meaning that the server frees up any memory allocated to the subquery results after the SQL statement has finished execution).
You already saw several examples of subqueries in earlier chapters, but here’s a simple example to get started:
mysql> SELECT customer_id, first_name, last_name -> FROM customer -> WHERE customer_id = (SELECT MAX(customer_id) FROM customer); +-------------+------------+-----------+ | customer_id | first_name | last_name | +-------------+------------+-----------+ | 599 | AUSTIN | CINTRON | +-------------+------------+-----------+ 1 row in set (0.27 sec)
In this example, the subquery returns the maximum value found in the customer_id
column in the customer
table, and the containing statement then returns data about that customer. If you are ever confused about what a subquery is doing, you can run the subquery by itself (without the parentheses) to see what it returns. Here’s the subquery from the previous example:
mysql> SELECT MAX(customer_id) FROM customer; +------------------+ | MAX(customer_id) | +------------------+ | 599 | +------------------+ 1 row in set (0.00 sec)
The subquery returns a single row with a single column, which allows it to be used as one of the expressions in an equality condition (if the subquery returned two or more rows, it could be compared to something but could not be equal to anything, but more on this later). In this case, you can take the value the subquery returned and substitute it into the righthand expression of the filter condition in the containing query, as in:
mysql> SELECT customer_id, first_name, last_name -> FROM customer -> WHERE customer_id = 599; +-------------+------------+-----------+ | customer_id | first_name | last_name | +-------------+------------+-----------+ | 599 | AUSTIN | CINTRON | +-------------+------------+-----------+ 1 row in set (0.00 sec)
The subquery is useful in this case because it allows you to retrieve information about the customer with the highest ID in a single query, rather than retrieving the maximum customer_id
using one query and then writing a second query to retrieve the desired data from the customer
table. As you will see, subqueries are useful in many other situations as well, and may become one of the most powerful tools in your SQL toolkit.
Along with the differences noted previously regarding the type of result set returned by a subquery (single row/column, single row/multicolumn, or multiple columns), you can use another feature to differentiate subqueries; some subqueries are completely self-contained (called noncorrelated subqueries), while others reference columns from the containing statement (called correlated subqueries). The next several sections explore these two subquery types and show the different operators that you can employ to interact with them.
Now that you have learned about the different types of subqueries and the different operators that you can employ to interact with the data returned by subqueries, it’s time to explore the many ways in which you can use subqueries to build powerful SQL statements. The next three sections demonstrate how you may use subqueries to construct custom tables, to build conditions, and to generate column values in result sets.
Back in Chapter 3, I stated that the from
clause of a select
statement contains the tables to be used by the query. Since a subquery generates a result set containing rows and columns of data, it is perfectly valid to include subqueries in your from
clause along with tables. Although it might, at first glance, seem like an interesting feature without much practical merit, using subqueries alongside tables is one of the most powerful tools available when writing queries. Here’s a simple example:
mysql> SELECT c.first_name, c.last_name, -> pymnt.num_rentals, pymnt.tot_payments -> FROM customer c -> INNER JOIN -> (SELECT customer_id, -> count(*) num_rentals, sum(amount) tot_payments -> FROM payment -> GROUP BY customer_id -> ) pymnt -> ON c.customer_id = pymnt.customer_id; +-------------+--------------+-------------+--------------+ | first_name | last_name | num_rentals | tot_payments | +-------------+--------------+-------------+--------------+ | MARY | SMITH | 32 | 118.68 | | PATRICIA | JOHNSON | 27 | 128.73 | | LINDA | WILLIAMS | 26 | 135.74 | | BARBARA | JONES | 22 | 81.78 | | ELIZABETH | BROWN | 38 | 144.62 | ... | TERRENCE | GUNDERSON | 30 | 117.70 | | ENRIQUE | FORSYTHE | 28 | 96.72 | | FREDDIE | DUGGAN | 25 | 99.75 | | WADE | DELVALLE | 22 | 83.78 | | AUSTIN | CINTRON | 19 | 83.81 | +-------------+--------------+-------------+--------------+ 599 rows in set (0.03 sec)
In this example, a subquery generates a list of customer IDs along with the number of film rentals and the total payments. Here’s the result set generated by the subquery:
mysql> SELECT customer_id, count(*) num_rentals, sum(amount) tot_payments -> FROM payment -> GROUP BY customer_id; +-------------+-------------+--------------+ | customer_id | num_rentals | tot_payments | +-------------+-------------+--------------+ | 1 | 32 | 118.68 | | 2 | 27 | 128.73 | | 3 | 26 | 135.74 | | 4 | 22 | 81.78 | ... | 596 | 28 | 96.72 | | 597 | 25 | 99.75 | | 598 | 22 | 83.78 | | 599 | 19 | 83.81 | +-------------+-------------+--------------+ 599 rows in set (0.03 sec)
The subquery is given the name pymnt
and is joined to the customer
table via the customer_id
column. The containing query then retrieves the customer’s name from the customer
table, along with the summary columns from the pymnt
subquery.
Subqueries used in the from
clause must be noncorrelated1; they are executed first, and the data is held in memory until the containing query finishes execution. Subqueries offer immense flexibility when writing queries, because you can go far beyond the set of available tables to create virtually any view of the data that you desire, and then join the results to other tables or subqueries. If you are writing reports or generating data feeds to external systems, you may be able to do things with a single query that used to demand multiple queries or a procedural language to accomplish.
Along with using subqueries to summarize existing data, you can use subqueries to generate data that doesn’t exist in any form within your database. For example, you may wish to group your customers by the amount of money spent on film rentals, but you want to use group definitions that are not stored in your database. For example, let’s say you want to sort your customers into the groups shown in Table 9-1.
Group name | Lower limit | Upper limit |
---|---|---|
Small Fry |
0 |
$74.99 |
Average Joes |
$75 |
$149.99 |
Heavy Hitters |
$150 |
$9,999,999.99 |
To generate these groups within a single query, you will need a way to define these three groups. The first step is to define a query that generates the group definitions:
mysql> SELECT 'Small Fry' name, 0 low_limit, 74.99 high_limit -> UNION ALL -> SELECT 'Average Joes' name, 75 low_limit, 149.99 high_limit -> UNION ALL -> SELECT 'Heavy Hitters' name, 150 low_limit, 9999999.99 high_limit; +---------------+-----------+------------+ | name | low_limit | high_limit | +---------------+-----------+------------+ | Small Fry | 0 | 74.99 | | Average Joes | 75 | 149.99 | | Heavy Hitters | 150 | 9999999.99 | +---------------+-----------+------------+ 3 rows in set (0.00 sec)
I have used the set operator union all
to merge the results from three separate queries into a single result set. Each query retrieves three literals, and the results from the three queries are put together to generate a result set with three rows and three columns. You now have a query to generate the desired groups, and you can place it into the from
clause of another query to generate your customer groups:
mysql> SELECT pymnt_grps.name, count(*) num_customers -> FROM -> (SELECT customer_id, -> count(*) num_rentals, sum(amount) tot_payments -> FROM payment -> GROUP BY customer_id -> ) pymnt -> INNER JOIN -> (SELECT 'Small Fry' name, 0 low_limit, 74.99 high_limit -> UNION ALL -> SELECT 'Average Joes' name, 75 low_limit, 149.99 high_limit -> UNION ALL -> SELECT 'Heavy Hitters' name, 150 low_limit, 9999999.99 high_limit -> ) pymnt_grps -> ON pymnt.tot_payments -> BETWEEN pymnt_grps.low_limit AND pymnt_grps.high_limit -> GROUP BY pymnt_grps.name; +---------------+---------------+ | name | num_customers | +---------------+---------------+ | Average Joes | 515 | | Heavy Hitters | 46 | | Small Fry | 38 | +---------------+---------------+ 3 rows in set (0.03 sec)
The from
clause contains two subqueries; the first subquery, named pymnt
, returns the total number of film rentals and total payments for each customer, while the second subquery, named pymnt_grps
, generates the three customer groupings. The two subqueries are joined by finding which of the 3 groups each customer belongs to, and the rows are then grouped by the Group Name in order to count the number of customers in each group.
Of course, you could simply decide to build a permanent (or temporary) table to hold the group definitions instead of using a subquery. Using that approach, you would find your database to be littered with small special-purpose tables after awhile, and you wouldn’t remember the reason for which most of them were created. Using subqueries, however, you will be able to adhere to a policy where tables are added to a database only when there is a clear business need to store new data.
Let’s say that you want to generate a report showing each customer’s name, along with their city, the total number of rentals, and the total payment amount. You could accomplish this by joining the payment
, customer
, address
, and city
tables, and then grouping on the customer’s first and last names:
mysql> SELECT c.first_name, c.last_name, ct.city, -> sum(p.amount) tot_payments, count(*) tot_rentals -> FROM payment p -> INNER JOIN customer c -> ON p.customer_id = c.customer_id -> INNER JOIN address a -> ON c.address_id = a.address_id -> INNER JOIN city ct -> ON a.city_id = ct.city_id -> GROUP BY c.first_name, c.last_name, ct.city; +-------------+------------+-----------------+--------------+-------------+ | first_name | last_name | city | tot_payments | tot_rentals | +-------------+------------+-----------------+--------------+-------------+ | MARY | SMITH | Sasebo | 118.68 | 32 | | PATRICIA | JOHNSON | San Bernardino | 128.73 | 27 | | LINDA | WILLIAMS | Athenai | 135.74 | 26 | | BARBARA | JONES | Myingyan | 81.78 | 22 | ... | TERRENCE | GUNDERSON | Jinzhou | 117.70 | 30 | | ENRIQUE | FORSYTHE | Patras | 96.72 | 28 | | FREDDIE | DUGGAN | Sullana | 99.75 | 25 | | WADE | DELVALLE | Lausanne | 83.78 | 22 | | AUSTIN | CINTRON | Tieli | 83.81 | 19 | +-------------+------------+-----------------+--------------+-------------+ 599 rows in set (0.06 sec)
This query returns the desired data, but if you look at the query closely, you will see that the customer
, address
, and city
tables are needed only for display purposes, and that the payment
table has everything needed to generate the groupings (customer_id
and amount
). Therefore, you could separate out the task of generating the groups into a subquery, and then join the other three tables to the table generated by the subquery to achieve the desired end result. Here’s the grouping subquery:
mysql> SELECT customer_id, -> count(*) tot_rentals, sum(amount) tot_payments -> FROM payment -> GROUP BY customer_id; +-------------+-------------+--------------+ | customer_id | tot_rentals | tot_payments | +-------------+-------------+--------------+ | 1 | 32 | 118.68 | | 2 | 27 | 128.73 | | 3 | 26 | 135.74 | | 4 | 22 | 81.78 | ... | 595 | 30 | 117.70 | | 596 | 28 | 96.72 | | 597 | 25 | 99.75 | | 598 | 22 | 83.78 | | 599 | 19 | 83.81 | +-------------+-------------+--------------+ 599 rows in set (0.03 sec)
This is the heart of the query; the other tables are needed only to provide meaningful strings in place of the customer_id
value. The next query joins the previous data set to the other three tables:
mysql> SELECT c.first_name, c.last_name, -> ct.city, -> pymnt.tot_payments, pymnt.tot_rentals -> FROM -> (SELECT customer_id, -> count(*) tot_rentals, sum(amount) tot_payments -> FROM payment -> GROUP BY customer_id -> ) pymnt -> INNER JOIN customer c -> ON pymnt.customer_id = c.customer_id -> INNER JOIN address a -> ON c.address_id = a.address_id -> INNER JOIN city ct -> ON a.city_id = ct.city_id; +-------------+------------+-----------------+--------------+-------------+ | first_name | last_name | city | tot_payments | tot_rentals | +-------------+------------+-----------------+--------------+-------------+ | MARY | SMITH | Sasebo | 118.68 | 32 | | PATRICIA | JOHNSON | San Bernardino | 128.73 | 27 | | LINDA | WILLIAMS | Athenai | 135.74 | 26 | | BARBARA | JONES | Myingyan | 81.78 | 22 | ... | TERRENCE | GUNDERSON | Jinzhou | 117.70 | 30 | | ENRIQUE | FORSYTHE | Patras | 96.72 | 28 | | FREDDIE | DUGGAN | Sullana | 99.75 | 25 | | WADE | DELVALLE | Lausanne | 83.78 | 22 | | AUSTIN | CINTRON | Tieli | 83.81 | 19 | +-------------+------------+-----------------+--------------+-------------+ 599 rows in set (0.06 sec)
I realize that beauty is in the eye of the beholder, but I find this version of the query to be far more satisfying than the big, flat version. This version may execute faster as well, because the grouping is being done on a single numeric column(customer_id
) instead of multiple lengthy string columns (customer.first_name
, customer.last_name
, city.city
).
Common table expressions (a.k.a CTE’s), which are new to MySQL in version 8.0, have been available in other database servers for quite some time. A CTE is a named subquery which appears at the top of a query in a with clause, which can contain multiple CTE’s separated by commas. Along with making queries more understandable, this feature also allows each CTE to refer to any other CTE defined above it in the same with
clause. Here’s an example which includes three CTE’s, where the 2nd refers to the 1st, and the 3rd refers to the 2nd:
mysql> WITH actors_s AS -> (SELECT actor_id, first_name, last_name -> FROM actor -> WHERE last_name LIKE 'S%' -> ), -> actors_s_pg AS -> (SELECT s.actor_id, s.first_name, s.last_name, -> f.film_id, f.title -> FROM actors_s s -> INNER JOIN film_actor fa -> ON s.actor_id = fa.actor_id -> INNER JOIN film f -> ON f.film_id = fa.film_id -> WHERE f.rating = 'PG' -> ), -> actors_s_pg_revenue AS -> (SELECT spg.first_name, spg.last_name, p.amount -> FROM actors_s_pg spg -> INNER JOIN inventory i -> ON i.film_id = spg.film_id -> INNER JOIN rental r -> ON i.inventory_id = r.inventory_id -> INNER JOIN payment p -> ON r.rental_id = p.rental_id -> ) -- end of With clause -> SELECT spg_rev.first_name, spg_rev.last_name, -> sum(spg_rev.amount) tot_revenue -> FROM actors_s_pg_revenue spg_rev -> GROUP BY spg_rev.first_name, spg_rev.last_name -> ORDER BY 3 desc; +------------+-------------+-------------+ | first_name | last_name | tot_revenue | +------------+-------------+-------------+ | NICK | STALLONE | 692.21 | | JEFF | SILVERSTONE | 652.35 | | DAN | STREEP | 509.02 | | GROUCHO | SINATRA | 457.97 | | SISSY | SOBIESKI | 379.03 | | JAYNE | SILVERSTONE | 372.18 | | CAMERON | STREEP | 361.00 | | JOHN | SUVARI | 296.36 | | JOE | SWANK | 177.52 | +------------+-------------+-------------+ 9 rows in set (0.18 sec)
This query calculates the total revenues generated from PG-rated film rentals where the cast includes an actor whose last name starts with S. The first subquery (actors_s) finds all actors whose last name starts with S, the second subquery (actors_s_pg) joins that data set to the film
table and filters on films having a PG rating, and the third subquery (actors_s_pg_revenue) joins that data set to the payment
table to retrieve the amounts paid to rent any of these films. The final query simply groups the data from actors_s_pg_revenue
by first/last names and sums the revenues.
For those of you who tend to utilize temporary tables to store query results for use in subsequent queries, you may find CTE’s to be an attractive alternative.
For this last section of the chapter, I finish where I began: with single-column, single-row scalar subqueries. Along with being used in filter conditions, scalar subqueries may be used wherever an expression can appear, including the select
and order by
clauses of a query and the values
clause of an insert
statement.
In “Task-oriented subqueries”, I showed you how to use a subquery to separate out the grouping mechanism from the rest of the query. Here’s another version of the same query that uses subqueries for the same purpose, but in a different way:
mysql> SELECT -> (SELECT c.first_name FROM customer c -> WHERE c.customer_id = p.customer_id -> ) first_name, -> (SELECT c.last_name FROM customer c -> WHERE c.customer_id = p.customer_id -> ) last_name, -> (SELECT ct.city -> FROM customer c -> INNER JOIN address a -> ON c.address_id = a.address_id -> INNER JOIN city ct -> ON a.city_id = ct.city_id -> WHERE c.customer_id = p.customer_id -> ) city, -> sum(p.amount) tot_payments, -> count(*) tot_rentals -> FROM payment p -> GROUP BY p.customer_id; +-------------+------------+-----------------+--------------+-------------+ | first_name | last_name | city | tot_payments | tot_rentals | +-------------+------------+-----------------+--------------+-------------+ | MARY | SMITH | Sasebo | 118.68 | 32 | | PATRICIA | JOHNSON | San Bernardino | 128.73 | 27 | | LINDA | WILLIAMS | Athenai | 135.74 | 26 | | BARBARA | JONES | Myingyan | 81.78 | 22 | ... | TERRENCE | GUNDERSON | Jinzhou | 117.70 | 30 | | ENRIQUE | FORSYTHE | Patras | 96.72 | 28 | | FREDDIE | DUGGAN | Sullana | 99.75 | 25 | | WADE | DELVALLE | Lausanne | 83.78 | 22 | | AUSTIN | CINTRON | Tieli | 83.81 | 19 | +-------------+------------+-----------------+--------------+-------------+ 599 rows in set (0.06 sec)
There are two main differences between this query and the earlier version using a subquery in the from
clause:
Instead of joining the customer
, address
, and city
tables to the payment data, correlated scalar subqueries are used in the select
clause to look up the customer’s first/last names and city.
The customer
table is accessed three times (once in each of the three subqueries) rather than just once.
The customer
table is accessed three times because scalar subqueries can only return a single column and row, so if we need three columns related to the customer, it is necessary to use three different subqueries.
As previously noted, scalar subqueries can also appear in the order by
clause. The following query retrieves actor’s first and last names and sorts by the number of films in which the actor appeared:
mysql> SELECT a.actor_id, a.first_name, a.last_name -> FROM actor a -> ORDER BY -> (SELECT count(*) FROM film_actor fa -> WHERE fa.actor_id = a.actor_id) DESC; +----------+-------------+--------------+ | actor_id | first_name | last_name | +----------+-------------+--------------+ | 107 | GINA | DEGENERES | | 102 | WALTER | TORN | | 198 | MARY | KEITEL | | 181 | MATTHEW | CARREY | ... | 71 | ADAM | GRANT | | 186 | JULIA | ZELLWEGER | | 35 | JUDY | DEAN | | 199 | JULIA | FAWCETT | | 148 | EMILY | DEE | +----------+-------------+--------------+ 200 rows in set (0.01 sec)
The query uses a correlated scalar subquery in the order by
clause to return just the number of film appearances, and this value is used solely for sorting purposes.
Along with using correlated scalar subqueries in select
statements, you can use noncorrelated scalar subqueries to generate values for an insert
statement. For example, let’s say you are going to generate a new row in the film_actor
table, and you’ve been given the following data:
The first and last name of the actor
The name of the film
You have two choices for how to go about it: execute two queries to retrieve the primary key values from film
and actor
and place those values into an insert
statement, or use subqueries to retrieve the two key values from within an insert
statement. Here’s an example of the latter approach:
INSERT INTO film_actor (actor_id, film_id, last_update) VALUES ( (SELECT actor_id FROM actor WHERE first_name = 'JENNIFER' AND last_name = 'DAVIS'), (SELECT film_id FROM film WHERE title = 'ACE GOLDFINGER'), now() );
Using a single SQL statement, you can create a row in the film_actor
table and look up two foreign key column values at the same time.
I covered a lot of ground in this chapter, so it might be a good idea to review it. The examples in this chapter demonstrate subqueries that:
Return a single column and row, a single column with multiple rows, and multiple columns and rows
Are independent of the containing statement (noncorrelated subqueries)
Reference one or more columns from the containing statement (correlated subqueries)
Are used in conditions that utilize comparison operators as well as the special-purpose operators in
, not in
, exists
, and not exists
Can be found in select
, update
, delete
, and insert
statements
Generate result sets that can be joined to other tables (or subqueries) in a query
Can be used to generate values to populate a table or to populate columns in a query’s result set
Are used in the select
, from
, where
, having
, and order by
clauses of queries
Obviously, subqueries are a very versatile tool, so don’t feel bad if all these concepts haven’t sunk in after reading this chapter for the first time. Keep experimenting with the various uses for subqueries, and you will soon find yourself thinking about how you might utilize a subquery every time you write a nontrivial SQL statement.
These exercises are designed to test your understanding of subqueries. Please see Appendix B for the solutions.
Construct a query against the film
table that uses a filter condition with a noncorrelated subquery against the category
table to find all action films (category.name = 'Action'
).
Rework the query from Exercise 9-1 using a correlated subquery against the category
and film_category
tables to achieve the same results.
Join the following query to a subquery against the film_actor
table to show the level of each actor:
SELECT 'Hollywood Star' level, 30 min_roles, 99999 max_roles UNION ALL SELECT 'Prolific Actor' level, 20 min_roles, 29 max_roles UNION ALL SELECT 'Newcomer' level, 1 min_roles, 19 max_roles
The subquery against the film_actor
table should count the number of rows for each actor using group by actor_id
, and the count should be compared to the min_roles/max_roles columns to determine which level each actor belongs to.
1 Actually, depending on which database server you are using, you might be able to include correlated subqueries in your from clause by using Cross Apply or Outer Apply, but these features are beyond the scope of this book.
3.129.70.185