Most of the examples used in this book thus far have intentionally involved one table per SQL statement in order to allow you to focus on the basic syntax of each SQL statement. When developing a MySQL or MariaDB database, though, you will often query multiple tables. There are a few methods by which you may do that—you’ve seen some simple examples of them in previous chapters. This chapter covers how to merge results from multiple SQL statements, how to join tables, and how to use subqueries to achieve similar results.
Let’s start this chapter by looking at a simple method of unifying results from multiple
SQL statements. There may be times when you just want the unified results
of two SELECT
statements that don’t interact with each other.
In this situation, you can use the UNION
operator, which
merges two SELECT
statements to form a unified results set.
You can merge many SELECT
statements together simply by
placing the UNION
between them in a chain. Let’s look at an
example.
In Counting and Grouping Results, we queried the
birds
table to get a count of the number of birds in the
Pelecanidae family (i.e., Pelicans). Suppose we want
to also know how many birds are in the Ardeidae
family (i.e., Herons). That’s easy to do: we’d use a copy of the same
SELECT
, but change the value in the WHERE
clause. Suppose further that we want to merge the results of the
SELECT
statement counting Pelicans with the results of a
SELECT
counting Herons. We’ll do this with a
UNION
operator, so we can enter two complete
SELECT
statements and unite them into one results set. Enter
the following in the mysql
client:
SELECT 'Pelecanidae' AS 'Family',
COUNT(*) AS 'Species'
FROM birds, bird_families AS families
WHERE birds.family_id = families.family_id
AND families.scientific_name = 'Pelecanidae'
UNION
SELECT 'Ardeidae',
COUNT(*)
FROM birds, bird_families AS families
WHERE birds.family_id = families.family_id
AND families.scientific_name = 'Ardeidae';
+-------------+---------+
| Family | Species |
+-------------+---------+
| Pelecanidae | 10 |
| Ardeidae | 157 |
+-------------+---------+
First notice that the column headings in the results is taken only
from the first SELECT
statement. Next notice that for the
first fields in both SELECT
statements, we didn’t reference a
column. Instead, we gave plain text within quotes:
'Pelecanidae'
and 'Ardeidae'
. That’s an
acceptable choice in MySQL and MariaDB. It works well when you want to
fill a field with text like this. Notice that we gave field aliases for
the columns in the first SELECT
statement, but not in the
second one. MySQL uses the first ones it’s given for the column headings
of the results set when using the UNION
operator. It ignores
any field aliases in subsequent SELECT
statements, so they’re
not needed. If you don’t give aliases, it uses the column names of the
first SQL statement of the UNION
.
The reason a UNION
was somewhat necessary in the
preceding example is because we’re using an aggregate function,
COUNT()
with GROUP BY
. We can group by multiple
columns, but to get results like this which show separate counts for two
specific values of the same column, a UNION
or some other
method is necessary.
There are a few minor things to know about using a
UNION
. It’s used only with SELECT
statements.
The SELECT
statements can select columns from different
tables. Duplicate rows are combined into a single column in the results
set.
You can use the ORDER BY
clause to order the unified
results. If you want to order the results of a SELECT
statements, independently of the unified results, you have to put that
SELECT
statement within parentheses and add an ORDER
BY
clause to it. When specifying the columns in the ORDER
BY
clauses, you cannot preface column names with the table names
(e.g., families.scientific_name
). If using the column names
would be ambiguous, you should instead use column aliases. Let’s expand
our previous example to better illustrate how to use the ORDER
BY
clause with UNION
. Let’s get a count for each bird
family within two orders: Pelecaniformes and
Suliformes. Enter the following:
SELECT families.scientific_name AS 'Family',
COUNT(*) AS 'Species'
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND orders.scientific_name = 'Pelecaniformes'
GROUP BY families.family_id
UNION
SELECT families.scientific_name, COUNT(*)
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND orders.scientific_name = 'Suliformes'
GROUP BY families.family_id;
+-------------------+---------+
| Family | Species |
+-------------------+---------+
| Pelecanidae | 10 |
| Balaenicipitidae | 1 |
| Scopidae | 3 |
| Ardeidae | 157 |
| Threskiornithidae | 53 |
| Fregatidae | 13 |
| Sulidae | 16 |
| Phalacrocoracidae | 61 |
| Anhingidae | 8 |
+-------------------+---------+
The first five rows are are Pelecaniformes and
the remaining rows are Suliformes. The results are
not in alphabetical order, but in the order of each SELECT
statement and the order that server found the rows for each
SELECT
statement based on the family_id
. If we
want to order the results alphabetically by the family name, we have to
use an ORDER BY
clause, but after the unified results are
generated. To do this, we’ll wrap the results set in parentheses to tell
MySQL to treat it as a table. Then we’ll select all of the columns and
rows of that results set and use the ORDER BY
clause to order
them based on the family name. To avoid confusion, we’ll add the name of
the order to the results. Enter the following:
SELECT * FROM
(
SELECT families.scientific_name AS 'Family',
COUNT(*) AS 'Species',
orders.scientific_name AS 'Order'
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND orders.scientific_name = 'Pelecaniformes'
GROUP BY families.family_id
UNION
SELECT families.scientific_name, COUNT(*), orders.scientific_name
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND orders.scientific_name = 'Suliformes'
GROUP BY families.family_id ) AS derived_1
ORDER BY Family;
+-------------------+---------+----------------+
| Family | Species | Order |
+-------------------+---------+----------------+
| Anhingidae | 8 | Suliformes |
| Ardeidae | 157 | Pelecaniformes |
| Balaenicipitidae | 1 | Pelecaniformes |
| Fregatidae | 13 | Suliformes |
| Pelecanidae | 10 | Pelecaniformes |
| Phalacrocoracidae | 61 | Suliformes |
| Scopidae | 3 | Pelecaniformes |
| Sulidae | 16 | Suliformes |
| Threskiornithidae | 53 | Pelecaniformes |
+-------------------+---------+----------------+
In these examples, it may seem to be a lot of typing to achieve very
little. But there are times—albeit rare times—when UNION
is
the best or simplest choice. It’s more useful when you retrieve data from
very distinct, separate sources or other situations that would require
contortions to fit into a single SELECT
statement and are
executed more easily as separate ones, still giving you a unified results
set.
You can get the same results as the previous examples, though, with
less effort by using a subquery. Actually, when we put the
UNION
within parentheses, that became a subquery, just not
much of one. We’ll cover subqueries later in this chapter. For now, let’s
consider how to join multiple tables in one SQL statement.
The JOIN
clause links two tables together in a SELECT
,
UPDATE
, or DELETE
statement. JOIN
links tables based on columns with common data for purposes of selecting,
updating, or deleting data. In A Little Complexity, for instance, we joined two
tables named books
and status_names
, taking
advantage of the design that put identical values in the
status
column of books
and the
status_id
column of status_names
. That way, we
could show data from each table about the same book:
SELECT
book_id
,
title
,
status_name
FROM
books
JOIN
status_names
WHERE
status
=
status_id
;
Let’s review the way a join works, using this example. The
status
and status_id
fields both contain numbers
that refer to a status. In the books
table, the numbers have
no intrinsic meaning. But the status_names
table associates
the numbers with meaningful text. Thus, by joining the tables, you can
associate a book with its status.
Sometimes there are alternatives to the JOIN
clause.
For instance, when constructing an SQL statement that includes multiple
tables, a simple method is to list the tables in a comma-separated list in
the appropriate position of the SQL statement—for a SELECT
statement, you would list them in the FROM
clause—and to
provide pairing of columns in the WHERE
clause
on which the tables will be joined. This is the method we have used
several times in the previous chapters. Although this method works fine
and would seem fairly straightforward, a more agreeable method is to use a
JOIN
clause to join both tables and to specify the join point
columns. When you have an error with an SQL statement, keeping these items
together and not having part of them in the WHERE
clause
makes troubleshooting SQL statements easier.
With JOIN
, tables are linked together based on columns
with common data for purposes of selecting, updating, or deleting data.
The JOIN
clause is entered in the relevant statement where
tables referenced are specified usually. This precludes the need to join
the tables based on key columns in the WHERE
clause. The
ON
operator is used to indicate the pair of columns by which the tables
are to be joined (indicated with the equals-sign operator). If needed, you
may specify multiple pairs of columns, separated by AND
. If
the column names by which the two tables are joined are the same in both
tables, as an alternative method, the USING
operator may be
given along with a comma-separated list of columns that both tables have
in common, contained within parentheses. The columns must be contained in
each table that is joined. To improve performance, join to a column that
is indexed.
Here is how the first of these two syntax looks using a
JOIN
:
SELECT
book_id
,
title
,
status_name
FROM
books
JOIN
status_names
ON
(
status
=
status_id
);
This is the same example as before, but without the
WHERE
clause. It doesn’t need it, because it uses
ON
instead to indicate the join point. If we were to alter
the books
table to modify the name of the status
column to be status_id
, so that the names of both columns on
which we join these two tables are the same, we could do the join like
this:
SELECT
book_id
,
title
,
status_name
FROM
books
JOIN
status_names
USING
(
status_id
);
Here we use the keyword USING
in the JOIN
clause
to indicate the identical column by which to join.
These syntaxes are only two of a few possible with the
JOIN
. They show how you might construct a SELECT
statement using a JOIN
. It’s basically the same for the
UPDATE
and DELETE
statements. In the next
subsections, we’ll consider the methods for using JOIN
with
each of these three SQL statements, and look at some examples for
each.
Suppose we want to get a list of species of Geese whose existence is
Threatened
—that’s a category of conservation states. We
will need to construct a SELECT
statement that takes data
from the birds
table and the
conservation_status
table. The shared data in the
birds
and the conservation_status
tables is
the conservation_status_id
column of each table. We didn’t
have to give the column the same name in each table, but doing so makes
it easier to know where to join them.
Enter the following in the mysql client:
SELECT common_name, conservation_state
FROM birds
JOIN conservation_status
ON(birds.conservation_status_id = conservation_status.conservation_status_id)
WHERE conservation_category = 'Threatened'
AND common_name LIKE '%Goose%';
+----------------------------+--------------------+
| common_name | conservation_state |
+----------------------------+--------------------+
| Swan Goose | Vulnerable |
| Lesser White-fronted Goose | Vulnerable |
| Hawaiian Goose | Vulnerable |
| Red-breasted Goose | Endangered |
| Blue-winged Goose | Vulnerable |
+----------------------------+--------------------+
The ON
operator specifies the
conservation_status_id
columns from each table as the
common item on which to join the tables. MySQL knows the proper table in
which to find the conservation_category
and
common_name
columns, and pulls the rows that match.
That works fine, but it’s a lot to type. Let’s modify this
statement to use the USING
operator, specifing
conservation_status_id
just once to make the join. MySQL
will understand what to do. Here’s that same SQL statement, but with the
USING
operator:
SELECT
common_name
,
conservation_state
FROM
birds
JOIN
conservation_status
USING
(
conservation_status_id
)
WHERE
conservation_category
=
'Threatened'
AND
common_name
LIKE
'%Goose%'
;
Now let’s modify the SQL statement to include the bird family. To
do that, we’ll have to add another table, the
bird_families
. Let’s also include Ducks in the list. Try
executing the following:
SELECT common_name AS 'Bird',
bird_families.scientific_name AS 'Family', conservation_state AS 'Status'
FROM birds
JOIN conservation_status USING(conservation_status_id)
JOIN bird_families USING(family_id)
WHERE conservation_category = 'Threatened'
AND common_name REGEXP 'Goose|Duck'
ORDER BY Status, Bird;
+----------------------------+----------+-----------------------+
| Bird | Family | Status |
+----------------------------+----------+-----------------------+
| Laysan Duck | Anatidae | Critically Endangered |
| Pink-headed Duck | Anatidae | Critically Endangered |
| Blue Duck | Anatidae | Endangered |
| Hawaiian Duck | Anatidae | Endangered |
| Meller's Duck | Anatidae | Endangered |
| Red-breasted Goose | Anatidae | Endangered |
| White-headed Duck | Anatidae | Endangered |
| White-winged Duck | Anatidae | Endangered |
| Blue-winged Goose | Anatidae | Vulnerable |
| Hawaiian Goose | Anatidae | Vulnerable |
| Lesser White-fronted Goose | Anatidae | Vulnerable |
| Long-tailed Duck | Anatidae | Vulnerable |
| Philippine Duck | Anatidae | Vulnerable |
| Swan Goose | Anatidae | Vulnerable |
| West Indian Whistling-Duck | Anatidae | Vulnerable |
| White-headed Steamer-Duck | Anatidae | Vulnerable |
+----------------------------+----------+-----------------------+
We gave two JOIN
clauses in this SQL statement. It
doesn’t usually matter which table is listed where. For instance,
although bird_families
is listed just after the join for
the conservation_status
table, MySQL determined that
bird_families
is to be joined to the birds
table. Without using JOIN
, we would have to be more
emphatic in specifying the join points, and we would have to list them
in the WHERE
clause. It would have to be entered
like this:
SELECT
common_name
AS
'Bird'
,
bird_families
.
scientific_name
AS
'Family'
,
conservation_state
AS
'Status'
FROM
birds
,
conservation_status
,
bird_families
WHERE
birds
.
conservation_status_id
=
conservation_status
.
conservation_status_id
AND
birds
.
family_id
=
bird_families
.
family_id
AND
conservation_category
=
'Threatened'
AND
common_name
REGEXP
'Goose|Duck'
ORDER
BY
Status
,
Bird
;
That’s a very cluttered WHERE
clause, making it
difficult to see clearly the conditions by which we’re selecting data
from the tables. Using JOIN
clauses is much tidier.
Incidentally, the SQL statement with two JOIN
clauses
used a regular expression—the REGEXP
operator in the WHERE
clause—to specify that the clause find either Goose
or
Duck
. We also added an ORDER BY
clause to order first by
Status
, then by Bird
name.
In this example, though, there’s little point in listing the bird
family name, because the birds are all of the same family. Plus, there
may be similar birds that we might like to have in the list, but that
don’t have the words Goose
or Duck
in their
name. So let’s change that in the SQL statement. Let’s also order the
results differently and list birds from the least endangered to the most
endangered. Enter the following:
SELECT common_name AS 'Bird from Anatidae',
conservation_state AS 'Conservation Status'
FROM birds
JOIN conservation_status AS states USING(conservation_status_id)
JOIN bird_families USING(family_id)
WHERE conservation_category = 'Threatened'
AND bird_families.scientific_name = 'Anatidae'
ORDER BY states.conservation_status_id DESC, common_name ASC;
+----------------------------+-----------------------+
| Bird from Anatidae | Conservation Status |
+----------------------------+-----------------------+
| Auckland Islands Teal | Vulnerable |
| Blue-winged Goose | Vulnerable |
| Eaton's Pintail | Vulnerable |
| Hawaiian Goose | Vulnerable |
| Lesser White-fronted Goose | Vulnerable |
| Long-tailed Duck | Vulnerable |
| Marbled Teal | Vulnerable |
| Philippine Duck | Vulnerable |
| Salvadori's Teal | Vulnerable |
| Steller's Eider | Vulnerable |
| Swan Goose | Vulnerable |
| West Indian Whistling-Duck | Vulnerable |
| White-headed Steamer-Duck | Vulnerable |
| Bernier's Teal | Endangered |
| Blue Duck | Endangered |
| Brown Teal | Endangered |
| Campbell Islands Teal | Endangered |
| Hawaiian Duck | Endangered |
| Meller's Duck | Endangered |
| Red-breasted Goose | Endangered |
| Scaly-sided Merganser | Endangered |
| White-headed Duck | Endangered |
| White-winged Duck | Endangered |
| White-winged Scoter | Endangered |
| Baer's Pochard | Critically Endangered |
| Brazilian Merganser | Critically Endangered |
| Crested Shelduck | Critically Endangered |
| Laysan Duck | Critically Endangered |
| Madagascar Pochard | Critically Endangered |
| Pink-headed Duck | Critically Endangered |
+----------------------------+-----------------------+
An obvious change to this example is the elimination of
bird_families.scientific_name
from the list of selected
columns, so only two columns appear in the output. Another change, which
is cosmetic, is to provide the alias states
to the
conservation_status
table so we could refer to the short
alias later instead of the long name.
Finally, the ORDER BY
clause orders the output by
conservation_status_id
, because that value happens to be in
the order of severity in the conservation_status
table. We
want to override the default order, which puts the most threatened
species first, so we add the DESC
option to put the least
threatened first. We’re still ordering results secondarily by the common
name of the birds, but using the actual column name this time instead of
an alias. This is because we changed the alias for the
common_name
column from Birds
to Birds
from Anatidae
, because all the results are in that family. We
could have used 'Birds from Anatidae'
in the ORDER
BY
clause, but that’s bothersome to type.
Let’s look at one more basic example of a JOIN
.
Suppose we wanted to get a list of members located in Russia (i.e.,
where country_id
has a value of ru
) who have
reported sighting a bird from the Scolopacidae
family
(shore and wader birds like Sandpipers and Curlews). Information on bird
sightings is stored in the bird_sightings
table. It
includes GPS coordinates recorded from a bird list application on the
member’s mobile phone when they note the sighting. Enter this SQL
statement:
SELECT CONCAT(name_first, ' ', name_last) AS Birder,
common_name AS Bird, location_gps AS 'Location of Sighting'
FROM birdwatchers.humans
JOIN birdwatchers.bird_sightings USING(human_id)
JOIN rookery.birds USING(bird_id)
JOIN rookery.bird_families USING(family_id)
WHERE country_id = 'ru'
AND bird_families.scientific_name = 'Scolopacidae'
ORDER BY Birder;
+-------------------+-------------------+---------------------------+
| Birder | Bird | Location of Sighting |
+-------------------+-------------------+---------------------------+
| Anahit Vanetsyan | Bar-tailed Godwit | 42.81958072; 133.02246094 |
| Elena Bokova | Eurasian Curlew | 51.70469364; 58.63746643 |
| Elena Bokova | Eskimo Curlew | 66.16051056; -162.7734375 |
| Katerina Smirnova | Eurasian Curlew | 42.69096856; 130.78185081 |
+-------------------+-------------------+---------------------------+
This SQL statement joins together four tables, two from the
birdwatchers
database and two from the birds
database. Look closely at this SQL statement and consider the purpose of
including each of those four tables. All of them were needed to assemble
the results shown. Incidentally, we used the CONCAT()
function to concatenate together the member’s first and last name for
the Birder
field in the results.
There are other types of joins besides a plain JOIN
.
Let’s do another SELECT
using another type of
JOIN
. For an example of this, we’ll get a list of Egrets
and their conservation status. Enter the following SQL statement:
SELECT common_name AS 'Bird',
conservation_state AS 'Status'
FROM birds
LEFT JOIN conservation_status USING(conservation_status_id)
WHERE common_name LIKE '%Egret%'
ORDER BY Status, Bird;
+--------------------+-----------------+
| Bird | Status |
+--------------------+-----------------+
| Great Egret | NULL |
| Cattle Egret | Least Concern |
| Intermediate Egret | Least Concern |
| Little Egret | Least Concern |
| Snowy Egret | Least Concern |
| Reddish Egret | Near Threatened |
| Chinese Egret | Vulnerable |
| Slaty Egret | Vulnerable |
+--------------------+-----------------+
This SELECT
statement is like the previous examples,
except that instead of using a JOIN
, we’re using a
LEFT JOIN
. This type of join selects rows in the table on the left (i.e.,
birds
) regardless of whether there is a matching row in the
table on the right (i.e., conservation_status
). Because
there is no match on the right, MySQL returns a NULL value for columns
it cannot reconcile from the table on the right. You can see this in the
results. The Great Egret
has a value of NULL for its
Status
. This is because no value was entered in the
conservation_status_id
column of the row related to that
bird species. It would return NULL if the value of that column is NULL,
blank if the column was set to empty (e.g., ''
), or any
value that does not match in the right table.
Because of the LEFT JOIN
, the results show all birds
with the word Egret
in the common name even if we don’t
know their conservation status. It also indicates which Egrets need to
set the value of conservation_status_id
. We’ll need to
update that row and others like it. An UPDATE
statement with this same LEFT
JOIN
can easily do that. We’ll show a couple in the next
section.
If you want to use the UPDATE
statement to change the data in multiple tables, or change data in a
table based on criteria from multiple tables, you can use the
JOIN
clause. The syntax of the JOIN
clause for
UPDATE
is the same as it is for SELECT
. So
let’s go straight to some practical examples. We’ll start with the
example at the end of the previous subsection.
Let’s use UPDATE
with LEFT JOIN
to
locate rows in the birds
table that don’t have a value in
conservation_status_id
. We could update all of the rows,
but let’s do only rows for one bird family,
Ardeidae (i.e., Herons, Egrets, and Bitterns).
First, execute this SELECT
statement to
test our joins and WHERE
clause:
SELECT
common_name
,
conservation_state
FROM
birds
LEFT
JOIN
conservation_status
USING
(
conservation_status_id
)
JOIN
bird_families
USING
(
family_id
)
WHERE
bird_families
.
scientific_name
=
'Ardeidae'
;
If you’re working from the data from the MySQL Resources site, you
should have over 150 rows in the results. You’ll notice that many of the
rows have nothing in the common_name
field. That’s because
there are many bird species for which there are scientific names, but no
common names. Those rows also have no value for the
conservation_status_id
. There are also a few rows for bird
species that do have common names.
Let’s add another row to the conservation_status
, one
for an unknown state. We’ll set these unknown rows to that state. Enter
these two SQL statements:
INSERT INTO conservation_status (conservation_state)
VALUES('Unknown'),
SELECT LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
| 9 |
+------------------+
In the first SQL statement here we entered only a value for
conservation_state
. The defaults for the other columns are
fine. We’ll use the UPDATE
statement to set the rows for
the birds in Ardeidae to this new state, so we want
to know the conservation_status_id
for it. To get that
value, we issue a SELECT
statement with the LAST_INSERT_ID()
function. It returns
the identifier generated from the previous SQL statement entered, which
added a row for the current client connection (i.e., just us). Let’s use
that number to set the conservation_status_id
in the
birds
table for bird species in
Ardeidae. If your identification number is
different, use what you received in the following SQL statement:
UPDATE
birds
LEFT
JOIN
conservation_status
USING
(
conservation_status_id
)
JOIN
bird_families
USING
(
family_id
)
SET
birds
.
conservation_status_id
=
9
WHERE
bird_families
.
scientific_name
=
'Ardeidae'
AND
conservation_status
.
conservation_status_id
IS
NULL
;
This UPDATE
statement should have changed almost 100
rows on your server. The joins here are the same as we used in the
previous SELECT
statement, in which we discovered that we
did not have a conservation status set for the Great Egret. Notice in
the WHERE
clause here that one of the conditions is that
conservation_status.conservation_status_id
has a value of
NULL. We could have removed the LEFT JOIN
to the
conservation_status
table and then updated simply all of
the rows for the Ardeidae birds that have a NULL
value in the conservation_status_id
column. But that would
not have included any rows that might have other nonmatching values
(e.g., a blank column). By including this LEFT JOIN
, we
updated all of these possibilities. However, it requires the condition
that the conservation_status.conservation_status_id
is
NULL, the column from the right table—it will be assumed NULL if not
matched.
Because the method of joining tables is the same for both the
SELECT
statement and the UPDATE
statement, you
can easily test the JOIN
clauses and WHERE
clause using a SELECT
first. When
that’s successful, you can then execute an UPDATE
statement
with the same JOIN
and WHERE
clauses. That’s
the best procedure to follow to ensure proper updating of data when
joining multiple tables.
Having used JOIN
with SELECT
and
UPDATE
statements, let’s look at some practical examples using
DELETE
. In Deleting in Multiple Tables,
we saw an example of DELETE
with a JOIN
. In
that example, we wanted to delete the rows where the member Elena Bokova
has a yahoo.com email address from both the
humans
and the prize_winners
tables from the
birdwatchers
database. For that purpose, we constructed a
DELETE
statement that worked fine, but there was
potentially a problem with it. Here is that SQL statement again:
DELETE
FROM
humans
,
prize_winners
USING
humans
JOIN
prize_winners
WHERE
name_first
=
'Elena'
AND
name_last
=
'Bokova'
AND
email_address
LIKE
'%yahoo.com'
AND
humans
.
human_id
=
prize_winners
.
human_id
;
Compared to the JOIN
clauses we’ve been using, the
syntax here may look strange. This is how it works with a
DELETE
statement. Tables from which data is deleted are
listed in the FROM
clause, while tables used in the
WHERE
clause to provide filters to determine which rows to
delete are listed in a USING
clause. The clause
"USING humans JOIN prize_winners
" just tells the server
that those two tables provide the columns in the WHERE
clause.
Don’t confuse a USING
clause, which has JOIN
subclauses, with the USING
operator, which can be used
in a JOIN
clause.
As the preceding DELETE
SQL statement is constructed,
if MySQL finds a row in the humans
table where the name and
email information match, there has to be a matching row in the
prize_winners
table for the human_id
. If
there’s not a row in both, MySQL won’t delete the row in the
humans
table and no error will be returned—you might not
realize it failed. To allow for this possibility, we could use a LEFT JOIN
like so:
DELETE
FROM
humans
,
prize_winners
USING
humans
LEFT
JOIN
prize_winners
ON
humans
.
human_id
=
prize_winners
.
human_id
WHERE
name_first
=
'Elena'
AND
name_last
=
'Bokova'
AND
email_address
LIKE
'%yahoo.com'
;
Notice that for this syntax we moved the valuation of the
human_id
columns to the USING
clause, adding a LEFT
JOIN
and an ON
operator to replace that condition in the WHERE
clause. That’s necessary because if
there’s not a match in the other table, the WHERE
clause
won’t include that row in the results to be deleted. With the LEFT
JOIN
, all of the rows in both the humans
and the
prize_winners
tables that match the criteria given to it
will be deleted, and any rows found in the humans
table for
which there isn’t a match in the prize_winners
table, but
which match the criteria of the WHERE
clause will be
deleted also. This prevents what are known as orphaned rows.
For general maintenance, we should check occasionally to see if
there are rows in the prize_winners
table that don’t have
matching rows in the humans
table, and then delete them.
Someone might have had us delete their account, but we may have
forgotten to remove entries for them in related tables. To handle that
possibility, we could use RIGHT JOIN
instead of LEFT
JOIN
. We could enter something like this:
DELETE
FROM
prize_winners
USING
humans
RIGHT
JOIN
prize_winners
ON
humans
.
human_id
=
prize_winners
.
human_id
WHERE
humans
.
human_id
IS
NULL
;
In this DELETE
statement, we listed only the
prize_winners
table in the FROM
clause because
that’s the only one from which we want to delete rows. It’s a good
policy not to list tables that are not to be affected in the FROM
clause of a DELETE
statement, even if you think there’s no possible way that there is a row
that would be deleted in the other tables.
Because we put the humans
table first in the
USING
clause and the prize_winners
table
second, we’re doing a RIGHT JOIN
so that columns from the
table on the right (prize_winners
) will be deleted even if
there is no value in the table on the left. If we reversed the order of
the tables, we would then need a LEFT JOIN
for this
task.
It’s worth focusing for a moment on the final clause of the
previous DELETE
statement, a WHERE
clause
checking for NULLs in one column. As we saw earlier, a LEFT
JOIN
or RIGHT JOIN
can return rows where there was
nothing in the column you’re doing the join on. The results contain NULL
for the missing value. So in the WHERE
clause here, we’re
using that as the condition for finding the orphaned rows in the
prize_winners
table.
There are many contortions to the JOIN
clause. The
basic JOIN
syntaxes that we covered in Selecting a Basic Join are worth learning well; they
will be the ones you will use primarily. You will sometimes have a need
for using a LEFT JOIN
or a RIGHT JOIN
. Let’s
move on to a related topic that can be valuable in many situations:
subqueries.
A subquery is a query within another query, a SELECT
statement
within another SQL statement. A subquery returns a single value, a row of
data, a single column from several rows, or several columns from several
rows. These are known respectively as scalar, column, row, and table
subqueries. I’ll refer to these distinctions later in this chapter.
Although the same results can be accomplished by using the
JOIN
clause and sometimes the UNION
, depending
on the situation, subqueries are a cleaner approach. They make a complex
query more modular, which makes it easier to create and to troubleshoot
problems. Here are two generic examples of subqueries (we also used a few
subqueries in Chapter 8):
UPDATEtable_1
SETcol_5
= 1 WHEREcol_id
=SELECT
; SELECTcol_id
FROMtable_2
WHEREcol_1 = value
column_a
,column_1
FROMtable_1
JOIN (SELECT
);column_1
,column_2
FROMtable_2
WHEREcolumn_2 = value
) ASderived_table
USING(col_id
In the first example, the SELECT
statement is an inner query. The UPDATE
statement is
referred to as the main or outer query. In the second example, the
SELECT
within parentheses is the inner query and the
SELECT
outside of the parentheses is the outer query. An
outer query containing a subquery can be a SELECT
,
INSERT
, UPDATE
, DELETE
,
DO
, or even a SET
statement. There are some
limitations, though. An outer query cannot generally select data or modify
data from the same table of an inner query. This doesn’t apply
though if the subquery is part of a FROM
clause.
These generic examples may be confusing. Generic examples aren’t usually easy to follow. I’d rather present first the syntax for subqueries, but there is no syntax per se for the use of subqueries—other than the syntax inherent in the SQL statements used for the inner and outer queries. Subqueries are rather a method of constructing combinations of SQL statements. As such, you need only to make sure of two basic factors with subqueries.
The first factor of which you need to be mindful is how a subquery
is contained within an outer query, where you position it. For instance,
if you construct an outer query which is an UPDATE
statement,
you could place a subquery in the WHERE
clause to provide a set of values to
which a column is equal (e.g., as in the first generic example). Or you
might locate a subquery in the FROM
clause of an outer,
SELECT
statement (e.g., as in the second generic example).
These are where subqueries may be positioned. You can have multiple
subqueries within an outer query, but they will be positioned generally
within the FROM
clause or the WHERE
clause.
The second factor is whether the results returned from a subquery
are in keeping with the expectations of the outer query. For instance, in
the first generic example, the UPDATE
clause has a
WHERE
clause that expects a single value from the subquery.
If the subquery returns several values, a row of columns, or a table of
results, it will confuse MySQL and cause an error. So you need to be sure
that the subquery you construct will return the type of values required by
the outer query as you constructed it.
You’ll better understand these factors as we look at examples of them. As mentioned at the start of this section, the different types of subqueries are scalar, column, row, and table subqueries. In the following subsections, we’ll look at each of these types, along with examples of them.
The most basic subquery is one that returns a single value, a scalar value. This
type of subquery is particularly useful in a WHERE
clause in conjunction with an
=
operator, or in other instances where a single value from
an expression is permitted. Let’s look at simple example of this. Let’s
get a list of bird families that are members of the
Galliformes bird order (i.e., Grouse, Partridges,
Quails, and Turkeys). This can be done easily with a JOIN
in which we join the
birds
and bird_families
tables together based
on the order_id
for Galliformes. We’ll
use instead a scalar subquery to get the order_id
we need.
Enter this in mysql:
SELECT scientific_name AS Family
FROM bird_families
WHERE order_id =
(SELECT order_id
FROM bird_orders
WHERE scientific_name = 'Galliformes'),
+----------------+ | Family | +----------------+ | Megapodiidae | | Cracidae | | Numididae | | Odontophoridae | | Phasianidae | +----------------+
The inner query (i.e., the subquery here) returns one value, the
order_id
. That’s used to complete the WHERE
clause of the outer query. That was pretty simple. Let’s look at another
example of a scalar subquery.
We had an example earlier in this chapter, in the section related
to using a JOIN
, in which we selected members from Russia
who had sighted birds of the family Scolopacidae.
To thank members in Russia for using our telephone application for
recording sightings, we’re going to give a one-year premium membership
to one of those members. Enter this hefty SQL statement in mysql:
UPDATE
humans
SET
membership_type
=
'premium'
,
membership_expiration
=
DATE_ADD
(
IFNULL
(
membership_expiration
,
CURDATE
()),
INTERVAL
1
YEAR
)
WHERE
human_id
=
(
SELECT
human_id
FROM
(
SELECT
human_id
,
COUNT
(
*
)
AS
sightings
,
join_date
FROM
birdwatchers
.
bird_sightings
JOIN
birdwatchers
.
humans
USING
(
human_id
)
JOIN
rookery
.
birds
USING
(
bird_id
)
JOIN
rookery
.
bird_families
USING
(
family_id
)
WHERE
country_id
=
'ru'
AND
bird_families
.
scientific_name
=
'Scolopacidae'
GROUP
BY
human_id
)
AS
derived_1
WHERE
sightings
>
5
ORDER
BY
join_date
ASC
LIMIT
1
);
The most inner query here is basically the same as the one in the
example mentioned earlier. The difference is that here we’re not
selecting the names involved. Instead, we’re selecting the
human_id
and the join_date
(i.e., the date
that the member joined). With the GROUP BY
clause, we’re grouping members
based on the human_id
to get a count with the
COUNT()
function. Put another way, we’re counting the
number of entries of each human_id
in the
bird_sightings
table for the bird family and member country
we specified. That subquery will return a table of results; it’s a table
subquery. We’ll talk more about that type of subquery later in this
chapter.
The query wrapped around the most inner query, which is also a subquery, selects only rows where the number of sightings is more than five. It orders the rows with newer members first based on the date the members joined—we want the newest Russian member reporting several Curlews and the like to be awarded a year of premium membership. This subquery is limited to one row with one column. It’s a scalar query.
The main query in the preceding example is using the single value
from the scalar query to determine which member to give one year of
premium membership. If we hadn’t added the LIMIT
to the scalar query, it would have
returned more than one value—it then wouldn’t have been a scalar query.
Based on the operator in the WHERE
clause of its outer
query, MySQL would have returned an error message like this:
ERROR 1242 (ER_SUBSELECT_NO_1_ROW) SQLSTATE = 21000 Message = "Subquery returns more than 1 row"
As with all subqueries, there’s always a way to get the same
results without a subquery, using JOIN
or some other method
to bring results together in complex ways. To some extent, it’s a matter
of style which method you decide to use. I generally prefer subqueries,
especially when using them in applications I develop in PHP or Perl.
They’re easier for me to decipher months or years later when I want to
make changes to a program I’ve written.
In the preceding subsection, we discussed instances in which one scalar
value was obtained in a WHERE
clause. However,
there are times when you may want to match multiple values. For those
situations, you will need to use the subquery in conjunction with an
operator such as IN
, which is used to specify a
comma-separated list of values. Let’s look at an example of this.
In one of the examples in the previous subsection, we used a scalar subquery to get a list of bird families for the bird order Galliformes. Suppose that we also want the common name of one bird species from each family in the order; we want to randomly select a bird name from each. To do this, we will create a subquery that will select a list of bird family names for the order. Enter the following SQL statement:
SELECT * FROM
(SELECT common_name AS 'Bird',
families.scientific_name AS 'Family'
FROM birds
JOIN bird_families AS families USING(family_id)
JOIN bird_orders AS orders USING(order_id)
WHERE common_name != ''
AND families.scientific_name IN
(SELECT DISTINCT families.scientific_name AS 'Family'
FROM bird_families AS families
JOIN bird_orders AS orders USING(order_id)
WHERE orders.scientific_name = 'Galliformes'
ORDER BY Family)
ORDER BY RAND()) AS derived_1
GROUP BY (Family);
+------------------------+----------------+
| Bird | Family |
+------------------------+----------------+
| White-crested Guan | Cracidae |
| Forsten's Scrubfowl | Megapodiidae |
| Helmeted Guineafowl | Numididae |
| Mountain Quail | Odontophoridae |
| Gray-striped Francolin | Phasianidae |
+------------------------+----------------+
In this example, we have two subqueries, a subquery within a
subquery, within an outer query. The most inner subquery is known as a
nested subquery. The subqueries here are executed before the outer
query, so the results will be available before the WHERE
clause of the outer query is executed. In that vein, the nested subquery
will be executed before the subquery in which it is contained. In this
example, the nested query is contained within the parentheses of the
IN
operator—the most indented query. That SQL statement
selects the bird family name where the name of the order is
Galliformes. The DISTINCT
flag
by the alias Family
instructs MySQL to return
only one entry for each distinct family name. If we had manually entered
that information, it would look like this:
('Cracidae','Megapodiidae','Numididae','Odontophoridae','Phasianidae').
This subquery is a multiple-field or column subquery.
The inner subquery in the preceding example is a table subquery.
It selects a list of all birds that are in the list of bird families
provided by its subquery. We could just select one bird for each family
at this level using a GROUP BY
clause to group by the Family
name to get one bird
species per family. But that would select the first rows found and the
results would be the same every time. We want to select randomly each
time this SQL statement is executed. To do that, we’re selecting all of
the birds for each bird family and then using ORDER BY RAND()
to randomly order the
rows of the results table. Then we’re wrapping that in another query,
the outer query to GROUP BY
the bird family. That will give
us one entry for each bird family.
Row subqueries retrieve a single row of data that is then used by the
outer query. It’s used in a WHERE
clause to compare one row of columns
to one row of columns selected in the subquery. Let’s consider an
example of this and then we’ll discuss it more. Suppose another
bird-watcher site closes, this one in Eastern Europe. They send us their
database, which contains a table with the names of their members, and
another table with information members provided related to birds they
spotted. We put both of these tables in the birdwatchers
database to import into our tables. In the process of importing these
members into our humans
table, we discover people who are
already members of our site. That’s OK: we know how to avoid importing
the duplicates. Now we want to import the table of birds spottings.
Because there were duplicate members, maybe those members have logged
information on birds they saw in the wild on this Eastern European site.
So we want to check that each entry is not a duplicate and then import
it. Look at this SQL statement:
INSERT
INTO
bird_sightings
(
bird_id
,
human_id
,
time_seen
,
location_gps
)
VALUES
(
SELECT
birds
.
bird_id
,
humans
.
human_id
,
date_spotted
,
gps_coordinates
FROM
(
SELECT
personal_name
,
family_name
,
science_name
,
date_spotted
,
CONCAT
(
latitude
,
'; '
,
longitude
)
AS
gps_coordinates
FROM
eastern_birders
JOIN
eastern_birders_spottings
USING
(
birder_id
)
WHERE
(
personal_name
,
family_name
,
science_name
,
CONCAT
(
latitude
,
'; '
,
longitude
)
)
NOT
IN
(
SELECT
name_first
,
name_last
,
scientific_name
,
location_gps
FROM
humans
JOIN
bird_sightings
USING
(
human_id
)
JOIN
rookery
.
birds
USING
(
bird_id
)
)
)
AS
derived_1
JOIN
humans
ON
(
personal_name
=
name_first
AND
family_name
=
name_last
)
JOIN
rookery
.
birds
ON
(
scientific_name
=
science_name
)
);
This looks very complicated and can be difficult to understand or
construct correctly. Let’s discern the major elements here. Look first
at the subquery in parentheses, the nested subquery. We’re selecting
data from tables in our database: the names of each person, the bird
species and where the member sighted it. This nested subquery is
contained within the WHERE
clause of another subquery, a
row subquery. Notice that a list of columns from the tables of the row
subquery is given in parentheses. So the condition of the
WHERE
clause is that the values of those columns for each
row of the joined tables are compared to the values of the columns for
each row from joined tables in its subquery. The outer query inserts the
relevant values into the bird_sightings
table.
The preceding example is certainly an odd one and seemingly,
unnecessarily complex. But there are times when a row query like this
can be useful. To put our example more simply, if there’s a row with the
same human name who spotted the same bird species at the exact same map
coordinates, don’t import it. If all of those values are not the same,
then insert it into the bird_sightings
table. There are
other ways, though, you can accomplish this task. For instance, you
might do this in stages with multiple SQL statements and a temporary
table. You could also do it in stages within a program using one of the
languages like Perl and an API like the Perl DBI. But it’s good to know
you have the option of doing it within one SQL statement if that’s what
you want.
A subquery can be used to generate a results set, a table from which an
outer query can select data. That is to say, a subquery can be
used in a FROM
clause as if it were another
table in a database. It is said to be a derived table.
There are a few rules related to table subqueries. Each derived
table must be assigned an alias—any unique name is fine. You can use the
keyword AS
for assigning an alias. Each column in a
subquery that is in part of a FROM
clause must have a
unique name. For instance, if you select the same column twice in a
subquery, you have to assign at least one of them an alias that is
unique. A subquery contained in a FROM
clause cannot
generally be a correlated subquery; it cannot reference the same table
as the outer query.
For an example of a table subquery, let’s use the example near the
beginning of this chapter that used a UNION
. In that example, we had two SELECT
statements which counted the
number of rows for birds in two bird families:
Pelecanidae and Ardeidae. With
a UNION
, the results were merged into one results set. That
was a bulky method. We can do better with a table subquery. The subquery
we’ll use will select just the bird family name for each bird of the two
families that we wanted to count. That may seem silly, to list the bird
family name multiple times, especially when we already know the name of
the bird families we want to count. But that’s how we can count them and
use the name for our results set. MySQL won’t display the names multiple
times—that will go on behind the scenes. It will display only one entry
per family because of the GROUP BY
clause. Enter the
following:
SELECT family AS 'Bird Family',
COUNT(*) AS 'Number of Birds'
FROM
(SELECT families.scientific_name AS family
FROM birds
JOIN bird_families AS families USING(family_id)
WHERE families.scientific_name IN('Pelecanidae','Ardeidae')) AS derived_1
GROUP BY family;
+-------------+-----------------+
| Bird Family | Number of Birds |
+-------------+-----------------+
| Ardeidae | 157 |
| Pelecanidae | 10 |
+-------------+-----------------+
This a much better way to form this unified results set than using
a UNION
. We could add more bird family names to the WHERE
clause in the subquery to get more
rows in the results set, instead of having to copy the
SELECT
statement for each family we add.
You can see in this example that a table subquery is the same as a
table in the FROM
clause. We can even give it an alias
(e.g., derived_1
) as we can with a normal table. The
subquery returns a table of results (i.e., the bird family names). The
GROUP BY
clause tells MySQL to group the results based on
the family
field, the alias in the subquery for the
scientific_name
column of the bird_families
table. We used that same alias to select that field in the column list
of the outer query. When a column in a subquery is set to an alias, you
have to use the alias; the column name becomes inaccessible outside the
subquery when an alias is given.
Performance problems can occur with subqueries if they are not well
constructed. There can be a performance drain when a subquery is placed
within an IN()
operator as part of a
WHERE
clause of the outer query. It’s generally better to
use instead the =
operator, along with AND
for
each column
=value
pair. For situations in which you suspect poor performance with a
subquery, try reconstructing the SQL statement with JOIN
and compare the differences between the two SQL statements using
the BENCHMARK()
function. For ideas on
improving subquery performance, Oracle has tips on their site for Optimizing
Subqueries.
Many developers prefer subqueries—I do. They’re easier to construct
and decipher when you have problems later. If you work on a database that
is very large and has a huge amount of activity, subqueries may not be a
good choice because they can sometimes affect performance. For small
databases, though, they’re fine. You should learn to use subqueries and
learn how to work without them (i.e, use JOIN
) so you can
handle any situation presented to you. You cannot be sure which method
your next employer and team of developers may being using. It’s best to be
versatile.
As for learning to use JOIN
, that’s hardly optional.
Very few developers don’t use JOIN
. Even if you prefer
subqueries, they still call for JOIN
. You can see this in
almost all of the examples of subqueries in this chapter. You may rarely
use UNION
. But there’s not much to learn there. However, you
should be proficient in using JOIN
. So don’t avoid them;
practice manually entering SQL statements that use them. The act of typing
them helps.
The goal of the following exercises is to give you practice
assembling tables using JOIN
and creating subqueries. In the
process of doing these exercises, think about how tables and data come
together. Try to envision each table as a separate piece of paper with a
list of data on it, and how you might place them on a desk to find
information on them in relation to each other. In such a scenario, you
might tend to place your left index finger at one point on a page on the
left and your right index finger on a point on another page on your right.
That’s a join. Where you point on each are the join points. As you type
the SQL statements in these exercises, think of this scene and say aloud
what you’re doing, what you’re telling MySQL to do. It helps to better
understand the joining of tables and creating of subqueries.
In the birdwatchers
database, there is a table
called bird_sightings
in which there are records of birds
that members have seen in the wild. Suppose we have a contest in which
we will award a prize based on the most sightings of birds from the
order Galliformes. A member gets one point for
each sighting of birds in this order.
Construct an SQL statement to count the number of entries from
each member. There should be two fields in the results set: one
containing the human_id
with Birder
as the
alias; and the second field containing the number of entries with
Entries
as its alias. To accomplish this, join the
bird_sightings
table to birds
,
bird_families
, and bird_orders
. Remember
that these tables are in a different database. You will have to use
the COUNT()
function and a GROUP BY
clause.
Do all of this with JOIN
and not with subqueries. Your
results should look like the following:
+--------+---------+ | Birder | Entries | +--------+---------+ | 19 | 1 | | 28 | 5 | +--------+---------+
When you have successfully constructed this SQL statement,
modify it to join in the humans
table. In the column
list, replace the field for human_id
with the first and
last name of the member. Use the CONCAT()
function to
put them together into a single field (with a space in between the
names), with the same alias. Once you make the needed changes and
execute it, the results should look like this, but the number of names
and points may be different:
+--------------+--------+ | Birder | Points | +--------------+--------+ | Elena Bokova | 4 | | Marie Dyer | 8 | +--------------+--------+
In the preceding exercises, you were asked to count the number of bird species the members sighted from the Galliformes. So that the contest is more fun, instead of giving one point for each bird species in that order, give a point for only one bird species per bird family in the bird order. That means that a member doesn’t get more points for sighting the same bird species multiple times. A member also doesn’t get more points for spotting several birds in the same family. Instead, the member has to look through bird guides to find a species for each species and then go looking for one from each in their area. This should make the contest more of an adventure for the members.
To allow for the change to the contest, you will need to modify
the SQL statement you constructed at the end of the previous exercise.
First, you will need to add a DISTINCT
to the start of
the column list in the outer query. You’ll need to remove the
CONCAT()
and GROUP BY
. When you’ve done
that, execute the SQL statement to make sure you have no errors. You
should get a results set that shows multiple entries for some members.
Next, place the whole SQL statement inside another SQL statement to
make it a subquery. The new, outer query should include
CONCAT()
and GROUP BY
so that it can count
the single entries from each family for each member. It should return
results like this:
+--------------+--------+ | Birder | Points | +--------------+--------+ | Elena Bokova | 1 | | Marie Dyer | 5 | +--------------+--------+
There are five families in the Galliformes bird order. For the contest described in the last two exercises, the most points that a member could achieve therefore is 5. Change the SQL statement you entered at the end of the previous exercise to list only members who have 5 points. To do this, you will need to wrap the previous SQL statement inside another, creating a nested query. When you execute the full SQL statement, the results should look like this:
+------------+--------+ | Birder | Points | +------------+--------+ | Marie Dyer | 5 | +------------+--------+
3.16.130.201