A
join query is a querying technique that
matches rows from two or more tables based on a join condition in a
WHERE
clause and outputs only those rows that meet the condition. As part
of the process of converting the winestore entity-relationship model
to SQL statements, we have included the attributes required in any
practical join condition.
To understand which tables can be joined in the winestore database, and how the joins are processed, it is helpful to have a copy of the ER model at hand.
Oddly, the easiest way to introduce join queries is to discuss what not to do. Consider this query, which we might intuitively, but wrongly, use to find all the wineries in a region:
SELECT winery_name,region_name FROM winery, region;
This query produces—in part—the following results:
+-------------------------------+-------------+ | winery_name | region_name | +-------------------------------+-------------+ | Ryan Ridge Winery | Victoria | | Macdonald Creek Premium Wines | Victoria | | Davie's | Victoria | | Porkenberger Brook Vineyard | Victoria | | Rowley Hill Vineyard | Victoria |
The impression here is that, for example, Ryan Ridge Winery is located in the Victoria region. This might not be the case. Why? First, you can use the techniques covered so far in this chapter to check which region Ryan Ridge Winery is located in:
SELECT region_id FROM winery WHERE winery_name='Ryan Ridge Winery';
The result is region_id=2
.
Now query the region table to find the name of
region_id=2
using:
SELECT region_name FROM region WHERE region_id=2;
The region_name
is South Australia. So, Ryan Ridge
Winery isn’t in Victoria at all!
What happened in the first attempt at a join query? The technical
answer is that you just evaluated a cartesian product; that is, you produced as output all the possible
combinations of wineries and regions. These odd results can be seen
if you add an ORDER BY
clause to the original query:
SELECT winery_name, region_name FROM winery, region ORDER BY winery_name, region_name;
Recall that the ORDER BY
clause sorts the results
after the query has been evaluated; it has no effect on which rows
are returned from the query. Here is the first part of the result of
the query with the ORDER BY
clause:
+----------------------+-------------------+ | winery_name | region_name | +----------------------+-------------------+ | Anderson Creek Wines | New South Wales | | Anderson Creek Wines | South Australia | | Anderson Creek Wines | Victoria | | Anderson Creek Wines | Western Australia | | Anderson Group | New South Wales | | Anderson Group | South Australia | | Anderson Group | Victoria | | Anderson Group | Western Australia |
The query produces all possible combinations of the four region names and 300 wineries in the sample database! In fact, the size of the output can be accurately calculated as the total number of rows in the first table multiplied by the total rows in the second table. In this case, the output is 4 x 300 = 1,200 rows.
A
cartesian product isn’t the join we want. Instead,
we want to limit the results to only the sensible rows, where the
winery is actually located in the region. From a database
perspective, we want only rows in which the
region_id
in the winery table
matches the corresponding region_id
in the
region table. This is a natural
join.[6]
Consider a revised example using a natural join:
SELECT winery_name, region_name FROM winery, region WHERE winery.region_id = region.region_id ORDER BY winery_name;
An ORDER
BY
clause has been
added to sort the results by winery_name
but this
doesn’t affect the join. This query
produces—in part—the following sensible results:
+----------------------+-------------------+ | winery_name | region_name | +----------------------+-------------------+ | Anderson Creek Wines | Western Australia | | Anderson Group | New South Wales | | Beard | South Australia | | Beard and Sons | Western Australia | | Beard Brook | New South Wales |
Several features are shown in this first successful natural join:
The FROM
clause contains more than one table name.
In this example, SELECT
retrieves rows from the
tables winery and region.
Attributes in the WHERE
clause are specified using
both the table name and attribute name, separated by a period. This
usually disambiguates uses of the same attribute name in different
tables.
So, for example, region_id
in the
region table and region_id
in
the winery table are disambiguated as
region.region_id
and
winery.region_id
. This procedure can also be used
for clarity in queries, even if it isn’t required.
It can be used in all parts of the query, not just the
WHERE
clause.
The WHERE
clause includes a join clause that
matches rows between the multiple tables. In this example, the output
is reduced to those rows where wineries and regions have matching
region_id
attributes, resulting in a list of all
wineries and which region they are located in. This is the key to
joining two or more tables to produce sensible results.
The natural join can be used in many other examples in the winestore. Consider another example that finds all the wines made by all the wineries:
SELECT winery_name, wine_name, type FROM winery, wine WHERE wine.winery_id = winery.winery_id;
This query finds all wines made by wineries through a natural join of
the winery and wine tables
using the winery_id
attribute. The result is a
large table of the 1,028 wines stocked at the winestore, their types,
and the relevant wineries.
You can extend this query to produce a list of wines made by a specific winery or group of wineries. To find all wines made by wineries with a name beginning with Borg, use:
SELECT winery_name, wine_name, type FROM winery, wine WHERE wine.winery_id = winery.winery_id AND winery.winery_name LIKE 'Borg%';
This example extends the previous example by producing not all
natural join pairs of wines and wineries, but only those for the
winery or wineries beginning with Borg. The LIKE
clause is covered later, in Section 3.9.
Here are two more example join queries:
To find the name of the region Ryan Ridge Winery is situated in:
SELECT region.region_name FROM region,winery WHERE winery.region_id=region.region_id AND winery.winery_name='Ryan Ridge Winery';
To find which winery makes Curry Hill Red:
SELECT winery.winery_name FROM winery, wine WHERE wine.winery_id=winery.winery_id AND wine.wine_name='Curry Hill Red';
To save typing and add additional functionality, table aliases are sometimes used in queries. Consider an example that finds all inventory details of wine #183:
SELECT * FROM inventory i, wine w WHERE i.wine_id = 183 AND i.wine_id = w.wine_id;
In this query, the FROM
clause specifies aliases
for the table names. The alias inventory
i
means than the inventory
table can be referred to as i
elsewhere in the
query. For example, i.wine_id
is the same as
inventory.wine_id
. This saves typing in this
query.
Aliases are powerful for complex queries that need to use the same
table twice but in different ways. For example, to find any two
customers with the same surname
, you can write the
query:
SELECT c1.cust_id, c2.cust_id FROM customer c1, customer c2 WHERE c1.surname = c2.surname AND c1.cust_id != c2.cust_id;
The final clause, c1.cust_id!=c2.cust_id
, is
essential; without it, all customers are reported as answers. This
occurs because all customers are rows in tables c1 and c2 and, for
example, a customer with cust_id=1
in table c1
has—of course—the same surname
as the
customer with cust_id=1
in table c2.
The next join example uses the
DISTINCT
operator to find red wines that cost
less than $10. Wines can have more than one inventory row, and the
inventory rows for the same wine can have the same per-bottle cost.
The DISTINCT
operator shows each
wine_name
and cost
pair once by
removing any duplicates. To find which red wines cost less than $10,
use:
SELECT DISTINCT wine_name, cost FROM wine,inventory WHERE wine.wine_id=inventory.wine_id AND inventory.cost<10 AND UPPER(wine.type)='RED';
Here are two examples that use DISTINCT
to show
only one matching answer:
Queries can join more than two
tables. In the next example, the query finds all details of each item
from each order by a particular customer, customer #2. The example
also illustrates how frequently the Boolean operators
AND
and OR
are used:
SELECT * FROM customer, orders, items WHERE customer.cust_id = orders.cust_id AND orders.order_id = items.order_id AND orders.cust_id = items.cust_id AND customer.cust_id = 2;
In this query, the natural join is between three tables,
customer, orders, and
items, and the rows selected are those in which
the cust_id
is the same for all three tables, the
cust_id
is 2, and the order_id
is the same in the orders and
items tables.
If you remove the cust_id=2
clause, the query
outputs all items in all orders by all customers. This is a large
result set, but still a sensible one that is much smaller than the
cartesian product!
Here are two more examples that join three tables:
To find which wines are made in the Margaret River region:
SELECT wine_name FROM wine,winery,region WHERE wine.winery_id=winery.winery_id AND winery.region_id=region.region_id AND region.region_name='Margaret River';
To find which region contains the winery that makes the Red River Red wine:
SELECT region_name FROM wine,winery,region WHERE wine.winery_id=winery.winery_id AND winery.region_id=region.region_id AND wine.wine_name='Red River Red';
Extending to four or more tables generalizes the approach further. To find the details of customers who have purchased wines from Buonopane Wines, use:
SELECT DISTINCT customer.cust_id, customer.surname, customer.firstname FROM customer, winery, wine, items WHERE customer.cust_id=items.cust_id AND items.wine_id=wine.wine_id AND wine.winery_id=winery.winery_id AND winery.winery_name='Buonopane Wines' ORDER BY customer.surname, customer.firstname;
This last query is the most complex so far and contains a four-step
process. The easiest way to understand a query is usually to start
with the WHERE
clause and work toward the
SELECT
clause:
The WHERE
clause restricts the
winery rows to those that bear the name
Buonopane Wines.
The resultant winery rows—there is probably only one winery called Buonopane Wines—are joined with wine to find all wines made by Buonopane Wines.
The wines made by Buonopane Wines are joined with the items that have been purchased.
The purchases of Buonopane Wines are joined with the
customer rows of the customers who have
purchased the wine. You can leave out the orders
table, because the items table contains a
cust_id
for the join; if you need the order
number, the discount applied, or another orders
attribute, the orders table needs to be included
in the query.
The result is the details of customers who have purchased Buonopane
Wines. DISTINCT
is used to show each customer only
once. ORDER
BY
sorts the
customer rows into telephone directory order.
Designing a query like this is a step-by-step process. We began by
testing a query to find the winery_id
of wineries
with the name Buonopane Wines. Then, after testing the query and
checking the result, we progressively added additional tables to the
FROM
clause and join conditions. Finally, we added
the ORDER
BY
clause.
The next example uses three tables but queries the complex many-to-many relationship in the winestore that exists between the wines and grape_variety tables via the wine_variety table. As outlined in the system requirements in Chapter 1, a wine can have one or more grape varieties and these are listed in a specific order (e.g., Cabernet, then Sauvignon). From the other perspective, a grape variety such as Cabernet can be in hundreds of different wines. The relationship is managed by creating an intermediate table between grape_variety and wine called wine_variety.
Here is the example query that joins all three tables. To find what grape varieties are in wine #1004, use:
SELECT variety FROM grape_variety, wine_variety, wine WHERE wine.wine_id=wine_variety.wine_id AND wine_variety.variety_id=grape_variety.variety_id AND wine.wine_id=1004 ORDER BY wine_variety.id;
The result of the query is:
+-----------+ | variety | +-----------+ | Cabernet | | Sauvignon | +-----------+ 2 rows in set (0.00 sec)
The join condition is the same as any three-table query. The only
significant difference is the ORDER
BY
clause that presents the results in the same
order they were added to the wine_variety table
(assuming the first variety gets ID=1
, the second
ID=2
, and so on).
We’ve now covered as much complex querying in SQL as we need to in this chapter. If you’d like to learn more, see the pointers to resources included in Appendix E. SQL examples in web database applications can be found throughout Chapter 4 to Chapter 13.
[6] It isn’t quite true to say
that the joins described here are natural joins. A true natural join
doesn’t require you to specify the join condition,
because “natural” implies that the
system figures this out itself. So, a real natural join
doesn’t need the WHERE
clause;
one is automatically included “behind the
scenes.” The joins described throughout this chapter
are actually called inner joins, but the results
are identical to a those of a natural join.
18.118.200.154