Among the most powerful database queries are those phrased in the negative, such as “show me all the customers who have no made a purchase in the past year.” This type of query is particularly tricky because it is asking for data that are not in the database. (The bookstore has data about customers who have purchased, but not those who have not.) The only way to perform such a query is to request the DBMS to use the difference operation.
Traditional SQL Negative Queries
The traditional way to perform a query that requires a difference is to use subquery syntax with the NOT IN operator. To do so, the query takes the following general format:
WHERE column NOT IN (SELECT column
The outer query retrieves a list of all things of interest; the subquery retrieves those that meet the necessary criteria. The NOT IN operator then acts to include all those from the list of all things that are not in the set of values returned by the subquery.
As a first example, consider the query that retrieves all books that are not in stock (no rows exist in
volume):
WHERE book.work_numb = work.work_numb
AND isbn NOT IN (SELECT isbn
The outer query selects those rows in
books (the list of all things) whose ISBNs are not in
volume (the list of things that
are). The result in
Figure 6-2 contains the nine books that do not appear at least once in the
volume table.
As a second example, we will retrieve the titles of all books for which we don't have a new copy in stock, the result of which can be found in
Figure 6-3:
WHERE work.work_numb = book.work_numb
AND book.isbn NOT IN (SELECT isbn
WHERE condition_code = 1);
In this case, the subquery contains a restrict predicate in its WHERE clause, limiting the rows retrieved by the subquery to new volumes (those with a condition code value of 1). The outer query then copies a book to the result table if the ISBN is not in the result of the subquery.
Notice that in both of the sample queries there is no explicit syntax to make the two tables union compatible, something required by the relational algebra difference operation. However, the outer query's WHERE clause contains a predicate that compares a column taken from the result of the outer query with the same column taken from the result of the subquery. These two columns represent the union compatible tables.
As a final example, consider a query that retrieves the names of all customers who have not made a purchase after 1-Aug-2013. When you are putting together a query of this type, your first thought might be to write the query as follows:
SELECT first_name, last_name
WHERE sale_date < ‘1-Aug-2013’;
This query, however, won't work as you intend. First of all, the join eliminates all customers who have no purchases in the sale table, even though they should be included in the result. Second, the retrieval predicate identifies those customers who placed orders prior to 1-Aug-2013 but says nothing about who may or may not have made a purchase after that date. Customers may have made a purchase prior to 1-Aug-2013, on 1-Aug-2013, after 1-Aug-2013, or any combination of the preceding.
The typical way to perform this query correctly is to use a difference: the difference between all customers and those who
have made a purchase after 1-Aug-2013. The query—the result of which can be found in
Figure 6-4—appears as follows:
SELECT first_name, last_name
WHERE customer_numb NOT IN
WHERE sale_date >= ‘1-Aug-2013’)
Negative Queries using the EXCEPT Operator
The SQL-92 standard added an operator—EXCEPT—that performs a difference operation directly between two union compatible tables. Queries using EXCEPT look very much like a union:
SELECT first_name, last_name
SELECT first_name, last_name
WHERE customer.customer_numb = sale.customer_numb
AND sale_date >= ‘1-Aug-2013’;
or
EXCEPT CORRESPONDING BY (first_name, last_name) SELECT *
WHERE customer.customer_numb = sale.customer_numb
AND sale_date >= ‘1-Aug-2013’;
Using the first syntax you include two complete SELECT statements that are joined by the keyword EXCEPT. The SELECTs must return union compatible tables. The first SELECT retrieves a list of all things (in this example, all customers); the second retrieves the things that are (in this example, customers with sales after 1-Aug-2013). The EXCEPT operator then removes all rows from the first table that appear in the second.
The second syntax retrieves all columns from both sources but uses the CORRESPONDING BY clause to project the columns to make the two tables union compatible.