Whether you are querying, modifying, or deleting data, the WHERE clause is the mechanism for identifying what data you want to work with. This chapter explores the role of the WHERE clause in SQL statements, as well as the various options available when building a WHERE clause.
Before delving into the
WHERE clause,
let’s imagine life without it. Say that you are
interested in doing some maintenance on the data in the
part
table. To inspect the data in the table, you
issue the following query:
SELECT part_nbr, name, supplier_id, status, inventory_qty FROM part;
If the part
table contains 10,000 items, the
result set returned by the query would consist of 10,000 rows, each
with 5 columns. You would then load the 10,000 rows into memory and make
your modifications.
Once you have made the required modifications to your data in memory,
it is time to apply the changes to the part
table.
Without the ability to specify the rows to modify, you have no choice
but to delete all rows in the table and re-insert all 10,000 rows:
DELETE FROM part; INSERT INTO part (part_nbr, name, supplier_id, status, inventory_qty) VALUES ('XY5-1002', 'Wonder Widget', 1, 'IN-STOCK', 1); /* 9,999 more INSERTs on the wall, 9,999 more INSERTS . . . */
While this approach works in theory, it wreaks havoc on performance, concurrency (the ability for more than one user to modify data simultaneously), and scalability (the ability to perform predictably as load increases).
Now imagine that you want to modify data in the
part
table only for those parts supplied by Acme
Industries. Since the supplier’s name is stored in
the supplier
table, you must include both the
part
and supplier
tables in the
FROM clause:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty, s.supplier_id, s.name FROM part p, supplier s;
If 100 companies supply the 10,000 parts in the part table, this
query will return 1,000,000 rows. Known as the
Cartesian
product, this number equates to every possible combination
of all rows from the two tables. As you sift through the million
rows, you would keep only those where the values of
p.supplier_id
and s.supplier_id
are identical and where the s.name
column matches
'Acme Industries
‘. If Acme Industries supplies
only 50 of the 10,000 parts in your database, you will end up
discarding 999,950 of the 1,000,000 rows returned by your query.
Hopefully, the scenarios in the previous section give you some insight into the utility of the WHERE clause, including the ability to:
Filter out unwanted data from a query’s result set.
Isolate one or more rows of a table for modification.
Conditionally join two or more data sets together.
To see how these things are accomplished, let’s add a WHERE clause to the previous SELECT statement, which strives to locate all parts supplied by Acme Industries. Here’s the query with the new WHERE clause:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty, s.supplier_id, s.name FROM part p, supplier s WHERE s.supplier_id = p.supplier_id AND s.name = 'Acme Industries';
The WHERE clause here is comprised of two parts, known as conditions, which are evaluated separately. Conditions always evaluate to either TRUE or FALSE; if there are multiple conditions in a WHERE clause, they all must evaluate to TRUE for a given row to be included in the result set. Actually, that’s a bit of an oversimplification. As you will see later, using the OR and NOT operators allows the WHERE clause to evaluate to TRUE even if individual conditions evaluate to FALSE.
For this example, a row created by combining data from the
part
and supplier
tables will
only be included in the final result set if both tables share a
common value for the supplier_id
column, and if
the value of the name column in the supplier
table
matches 'Acme Industries
‘. Any other permutation
of data from the two tables would evaluate to FALSE and be discarded.
For this chapter only, we’ll use the older style of join syntax in which you specify join conditions in the WHERE clause. We do this to explore the full functionality of the WHERE clause.
With the addition of the WHERE clause to the previous example,
therefore, Oracle will take on the work of discarding undesired rows
from the result set, and only 50 rows would be returned by the query,
rather than 1,000,000. Now that you have retrieved the 50 rows of
interest from the database, you can begin the process of modifying
the data. Keep in mind, however, that with the WHERE clause at your
disposal you will no longer need to delete and re-insert your
modified data; instead, you can use the UPDATE statement to
modify specific
rows based on the
part_nbr
column, which is the unique identifier
for the table:
UPDATE part SET status = 'DISCONTINUED' WHERE part_nbr = 'AI5-4557';
While this is certainly an improvement, you can do even better. If your intent is to modify the status for all 50 parts supplied by Acme Industries, there is no need to execute a separate query at all. Simply execute a single UPDATE statement that finds and modifies all 50 records:
UPDATE part SET status = 'DISCONTINUED' WHERE supplier_id = (SELECT supplier_id FROM supplier WHERE name = 'Acme Industries'),
The WHERE clause in this statement consists of a single condition
that equates the supplier_id
column to the value
returned by the subquery against the supplier
table. Subqueries are covered extensively in Chapter 5, so don’t worry if this
looks a bit intimidating. The net result is that the condition will
be rewritten to use the value returned by the subquery, as in:
UPDATE part SET status = 'DISCONTINUED' WHERE supplier_id = 1;
When executed, the condition evaluates to TRUE for exactly 50 of the
10,000 rows in the part
table, and the status of
those 50 rows changes to DISCONTINUED.
Now that you have seen the WHERE clause in action, let’s take a look at how it is evaluated. As previously mentioned, the WHERE clause consists of one or more conditions that evaluate independently to TRUE or FALSE. If your WHERE clause consists of multiple conditions, the conditions are separated by the logical operators AND and OR. Depending on the outcome of the individual conditions and the placement of these logical operators, Oracle will assign a final value of TRUE or FALSE to each candidate row, thereby determining whether a row will be included in the final result set.
Here’s another look at the Acme Industries query:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty, s.supplier_id, s.name FROM part p, supplier s WHERE s.supplier_id = p.supplier_id AND s.name = 'Acme Industries';
The WHERE clause consists of two conditions separated by AND. Thus, a row will only be included if both conditions evaluate to TRUE. Table 2-1 shows the possible scenarios when conditions are replaced by their possible outcomes.
Using basic logic rules, you can see that the only combination of outcomes that results in a final value of TRUE being assigned to a candidate row is where both conditions evaluate to TRUE. Table 2-2 demonstrates the possible outcomes if the conditions had been separated by OR rather than AND.
Intermediate result |
Final result |
WHERE TRUE OR TRUE |
TRUE |
WHERE FALSE OR FALSE |
FALSE |
WHERE FALSE OR TRUE |
TRUE |
WHERE TRUE OR FALSE |
TRUE |
Next, let’s spice the query up a bit by including parts supplied by either Acme Industries or Tilton Enterprises:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty, s.supplier_id, s.name FROM part p, supplier s WHERE s.supplier_id = p.supplier_id AND (s.name = 'Acme Industries' OR s.name = 'Tilton Enterprises'),
There are now three separate conditions separated by AND and OR with parentheses surrounding two of the conditions. Table 2-3 illustrates the possible outcomes.
Since a particular part cannot be supplied by both Acme Industries and Tilton Enterprises, the intermediate results TRUE AND (TRUE AND TRUE) and FALSE AND (TRUE AND TRUE) were not included in Table 2-3.
To liven things up even more, here’s an example using the NOT operator. The following query returns data for parts supplied by anyone other than Acme Industries or Tilton Enterprises:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty,
s.supplier_id, s.name
FROM part p, supplier s
WHERE s.supplier_id = p.supplier_id
AND NOT (s.name = 'Acme Industries'
OR s.name = 'Tilton Enterprises'),
Table 2-4 demonstrates how the addition of the NOT operator changes the outcome.
Intermediate result |
Final result |
WHERE TRUE AND NOT (TRUE OR FALSE) |
FALSE |
WHERE TRUE AND NOT (FALSE OR TRUE) |
FALSE |
WHERE TRUE AND NOT (FALSE OR FALSE) |
TRUE |
WHERE FALSE AND NOT (TRUE OR FALSE) |
FALSE |
WHERE FALSE AND NOT (FALSE OR TRUE) |
FALSE |
WHERE FALSE AND NOT (FALSE OR FALSE) |
FALSE |
The use of the NOT operator in the previous example is a bit forced; later examples will demonstrate more natural ways of expressing the same logic.
Now that you understand how conditions are grouped together and evaluated, it’s time to take a look at the different elements that make up a condition. A condition is comprised of one or more expressions along with one or more operators . Examples of expressions include:
Numbers
Columns, such as s.supplier_id
Literals, such as `Acme Industries’
Functions, such as UPPER(`abcd')
Lists of simple expressions, such as (1, 2, 3)
Subqueries
Examples of operators include:
Arithmetic operators, such as +, -, *, and /
Comparison operators, such as =, <, >=, !=, LIKE, and IN
The following sections explore many of the common condition types that use different combinations of the preceeding expression and operator types.
Most of the conditions found in a WHERE clause will be equality conditions used to join data sets together or to isolate specific values. You have already encountered these types of conditions numerous times in previous examples, including:
s.supplier_id = p.supplier_id s.name = 'Acme Industries' supplier_id = (SELECT supplier_id FROM supplier WHERE name = 'Acme Industries')
All three conditions are comprised of a column expression followed by a comparison operator (=) followed by another expression. The conditions differ in the type of expression on the right side of the comparison operator. The first example compares one column to another, the second example compares a column to a literal, and the third example compares a column to the value returned by a subquery.
You can also build conditions that use the inequality comparison operator (!=). In a previous example, the NOT operator was used to find information about parts supplied by every supplier other than Acme Industries and Tilton Enterprises. Using the != operator rather than using NOT makes the query easier to understand and removes the need for the OR operator:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty, s.supplier_id, s.name FROM part p, supplier s WHERE s.supplier_id = p.supplier_id AND s.name != 'Acme Industries' AND s.name != 'Tilton Enterprises';
While this is an improvement over the previous version, the next section shows an even cleaner way to represent the same logic.
Along with determining whether two expressions are identical, it is often useful to determine whether one expression can be found within a set of expressions. Using the IN operator, you can build conditions that will evaluate to TRUE if a given expression exists in a set of expressions:
s.name IN ('Acme Industries', 'Tilton Enterprises')
You may also use the NOT IN operator to determine whether an expression does not exist in a set of expressions:
s.name NOT IN ('Acme Industries', 'Tilton Enterprises')
Most people prefer to use a single condition with IN or NOT IN instead of writing multiple conditions using = or !=, so, with that in mind, here’s one last stab at the Acme/Tilton query:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty, s.supplier_id, s.name FROM part p, supplier s WHERE s.supplier_id = p.supplier_id AND s.name NOT IN ('Acme Industries', 'Tilton Enterprises'),
Along with prefabricated sets of expressions, subqueries may be employed to generate sets on the fly. If a subquery returns exactly one row, you may use a comparison operator; if a subquery returns more than one row, or if you’re not sure whether the subquery might return more than one row, use the IN operator. The following example updates all orders that contain parts supplied by Eastern Importers:
UPDATE cust_order SET sale_price = sale_price * 1.1 WHERE cancelled_dt IS NULL AND ship_dt IS NULL AND order_nbr IN (SELECT li.order_nbr FROM line_item li, part p, supplier s WHERE s.name = 'Eastern Importers' AND s.supplier_id = p.supplier_id AND p.part_nbr = li.part_nbr);
The subquery evaluates to a (potentially empty) set of order numbers. All orders whose order number exists in that set are then modified by the UPDATE statement.
If you are dealing with dates or numeric data, you may be interested in whether a value falls within a specified range rather than whether it matches a specific value or exists in a finite set. For such cases, you may use the BETWEEN operator, as in:
DELETE FROM cust_order WHERE order_dt BETWEEN '01-JUL-2001' AND '31-JUL-2001';
To determine whether a value lies outside a specific range, you can use the NOT BETWEEN operator:
SELECT order_nbr, cust_nbr, sale_price FROM cust_order WHERE sale_price NOT BETWEEN 1000 AND 10000;
When using BETWEEN, make sure the first value is the lesser of the two values provided. While “BETWEEN 01-JUL-2001 AND 31-JUL-2001” and “BETWEEN 31-JUL-2001 AND 01-JUL-2001” might seem logically equivalent, specifying the higher value first guarantees that your condition will always evaluate to FALSE. Keep in mind that X BETWEEN Y AND Z is evaluated as X >= Y AND X <= Z.
Ranges may also be specified using the operators <, >, <=, and >=, although doing so requires writing two conditions rather than one. The previous query can also be expressed as:
SELECT order_nbr, cust_nbr, sale_price FROM cust_order WHERE sale_price < 1000 OR sale_price > 10000;
When dealing with character data, there are some situations where you are looking for an exact string match, and others where a partial match is sufficient. For the latter case, you can use the LIKE operator along with one or more pattern-matching characters, as in:
DELETE FROM part WHERE part_nbr LIKE 'ABC%';
The
pattern-matching
character
%
matches strings of any length, so all of the following part numbers
would be deleted: 'ABC
',
'ABC-123
', 'ABC9999999
‘. If you
need finer control, you can use the underscore
(_
)
pattern-matching character to match single characters, as in:
DELETE FROM part WHERE part_nbr LIKE '_B_';
For this pattern, any part number composed of exactly three characters with a B in the middle would be deleted. Both pattern-matching characters may be utilized in numerous combinations to find the desired data. Additionally, the NOT LIKE operator may be employed to find strings that don’t match a specified pattern. The following example deletes all parts whose name does not contain a Z in the third position followed later by the string “T1J”:
DELETE FROM part WHERE part_nbr NOT LIKE '_ _Z%T1J%';
Oracle provides a slew of built-in
functions
for handling character data that can be used to build matching
conditions. For example, the condition part_nbr LIKE
'ABC%
' could be rewritten using the SUBSTR function as
SUBSTR(part_nbr, 1, 3) = 'ABC
‘. For definitions
and examples for all of Oracle’s built-in functions,
see Oracle in a Nutshell
(O’Reilly).
You may come across data that include the characters
%
and _
and need to include
them in your patterns. For example, you might have a column called
instructions
in the cust_order
table that may have a value such as:
Cancel order if more than 25% of parts are unavailable
If you want to find strings containing the %
character, you will need to escape the
%
character within your pattern so that it
isn’t treated as a wildcard. To do so, you will need
to use the ESCAPE clause to let Oracle know which character you have
chosen as the escape character:
SELECT instructions FROM cust_order WHERE instructions LIKE '%\%%' ESCAPE '';
This query would return all rows where the instructions column
contains the %
character anywhere in the string.
Beginning with the Oracle Database 10g release, you can use regular expressions within your conditions. Regular expressions allow for much more complex pattern matching without the need for multiple conditions. For example, if you wanted to find all customers whose name begins with W, ends in “ies” and does not include L anywhere in the string, you could use multiple conditions with the LIKE and NOT LIKE operators:
SELECT name FROM customer WHERE name LIKE 'W%ies' AND name NOT LIKE '%L%'; NAME ------------------------------ Worcester Technologies Wallace Industries
You can achieve the same result more succinctly, in a single expression, with the new REGEXP_LIKE function:
SELECT name FROM customer WHERE REGEXP_LIKE(name, '^W([^L]*)ies$'), NAME ------------------------------ Worcester Technologies Wallace Industries
If that second argument to REGEXP_LIKE looks like gibberish, fear not: we cover regular expressions in detail in Chapter 17.
The NULL
expression represents the absence
of a value. If, when entering an order into the database, you are
uncertain when the order will be shipped, it is better to leave the
ship date undefined than to fabricate a value. Until the ship date
has been determined, therefore, it is best to leave the
ship_dt
column NULL. NULL is also useful for cases
where data is not applicable. For example, a cancelled
order’s shipping date is no longer applicable and
should be set to NULL.
When working with NULL, the concept of equality does not apply; a column may be NULL, but it will never equal NULL. Therefore, you will need to use the special operator IS NULL when looking for NULL data, as in:
UPDATE cust_order SET expected_ship_dt = SYSDATE + 1 WHERE ship_dt IS NULL;
In this example, all orders whose shipping date hasn’t been specified will have their expected shipping date set to tomorrow.
You may also use the IS NOT NULL operator to locate non-NULL data:
UPDATE cust_order SET expected_ship_dt = NULL WHERE ship_dt IS NOT NULL;
This example sets the expected shipping date to NULL for all orders
that have already shipped. Notice that the SET clause uses the
equality operator (=) with NULL, whereas the WHERE clause uses the IS
NOT NULL operator. The equality operator is used to set a column to
NULL, whereas the IS NOT NULL operator is used to evaluate whether a
column is NULL. A great many mistakes might have been avoided had the
designers of SQL chosen a special operator to be utilized when
setting a column to NULL (i.e., SET expected_ship_dt TO
NULL
), but this is not the case. To make matters worse,
Oracle doesn’t complain if you mistakenly use the
equality operator when evaluating for NULL. The following query will
parse and execute but will never return rows:
SELECT order_nbr, cust_nbr, sale_price, order_dt
FROM cust_order
WHERE ship_dt = NULL;
Hopefully, you would quickly recognize that the previous query never returns data and replace the equality operator with IS NULL. However, there is a more subtle mistake involving NULL that is harder to spot. Say you are looking for all employees who are not managed by Marion Blake, whose employee ID is 7698. Your first instinct may be to run the following query:
SELECT fname, lname, manager_emp_id
FROM employee
WHERE manager_emp_id != 7698;
FNAME LNAME MANAGER_EMP_ID -------------------- -------------------- -------------- JOHN SMITH 7902 TERRY JONES 7839 MARION BLAKE 7839 CAROL CLARK 7839 DONALD SCOTT 7566 DIANE ADAMS 7788 JENNIFER FORD 7566 BARBARA MILLER 7782
While this query returns rows, it leaves out those employees who are top-level managers and, thus, are not managed by anyone. Since NULL is neither equal nor not equal to 7698, this set of employees is absent from the result set. To ensure that all employees are considered, you will need to explicitly handle NULL, as in:
SELECT fname, lname, manager_emp_id
FROM employee
WHERE manager_emp_id IS NULL OR manager_emp_id != 7698;
FNAME LNAME MANAGER_EMP_ID -------------------- -------------------- -------------- JOHN SMITH 7902 TERRY JONES 7839 MARION BLAKE 7839 CAROL CLARK 7839 DONALD SCOTT 7566 FRANCIS KING DIANE ADAMS 7788 JENNIFER FORD 7566 BARBARA MILLER 7782
Including two conditions for every nullable column in your WHERE clause can get a bit tiresome. Instead, you can use Oracle’s built-in function NVL, which substitutes a specified value for columns that are NULL, as in:
SELECT fname, lname, manager_emp_id
FROM employee
WHERE NVL(manager_emp_id, -999) != 7698;
FNAME LNAME MANAGER_EMP_ID -------------------- -------------------- -------------- JOHN SMITH 7902 TERRY JONES 7839 MARION BLAKE 7839 CAROL CLARK 7839 DONALD SCOTT 7566 FRANCIS KING DIANE ADAMS 7788 JENNIFER FORD 7566 BARBARA MILLER 7782
In this example, the value -999
is substituted for
all NULL values, which, since -999
is never equal
to 7698, guarantees that all rows whose
manager_emp_id
column is NULL will be included in
the result set. Thus, all employees whose
manager_emp_id
column is NULL or is
not NULL and has a value other than 7698 will be
retrieved by the
query.
Throughout this chapter, all examples that join multiple tables have had their join conditions included in the WHERE clause along with various filter conditions. Beginning with the Oracle9i release, you have the option of using the ANSI join syntax, which specifies that all join conditions be included in the FROM clause, as illustrated by the following:
SELECT p.part_nbr, p.name, p.supplier_id, p.status, p.inventory_qty, s.supplier_id, s.name FROM part p INNER JOIN supplier s ON s.supplier_id = p.supplier_id WHERE s.name NOT IN ('Acme Industries', 'Tilton Enterprises'),
As you can see, the join condition s.supplier_id =
p.supplier_id
has been moved to the ON subclause, and the
FROM clause specifies that the part
and
supplier
tables be joined via an inner join. This
syntax may look a bit strange at first, but it greatly improves the
readability and maintainability of your queries. Therefore, for the
remainder of this book, all examples will employ the ANSI join
syntax.
This chapter has introduced the role of the WHERE clause in different types of SQL statements as well as the various components used to build a WHERE clause. Because the WHERE clause plays such an important role in many SQL statements, however, the topic is far from exhausted. Additional coverage of WHERE clause topics may be found in:
Chapter 3, in which various flavors of join conditions are studied in detail
Chapter 5, which probes the different types of subqueries along with the appropriate operators for evaluating their results
Chapter 6, in which various methods of handling date/time data are explored
Chapter 15, which explores certainaspects of the WHERE clause from the standpoint of performance andefficiency
Additionally, here are a few tips to help you make the most of your WHERE clauses:
Check your join conditions carefully. Make sure that each data set in the FROM clause is properly joined. Keep in mind that some joins require multiple conditions. See Chapter 3 for more information.
Avoid unnecessary joins. Just because two data sets in your FROM clause contain the same column does not necessitate a join condition be added to your FROM/WHERE clause. In some designs, redundant data has been propagated to multiple tables through a process called denormalization. Take the time to understand the database design, and ask your DBA or database designer for a current data model.
Use parentheses.
Oracle
maintains
both operator precedence and condition precedence, meaning there are
clearly defined rules for the order in which things will be
evaluated, but the safest route for you and for those who will later
maintain your code is to dictate evaluation order using parentheses.
For operators, specifying (5 * p.inventory_qty) +
2
rather than 5 * p.inventory_qty + 2
makes the order in which the operations should be performed clear.
For conditions, use parentheses any time the OR operator is employed.
Use consistent indentation. For example, if the previous line contains a left parenthesis without a matching right parenthesis, indent the current line to show that it is a continuation of the previous line.
Handle NULLs properly. After writing your WHERE clause, inspect each condition with respect to its ability to properly handle NULL values. Take the time to understand the table definitions in your database so that you know which columns allow NULLs.
Pick up introductory books on logic and set theory at your local library. While understanding these two topics won’t necessarily get you invited to more cocktail parties, it will certainly make you a better SQL programmer.
18.119.163.238