Chapter 12

Delving Deep with Nested Queries

IN THIS CHAPTER

Bullet Pulling data from multiple tables with a single SQL statement

Bullet Comparing a value from one table with a set of values from another table

Bullet Using the SELECT statement to compare a value from one table with a single value from another table

Bullet Comparing a value from one table with all the corresponding values in another table

Bullet Making queries that correlate rows in one table with corresponding rows in another table

Bullet Determining which rows to update, delete, or insert by using a subquery

One of the best ways to protect your data’s integrity is to avoid modification anomalies (see Chapter 5 for the gory details of those) by normalizing your database. Normalization involves breaking up a single table into multiple tables, each of which has a single theme. You don’t want product information in the same table with customer information, for example, even if the customers have bought products.

If you normalize a database properly, the data is scattered across multiple tables. Most queries that you want to make need to pull data from two or more tables. One way to do this is to use a join operator or one of the other relational operators (UNION, INTERSECT, or EXCEPT). The relational operators take information from multiple tables and combine it all into a single result set. Different operators combine the data in different ways.

Remember Another way to pull data from two or more tables is to use a nested query. In SQL, a nested query is one in which an outer enclosing statement contains within it a subquery. That subquery may serve as an enclosing statement for a lower-level subquery that is nested within it. There are no theoretical limits to the number of nesting levels a nested query may have, but you do face some practical limits that depend on your SQL implementation.

Subqueries are invariably SELECT statements, but the outermost enclosing statement may also be an INSERT, UPDATE, or DELETE statement.

Tip A subquery can operate on a table other than the table that its enclosing statement operates on, so nested queries give you another way to extract information from multiple tables.

For example, suppose that you want to query your corporate database to find all department managers who are more than 50 years old. With the joins I discuss in Chapter 11, you can use a query like this:

SELECT D.Deptno, D.Name, E.Name, E.Age

FROM DEPT D, EMPLOYEE E

WHERE D.ManagerID = E.ID AND E.Age >50 ;

D is the alias for the DEPT table, and E is the alias for the EMPLOYEE table. The EMPLOYEE table has an ID column that is the primary key, and the DEPT table has a ManagerID column that is the ID value of the employee who is the department's manager. A simple join (the list of tables in the FROM clause) pairs the related tables, and a WHERE clause filters out all rows except those that meet the criteria. Note that the SELECT statement's parameter list includes the Deptno and Name columns from the DEPT table and the Name and Age columns from the EMPLOYEE table.

Next, suppose that you're interested in the same set of rows but you want only the columns from the DEPT table. In other words, you’re interested in the departments whose managers are 50 or older, but you don’t care who those managers are or exactly how old they are. You could then write the query with a subquery rather than a join:

SELECT D.Deptno, D.Name

FROM DEPT D

WHERE EXISTS (SELECT * FROM EMPLOYEE E

WHERE E.ID = D.ManagerID AND E.Age > 50) ;

This query has two new elements: the EXISTS keyword and the SELECT * in the WHERE clause of the inner SELECT. The inner SELECT is a subquery (or subselect), and the EXISTS keyword is one of several tools for use with a subquery that is described in this chapter.

What Subqueries Do

Subqueries are located within the WHERE clause of their enclosing statement. Their function is to set the search conditions for the WHERE clause. Each kind of subquery produces a different result. Some subqueries produce a list of values that is then used as input by the enclosing statement. Other subqueries produce a single value that the enclosing statement then evaluates with a comparison operator. A third kind of subquery returns a value of True or False.

Nested queries that return sets of rows

To illustrate how a nested query returns a set of rows, imagine that you work for a systems integrator of computer equipment. Your company, Zetec Corporation, assembles systems from components that you buy, and then it sells them to companies and government agencies. You keep track of your business with a relational database. The database consists of many tables, but right now you're concerned with only three of them: the PRODUCT table, the COMP_USED table, and the COMPONENT table. The PRODUCT table (shown in Table 12-1) contains a list of all your standard products. The COMPONENT table (shown in Table 12-2) lists components that go into your products, and the COMP_USED table (shown in Table 12-3) tracks which components go into each product.

TABLE 12-1 PRODUCT Table

Column

Type

Constraints

Model

CHAR (6)

PRIMARY KEY

ProdName

CHAR (35)

ProdDesc

CHAR (31)

ListPrice

NUMERIC (9,2)

TABLE 12-2 COMPONENT Table

Column

Type

Constraints

CompID

CHAR (6)

PRIMARY KEY

CompType

CHAR (10)

CompDesc

CHAR (31)

TABLE 12-3 COMP_USED Table

Column

Type

Constraints

Model

CHAR (6)

FOREIGN KEY (for PRODUCT)

CompID

CHAR (6)

FOREIGN KEY (for COMPONENT)

A component may be used in multiple products, and a product can contain multiple components (a many-to-many relationship). This situation can cause integrity problems. To circumvent the problems, create the linking table COMP_ USED to relate COMPONENT to PRODUCT. A component may appear in many rows in the COMP_USED table, but each of those rows will reference only one component (a one-to-many relationship). Similarly, a product may appear in many rows in COMP_USED, but each row references only one product (another one-to-many relationship). Adding the linking table transforms a troublesome many-to-many relationship into two relatively simple one-to-many relationships. This process of reducing the complexity of relationships is one example of normalization.

Subqueries introduced by the keyword IN

One form of a nested query compares a single value with the set of values returned by a SELECT statement. It uses the IN predicate with the following syntax:

SELECT column_list

FROM table

WHERE expression IN (subquery) ;

The expression in the WHERE clause evaluates to a value. If that value is IN the list returned by the subquery, then the WHERE clause returns a True value. The specified columns from the table row being processed are added to the result table. The subquery may reference the same table referenced by the outer query, or it may reference a different table.

In the following example, I use Zetec's database to demonstrate this type of query. Assume that there’s a shortage of computer monitors in the computer industry, so that when you run out of monitors, you can no longer deliver products that include them. You want to know which products are affected. Glancing gratefully at your own monitor, enter the following query:

SELECT Model

FROM COMP_USED

WHERE CompID IN

(SELECT CompID

FROM COMPONENT

WHERE CompType = 'Monitor') ;

SQL processes the innermost query first, so it processes the COMPONENT table, returning the value of CompID for every row where CompType is 'Monitor'. The result is a list of the ID numbers of all monitors. The outer query then compares the value of CompID in every row in the COMP_USED table against the list. If the comparison is successful, the value of the Model column for that row is added to the outer SELECT's result table. The result is a list of all product models that include a monitor. The following example shows what happens when you run the query:

Model

-----

CX3000

CX3010

CX3020

MB3030

MX3020

MX3030

You now know which products will soon be out of stock. It’s time to go to the sales force and tell them to slow down on promoting these products.

When you use this form of nested query, the subquery must specify a single column, and that column’s data type must match the data type of the argument preceding the IN keyword.

Tip I’m sure you remember the KISS principle. Keeping things simple is important when you’re dealing with software of any kind, but it is especially important when dealing with database software. Statements that include nested SELECTs can be difficult to get right. One way to get them working the way they should is to run the inner SELECT all by itself first and then verify that the result you get is the result you expect. When you're sure the inner SELECT is functioning properly, you can enclose it in the outer part of the statement and have a better chance that the whole thing will work as advertised.

Subqueries introduced by the keyword NOT IN

Just as you can introduce a subquery with the IN keyword, you can do the opposite and introduce it with the NOT IN keywords. In fact, now is a great time for Zetec management to make such a query. By using the query in the preceding section, Zetec management found out what products not to sell. That is valuable information, but it doesn't pay the rent. What Zetec management really wants to know is what products to sell. Management wants to emphasize the sale of products that don’t contain monitors. A nested query featuring a subquery introduced by the NOT IN keywords provides the requested information:

SELECT Model

FROM COMP_USED

WHERE CompID NOT IN

(SELECT CompID

FROM COMPONENT

WHERE CompType = 'Monitor')) ;

This query produces the following result:

Model

-----

PX3040

PB3050

PX3040

PB3050

Remember Worth noting is the fact that the result set contains duplicates. The duplication occurs because a product containing several components that are not monitors has a row in the COMP_USED table for each component. The query creates an entry in the result table for each of those rows.

In the example, the number of rows does not create a problem because the result table is short. In the real world, however, such a result table may have hundreds or thousands of rows. To avoid confusion, it’s best to eliminate the duplicates. You can do so easily by adding the DISTINCT keyword to the query. Only rows that are distinct (different) from all previously retrieved rows are added to the result table:

SELECT DISTINCT Model

FROM COMP_USED

WHERE CompID NOT IN

(SELECT CompID

FROM COMPONENT

WHERE CompType = 'Monitor')) ;

As expected, the result is as follows:

Model

-----

PX3040

PB3050

Nested queries that return a single value

Introducing a subquery with one of the six comparison operators (=, <>, <,<=, >, >=) is often useful. In such a case, the expression preceding the operator evaluates to a single value, and the subquery following the operator must also evaluate to a single value. An exception is the case of the quantified comparison operator, which is a comparison operator followed by a quantifier (ANY, SOME, or ALL).

To illustrate a case in which a subquery returns a single value, look at another piece of Zetec Corporation's database. It contains a CUSTOMER table that holds information about the companies that buy Zetec products. It also contains a CONTACT table that holds personal data about individuals at each of Zetec’s customer organizations. The tables are structured as shown in Tables 12-4 and 12-5.

TABLE 12-4 CUSTOMER Table

Column

Type

Constraints

CustID

INTEGER

PRIMARY KEY

Company

CHAR (40)

UNIQUE

CustAddress

CHAR (30)

CustCity

CHAR (20)

CustState

CHAR (2)

CustZip

CHAR (10)

CustPhone

CHAR (12)

ModLevel

INTEGER

TABLE 12-5 CONTACT Table

Column

Type

Constraints

CustID

INTEGER

PRIMARY KEY

ContFName

CHAR (10)

ContLName

CHAR (16)

ContPhone

CHAR (12)

ContInfo

CHAR (50)

Say that you want to look at the contact information for Olympic Sales, but you don't remember that company’s CustID. Use a nested query like this one to recover the information you want:

SELECT *

FROM CONTACT

WHERE CustID =

(SELECT CustID

FROM CUSTOMER

WHERE Company = 'Olympic Sales') ;

The result looks something like this:

CustID ContFName ContLName ContPhone ContInfo

------ --------- --------- --------- --------

118 Jerry Attwater 505-876-3456 Will play

major role in

additive

manufacturing.

You can now call Jerry at Olympic and tell him about this month’s special sale on 3D printers.

When you use a subquery in an “=” comparison, the subquery's SELECT list must specify a single column (CustID in the example). When the subquery is executed, it must return a single row in order to have a single value for the comparison.

In this example, I assume that the CUSTOMER table has only one row with a Company value of 'Olympic Sales'. The CREATE TABLE statement for CUSTOMER specifies a UNIQUE constraint for Company, and this statement guarantees that the subquery in the preceding example returns a single value (or no value). Subqueries like the one in this example, however, are commonly used on columns that are not specified to be UNIQUE. In such cases, you must rely on prior knowledge of the database contents for believing that the column has no duplicates.

If more than one customer has a value of 'Olympic Sales' in the Company column (perhaps in different states), the subquery raises an error.

If no customer with such a company name exists, the subquery is treated as if it was null, and the comparison becomes unknown. In this case, the WHERE clause returns no row (because it returns only rows with the condition True and filters rows with the condition False or unknown). This would probably happen, for example, if someone misspelled the Company as 'Olumpic Sales'.

Although the equal operator (=) is the most common, you can use any of the other five comparison operators in a similar structure. For every row in the table specified in the enclosing statement's FROM clause, the single value returned by the subquery is compared with the expression in the enclosing statement’s WHERE clause. If the comparison gives a True value, a row is added to the result table.

You can guarantee that a subquery will return a single value if you include an aggregate function in it. Aggregate functions always return a single value. (Aggregate functions are described in Chapter 3.) Of course, this way of returning a single value is helpful only if you want the result of an aggregate function.

Suppose you are a Zetec salesperson and you need to earn a big commission check to pay for some unexpected bills. You decide to concentrate on selling Zetec's most expensive product. You can find out what that product is with a nested query:

SELECT Model, ProdName, ListPrice

FROM PRODUCT

WHERE ListPrice =

(SELECT MAX(ListPrice)

FROM PRODUCT) ;

In the preceding nested query, both the subquery and the enclosing statement operate on the same table. The subquery returns a single value: the maximum list price in the PRODUCT table. The outer query retrieves all rows from the PRODUCT table that have that list price.

The next example shows a comparison subquery that uses a comparison operator other than =:

SELECT Model, ProdName, ListPrice

FROM PRODUCT

WHERE ListPrice <

(SELECT AVG(ListPrice)

FROM PRODUCT) ;

The subquery returns a single value: the average list price in the PRODUCT table. The outer query retrieves all rows from the PRODUCT table that have a lower list price than the average list price.

Technicalstuff In the original SQL standard, a comparison could have only one subquery, and it had to be on the right side of the comparison. SQL:1999 allowed either or both operands of the comparison to be subqueries, and later versions of SQL retain that expansion of capability.

The ALL, SOME, and ANY quantifiers

Another way to make sure that a subquery returns a single value is to introduce it with a quantified comparison operator. The universal quantifier ALL, and the existential quantifiers SOME and ANY, when combined with a comparison operator, process the list returned by a subquery, reducing it to a single value.

You'll see how these quantifiers affect a comparison by looking at the baseball pitchers’ complete game database from Chapter 11, which is listed next.

The contents of the two tables are given by the following two queries:

SELECT * FROM NATIONAL

FirstName LastName CompleteGames

--------- -------- -------------

Sal Maglie 11

Don Newcombe 9

Sandy Koufax 13

Don Drysdale 12

Bob Turley 8

SELECT * FROM AMERICAN

FirstName LastName CompleteGames

--------- -------- -------------

Whitey Ford 12

Don Larson 10

Bob Turley 8

Allie Reynolds 14

The presumption is that the pitchers with the most complete games should be in the American League because of the presence of designated hitters in that league. One way to verify this presumption is to build a query that returns all American League pitchers who have thrown more complete games than all the National League pitchers. The query can be formulated as follows:

SELECT *

FROM AMERICAN

WHERE CompleteGames > ALL

(SELECT CompleteGames FROM NATIONAL) ;

This is the result:

FirstName LastName CompleteGames

---------- --------- --------------

Allie Reynolds 14

The subquery (SELECT CompleteGames FROM NATIONAL) returns the values in the CompleteGames column for all National League pitchers. The > ALL quantifier says to return only those values of CompleteGames in the AMERICAN table that are greater than each of the values returned by the subquery. This condition translates into “greater than the highest value returned by the subquery.” In this case, the highest value returned by the subquery is 13 (Sandy Koufax). The only row in the AMERICAN table higher than that is Allie Reynolds's record, with 14 complete games.

What if your initial presumption was wrong? What if the major-league leader in complete games was a National League pitcher, in spite of the fact that the National League has no designated hitter? If that was the case, the query

SELECT *

FROM AMERICAN

WHERE CompleteGames > ALL

(SELECT CompleteGames FROM NATIONAL) ;

would return a warning that no rows satisfy the query’s conditions — meaning that no American League pitcher has thrown more complete games than the pitcher who has thrown the most complete games in the National League.

Nested queries that are an existence test

A query returns data from all table rows that satisfy the query’s conditions. Sometimes many rows are returned; sometimes only one comes back. Sometimes none of the rows in the table satisfy the conditions, and no rows are returned. You can use the EXISTS and NOT EXISTS predicates to introduce a subquery. That structure tells you whether any rows in the table located in the subquery's FROM clause meet the conditions in its WHERE clause.

Remember Subqueries introduced with EXISTS and NOT EXISTS are fundamentally different from the other subqueries in this chapter so far. In all the previous cases, SQL first executes the subquery and then applies that operation's result to the enclosing statement. EXISTS and NOT EXISTS subqueries, on the other hand, are examples of correlated subqueries.

A correlated subquery first finds the table and row specified by the enclosing statement and then executes the subquery on the row in the subquery's table that correlates with the current row of the enclosing statement’s table.

The subquery either returns one or more rows or it returns none. If it returns at least one row, the EXISTS predicate succeeds (see the following section), and the enclosing statement performs its action. In the same circumstances, the NOT EXISTS predicate fails (see the section after that), and the enclosing statement does not perform its action. After one row of the enclosing statement's table is processed, the same operation is performed on the next row. This action is repeated until every row in the enclosing statement’s table has been processed.

EXISTS

Suppose you are a salesperson for Zetec Corporation and you want to call your primary contact people at all of Zetec’s customer organizations in California. Try the following query:

SELECT *

FROM CONTACT

WHERE EXISTS

(SELECT *

FROM CUSTOMER

WHERE CustState = 'CA'

AND CONTACT.CustID = CUSTOMER.CustID) ;

Notice the reference to CONTACT.CustID, which is referencing a column from the outer query and comparing it with another column, CUSTOMER.CustID, from the inner query. For each candidate row of the outer query, you evaluate the inner query, using the CustID value from the current CONTACT row of the outer query in the WHERE clause of the inner query.

Here's what happens:

  1. The CustID column links the CONTACT table to the CUSTOMER table.
  2. SQL looks at the first record in the CONTACT table, finds the row in the CUSTOMER table that has the same CustID, and checks that row's CustState field.
  3. If CUSTOMER.CustState = 'CA', the current CONTACT row is added to the result table.
  4. The next CONTACT record is then processed in the same way, and so on, until the entire CONTACT table has been processed.
  5. Because the query specifies SELECT * FROM CONTACT, all the contact table's fields are returned, including the contact’s name and phone number.

NOT EXISTS

In the previous example, the Zetec salesperson wants to know the names and numbers of the contact people of all the customers in California. Imagine that a second salesperson is responsible for all of the United States except California. She can retrieve her contact people by using NOT EXISTS in a query similar to the preceding one:

SELECT *

FROM CONTACT

WHERE NOT EXISTS

(SELECT *

FROM CUSTOMER

WHERE CustState = 'CA'

AND CONTACT.CustID = CUSTOMER.CustID) ;

Every row in CONTACT for which the subquery does not return a row is added to the result table.

Other correlated subqueries

As noted in a previous section of this chapter, subqueries introduced by IN or by a comparison operator need not be correlated queries, but they can be.

Correlated subqueries introduced with IN

In the earlier section “Subqueries introduced by the keyword IN,” I discuss how a noncorrelated subquery can be used with the IN predicate. To show how a correlated subquery may use the IN predicate, ask the same question that came up with the EXISTS predicate: What are the names and phone numbers of the contacts at all of Zetec's customers in California? You can answer this question with a correlated IN subquery:

SELECT *

FROM CONTACT

WHERE 'CA' IN

(SELECT CustState

FROM CUSTOMER

WHERE CONTACT.CustID = CUSTOMER.CustID) ;

The statement is evaluated for each record in the CONTACT table. If, for that record, the CustID numbers in CONTACT and CUSTOMER match, then the value of CUSTOMER.CustState is compared to 'CA'. The result of the subquery is a list that contains, at most, one element. If that one element is 'CA', the WHERE clause of the enclosing statement is satisfied, and a row is added to the query's result table.

Subqueries introduced with comparison operators

A correlated subquery can also be introduced by one of the six comparison operators, as shown in the next example.

Zetec pays bonuses to its salespeople based on their total monthly sales volume. The higher the volume is, the higher the bonus percentage is. The bonus percentage list is kept in the BONUSRATE table:

MinAmount MaxAmount BonusPct

--------- --------- --------

0.00 24999.99 0.

25000.00 49999.99 0.1

50000.00 99999.99 0.2

100000.00 249999.99 0.3

250000.00 499999.99 0.4

500000.00 749999.99 0.5

750000.00 999999.99 0.6

If a person’s monthly sales are between $100,000.00 and $249,999.99, the bonus is 0.3 percent of sales.

Sales are recorded in a transaction master table named TRANSMASTER:

TRANSMASTER

-----------

Column Type Constraints

------ ---- -----------

TransID INTEGER PRIMARY KEY

CustID INTEGER FOREIGN KEY

EmpID INTEGER FOREIGN KEY

TransDate DATE

NetAmount NUMERIC

Freight NUMERIC

Tax NUMERIC

InvoiceTotal NUMERIC

Sales bonuses are based on the sum of the NetAmount field for all of a person’s transactions in the month. You can find any person’s bonus rate with a correlated subquery that uses comparison operators:

SELECT BonusPct

FROM BONUSRATE

WHERE MinAmount <=

(SELECT SUM (NetAmount)

FROM TRANSMASTER

WHERE EmpID = 133)

AND MaxAmount >=

(SELECT SUM (NetAmount)

FROM TRANSMASTER

WHERE EmpID = 133) ;

This query is interesting in that it contains two subqueries, making use of the logical connective AND. The subqueries use the SUM aggregate operator, which returns a single value: the total monthly sales of employee number 133. That value is then compared against the MinAmount and the MaxAmount columns in the BONUSRATE table, producing the bonus rate for that employee.

If you had not known the EmpID but had known the EmplName, you could arrive at the same answer with a more complex query:

SELECT BonusPct

FROM BONUSRATE

WHERE MinAmount <=

(SELECT SUM (NetAmount)

FROM TRANSMASTER

WHERE EmpID =

(SELECT EmpID

FROM EMPLOYEE

WHERE EmplName = 'Coffin'))

AND MaxAmount >=

(SELECT SUM (NetAmount)

FROM TRANSMASTER

WHERE EmpID =

(SELECT EmpID

FROM EMPLOYEE

WHERE EmplName = 'Coffin'));

This example uses subqueries nested within subqueries, which, in turn, are nested within an enclosing query to arrive at the bonus rate for the employee named Coffin. This structure works only if you know for sure that the company has one, and only one, employee whose last name is Coffin. If you know that more than one employee has the same last name, you can add terms to the WHERE clause of the innermost subquery until you're sure that only one row of the EMPLOYEE table is selected.

Subqueries in a HAVING clause

You can have a correlated subquery in a HAVING clause just as you can in a WHERE clause. As I mention in Chapter 10, a HAVING clause is usually preceded by a GROUP BY clause. The HAVING clause acts as a filter to restrict the groups created by the GROUP BY clause. Groups that don't satisfy the condition of the HAVING clause are not included in the result. When used this way, the HAVING clause is evaluated for each group created by the GROUP BY clause.

Tip In the absence of a GROUP BY clause, the HAVING clause is evaluated for the set of rows passed by the WHERE clause — which is considered to be a single group. If neither a WHERE clause nor a GROUP BY clause is present, the HAVING clause is evaluated for the entire table:

SELECT TM1.EmpID

FROM TRANSMASTER TM1

GROUP BY TM1.Department

HAVING MAX (TM1.NetAmount) >= ALL

(SELECT 2 * AVG (TM2.NetAmount)

FROM TRANSMASTER TM2

WHERE TM1.EmpID <> TM2.EmpID) ;

This query uses two aliases for the same table, enabling you to retrieve the EmpID number of all salespeople who had a sale of at least twice the average sale of all the other salespeople. The query works as follows:

  1. The outer query groups TRANSMASTER rows by the employees' department. This is done with the SELECT, FROM, and GROUP BY clauses.
  2. The HAVING clause filters these groups. For each group, it calculates the MAX of the NetAmount column for the rows in that group.
  3. The inner query evaluates twice the average NetAmount from all rows of TRANSMASTER whose EmpID is different from the EmpID of the current group of the outer query.

    Remember In the last line, you have to reference two different EmpID values — so you use different aliases for TRANSMASTER in the FROM clauses of the outer and inner queries.

  4. You use those aliases in the comparison of the query's last line to indicate that you’re referencing both the EmpID from the current row of the inner subquery (TM2.EmpID) and the EmpID from the current group of the outer subquery (TM1.EmpID).

UPDATE, DELETE, and INSERT

In addition to SELECT statements, UPDATE, DELETE, and INSERT statements can also include WHERE clauses. Those WHERE clauses can contain subqueries in the same way that SELECT statements' WHERE clauses do.

For example, Zetec has just made a volume purchase deal with Olympic Sales and wants to provide Olympic with a retroactive 10 percent credit for all its purchases in the last month. You can give this credit with an UPDATE statement:

UPDATE TRANSMASTER

SET NetAmount = NetAmount * 0.9

WHERE SaleDate > (CurrentDate – 30) DAY AND CustID =

(SELECT CustID

FROM CUSTOMER

WHERE Company = 'Olympic Sales') ;

You can also have a correlated subquery in an UPDATE statement. Suppose the CUSTOMER table has a column LastMonthsMax, and Zetec wants to give such a credit for purchases that exceed LastMonthsMax for the customer:

UPDATE TRANSMASTER TM

SET NetAmount = NetAmount * 0.9

WHERE NetAmount>

(SELECT LastMonthsMax

FROM CUSTOMER C

WHERE C.CustID = TM.CustID) ;

Note that this subquery is correlated: The WHERE clause in the last line references both the CustID of the CUSTOMER row from the subquery and the CustID of the current TRANSMASTER row that is a candidate for updating.

A subquery in an UPDATE statement can also reference the table that is being updated. Suppose that Zetec wants to give a 10 percent credit to customers whose purchases have exceeded $10,000:

UPDATE TRANSMASTER TM1

SET NetAmount = NetAmount * 0.9

WHERE 10000 < (SELECT SUM(NetAmount)

FROM TRANSMASTER TM2

WHERE TM1.CustID = TM2.CustID);

The inner subquery calculates the SUM of the NetAmount column for all TRANSMASTER rows for the same customer. What does this mean? Suppose the customer with CustID = 37 has four rows in TRANSMASTER with values for NetAmount: 3000, 5000, 2000, and 1000. The SUM of NetAmount for this CustID is 11000.

The order in which the UPDATE statement processes the rows is defined by your implementation and is generally not predictable. The order may differ depending on how the rows are arranged on the disk. Assume that the implementation processes the rows for this CustID in this order: first the TRANSMASTER with a NetAmount of 3000, then the one with NetAmount= 5000, and so on. After the first three rows for CustID 37 have been updated, their NetAmount values are 2700 (90 percent of $3,000), 4500 (90 percent of $5,000), and 1800 (90 percent of $2,000). Then, when you process the last TRANSMASTER row for CustID 37 (whose NetAmount is 1000), the SUM returned by the subquery would seem to be 10000 — that is, the SUM of the new NetAmount values of the first three rows for CustID 37, and the old NetAmount value of the last row for CustID 37. Thus it would seem that the last row for CustID 37 isn't updated, because the comparison with that SUM is not True — after all, 10000 is not less than 10000. But that is not how the UPDATE statement is defined when a subquery references the table that is being updated.

Remember All evaluations of subqueries in an UPDATE statement reference the old values of the table — the ones that are being updated. In the preceding UPDATE for CustID 37, the subquery returns 11000 — the original SUM.

The subquery in a WHERE clause operates the same as a SELECT statement or an UPDATE statement. The same is true for DELETE and INSERT. To delete all of Olympic's transactions, use this statement:

DELETE FROM TRANSMASTER

WHERE CustID =

(SELECT CustID

FROM CUSTOMER

WHERE Company = 'Olympic Sales') ;

As with UPDATE, DELETE subqueries can also be correlated and can also reference the table being deleted. The rules are similar to the rules for UPDATE subqueries. Suppose you want to delete all rows from TRANSMASTER for customers whose total NetAmount is larger than $10,000:

DELETE FROM TRANSMASTER TM1

WHERE 10000 < (SELECT SUM(NetAmount)

FROM TRANSMASTER TM2

WHERE TM1.CustID = TM2.CustID) ;

This query deletes all rows from TRANSMASTER that have CustID 37, as well as any other customers with purchases exceeding $10,000. All references to TRANSMASTER in the subquery denote the contents of TRANSMASTER before any deletions by the current statement. So even when you're deleting the last TRANSMASTER row for CustID 37, the subquery is evaluated on the original TRANSMASTER table and returns 11000.

Remember When you update, delete, or insert database records, you risk making a table's data inconsistent with other tables in the database. Such an inconsistency is called a modification anomaly, discussed in Chapter 5. If you delete TRANSMASTER records and a TRANSDETAIL table depends on TRANSMASTER, you must delete the corresponding records from TRANSDETAIL, too. This operation is called a cascading delete, because the deletion of a parent record must cascade to its associated child records. Otherwise the undeleted child records become orphans. In this case, they would be invoice detail lines that are in limbo because they are no longer connected to an invoice record.

If your implementation of SQL doesn’t support cascading deletes, you must do the deletions yourself. In this case, delete the appropriate records from the child table before deleting the corresponding record from the parent. That way, you don’t have orphan records in the child table, even for a second.

Retrieving changes with pipelined DML

In the preceding section, I show how an UPDATE, DELETE, or INSERT statement can include a nested SELECT statement within a WHERE clause. SQL:2011 introduced a related capability, in which a data manipulation command (such as UPDATE, INSERT, DELETE, or MERGE statements) can be nested within a SELECT statement. This capability is called pipelined DML.

One way to look at a data change operation is to envision a table before it is changed with a DELETE, INSERT, or UPDATE operation. You could call the table before the change the old table and the table after the change, the new table. During the data change operation, auxiliary tables, called delta tables, are created. A DELETE operation creates an old delta table, which contains the rows to be deleted. An INSERT operation creates a new delta table, which contains the rows to be inserted. An UPDATE operation would create both an old and a new delta table, the old for the rows being replaced and the new for the rows replacing them.

With pipelined DML, you can retrieve the information in the delta tables. Suppose you want to delete from your product line all products with ProductIDs between 1000 and 1399, and you want a record of exactly which products in that range are deleted. You could use the following code:

SELECT Oldtable.ProductID

FROM OLD TABLE (DELETE FROM Product

WHERE ProductID BETWEEN 1000 AND 1399)

AS Oldtable ;

In this example, the keywords OLD TABLE specify that the result of the SELECT is coming from the old delta table. The result is the list of ProductID numbers for the products that are being deleted.

Similarly, you could retrieve a list from the new delta table by using the NEW TABLE keywords, which displays the Product ID numbers of rows inserted by an INSERT operation or updated by an UPDATE operation. Because an UPDATE operation created both an old delta table and a new delta table, you can retrieve the contents of either or both by using pipelined DML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.39.59