Chapter 4

Querying Multiple Tables with Relational Operators

IN THIS CHAPTER

Bullet The UNION statement

Bullet The INTERSECT statement

Bullet The EXCEPT statement

Bullet The JOIN statements

In Chapter 3 of this minibook, I show you how, by using nested queries, data can be drawn from multiple tables to answer a question that involves different ideas. Another way to collect information from multiple tables is to use the relational operators UNION, INTERSECT, EXCEPT, and JOIN. SQL’s UNION, INTERSECT, and EXCEPT operators are modeled after the union, intersect, and except operators of relational algebra. Each one performs a very specific combining operation on the data in two or more tables. The JOIN operator, on the other hand, is considerably more flexible. A number of different joins exist, and each performs a somewhat different operation. Depending on what you want in terms of information retrieved from multiple tables, one or another of the joins or the other relational operators is likely to give it to you. In this chapter, I show you each of SQL’s relational operators, cover how it works, and discuss what you can use it for.

UNION

The UNION operator is the SQL implementation of the union operator used in relational algebra. SQL’s UNION operator enables you to draw information from two or more tables that have the same structure. Same structure means

  • The tables must all have the same number of columns.
  • Corresponding columns must all have identical data types and lengths.

When these criteria are met, the tables are union-compatible. The union of two tables returns all the rows that appear in either table and eliminates duplicates.

Suppose you have created a database for a business named Acme Systems that sells and installs computer products. Acme has two warehouses that stock the products, one in Fort Deposit, Alabama, and the other in East Kingston, New Hampshire. It contains two union-compatible tables, named DEPOSIT and KINGSTON. Both tables have two columns, and corresponding columns are of the same type. In fact, corresponding columns have identical column names (although this condition isn’t required for union compatibility).

DEPOSIT lists the names and quantity in stock of products in the Fort Deposit warehouse. KINGSTON lists the same information about the East Kingston warehouse. The UNION of the two tables gives you a virtual result table containing all the rows in the first table plus all the rows in the second table. For this example, I put just a few rows in each table to illustrate the operation:

SELECT * FROM DEPOSIT ;

 

ProductName QuantityInStock

----------- ---------------

185_Express 12

505_Express 5

510_Express 6

520_Express 2

550_Express 3

 

SELECT * FROM KINGSTON ;

 

ProductName QuantityInStock

----------- ---------------

185_Express 15

505_Express 7

510_Express 6

520_Express 2

550_Express 1

 

SELECT * FROM DEPOSIT

UNION

SELECT * FROM KINGSTON ;

 

ProductName QuantityInStock

----------- ---------------

185_Express 12

185_Express 15

505_Express 5

505_Express 7

510_Express 6

520_Express 2

550_Express 3

550_Express 1

The UNION DISTINCT operator functions identically to the UNION operator without the DISTINCT keyword. In both cases, duplicate rows are eliminated from the result set. In this example, because both warehouses had the same number of 510_Express and 520_Express products, those rows in both tables were exact duplicates, only one of which was returned.

This example shows how UNION works, but it isn’t very practical. In most cases, I imagine Acme’s manager would not care which products were stocked in exactly the same numbers at both warehouses, and thus partially removed from the result set. All the information is present, but the user must be savvy enough to realize that the total number of units of 510_Express is actually 12 rather than 6, and the total number of units of 520_Express is 4 rather than 2.

Warning I use the asterisk (*) as shorthand for all the columns in a table. This shortcut is fine most of the time, but it can get you into trouble when you use relational operators in embedded or module-language SQL. What if you add one or more new columns to one table and not to another, or you add different columns to the two tables? The two tables are then no longer union-compatible, and your program is invalid the next time it’s recompiled. Even if the same new columns are added to both tables so that they are still union-compatible, your program is probably not prepared to deal with this additional data. So, explicitly listing the columns that you want rather than relying on the * shorthand is generally a good idea. When you’re entering ad hoc SQL from the console, the asterisk will probably work fine because you can quickly display table structure to verify union compatibility if your query isn’t successful.

UNION ALL

As mentioned previously, the UNION operation normally eliminates any duplicate rows that result from its operation, which is the desired result most of the time. Sometimes, however, you may want to preserve duplicate rows. On those occasions, use UNION ALL.

The following code shows you what UNION ALL produces when it’s used with the DEPOSIT and KINGSTON tables:

SELECT * FROM DEPOSIT

UNION ALL

SELECT * FROM KINGSTON ;

 

ProductName QuantityInStock

----------- ---------------

185_Express 12

505_Express 5

510_Express 6

520_Express 2

550_Express 3

185_Express 15

505_Express 7

510_Express 6

520_Express 2

550_Express 1

UNION CORRESPONDING

You can sometimes form the union of two tables even if they are not union-compatible. If the columns you want in your result set are present and compatible in both source tables, you can perform a UNION CORRESPONDING operation. Only the specified columns are considered, and they are the only columns displayed in the result set.

Suppose ACME Systems opens a third warehouse, in Jefferson, Maine. A new table named JEFFERSON is added to the database, which includes Product and QuantityInStock columns (as the DEPOSIT and KINGSTON tables do), but also has an additional column named QuantityOnHold. A UNION or UNION ALL of JEFFERSON with either DEPOSIT or KINGSTON would not return any rows because there is not a complete match between all the columns of JEFFERSON and all the columns of the other two tables. However, you can still add the JEFFERSON data to that of either DEPOSIT or KINGSTON by specifying only the columns in JEFFERSON that correspond with the columns in the other table. Here’s a sample query:

SELECT *

FROM JEFFERSON

UNION CORRESPONDING BY

(ProductName, QuantityInStock)

SELECT *

FROM KINGSTON ;

The result table holds the products and the quantities in stock at both warehouses. As with the simple UNION, duplicates are eliminated. Thus, if the Jefferson warehouse happens to have the same quantity of a particular product that the Kingston warehouse has, the UNION CORRESPONDING operation loses one of those rows. To avoid this problem, use UNION ALL CORRESPONDING.

Tip Each column name in the list following the CORRESPONDING keyword must be a name that exists in both unioned tables. If you omit this list of names, an implicit list of all names that appear in both tables is used. But this implicit list of names may change when new columns are added to one or both tables. Therefore, explicitly listing the column names is better than omitting them.

INTERSECT

The UNION operation produces a result table containing all rows that appear in at least one of the source tables. If you want only rows that appear in all the source tables, you can use the INTERSECT operation, which is the SQL implementation of relational algebra’s intersect operation. I illustrate INTERSECT by returning to the Acme Systems warehouse table:

SELECT * FROM DEPOSIT ;

 

ProductName QuantityInStock

----------- ---------------

185_Express 12

505_Express 5

510_Express 6

520_Express 2

550_Express 3

 

SELECT * FROM KINGSTON ;

 

ProductName QuantityInStock

----------- ---------------

185_Express 15

505_Express 7

510_Express 6

520_Express 2

550_Express 1

Only rows that appear in all source tables show up in the INTERSECT operation’s result table:

SELECT *

FROM DEPOSIT

INTERSECT

SELECT *

FROM KINGSTON;

 

ProductName QuantityInStock

----------- ---------------

510_Express 6

520_Express 2

The result table shows that the Fort Deposit and East Kingston warehouses both have exactly the same number of 510_Express and 520_Express products in stock, a fact of dubious value. Note that, as was the case with UNION, INTERSECT DISTINCT produces the same result as the INTERSECT operator used alone. In this example, only one of the identical rows displaying each of two products is returned.

The ALL and CORRESPONDING keywords function in an INTERSECT operation the same way they do in a UNION operation. If you use ALL, duplicates are retained in the result table. If you use CORRESPONDING, the intersected tables need not be union-compatible, although the corresponding columns need to have matching types and lengths.

Consider another example: A municipality keeps track of the phones carried by police officers, firefighters, parking enforcement officers, and other city employees. A database table called PHONES contains data on all phones in active use. Another table named OUT, with an identical structure, contains data on all phones that have been taken out of service. No cellphone should ever exist in both tables. With an INTERSECT operation, you can test to see whether such an unwanted duplication has occurred:

SELECT *

FROM PHONES

INTERSECT CORRESPONDING BY (PhoneID)

SELECT *

FROM OUT ;

If the result table contains any rows, you know you have a problem. You should investigate any PhoneID entries that appear in the result table. The corresponding phone is either active or out of service; it can’t be both. After you detect the problem, you can perform a DELETE operation on one of the two tables to restore database integrity.

EXCEPT

The UNION operation acts on two source tables and returns all rows that appear in either table. The INTERSECT operation returns all rows that appear in both the first and the second table. In contrast, the EXCEPT (or EXCEPT DISTINCT) operation returns all rows that appear in the first table but that do not also appear in the second table.

Returning to the municipal phone database example, say that a group of phones that had been declared out of service and returned to the vendor for repairs have now been fixed and placed back into service. The PHONES table was updated to reflect the returned phones, but the returned phones were not removed from the OUT table as they should have been. You can display the PhoneID numbers of the phones in the OUT table, with the reactivated ones eliminated, using an EXCEPT operation:

SELECT *

FROM OUT

EXCEPT CORRESPONDING BY (PhoneID)

SELECT *

FROM PHONES;

This query returns all the rows in the OUT table whose PhoneID is not also present in the PHONES table. These are the phones still out of service.

JOINS

The UNION, INTERSECT, and EXCEPT operators are valuable in multitable databases in which the tables are union-compatible. In many cases, however, you want to draw data from multiple tables that have very little in common. JOINs are powerful relational operators that combine data from multiple tables into a single result table. The source tables may have little (or even nothing) in common with each other.

SQL supports a number of types of JOINs. The best one to choose in a given situation depends on the result you’re trying to achieve.

Cartesian product or cross join

Any multitable query is a type of JOIN. The source tables are joined in the sense that the result table includes information taken from all the source tables. The simplest JOIN is a two-table SELECT that has no WHERE clause qualifiers. Every row of the first table is joined to every row of the second table. The result table is referred to as the Cartesian product of the two source tables — the direct product of the two sets. (The less fancy name for the same thing is cross join.) The number of rows in the result table is equal to the number of rows in the first source table multiplied by the number of rows in the second source table.

For example, imagine that you’re the personnel manager for a company, and that part of your job is to maintain employee records. Most employee data, such as home address and telephone number, is not particularly sensitive. But some data, such as current salary, should be available only to authorized personnel. To maintain security of the sensitive information, you’d probably keep it in a separate table that is password protected. Consider the following pair of tables:

EMPLOYEE COMPENSATION

-------- ------------

EmpID Employ

FName Salary

LName Bonus

City

Phone

Fill the tables with some sample data:

EmpID FName LName City Phone

----- ----- ----- ---- -----

1 Jenny Smith Orange 555-1001

2 Bill Jones Newark 555-3221

3 Val Brown Nutley 555-6905

4 Justin Time Passaic 555-8908

 

Employ Salary Bonus

------ ------ -----

1 63000 10000

2 48000 2000

3 54000 5000

4 52000 7000

Create a virtual result table with the following query:

SELECT *

FROM EMPLOYEE, COMPENSATION ;

which can also be written

SELECT *

FROM EMPLOYEE CROSS JOIN COMPENSATION ;

Both of the above formulations do exactly the same thing. This query produces

EmpID FName LName City Phone Employ Salary Bonus

----- ----- ----- ---- ----- ------ ------ -----

1 Jenny Smith Orange 555-1001 1 63000 10000

1 Jenny Smith Orange 555-1001 2 48000 2000

1 Jenny Smith Orange 555-1001 3 54000 5000

1 Jenny Smith Orange 555-1001 4 52000 7000

2 Bill Jones Newark 555-3221 1 63000 10000

2 Bill Jones Newark 555-3221 2 48000 2000

2 Bill Jones Newark 555-3221 3 54000 5000

2 Bill Jones Newark 555-3221 4 52000 7000

3 Val Brown Nutley 555-6905 1 63000 10000

3 Val Brown Nutley 555-6905 2 48000 2000

3 Val Brown Nutley 555-6905 3 54000 5000

3 Val Brown Nutley 555-6905 4 52000 7000

4 Justin Time Passaic 555-8908 1 63000 10000

4 Justin Time Passaic 555-8908 2 48000 2000

4 Justin Time Passaic 555-8908 3 54000 5000

4 Justin Time Passaic 555-8908 4 52000 7000

The result table, which is the Cartesian product of the EMPLOYEE and COMPENSATION tables, contains considerable redundancy. Furthermore, it doesn’t make much sense. It combines every row of EMPLOYEE with every row of COMPENSATION. The only rows that convey meaningful information are those in which the EmpID number that came from EMPLOYEE matches the Employ number that came from COMPENSATION. In those rows, an employee’s name and address are associated with that same employee’s compensation.

When you’re trying to get useful information out of a multitable database, the Cartesian product produced by a cross join is almost never what you want, but it’s almost always the first step toward what you want. By applying constraints to the JOIN with a WHERE clause, you can filter out the unwanted rows. The most common JOIN that uses the WHERE clause filter is the equi-join.

Equi-join

An equi-join is a cross join with the addition of a WHERE clause containing a condition specifying that the value in one column in the first table must be equal to the value of a corresponding column in the second table. Applying an equi-join to the example tables from the previous section brings a more meaningful result:

SELECT *

FROM EMPLOYEE, COMPENSATION

WHERE EMPLOYEE.EmpID = COMPENSATION.Employ ;

This produces the following:

EmpID FName LName City Phone Employ Salary Bonus

----- ------ ----- ---- ----- ------ ------ -----

1 Jenny Smith Orange 555-1001 1 63000 10000

2 Bill Jones Newark 555-3221 2 48000 2000

3 Val Brown Nutley 555-6905 3 54000 5000

4 Justin Time Passaic 555-8908 4 52000 7000

In this result table, the salaries and bonuses on the right apply to the employees named on the left. The table still has some redundancy because the EmpID column duplicates the Employ column. You can fix this problem by specifying in your query which columns you want selected from the COMPENSATION table:

SELECT EMPLOYEE.*,COMPENSATION.Salary,COMPENSATION.Bonus

FROM EMPLOYEE, COMPENSATION

WHERE EMPLOYEE.EmpID = COMPENSATION.Employ ;

This produces the following result:

EmpID FName LName City Phone Salary Bonus

----- ----- ----- ---- ----- ------ -----

1 Jenny Smith Orange 555-1001 63000 10000

2 Bill Jones Newark 555-3221 48000 2000

3 Val Brown Nutley 555-6905 54000 5000

4 Justin Time Passaic 555-8908 52000 7000

This table tells you what you want to know, but doesn’t burden you with any extraneous data. The query is somewhat tedious to write, however. To avoid ambiguity, it makes good sense to qualify the column names with the names of the tables they came from. However, writing those table names repeatedly can be tiresome.

You can cut down on the amount of typing by using aliases (or correlation names). An alias is a short name that stands for a table name. If you use aliases in recasting the preceding query, it comes out like this:

SELECT E.*, C.Salary, C.Bonus

FROM EMPLOYEE E, COMPENSATION C

WHERE E.EmpID = C.Employ ;

In this example, E is the alias for EMPLOYEE, and C is the alias for COMPENSATION. The alias is local to the statement it’s in. After you declare an alias (in the FROM clause), you must use it throughout the statement. You can’t use both the alias and the long form of the table name.

Mixing the long form of table names with aliases creates confusion. Consider the following example, which is confusing:

SELECT T1.C, T2.C

FROM T1 T2, T2 T1

WHERE T1.C > T2.C ;

In this example, the alias for T1 is T2, and the alias for T2 is T1. Admittedly, this isn’t a smart selection of aliases, but it isn’t forbidden by the rules. If you mix aliases with long-form table names, you can’t tell which table is which.

The preceding example with aliases is equivalent to the following SELECT with no aliases:

SELECT T2.C, T1.C

FROM T1, T2

WHERE T2.C > T1.C ;

SQL enables you to join more than two tables. The maximum number varies from one implementation to another. The syntax is analogous to the two-table case:

SELECT E.*, C.Salary, C.Bonus, Y.TotalSales

FROM EMPLOYEE E, COMPENSATION C, YTD_SALES Y

WHERE E.EmpID = C.Employ

AND C.Employ = Y.EmpNo ;

This statement performs an equi-join on three tables, pulling data from corresponding rows of each one to produce a result table that shows the salespeople’s names, the amount of sales they are responsible for, and their compensation. The sales manager can quickly see whether compensation is in line with production.

Tip Storing a salesperson’s year-to-date sales in a separate YTD_SALES table ensures better performance and reliability than keeping that data in the EMPLOYEE table. The data in the EMPLOYEE table is relatively static. A person’s name, address, and telephone number don’t change very often. In contrast, the year-to-date sales change frequently. (You hope.) Because the YTD_SALES table has fewer columns than EMPLOYEE, you may be able to update it more quickly. If, in the course of updating sales totals, you don’t touch the EMPLOYEE table, you decrease the risk of accidentally modifying EMPLOYEE information that should stay the same.

Natural join

The natural join is a special case of an equi-join. In the WHERE clause of an equi-join, a column from one source table is compared with a column of a second source table for equality. The two columns must be the same type and length and must have the same name. In fact, in a natural join, all columns in one table that have the same names, types, and lengths as corresponding columns in the second table are compared for equality.

Imagine that the COMPENSATION table from the preceding example has columns EmpID, Salary, and Bonus rather than Employ, Salary, and Bonus. In that case, you can perform a natural join of the COMPENSATION table with the EMPLOYEE table. The traditional JOIN syntax looks like this:

SELECT E.*, C.Salary, C.Bonus

FROM EMPLOYEE E, COMPENSATION C

WHERE E.EmpID = C.EmpID ;

This query is a natural join. An alternate syntax for the same operation is the following:

SELECT E.*, C.Salary, C.Bonus

FROM EMPLOYEE E NATURAL JOIN COMPENSATION C ;

Condition join

A condition join is like an equi-join, except the condition being tested doesn’t have to be equality (although it can be). It can be any well-formed predicate. If the condition is satisfied, the corresponding row becomes part of the result table. The syntax is a little different from what you have seen so far, in that the condition is contained in an ON clause rather than a WHERE clause.

Suppose Acme Systems wants to know which products the Fort Deposit warehouse has in larger numbers than does the East Kingston warehouse. This question is a job for a condition join:

SELECT *

FROM DEPOSIT JOIN KINGSTON

ON DEPOSIT.QuantityInStock > KINGSTON.QuantityInStock ;

Within the predicate of a condition join, ON syntax is used in place of WHERE syntax.

Column-name join

The column-name join is like a natural join, but it’s more flexible. In a natural join, all the source table columns that have the same name are compared with each other for equality. With the column-name join, you select which same-name columns to compare. You can choose them all if you want, making the column-name join effectively a natural join. Or you may choose fewer than all same-name columns. In this way, you have a great degree of control over which cross product rows qualify to be placed into your result table.

Suppose you are Acme Systems, and you have shipped the exact same number of products to the East Kingston warehouse that you have shipped to the Fort Deposit warehouse. So far, nothing has been sold, so the number of products in inventory in East Kingston should match the number in Fort Deposit. If there are mismatches, it means that something is wrong. Either some products were never delivered to the warehouse, or they were misplaced or stolen after they arrived. With a simple query, you can retrieve the inventory levels at the two warehouses.

SELECT * FROM DEPOSIT ;

 

ProductName QuantityInStock

----------- ---------------

185_Express 12

505_Express 5

510_Express 6

520_Express 2

550_Express 3

 

SELECT * FROM KINGSTON ;

 

ProductName QuantityInStock

----------- ---------------

185_Express 15

505_Express 7

510_Express 6

520_Express 2

550_Express 1

For such small tables, it is fairly easy to see which rows don’t match. However, for a table with thousands of rows, it’s not so easy. You can use a column-name join to see whether any discrepancies exist. I show only two columns of the DEPOSIT and KINGSTON tables, to make it easy to see how the various relational operators work on them. In any real application, such tables would have additional columns, and the contents of those additional columns would not necessarily match. With a column-name join, the join operation considers only the columns specified.

SELECT *

FROM DEPOSIT JOIN KINGSTON

USING (ProductName, QuantityInStock) ;

Note the USING keyword, which tells the DBMS which columns to use.

The result table shows only the rows for which the number of products in stock at Fort Deposit equals the number of products in stock at East Kingston:

ProductName QuantityInStock ProductName QuantityInStock

----------- --------------- ----------- ---------------

510_Express 6 510_Express 6

520_Express 2 520_Express 2

Wow! Only two products match. There is a definite “shrinkage” problem at one or both warehouses. Acme needs to get a handle on security.

Inner join

By now, you’re probably getting the idea that joins are pretty esoteric and that it takes an uncommon level of spiritual discernment to deal with them adequately. You may have even heard of the mysterious inner join and speculated that it probably represents the core or essence of relational operations. Well, ha! The joke is on you: There’s nothing mysterious about inner joins. In fact, all the joins covered so far in this chapter are inner joins. I could have formulated the column-name join in the last example as an inner join by using the following syntax:

SELECT *

FROM DEPOSIT INNER JOIN KINGSTON

USING (ProductName, QuantityInStock) ;

The result is the same.

The inner join is so named to distinguish it from the outer join. An inner join discards all rows from the result table that don’t have corresponding rows in both source tables. An outer join preserves unmatched rows. That’s the difference. Nothing metaphysical about it.

Outer join

When you’re joining two tables, the first one (call it the one on the left) may have rows that don’t have matching counterparts in the second table (the one on the right). Conversely, the table on the right may have rows that don’t have matching counterparts in the table on the left. If you perform an inner join on those tables, all the unmatched rows are excluded from the output. Outer joins, however, don’t exclude the unmatched rows. Outer joins come in three types: the left outer join, the right outer join, and the full outer join.

Left outer join

In a query that includes a join, the left table is the one that precedes the keyword JOIN, and the right table is the one that follows it. The left outer join preserves unmatched rows from the left table but discards unmatched rows from the right table.

To understand outer joins, consider a corporate database that maintains records of the company’s employees, departments, and locations. Tables 4-1, 4-2, and 4-3 contain the database’s sample data.

TABLE 4-1 LOCATION

LocationID

CITY

1

Boston

3

Tampa

5

Chicago

TABLE 4-2 DEPT

DeptID

LocationID

NAME

21

1

Sales

24

1

Admin

27

5

Repair

29

5

Stock

TABLE 4-3 EMPLOYEE

EmpID

DeptID

NAME

61

24

Kirk

63

27

McCoy

Now suppose that you want to see all the data for all employees, including department and location. You get this with an equi-join:

SELECT *

FROM LOCATION L, DEPT D, EMPLOYEE E

WHERE L.LocationID = D.LocationID

AND D.DeptID = E.DeptID ;

This statement produces the following result:

1 Boston 24 1 Admin 61 24 Kirk

5 Chicago 27 5 Repair 63 27 McCoy

This results table gives all the data for all the employees, including their location and department. The equi-join works because every employee has a location and a department.

Suppose now that you want the data on the locations, with the related department and employee data. This is a different problem because a location without any associated departments may exist. To get what you want, you have to use an outer join, as in the following example:

SELECT *

FROM LOCATION L LEFT OUTER JOIN DEPT D

ON (L.LocationID = D.LocationID)

LEFT OUTER JOIN EMPLOYEE E

ON (D.DeptID = E.DeptID);

This join pulls data from three tables. First, the LOCATION table is joined to the DEPT table. The resulting table is then joined to the EMPLOYEE table. Rows from the table on the left of the LEFT OUTER JOIN operator that have no corresponding row in the table on the right are included in the result. Thus, in the first join, all locations are included, even if no department associated with them exists. In the second join, all departments are included, even if no employee associated with them exists. The result is as follows:

1 Boston 24 1 Admin 61 24 Kirk

5 Chicago 27 5 Repair 63 27 McCoy

3 Tampa NULL NULL NULL NULL NULL NULL

5 Chicago 29 5 Stock NULL NULL NULL

1 Boston 21 1 Sales NULL NULL NULL

The first two rows are the same as the two result rows in the previous example. The third row (3 Tampa) has nulls in the department and employee columns because no departments are defined for Tampa and no employees are stationed there. (Perhaps Tampa is a brand new location and has not yet been staffed.) The fourth and fifth rows (5 Chicago and 1 Boston) contain data about the Stock and the Sales departments, but the employee columns for these rows contain nulls because these two departments have no employees. This outer join tells you everything that the equi-join told you plus the following:

  • All the company’s locations, whether or not they have any departments
  • All the company’s departments, whether or not they have any employees

The rows returned in the preceding example aren’t guaranteed to be in the order you want. The order may vary from one implementation to the next. To make sure that the rows returned are in the order you want, add an ORDER BY clause to your SELECT statement, like this:

SELECT *

FROM LOCATION L LEFT OUTER JOIN DEPT D

ON (L.LocationID = D.LocationID)

LEFT OUTER JOIN EMPLOYEE E

ON (D.DeptID = E.DeptID)

ORDER BY L.LocationID, D.DeptID, E.EmpID;

Tip You can abbreviate the left outer join language as LEFT JOIN because there’s no such thing as a left inner join.

Right outer join

I’m sure you have figured out by now how the right outer join behaves. It preserves unmatched rows from the right table but discards unmatched rows from the left table. You can use it on the same tables and get the same result by reversing the order in which you present tables to the join:

SELECT *

FROM EMPLOYEE E RIGHT OUTER JOIN DEPT D

ON (D.DeptID = E.DeptID)

RIGHT OUTER JOIN LOCATION L

ON (L.LocationID = D.LocationID) ;

In this formulation, the first join produces a table that contains all departments, whether they have an associated employee or not. The second join produces a table that contains all locations, whether they have an associated department or not.

Tip You can abbreviate the right outer join language as RIGHT JOIN because there’s no such thing as a right inner join.

Full outer join

The full outer join combines the functions of the left outer join and the right outer join. It retains the unmatched rows from both the left and the right tables. Consider the most general case of the company database used in the preceding examples. It could have

  • Locations with no departments
  • Locations with no employees
  • Departments with no locations
  • Departments with no employees
  • Employees with no locations
  • Employees with no departments

Remember Whereas the preceding named conditions are unusual, they can happen, particularly in a startup situation, and when they do, you’ll be glad you have outer joins to deal with them. As soon as you say that a certain situation is not possible, reality will conk you on the head with an example of that very situation.

To show all locations, departments, and employees, regardless of whether they have corresponding rows in the other tables, use a full outer join in the following form:

SELECT *

FROM LOCATION L FULL OUTER JOIN DEPT D

ON (L.LocationID = D.LocationID)

FULL OUTER JOIN EMPLOYEE E

ON (D.DeptID = E.DeptID) ;

Tip You can abbreviate the full outer join language as FULL JOIN because there’s no such thing as a full inner join.

ON versus WHERE

The function of the ON and WHERE clauses in the various types of joins is potentially confusing. These facts may help you keep things straight:

  • The ON clause is part of the inner, left, right, and full joins. The cross join and UNION join don’t have an ON clause because neither of them does any filtering of the data.
  • The ON clause in an inner join is logically equivalent to a WHERE clause; the same condition could be specified either in the ON clause or a WHERE clause.
  • The ON clauses in outer joins (left, right, and full joins) are different from WHERE clauses. The WHERE clause simply filters the rows returned by the FROM clause. Rows rejected by the filter are not included in the result. The ON clause in an outer join first filters the rows of a cross product and then includes the rejected rows, extended with nulls.

Join Conditions and Clustering Indexes

The performance of queries that include joins depends, to a large extent, on which columns are indexed, and whether the index is clustering or not. A table can have only one clustering index, where data items that are near each other logically, such as 'Smith' and 'Smithson', are also near each other physically on disk. Using a clustering index to sequentially step through a table speeds up hard disk retrievals and thus maximizes performance.

Remember An index is a separate table that corresponds to a data table, but is sorted in some order. A clustering index is an index sorted in the same order that items are stored in memory and thus provides the fastest retrievals.

A clustering index works well with multipoint queries, which look for equality in nonunique columns. This is similar to looking up names in a telephone book. All the Smiths are listed together on consecutive pages. Most or all of them are located on the same hard disk cylinder. You can access multiple Smiths with a single disk seek operation. A nonclustering index, on the other hand, would not have this advantage. Each record typically requires a new disk seek, greatly slowing down operation. Furthermore, you probably have to touch every index to be sure you have not missed one. This is analogous to searching the greater Los Angeles telephone book for every instance of Area Code 626. Most of the numbers are in Area 213, but there will be instances of 626 sprinkled throughout the book.

Consider the following sample query:

SELECT Employee.FirstName, Employee.LastName, Student.Major

FROM Employee, Students

WHERE Employee.IDNum = Student.IDNum ;

This query returns the first and last names and the majors of university employees who are also students. How long it takes to run the query depends on how the tables are indexed. If Employee has a clustering index on IDNum, records searched are on consecutive pages. If Employee and Student both have clustering indexes on IDNum, the DBMS will likely use a merge join, which reads both tables in sorted order, minimizing the number of disk accesses needed. Such clustering often eliminates the need for a costly ORDER BY clause because the records are already sorted in the desired order.

The one disadvantage of clustered indexes is that they can become “tired” after a number of updates have been performed, causing the generation of overflow pages, which require additional disk seeks. Rebuilding the index corrects this problem. By tired, I mean less helpful. Every time you add or delete a record, the index loses some of its advantage. A deleted record must be skipped over, and added records must be put on an overflow page, which will usually require a couple of extra disk seeks.

Some modern DBMS products perform automatic clustered index maintenance, meaning they rebuild clustered indexes without having to be told to do so. If you have such a product, then the disadvantage that I just noted goes away.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.53.5