Chapter 11

Using Relational Operators

IN THIS CHAPTER

Bullet Combining tables with similar structures

Bullet Combining tables with different structures

Bullet Deriving meaningful data from multiple tables

You probably know by now that SQL is a query language for relational databases. In previous chapters, I present simple databases, and in most cases, my examples deal with only one table. In this chapter, I put the relational in “relational database.” After all, the name means “a database that consists of multiple related tables.” Here’s where you finally see those relationships.

Because the data in a relational database is distributed across multiple tables, a query usually draws data from more than one table. SQL has operators that combine data from multiple sources into a single result set. These are the UNION, INTERSECTION, and EXCEPT operators, as well as a family of JOIN operators. Each operator combines data from multiple tables in a different way.

UNION

The UNION operator is the SQL implementation of relational algebra's union operator. The UNION operator enables you to draw information from two or more tables that have the same structure. Same structure means

  • The tables must all have the same number of columns.
  • Corresponding columns must all have identical data types and lengths.

When these criteria are met, the tables are union-compatible: The union of the two tables returns all the rows that appear in either table and eliminates duplicates.

Suppose you create a baseball-statistics database (like the one in Chapter 12). It contains two union-compatible tables named AMERICAN and NATIONAL. Both tables have three columns, and corresponding columns are all the same type. In fact, corresponding columns have identical column names (although this condition isn’t required for union compatibility).

NATIONAL lists the players’ names and the number of complete games pitched by National League pitchers. AMERICAN lists the same information about pitchers in the American League. The UNION of the two tables gives you a virtual result table containing all the rows in the first table plus all the rows in the second table. For this example, I put just a few rows in each table to illustrate the operation:

SELECT * FROM NATIONAL ;

FirstName LastName CompleteGames

--------- -------- -------------

Sal Maglie 11

Don Newcombe 9

Sandy Koufax 13

Don Drysdale 12

SELECT * FROM AMERICAN ;

FirstName LastName CompleteGames

--------- -------- -------------

Whitey Ford 12

Don Larson 10

Bob Turley 8

Allie Reynolds 14

SELECT * FROM NATIONAL

UNION

SELECT * FROM AMERICAN ;

FirstName LastName CompleteGames

--------- -------- -------------

Allie Reynolds 14

Bob Turley 8

Don Drysdale 12

Don Larson 10

Don Newcombe 9

Sal Maglie 11

Sandy Koufax 13

Whitey Ford 12

The UNION DISTINCT operator functions identically to the UNION operator without the DISTINCT keyword. In both cases, duplicate rows are eliminated from the result set.

Warning I've been using the asterisk (*) as shorthand for all the columns in a table. This shortcut is fine most of the time, but it can get you into trouble when you use relational operators in embedded or module-language SQL. If you add one or more new columns to one table and not to another, or you add different columns to the two tables, the two tables are no longer union-compatible — and your program will be invalid the next time it’s recompiled. Even if the same new columns are added to both tables so they’re still union-compatible, your program is probably not prepared to deal with the additional data. You should explicitly list the columns you want, instead of relying on the * shorthand. When you're entering ad hoc SQL queries from the console, the asterisk probably works fine, because you can quickly display a table structure to verify union compatibility if your query isn’t successful.

The UNION ALL operation

As I mention previously, the UNION operation usually eliminates any duplicate rows that result from its operation, which is the desired result most of the time. Sometimes, however, you may want to preserve duplicate rows. On those occasions, use UNION ALL.

Referring to the example, suppose that “Bullet” Bob Turley had been traded in midseason from the New York Yankees in the American League to the Brooklyn Dodgers in the National League. Now suppose that during the season, he pitched eight complete games for each team. The ordinary UNION displayed in the example throws away one of the two lines containing Turley's data. Although he seemed to pitch only 8 complete games in the season, he actually hurled a remarkable 16 complete games. The following query gives you the true facts:

SELECT * FROM NATIONAL

UNION ALL

SELECT * FROM AMERICAN ;

Tip You can sometimes form a UNION of two tables even if they’re not union-compatible. If the columns you want in your result table are present and compatible in both tables, you can perform a UNION CORRESPONDING operation. Only the specified columns are considered — and they are the only columns displayed in the result table.

The CORRESPONDING operation

Baseball statisticians keep different statistics on pitchers than they keep on outfielders. In both cases, first names, last names, putouts, errors, and fielding percentages are recorded. Outfielders, of course, don't have a won/lost record, a saves record, or several other stats that pertain only to pitching. You can still perform a UNION that takes data from the OUTFIELDER table and from the PITCHER table to give you some overall information about defensive skill:

SELECT *

FROM OUTFIELDER

UNION CORRESPONDING

(FirstName, LastName, Putouts, Errors, FieldPct)

SELECT *

FROM PITCHER ;

The result table holds the first and last names of all the outfielders and pitchers, along with the putouts, errors, and fielding percentage of each player. As with the simple UNION, duplicates are eliminated. Thus, if a player spent some time in the outfield and he also pitched in one or more games, the UNION CORRESPONDING operation loses some of his statistics. To avoid this problem, use UNION ALL CORRESPONDING.

Tip Each column name in the list following the CORRESPONDING keyword must be a name that exists in both union-joined tables. If you omit this list of names, an implicit list of all names that appear in both tables is used. But this implicit list of names may change when new columns are added to one or both tables. Therefore you're better off explicitly listing the column names than you are if you omit them.

INTERSECT

The UNION operation produces a result table containing all rows that appear in any of the source tables. If you want only rows that appear in all the source tables, you can use the INTERSECT operation, which is the SQL implementation of relational algebra's intersect operation. I illustrate INTERSECT by returning to the fantasy world in which Bob Turley was traded to the Dodgers in midseason:

SELECT * FROM NATIONAL;

FirstName LastName CompleteGames

--------- -------- -------------

Sal Maglie 11

Don Newcombe 9

Sandy Koufax 13

Don Drysdale 12

Bob Turley 8

SELECT * FROM AMERICAN;

FIRST_NAME LAST_NAME COMPLETE_GAMES

---------- --------- --------------

Whitey Ford 12

Don Larson 10

Bob Turley 8

Allie Reynolds 14

Only rows that appear in all source tables show up in the INTERSECT operation's result table:

SELECT *

FROM NATIONAL

INTERSECT

SELECT *

FROM AMERICAN;

FirstName LastName CompleteGames

--------- -------- -------------

Bob Turley 8

The result table tells you that Bob Turley was the only pitcher to throw the same number of complete games in both leagues (a rather obscure distinction for old Bullet Bob). Note: As was the case with UNION, INTERSECT DISTINCT produces the same result as the INTERSECT operator used alone. In this example, only one of the identical rows featuring Bob Turley is returned.

Tip The ALL and CORRESPONDING keywords function in an INTERSECT operation the same way they do in a UNION operation. If you use ALL, duplicates are retained in the result table. If you use CORRESPONDING, the intersected tables don't need to be union-compatible, although the corresponding columns must have matching types and lengths.

Here’s what you get with INTERSECT ALL:

SELECT *

FROM NATIONAL

INTERSECT ALL

SELECT *

FROM AMERICAN;

FirstName LastName CompleteGames

--------- -------- -------------

Bob Turley 8

Bob Turley 8

Consider another example: A municipality keeps track of the cell phones carried by police officers, firefighters, street sweepers, and other city employees. A database table called PHONES contains data on all phones in active use. Another table named OUT, with an identical structure, contains data on all phones that have been taken out of service. No phones should ever exist in both tables. With an INTERSECT operation, you can test to see whether such an unwanted duplication has occurred:

SELECT *

FROM PHONES

INTERSECT CORRESPONDING (PhoneID)

SELECT *

FROM OUT ;

Remember If this operation gives you a result table containing any rows at all, you know you have a problem. You should investigate any PhoneID entries that appear in the result table. The corresponding phone is either active or out of service; it can't be both. After you detect the problem, you can perform a DELETE operation on one of the two tables to restore database integrity.

EXCEPT

The UNION operation acts on two source tables and returns all rows that appear in either table. The INTERSECT operation returns all rows that appear in both the first and the second tables. In contrast, the EXCEPT (or EXCEPT DISTINCT) operation returns all rows that appear in the first table but that do not also appear in the second table.

Returning to the municipal phone database example (see the “INTERSECT” section, earlier in this chapter), say that a group of phones that had been declared out of service and returned to the vendor for repairs have now been fixed and placed back into service. The PHONES table was updated to reflect the returned phones, but the returned phones were not removed from the OUT table as they should have been. You can display the PhoneID numbers of the phones in the OUT table, with the reactivated ones eliminated, using an EXCEPT operation:

SELECT *

FROM OUT

EXCEPT CORRESPONDING (PhoneID)

SELECT *

FROM PHONES;

This query returns all the rows in the OUT table whose PhoneID is not also present in the PHONES table.

Join Operators

The UNION, INTERSECT, and EXCEPT operators are valuable in multi-table databases that contain union-compatible tables. In many cases, however, you want to draw data from multiple tables that have very little in common. Joins are powerful relational operators that combine data from multiple tables into a single result table. The source tables may have little (or even nothing) in common with each other.

SQL supports several types of joins. The best one to choose in a given situation depends on the result you're trying to achieve. The following sections give you the details.

Basic join

Any multi-table query is a type of join. The source tables are joined in the sense that the result table includes information taken from all the source tables. The simplest join is a two-table SELECT that has no WHERE clause qualifiers: Every row of the first table is joined to every row of the second table. The result table is the Cartesian product of the two source tables. The number of rows in the result table is equal to the number of rows in the first source table multiplied by the number of rows in the second source table.

For example, imagine that you're the personnel manager for a company and that part of your job is to maintain employee records. Most employee data, such as home address and telephone number, is not particularly sensitive. But some data, such as current salary, should be available only to authorized personnel. To maintain security of the sensitive information, keep it in a separate table that is password protected. Consider the following pair of tables:

EMPLOYEE COMPENSATION

-------- ------------

EmpID Employ

FName Salary

LName Bonus

City

Phone

Fill the tables with some sample data:

EmpID FName LName City Phone

----- ----- ----- ---- -----

1 Whitey Ford Orange 555-1001

2 Don Larson Newark 555-3221

3 Sal Maglie Nutley 555-6905

4 Bob Turley Passaic 555-8908

Employ Salary Bonus

------ ------ -----

1 33000 10000

2 18000 2000

3 24000 5000

4 22000 7000

Create a virtual result table with the following query:

SELECT *

FROM EMPLOYEE, COMPENSATION ;

Here’s what the query produces:

EmpID FName LName City Phone Employ Salary Bonus

----- ----- ----- ---- ----- ------ ------ -----

1 Whitey Ford Orange 555-1001 1 33000 10000

1 Whitey Ford Orange 555-1001 2 18000 2000

1 Whitey Ford Orange 555-1001 3 24000 5000

1 Whitey Ford Orange 555-1001 4 22000 7000

2 Don Larson Newark 555-3221 1 33000 10000

2 Don Larson Newark 555-3221 2 18000 2000

2 Don Larson Newark 555-3221 3 24000 5000

2 Don Larson Newark 555-3221 4 22000 7000

3 Sal Maglie Nutley 555-6905 1 33000 10000

3 Sal Maglie Nutley 555-6905 2 18000 2000

3 Sal Maglie Nutley 555-6905 3 24000 5000

3 Sal Maglie Nutley 555-6905 4 22000 7000

4 Bob Turley Passaic 555-8908 1 33000 10000

4 Bob Turley Passaic 555-8908 2 18000 2000

4 Bob Turley Passaic 555-8908 3 24000 5000

4 Bob Turley Passaic 555-8908 4 22000 7000

The result table, which is the Cartesian product of the EMPLOYEE and COMPENSATION tables, contains considerable redundancy. Furthermore, it doesn’t make much sense. It combines every row of EMPLOYEE with every row of COMPENSATION. The only rows that convey meaningful information are those in which the EmpID number that came from EMPLOYEE matches the Employ number that came from COMPENSATION. In those rows, an employee's name and address are associated with his or her own compensation.

When you’re trying to get useful information out of a multi-table database, the Cartesian product produced by a basic join is almost never what you want, but it’s almost always the first step toward what you want. By applying constraints to the JOIN with a WHERE clause, you can filter out the unwanted rows. The following section explains how to filter the stuff you don't need to see.

Equi-join

The most common join that uses the WHERE clause filter is the equi-join. An equi-join is a basic join with a WHERE clause that contains a condition specifying that the value in one column in the first table must be equal to the value of a corresponding column in the second table. Applying an equi-join to the example tables from the previous section brings a more meaningful result:

SELECT *

FROM EMPLOYEE, COMPENSATION

WHERE EMPLOYEE.EmpID = COMPENSATION.Employ ;

This query produces the following results:

EmpID FName LName City Phone Employ Salary Bonus

----- ------ ----- ---- ----- ------ ------ -----

1 Whitey Ford Orange 555-1001 1 33000 10000

2 Don Larson Newark 555-3221 2 18000 2000

3 Sal Maglie Nutley 555-6905 3 24000 5000

4 Bob Turley Passaic 555-8908 4 22000 7000

In this result table, the salaries and bonuses on the right apply to the employees named on the left. The table still has some redundancy because the EmpID column duplicates the Employ column. You can fix this problem by slightly reformulating the query, like this:

SELECT EMPLOYEE.*,COMPENSATION.Salary,COMPENSATION.Bonus

FROM EMPLOYEE, COMPENSATION

WHERE EMPLOYEE.EmpID = COMPENSATION.Employ ;

This query produces the following result table:

EmpID FName LName City Phone Salary Bonus

----- ----- ----- ---- ----- ------ -----

1 Whitey Ford Orange 555-1001 33000 10000

2 Don Larson Newark 555-3221 18000 2000

3 Sal Maglie Nutley 555-6905 24000 5000

4 Bob Turley Passaic 555-8908 22000 7000

This table tells you what you want to know but doesn't burden you with any extraneous data. The query is somewhat tedious to write, however. To avoid ambiguity, you can qualify the column names with the names of the tables they came from. Typing those table names repeatedly provides good exercise for the fingers but has no other merit.

You can cut down on the amount of typing by using aliases (or correlation names). An alias is a short name that stands for a table name. If you use aliases in recasting the preceding query, it comes out like this:

SELECT E.*, C.Salary, C.Bonus

FROM EMPLOYEE E, COMPENSATION C

WHERE E.EmpID = C.Employ ;

In this example, E is the alias for EMPLOYEE, and C is the alias for COMPENSATION. The alias is local to the statement it’s in. After you declare an alias (in the FROM clause), you must use it throughout the statement. You can’t use both the alias and the long form of the table name in the same statement.

Tip Even if you could mix the long form of table names with aliases, you wouldn’t want to, because doing so creates major confusion. Consider the following example:

SELECT T1.C, T2.C

FROM T1 T2, T2 T1

WHERE T1.C > T2.C ;

In this example, the alias for T1 is T2, and the alias for T2 is T1. Admittedly, this isn’t a smart selection of aliases, but it isn’t forbidden by the rules. If you mix aliases with long-form table names, you can’t tell which table is which.

The preceding example with aliases is equivalent to the following SELECT statement with no aliases:

SELECT T2.C, T1.C

FROM T1 , T2

WHERE T2.C > T1.C ;

SQL enables you to join more than two tables. The maximum number varies from one implementation to another. The syntax is analogous to the two-table case; here's what it looks like:

SELECT E.*, C.Salary, C.Bonus, Y.TotalSales

FROM EMPLOYEE E, COMPENSATION C, YTD_SALES Y

WHERE E.EmpID = C.Employ

AND C.Employ = Y.EmpNo ;

This statement performs an equi-join on three tables, pulling data from corresponding rows of each one to produce a result table that shows the salespeople’s names, the amount of sales they are responsible for, and their compensation. The sales manager can quickly see whether compensation is in line with production.

Tip Storing a salesperson’s year-to-date sales in a separate YTD_SALES table ensures better computer performance and data reliability than keeping that data in the EMPLOYEE table. The data in the EMPLOYEE table is relatively static. A person’s name, address, and telephone number don’t change very often. In contrast, the year-to-date sales change frequently (you hope). Because the YTD_SALES table has fewer columns than the EMPLOYEE table, you may be able to update it more quickly. If, in the course of updating sales totals, you don’t touch the EMPLOYEE table, you decrease the risk of accidentally modifying employee information that should stay the same.

Cross join

CROSS JOIN is the keyword for the basic join without a WHERE clause. Therefore

SELECT *

FROM EMPLOYEE, COMPENSATION ;

can also be written as

SELECT *

FROM EMPLOYEE CROSS JOIN COMPENSATION ;

The result is the Cartesian product (also called the cross product) of the two source tables. CROSS JOIN rarely gives you the final result you want, but it can be useful as the first step in a chain of data-manipulation operations that ultimately produce the desired result.

Natural join

The natural join is a special case of an equi-join. In the WHERE clause of an equi-join, a column from one source table is compared with a column of a second source table for equality. The two columns must be the same type and length and must have the same name. In fact, in a natural join, all columns in one table that have the same names, types, and lengths as corresponding columns in the second table are compared for equality.

Imagine that the COMPENSATION table from the preceding example has columns EmpID, Salary, and Bonus rather than Employ, Salary, and Bonus. In that case, you can perform a natural join of the COMPENSATION table with the EMPLOYEE table. The traditional JOIN syntax would look like this:

SELECT E.*, C.Salary, C.Bonus

FROM EMPLOYEE E, COMPENSATION C

WHERE E.EmpID = C.EmpID ;

This query is a special case of a natural join. The SELECT statement will return joined rows where E.EmpID = C.EmpID. Consider the following:

SELECT E.*, C.Salary, C.Bonus

FROM EMPLOYEE E NATURAL JOIN COMPENSATION C ;

This query will join rows where E.EmpID = C.EmpID, where E.Salary = C.Salary, and where E.Bonus = C.Bonus. The result table will contain only rows where all corresponding columns match. In this example, the results of both queries will be the same because the EMPLOYEE table does not contain either a Salary or a Bonus column.

Condition join

A condition join is like an equi-join, except the condition being tested doesn't have to be an equality (although it can be). It can be any well-formed predicate. If the condition is satisfied, then the corresponding row becomes part of the result table. The syntax is a little different from what you have seen so far: The condition is contained in an ON clause rather than in a WHERE clause.

Say that a baseball statistician wants to know which National League pitchers have pitched the same number of complete games as one or more American League pitchers. This question is a job for an equi-join, which can also be expressed with condition-join syntax:

SELECT *

FROM NATIONAL JOIN AMERICAN

ON NATIONAL.CompleteGames = AMERICAN.CompleteGames ;

Column-name join

The column-name join is like a natural join, but it's more flexible. In a natural join, all the source table columns that have the same name are compared with each other for equality. With the column-name join, you select which same-name columns to compare. You can choose them all if you want, making the column-name join (effectively) a natural join. Or you may choose fewer than all same-name columns. In this way, you have a great degree of control over which cross-product rows qualify to be placed into your result table.

Suppose you’re a chess-set manufacturer and have one inventory table that keeps track of your stock of white pieces and another that keeps track of black pieces. The tables contain data as follows:

WHITE BLACK

----- -----

Piece Quant Wood Piece Quant Wood

----- ----- ---- ----- ----- ----

King 502 Oak King 502 Ebony

Queen 398 Oak Queen 397 Ebony

Rook 1020 Oak Rook 1020 Ebony

Bishop 985 Oak Bishop 985 Ebony

Knight 950 Oak Knight 950 Ebony

Pawn 431 Oak Pawn 453 Ebony

For each piece type, the number of white pieces should match the number of black pieces. If they don’t match, some chessmen are being lost or stolen, and you need to tighten security measures.

A natural join compares all columns with the same name for equality. In this case, a result table with no rows is produced because no rows in the WOOD column in the WHITE table match any rows in the WOOD column in the BLACK table. This result table doesn't help you determine whether any merchandise is missing. Instead, do a column-name join that excludes the WOOD column from consideration. It can take the following form:

SELECT *

FROM WHITE JOIN BLACK

USING (Piece, Quant) ;

The result table shows only the rows for which the number of white pieces in stock equals the number of black pieces:

Piece Quant Wood Piece Quant Wood

----- ----- ---- ----- ----- ----

King 502 Oak King 502 Ebony

Rook 1020 Oak Rook 1020 Ebony

Bishop 985 Oak Bishop 985 Ebony

Knight 950 Oak Knight 950 Ebony

The shrewd person can deduce that Queen and Pawn are missing from the list, indicating a shortage somewhere for those piece types.

Inner join

By now, you’re probably getting the idea that joins are pretty esoteric and that it takes an uncommon level of spiritual discernment to deal with them adequately. You may have even heard of the mysterious inner join and speculated that it probably represents the core or essence of relational operations. Well, ha! The joke’s on you: There’s nothing mysterious about inner joins. In fact, all the joins covered so far in this chapter are inner joins. I could have formulated the column-name join in the last example as an inner join by using the following syntax:

SELECT *

FROM WHITE INNER JOIN BLACK

USING (Piece, Quant) ;

The result is the same.

The inner join is so named to distinguish it from the outer join. An inner join discards all rows from the result table that don’t have corresponding rows in both source tables. An outer join preserves unmatched rows. That’s the difference. Nothing metaphysical about it.

Outer join

When you’re joining two tables, the first one (call it the one on the left) may have rows that don’t have matching counterparts in the second table (the one on the right). Conversely, the table on the right may have rows that don’t have matching counterparts in the table on the left. If you perform an inner join on those tables, all the unmatched rows are excluded from the output. Outer joins, however, don’t exclude the unmatched rows. Outer joins come in three types: the left outer join, the right outer join, and the full outer join.

Left outer join

In a query that includes a join, the left table is the one that precedes the keyword JOIN, and the right table is the one that follows it. The left outer join preserves unmatched rows from the left table but discards unmatched rows from the right table.

To understand outer joins, consider a corporate database that maintains records of the company's employees, departments, and locations. Tables 11-1, 11-2, and 11-3 contain the database’s example data.

TABLE 11-1 LOCATION

LOCATION_ID

CITY

1

Boston

3

Tampa

5

Chicago

TABLE 11-2 DEPT

DEPT_ID

LOCATION_ID

NAME

21

1

Sales

24

1

Admin

27

5

Repair

29

5

Stock

TABLE 11-3 EMPLOYEE

EMP_ID

DEPT_ID

NAME

61

24

Kirk

63

27

McCoy

Now suppose you want to see all the data for all employees, including department and location. You get this with an equi-join:

SELECT *

FROM LOCATION L, DEPT D, EMPLOYEE E

WHERE L.LocationID = D.LocationID

AND D.DeptID = E.DeptID ;

This statement produces the following result:

1 Boston 24 1 Admin 61 24 Kirk

5 Chicago 27 5 Repair 63 27 McCoy

This result table gives all the data for all the employees, including location and department. The equi-join works because every employee has a location and a department.

Next, suppose you want the data on the locations, with the related department and employee data. This is a different problem because a location without any associated departments may exist. To get what you want, you must use an outer join, as in the following example:

SELECT *

FROM LOCATION L LEFT OUTER JOIN DEPT D

ON (L.LocationID = D.LocationID)

LEFT OUTER JOIN EMPLOYEE E

ON (D.DeptID = E.DeptID);

This join pulls data from three tables. First, the LOCATION table is joined to the DEPT table. The result set is then joined to the EMPLOYEE table. Rows from the table on the left of the LEFT OUTER JOIN operator that have no corresponding row in the table on the right are included in the result. Thus, in the first join, all locations are included, even if no department associated with them exists. In the second join, all departments are included, even if no employee associated with them exists. The result is as follows:

1 Boston 24 1 Admin 61 24 Kirk

5 Chicago 27 5 Repair 63 27 McCoy

3 Tampa NULL NULL NULL NULL NULL NULL

5 Chicago 29 5 Stock NULL NULL NULL

1 Boston 21 1 Sales NULL NULL NULL

The first two rows are the same as the two result rows in the previous example. The third row (3 Tampa) has nulls in the department and employee columns because no departments are defined for Tampa and no employees are stationed there. The fourth and fifth rows (5 Chicago and 1 Boston) contain data about the Stock and the Sales departments, but the Employee columns for these rows contain nulls because these two departments have no employees. This outer join tells you everything that the equi-join told you — plus the following:

  • All the company's locations, whether they have any departments or not
  • All the company’s departments, whether they have any employees or not

The rows returned in the preceding example aren’t guaranteed to be in the order you want. The order may vary from one implementation to the next. To make sure that the rows returned are in the order you want, add an ORDER BY clause to your SELECT statement, like this:

SELECT *

FROM LOCATION L LEFT OUTER JOIN DEPT D

ON (L.LocationID = D.LocationID)

LEFT OUTER JOIN EMPLOYEE E

ON (D.DeptID = E.DeptID)

ORDER BY L.LocationID, D.DeptID, E.EmpID;

Tip You can abbreviate the left outer join language as LEFT JOIN because there's no such thing as a left inner join.

Right outer join

I bet you figured out how the right outer join behaves. Right! The right outer join preserves unmatched rows from the right table but discards unmatched rows from the left table. You can use it on the same tables and get the same result by reversing the order in which you present tables to the join:

SELECT *

FROM EMPLOYEE E RIGHT OUTER JOIN DEPT D

ON (D.DeptID = E.DeptID)

RIGHT OUTER JOIN LOCATION L

ON (L.LocationID = D.LocationID) ;

In this formulation, the first join produces a table that contains all departments, whether they have an associated employee or not. The second join produces a table that contains all locations, whether they have an associated department or not.

Tip You can abbreviate the right outer join language as RIGHT JOIN because there’s no such thing as a right inner join.

Full outer join

The full outer join combines the functions of the left outer join and the right outer join. It retains the unmatched rows from both the left and the right tables. Consider the most general case of the company database used in the preceding examples. It could have

  • Locations with no departments
  • Departments with no locations
  • Departments with no employees
  • Employees with no departments

To show all locations, departments, and employees, regardless of whether they have corresponding rows in the other tables, use a full outer join in the following form:

SELECT *

FROM LOCATION L FULL OUTER JOIN DEPT D

ON (L.LocationID = D.LocationID)

FULL OUTER JOIN EMPLOYEE E

ON (D.DeptID = E.DeptID) ;

Tip You can abbreviate the full-outer-join language as FULL JOIN because (this may sound hauntingly familiar) there's no such thing as a full inner join.

Union join

Unlike the other kinds of join, the union join makes no attempt to match a row from the left source table with any rows in the right source table. It creates a new virtual table that contains the union of all the columns in both source tables. In the virtual result table, the columns that came from the left source table contain all the rows that were in the left source table. For those rows, the columns that came from the right source table all have the null value. Similarly, the columns that came from the right source table contain all the rows that were in the right source table. For those rows, the columns that came from the left source table all have the null value. Thus, the table resulting from a union join contains all the columns of both source tables — and the number of rows it contains is the sum of the number of rows in the two source tables.

The result of a union join by itself is not immediately useful in most cases; it produces a result table with many nulls in it. But you can get useful information from a union join when you use it in conjunction with the COALESCE expression discussed in Chapter 9. Look at an example.

Suppose that you work for a company that designs and builds experimental rockets. You have several projects in the works. You also have several design engineers who have skills in multiple areas. As a manager, you want to know which employees, having which skills, have worked on which projects. Currently, this data is scattered among the EMPLOYEE table, the PROJECTS table, and the SKILLS table.

The EMPLOYEE table carries data about employees, and EMPLOYEE.EmpID is its primary key. The PROJECTS table has a row for each project that an employee has worked on. PROJECTS.EmpID is a foreign key that references the EMPLOYEE table. The SKILLS table shows the expertise of each employee. SKILLS.EmpID is a foreign key that references the EMPLOYEE table.

The EMPLOYEE table has one row for each employee; the PROJECTS table and the SKILLS table have zero or more rows.

Tables 11-4, 11-5, and 11-6 show example data in the three tables.

TABLE 11-4 EMPLOYEE Table

EmpID

Name

1

Ferguson

2

Frost

3

Toyon

TABLE 11-5 PROJECTS Table

ProjectName

EmpID

X-63 Structure

1

X-64 Structure

1

X-63 Guidance

2

X-64 Guidance

2

X-63 Telemetry

3

X-64 Telemetry

3

TABLE 11-6 SKILLS Table

Skill

EmpID

Mechanical Design

1

Aerodynamic Loading

1

Analog Design

2

Gyroscope Design

2

Digital Design

3

R/F Design

3

From the tables, you can see that Ferguson has worked on X-63 and X-64 structure design and has expertise in mechanical design and aerodynamic loading.

Now suppose that, as a manager, you want to see all the information about all the employees. You decide to apply an equi-join to the EMPLOYEE, PROJECTS, and SKILLS tables:

SELECT *

FROM EMPLOYEE E, PROJECTS P, SKILLS S

WHERE E.EmpID = P.EmpID

AND E.EmpID = S.EmpID ;

You can express this same operation as an inner join by using the following syntax:

SELECT *

FROM EMPLOYEE E INNER JOIN PROJECTS P

ON (E.EmpID = P.EmpID)

INNER JOIN SKILLS S

ON (E.EmpID = S.EmpID) ;

Both formulations give the same result, as shown in Table 11-7.

TABLE 11-7 Result of Inner Join

E.EmpID

Name

P.EmpID

ProjectName

S.EmpID

Skill

1

Ferguson

1

X-63 Structure

1

Mechanical Design

1

Ferguson

1

X-63 Structure

1

Aerodynamic Loading

1

Ferguson

1

X-64 Structure

1

Mechanical Design

1

Ferguson

1

X-64 Structure

1

Aerodynamic Loading

2

Frost

2

X-63 Guidance

2

Analog Design

2

Frost

2

X-63 Guidance

2

Gyroscope Design

2

Frost

2

X-64 Guidance

2

Analog Design

2

Frost

2

X-64 Guidance

2

Gyroscope Design

3

Toyon

3

X-63 Telemetry

3

Digital Design

3

Toyon

3

X-63 Telemetry

3

R/F Design

3

Toyon

3

X-64 Telemetry

3

Digital Design

3

Toyon

3

X-64 Telemetry

3

R/F Design

This data arrangement is not particularly enlightening. The employee ID numbers appear three times, and the projects and skills are duplicated for each employee. Bottom line: The inner joins are not well suited to answering this type of question. You can put the union join to work here, along with some strategically chosen SELECT statements, to produce a more suitable result. You begin with the basic union join:

SELECT *

FROM EMPLOYEE E UNION JOIN PROJECTS P

UNION JOIN SKILLS S ;

Remember Notice that the union join has no ON clause. It doesn't filter the data, so an ON clause isn’t needed. This statement produces the result shown in Table 11-8.

TABLE 11-8 Result of Union Join

E.EmpID

Name

P.EmpID

ProjectName

S.EmpID

Skill

1

Ferguson

NULL

NULL

NULL

NULL

NULL

NULL

1

X-63 Structure

NULL

NULL

NULL

NULL

1

X-64 Structure

NULL

NULL

NULL

NULL

NULL

NULL

1

Mechanical Design

NULL

NULL

NULL

NULL

1

Aerodynamic Loading

2

Frost

NULL

NULL

NULL

NULL

NULL

NULL

2

X-63 Guidance

NULL

NULL

NULL

NULL

2

X-64 Guidance

NULL

NULL

NULL

NULL

NULL

NULL

2

Analog Design

NULL

NULL

NULL

NULL

2

Gyroscope Design

3

Toyon

NULL

NULL

NULL

NULL

NULL

NULL

3

X-63 Telemetry

NULL

NULL

NULL

NULL

3

X-64 Telemetry

NULL

NULL

NULL

NULL

NULL

NULL

3

Digital Design

NULL

NULL

NULL

NULL

3

R/F Design

Each table has been extended to the right or left with nulls, and those null-extended rows have been union joined. The order of the rows is arbitrary and depends on the implementation. Now you can massage the data to put it in a more useful form.

Notice that the table has three ID columns, two of which are null in any row. You can improve the display by coalescing the ID columns. As I note in Chapter 9, the COALESCE expression takes on the value of the first non-null value in a list of values. In the present case, it takes on the value of the only non-null value in a column list:

SELECT COALESCE (E.EmpID, P.EmpID, S.EmpID) AS ID,

E.Name, P.ProjectName, S.Skill

FROM EMPLOYEE E UNION JOIN PROJECTS P

UNION JOIN SKILLS S

ORDER BY ID ;

The FROM clause is the same as in the previous example, but now the three EMP_ID columns are coalesced into a single column named ID. You're also ordering the result by ID. Table 11-9 shows the result.

TABLE 11-9 Result of Union Join with COALESCE Expression

ID

Name

ProjectName

Skill

1

Ferguson

X-63 Structure

NULL

1

Ferguson

X-64 Structure

NULL

1

Ferguson

NULL

Mechanical Design

1

Ferguson

NULL

Aerodynamic Loading

2

Frost

X-63 Guidance

NULL

2

Frost

X-64 Guidance

NULL

2

Frost

NULL

Analog Design

2

Frost

NULL

Gyroscope Design

3

Toyon

X-63 Telemetry

NULL

3

Toyon

X-64 Telemetry

NULL

3

Toyon

NULL

Digital Design

3

Toyon

NULL

R/F Design

Each row in this result has data about a project or a skill, but not both. When you read the result, you first must determine what type of information is in each row (project or skill). If the ProjectName column has a non-null value, the row names a project on which the employee has worked. If the Skill column is not null, the row names one of the employee's skills.

Tip You can make the result a little clearer by adding another COALESCE to the SELECT statement, as follows:

SELECT COALESCE (E.EmpID, P.EmpID, S.EmpID) AS ID,

E.Name, COALESCE (P.Type, S.Type) AS Type,

P.ProjectName, S.Skill

FROM EMPLOYEE E

UNION JOIN (SELECT "Project" AS Type, P.*

FROM PROJECTS) P

UNION JOIN (SELECT "Skill" AS Type, S.*

FROM SKILLS) S

ORDER BY ID, Type ;

In this union join, the PROJECTS table in the previous example is replaced with a nested SELECT that appends a column named P.Type with a constant value "Project" to the columns coming from the PROJECTS table. Similarly, the SKILLS table is replaced with a nested SELECT that appends a column named S.Type with a constant value "Skill" to the columns coming from the SKILLS table. In each row, P.Type is either null or "Project", and S.Type is either null or "Skill".

The outer SELECT list specifies a COALESCE of those two Type columns into a single column named Type. You then specify Type in the ORDER BY clause, which sorts the rows that all have the same ID in an order that puts all projects first, followed by all skills. The result is shown in Table 11-10.

TABLE 11-10 Refined Result of Union Join with COALESCE Expressions

ID

Name

Type

ProjectName

Skill

1

Ferguson

Project

X-63 Structure

NULL

1

Ferguson

Project

X-64 Structure

NULL

1

Ferguson

Skill

NULL

Mechanical Design

1

Ferguson

Skill

NULL

Aerodynamic Loading

2

Frost

Project

X-63 Guidance

NULL

2

Frost

Project

X-64 Guidance

NULL

2

Frost

Skill

NULL

Analog Design

2

Frost

Skill

NULL

Gyroscope Design

3

Toyon

Project

X-63 Telemetry

NULL

3

Toyon

Project

X-64 Telemetry

NULL

3

Toyon

Skill

NULL

Digital Design

3

Toyon

Skill

NULL

R/F Design

The result table now presents a very readable account of the project experience and skill sets of all employees in the EMPLOYEE table.

Considering the number of JOIN operations available, relating data from different tables shouldn't be a problem, regardless of the tables’ structure. You can trust that if the raw data exists in your database, SQL has the means to get it out and display it in a meaningful form.

ON versus WHERE

The function of the ON and WHERE clauses in the various types of joins is potentially confusing. These facts may help you keep things straight:

  • The ON clause is part of the inner, left, right, and full joins. The cross join and union join don't have an ON clause because neither of them does any filtering of the data.
  • The ON clause in an inner join is logically equivalent to a WHERE clause; the same condition could be specified either in an ON clause or a WHERE clause.
  • The ON clauses in outer joins (left, right, and full joins) are different from WHERE clauses. The WHERE clause simply filters the rows returned by the FROM clause. Rows rejected by the filter are not included in the result. The ON clause in an outer join first filters the rows of a cross product and then includes the rejected rows, extended with nulls.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.86.134