Data Retrieval

SQL is based on mathematical principles, specifically set theory and relational algebra. The data is stored in the database as unordered sets of data records. SQL is a set-oriented language and many of its language elements are directly related to relational algebraic terms, such as permutation, projection, restriction, and join.

A set of data is represented in a DB2 database as a table or a view and is stored in a DB2 table without regard to order. To retrieve data in a particular order, an ORDER BY phrase must be added to a SELECT statement. Similarly, if the data is to be grouped, then a GROUP BY phrase must be added to the statement.

Many of the examples that are used in this chapter are based on the employee table from the sample database shipped with DB2. The DDL for this table is given here for your reference.

CREATE TABLE "EMPLOYEE"  (
    "EMPNO" CHAR(6) NOT NULL ,
    "FIRSTNME" VARCHAR(12) NOT NULL ,
    "MIDINIT" CHAR(1) NOT NULL ,
    "LASTNAME" VARCHAR(15) NOT NULL ,
    "WORKDEPT" CHAR(3) ,
    "PHONENO" CHAR(4) ,
    "HIREDATE" DATE ,
    "JOB" CHAR(8) ,
    "EDLEVEL" SMALLINT NOT NULL ,
    "SEX" CHAR(1) ,
    "BIRTHDATE" DATE ,
    "SALARY" DECIMAL(9,2) ,
    "BONUS" DECIMAL(9,2) ,
    "COMM" DECIMAL(9,2) )

Basic Data Retrieval

The basic structure of an SQL query has the following form:

SELECT [DISTINCT] expression1, expression2, ... expressionN
FROM relation1, relation2, ..., relationM,
WHERE condition1, condition2, ..., conditionL
						

The SELECT clause corresponds to the projection operation of relational algebra. It allows you to retrieve a subset or full-set of the defined columns from a table. The FROM clause corresponds to the cartesian product operation of relational algebra. It allows you to specify the data sets that you want to retrieve information from, such as tables, views, or even the result set of other SELECT statements. The WHERE clause corresponds to the restriction operation of relational algebra. This restricts which rows of the relation to retrieve. If the WHERE clause is omitted, then the condition will evaluate to TRUE. Consider this simple query.

SELECT firstnme, lastname
FROM employee

The answer is a set of rows known as a result set, each of which is a pair <firstnme, lastname>, as shown here.

FIRSTNME     LASTNAME
------------ ---------------
BRUCE        ADAMSON
DAVID        BROWN
JOHN         GEYER
JASON        GOUNOT
CHRISTINE    HAAS
EILEEN       HENDERSON
JAMES        JEFFERSON
SYBIL        JOHNSON
WILLIAM      JONES
SALLY        KWAN
WING         LEE
VINCENZO     LUCCHESSI
JENNIFER     LUTZ
SALVATORE    MARINO
RAMLAL       MEHTA
HEATHER      NICHOLLS
SEAN         O'CONNELL
JOHN         PARKER
MARIA        PEREZ
ELIZABETH    PIANKA
EVA          PULASKI
DOLORES      QUINTANA
ETHEL        SCHNEIDER
MARILYN      SCOUTTEN
MAUDE        SETRIGHT
DANIEL       SMITH
PHILIP       SMITH
THEODORE     SPENSER
IRVING       STERN
MICHAEL      THOMPSON
JAMES        WALKER
MASATOSHI    YOSHIMURA

  32 record(s) selected.

The WHERE clause can be used to restrict the result set to those rows that meet a required condition. For example, the following query can be used to retrieve the names of all managers:

SELECT lastname, firstnme
FROM employee
WHERE job = 'MANAGER'

Note that not only have we restricted the result set to only those rows that qualify the condition, we have also changed the order of the projection so that the lastname now appears first for each row.

LASTNAME        FIRSTNME
--------------- ------------
THOMPSON        MICHAEL
KWAN            SALLY
GEYER           JOHN
STERN           IRVING
PULASKI         EVA
HENDERSON       EILEEN
SPENSER         THEODORE

  7 record(s) selected.

The SELECT clause also allows for the use of the optional keyword DISTINCT. Relational algebra never has duplicate tuples in a relation. More simply put, a result should not contain duplicate rows. The practice of eliminating all duplicate rows is relatively time consuming and therefore SQL allows duplicates in a result set. To eliminate duplicates, use the DISTINCT keyword in the SELECT clause. For example, the employee table contains all employees and their job titles. If you select the job column from the table, you would get 32 rows of data and many duplicate rows because more than one employee is a manager. The following SQL query produces distinct rows for its result set:

SELECT DISTINCT job
FROM employee
JOB
--------
ANALYST
CLERK
DESIGNER
FIELDREP
MANAGER
OPERATOR
PRES
SALESREP

  8 record(s) selected.

NOTE

SQL allows the use of the keyword ALL to specify explicitly that duplicates are not removed.

SELECT ALL job FROM employee


NOTE

To select all of the columns from a relation set, you can use the * character in place of a list of expressions. The following will list all columns from the table employee:

SELECT * FROM employee


Expressions in the SELECT Statement

The previous section showed that the SELECT clause consists of a set of expressions. For the examples so far, the expressions have simply been a column from a table. However, the expression can be any arithmetic, string, or date expression involving or not involving columns from the relation set. It also can contain aggregate functions such as MAX or COUNT (discussed later). DB2 also supports other powerful expressions, including the CASE statement and the evaluation of nested SELECT statements. Each of these will be discussed later in this chapter as well.

Consider the following SQL:

SELECT (RTRIM(lastname) || ', ' || RTRIM(firstnme) AS name
FROM employee
WHERE job = 'MANAGER'

Here the SELECT clause is using a system defined function (SDF) and the CONCAT operator (||). It also uses the AS clause to provide a column label for the expression. The default column label for an expression is a numerical value indicating the ordinal position in which the expression appeared in the SELECT clause. In this case, the column label is '1'. The result set for the previous query is shown here:

NAME
-----------------------------
THOMPSON, MICHAEL
KWAN, SALLY
GEYER, JOHN
STERN, IRVING
PULASKI, EVA
HENDERSON, EILEEN
SPENSER, THEODORE

  7 record(s) selected.

Order-of-operations apply to all expressions.

SELECT lastname, salary + bonus * 1.1 AS compensation
FROM employee
WHERE job ='CLERK'
LASTNAME        COMPENSATION
--------------- --------------
O'CONNELL            29910.000
JEFFERSON            22620.000
MARINO               29420.000
SMITH                19620.000
JOHNSON              17580.000
PEREZ                27930.000

  6 record(s) selected.

In addition to supporting expressions in the SELECT clause, expressions also can be used in the WHERE clause provided that the predicate they are used in can be evaluated to TRUE, FALSE, or NULL. Most times, a predicate will evaluate to TRUE or FALSE; however, if a value in the predicate is NULL, then the evaluation will be NULL, except when the IS NULL operator is being used. We will show examples of this difference.

You have already seen the use of the basic equality operator (=). The other basic operators are greater than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), and not equal to (<>). Basic predicates compare two values.

You can combine multiple conditions (predicates) in a single SQL statement using the Boolean operators, OR and AND. These operators allow you to combine multiple conditions in a single SQL statement. An order of evaluation is defined when both AND and OR appear in the WHERE clause. This definition states that AND is evaluated before OR. Parentheses can be used to reorder these operators. The DB2 optimizer will decide the order in which the overall conditions are applied to maximize query performance based on this rule.

For example, to query the employee table to find all of the operators and all of the clerks who make over $20,000, you could use the following query:

SELECT firstnme, lastname, job, salary
FROM employee
WHERE salary > 20000 AND (job = 'OPERATOR' OR job = 'CLERK')
FIRSTNME     LASTNAME        JOB      SALARY
------------ --------------- -------- -----------
SEAN         O'CONNELL       CLERK       29250.00
JAMES        JEFFERSON       CLERK       22180.00
SALVATORE    MARINO          CLERK       28760.00
MARIA        PEREZ           CLERK       27380.00
ETHEL        SCHNEIDER       OPERATOR    26250.00

  5 record(s) selected.

Note the difference in the result set if the parentheses are omitted.

SELECT firstnme, lastname, job, salary
FROM employee
WHERE salary > 20000 AND job = 'OPERATOR' OR job = 'CLERK'
FIRSTNME     LASTNAME        JOB      SALARY
------------ --------------- -------- -----------
SEAN         O'CONNELL       CLERK       29250.00
JAMES        JEFFERSON       CLERK       22180.00
SALVATORE    MARINO          CLERK       28760.00
MARIA        PEREZ           CLERK       27380.00
ETHEL        SCHNEIDER       OPERATOR    26250.00
JOHN         PARKER          OPERATOR    15340.00
PHILIP       SMITH           OPERATOR    17750.00
MAUDE        SETRIGHT        OPERATOR    15900.00

  8 record(s) selected.

This query actually shows us all of the clerks who make over $20,000 and all of the operators regardless of their salary. The order of evaluation makes this query equivalent to:

SELECT firstnme, lastname, job, salary
FROM employee
WHERE (salary > 20000 AND job = 'OPERATOR') OR job = 'CLERK'

Sorting the Output

For all of the examples thus far, the order of the result set has not been specified. Data is retrieved in an undetermined order if there is no ORDER BY clause in the SQL statement. To produce a list of all employees in alphabetical order according to their last name, the following SQL can be used:

SELECT firstnme, lastname, job, salary
FROM employee
ORDER BY lastname

The ORDER BY clause can contain a list of columns that will be used to sort the rows of the result set. The first column in the ORDER BY list is used as the primary sorting key followed by the next item in the list. Each item can be qualified with the DESC or ASC keyword to indicate whether the column should be sorted in descending order or ascending order. The default ordering sequence is ascending. For example, to sort the data first in descending order by job name and then in ascending order by last name but only for those with a salary > $20,000, the following query would work:

SELECT firstnme, lastname, job, salary
FROM employee
WHERE salary > 20000
ORDER BY job DESC, lastname

As an alternative to specifying the column name in the SORT BY clause, you can specify either a label name or a number representing the ordinal position of the column in the result set. Positioning starts at 1 and increases by 1 for every column. For example, the follwing query will produce the same result set as the previous one with the exception that the second column will have a different label name:

SELECT firstnme, lastname AS surname, job, salary
FROM employee
WHERE salary > 20000
ORDER BY 3 DESC, surname

Searching for String Patterns

SQL's powerful LIKE predicate allows you to search for patterns in character strings columns. Suppose you want to generate a list of the employees whose last name starts with the letter S.

SELECT firstnme, lastname
FROM employee
WHERE lastname LIKE 'S%'

In this query, a wildcard character is used with the LIKE predicate. In SQL, the percent character (%) is a substitute for zero or more characters. The search string (S%) can be substituted with names like Stern or Spenser, and so on (because the percent character can substitute zero or more characters, the search string also can be a single letter S).

The percent character can be used any place in the search string and as many times as you need it. The percent sign is not case sensitive, so it can take the place of uppercase or lowercase letters. However, the constant characters included in your search string are case sensitive.

Another wildcard character used with the LIKE predicate is the underline character (_). This character substitutes one and only one character. The underline character can take the place of any character. However, the underline character cannot be substituted for an empty character.

The previous SQL can be modified to include all employees whose name has the letter “C” as its second letter.

SELECT firstnme, lastname
FROM employee
WHERE lastname LIKE '_C%'

This example uses two wildcard characters that work with the LIKE predicate. The search string, in this example, can include names, such as SCOUTTEN or SCHNEIDER. (The first character can be any character, the uppercase letter “C” is the second character in the string, and the string ends with any number of characters.)

NOTE

DB2 stores the string 'CLERK' in the database a little differently depending on whether the data type is a CHAR or a VARCHAR. All CHAR data types are right padded with spaces to the width of the column, whereas VARCHAR data types are not. The equality (=) predicate ignores the right padded spaces in the database when comparing values. The LIKE predicate does not. This means that the following two queries will provide different result sets assuming that the string 'CLERK' is stored in a CHAR(8) data type:

SELECT firstnme, lastname
FROM employee
WHERE job = 'CLERK'

SELECT firstnme, lastname
FROM employee
WHERE job LIKE 'CLERK'

The second query will return nothing in its result set because the LIKE predicate considers all characters including the right padded spaces. To make these two queries equivalent, the comparison string must be changed to 'CLERK%'.


Searching for Data in Ranges

SQL also offers a RANGE operator used to restrict rows that are in a particular range of values. Consider the requirement to list those employees whose salaries are between $20,000 and $30,000.

SELECT lastname, salary
FROM employee
WHERE salary between 20000 and 30000

The BETWEEN predicate includes the values that you specify for searching your data. An important fact about the BETWEEN predicate is that it can work with character ranges as well.

In addition to the salary requirement, this example modifies the SQL to include only those employees whose last name begins with a letter between B and H:

SELECT lastname, salary
FROM employee
WHERE salary BETWEEN 20000 AND 30000 AND
      Lastname BETWEEN 'B' AND 'HZ'

In this example, the second BETWEEN predicate contains character values. We need to specify the HZ value to include all the possible names that start with the letter H. This was done assuming that the letter Z is the last possible value in the alphabet.

Searching for Null Values

NULL values represent an unknown value for a particular occurrence of an entity. You can use a NULL value when you don't know a particular value of a column.

For example, to create a list of all those employees whose phone numbers are not yet entered, the condition is represented with a NULL value.

SELECT firstnme, lastname
FROM employee
WHERE phoneno IS NULL

The IS predicate is used to search for the NULL value in this example. Remember that the NULL value means “unknown.” Because it has no particular value, it can't be compared with other values. You can't use conditional operands, such as equal (=), with NULL values.

Searching for Negative Conditions

The BETWEEN, IS, and LIKE predicates always look for the values that meet a particular condition. These predicates also can be used to look for values that don't meet a particular criterion.

The NOT predicate can be used to look for the opposite condition, combined with the LIKE, BETWEEN, and IS predicate, to accomplish negative searches as shown in the following example:

SELECT lastname, salary
FROM employee
WHERE salary NOT BETWEEN 20000 AND 30000 AND
      Lastname BETWEEN 'B' AND 'HZ'

This example has a BETWEEN predicate combined with the NOT predicate to get a list of those employees whose salaries are not in the range of $20,000, and $30,000.

The next example will list all of the employees who have a number entered in the table next to their name:

SELECT lastname, salary
FROM employee
WHERE phoneno IS NOT NULL

Searching for a Set of Values

In SQL, it is possible to establish a restriction condition based on a set of values. Suppose you need a list of the employees who are analysts, designers, or operators. This can be queried with the following statement:

SELECT firstnme, lastname
FROM employee
WHERE job IN ('ANALYST', 'DESIGNER', 'OPERATOR')

The IN clause is used to denote a set of values. This example uses a constant set of values. In this particular case, the SQL statement also could be written using the OR operator to restrict the test numbers.

You also can use the NOT predicate with the IN clause. In this case, the condition will be true when a value is not present in the set of values provided to the IN clause.

You can use as many values as you want in the IN clause. However, there will be cases when the list of values is very long, and it would be better to retrieve them using another SQL statement.

The IN predicate also can be used to define a set based on conditions. In this case, the IN predicate also accepts an SQL statement that defines the set of values. When you define the set of values using an SQL statement, the SQL statement which defines that set is called a subquery.

Subqueries

A subquery is an SQL statement that is used inside another SQL statement. Subqueries can be used with the IN clause to specify the search arguments for an SQL statement.

For example, if you have a second table (emp_resume) that contains a copy of the employees résumé, this table could be defined with the following SQL:

CREATE TABLE emp_resume
    (empno CHAR(6),
     resume_format varchar(10),
     resume CLOB(5120))

Now suppose you want to find a list of all employees in the employee table who have a résumé. You could use the following query:

SELECT firstnme, lastname
FROM employee
WHERE empno IN (SELECT empno FROM emp_resume)

In this example, the subquery appears as part of the IN clause. The subquery is retrieving all the employee numbers that are found in the emp_resume table.

As you can see in the previous example, the subquery is a standard SQL statement. The only difference here is that the subquery is used as a restriction condition. You will never see its output. The subquery is only used to create a list of values that will be used later by the outer SELECT statement.

The subquery used in this example is known as an uncorrelated subquery. In an uncorrelated subquery, the values retrieved by the subquery are not directly related to the rows processed by the outer SELECT statement. A correlated subquery is a query in which the subquery references values of the outer SELECT.

The next example shows a correlated subquery that achieves the same results as the prior example:

SELECT firstnme, lastname
FROM employee
WHERE EXISTS (SELECT empno
              FROM emp_resume
              WHERE employee.empno = empno)

The WHERE clause in the subquery in this example is making reference to a table that is listed in the outer FROM clause. Note the use of the EXISTS predicate. This predicate is true if a non-null result set is produced from the subquery. The NOT operator can be combined with this operator.

Quantified Predicates

A quantified predicate is used to compare a value or values with a collection of values. The quantified predicates, SOME, ANY, ALL, can be combined with the basic predicates, =, <>, <, >, <=, or >=. The right side of the expression must identify a number of columns that is the same as the left side of the expression, but can return any number of rows. The SOME and ANY predicate are synonymous.

When ALL is specified:

  • The result of the predicate is true if the fullselect returns no values or if the specified relationship is true for every value returned by the fullselect.

  • The result is false if the specified relationship is false for at least one value returned by the fullselect.

  • The result is unknown if the specified relationship is not false for any values returned by the fullselect and at least one comparison is unknown because of the null value.

When SOME or ANY is specified:

  • The result of the predicate is true if the specified relationship is true for each value of at least one row returned by the fullselect.

  • The result is false if the fullselect returns no rows or if the specified relationship is false for at least one value of every row returned by the fullselect.

  • The result is unknown if the specified relationship is not true for any of the rows and at least one comparison is unknown because of a null value.

Let's examine the use of a quantified predicate in a SELECT statement. First, look at the original queries and their result sets.

SELECT empno, firstnme, lastname
FROM employee
WHERE job IN ('DESIGNER', 'ANALYST')
Result of first select

EMPNO  FIRSTNME     LASTNAME
------ ------------ ---------------
000130 DOLORES      QUINTANA
000140 HEATHER      NICHOLLS
000150 BRUCE        ADAMSON
000160 ELIZABETH    PIANKA
000170 MASATOSHI    YOSHIMURA
000180 MARILYN      SCOUTTEN
000190 JAMES        WALKER
000200 DAVID        BROWN
000210 WILLIAM      JONES
000220 JENNIFER     LUTZ

  10 record(s) selected.

SELECT empno, resume_format, LENGTH(resume)AS resume_length
FROM emp_resume
WHERE resume_format = 'ascii'
Result of second select

EMPNO  RESUME_FORMAT RESUME_LENGTH
------ ------------- -------------
000130 ascii                  1257
000140 ascii                  1261
000150 ascii                  1308
000190 ascii                  1237

  4 record(s) selected.

Now, take a look at some examples and their result sets.

SELECT empno, firstnme, lastname
FROM employee
WHERE job IN ('DESIGNER', 'ANALYST') AND
      empno = ANY (SELECT empno
                   FROM emp_resume
                   WHERE resume_format = 'ascii')
EMPNO  FIRSTNME     LASTNAME
------ ------------ ---------------
000130 DOLORES      QUINTANA
000140 HEATHER      NICHOLLS
000150 BRUCE        ADAMSON
000190 JAMES        WALKER

  4 record(s) selected.

The subselect returns (000130, 000140, 000150, 000190) and empno from the employee table equals at least one of these.

SELECT empno, firstnme, lastname
FROM employee
WHERE job IN ('DESIGNER', 'ANALYST') AND
      empno > ANY (SELECT empno
                   FROM emp_resume
                   WHERE resume_format = 'ascii')
EMPNO  FIRSTNME     LASTNAME
------ ------------ ---------------
000140 HEATHER      NICHOLLS
000150 BRUCE        ADAMSON
000160 ELIZABETH    PIANKA
000170 MASATOSHI    YOSHIMURA
000180 MARILYN      SCOUTTEN
000190 JAMES        WALKER
000200 DAVID        BROWN
000210 WILLIAM      JONES
000220 JENNIFER     LUTZ

  9 record(s) selected.

The subselect returns (000130, 000140, 000150, 000190). All empno from the employee table expect for (000130) is greater than at least one of these values and so nine rows are returned.

SELECT empno, firstnme, lastname
FROM employee
WHERE job IN ('DESIGNER', 'ANALYST') AND
      empno > ALL (SELECT empno
                   FROM emp_resume
                   WHERE resume_format = 'ascii')
EMPNO  FIRSTNME     LASTNAME
------ ------------ ---------------
000200 DAVID        BROWN
000210 WILLIAM      JONES
000220 JENNIFER     LUTZ

  3 record(s) selected.

The subselect returns (000130, 000140, 000150, 000190). This time each empno from the employee table must be greater than all of the these values, therefore only three rows are returned.

SELECT empno, firstnme, lastname
FROM employee
WHERE job IN ('DESIGNER', 'ANALYST') AND
      (empno, 1308) = SOME (SELECT empno, LENGTH(resume)
                   FROM emp_resume
                   WHERE resume_format = 'ascii')
EMPNO  FIRSTNME     LASTNAME
------ ------------ ---------------
000150 BRUCE        ADAMSON

  1 record(s) selected.

In this query, the tuple (empno, 1308) must be equal to at least one row of the subselect. Hence, one row is returned in the result set.

Case Expressions

You can add some logic to your SQL statements and output using CASE expressions. Consider the generation of a list of employees. In the report, you want to print the employee's bonus, but instead of printing the dollar figure you want to categorize the bonus as above average, below average, or average. Let's say that if the bonus is between $450 and $600, then it is average. The following SQL statement using a CASE expression can deliver the report:

SELECT firstnme, lastname,
       CASE
        WHEN bonus < 450 THEN 'Below Average'
        WHEN bonus <= 600 THEN 'Average'
        ELSE 'Above Average'
       END
FROM employee

This SQL statement provides string messages based on the conditions of the CASE expression. In this example, the bonus column features a numeric value, but we are using it to produce a character string. The order of the conditions for the CASE expression is very important. DB2 will process the first condition first, then the second, and so on. If you do not pay attention to the order in which the conditions are processed, you might be retrieving the same result for every row in your table. For example, if you coded the <= 600 option before the < 450, all the data that is lower than 600, even 300, will display the message Average.

Aggregate Functions

Aggregate functions provide a single result for a group of qualifying rows for a specified view or table. For example, you might want to know the maximum value of a column for all rows that satisfy a certain condition. Common aggregate functions are MAX, AVG, MIN, COUNT, and SUM. DB2 has other aggregate functions available, such as STDDEV and VARIANCE, which can be used to gather statistical information. Refer to the DB2 UDB SQL Reference for a complete listing.

If you want to find the total salary payroll for all employees, you can use the SUM function.

SELECT SUM(salary)
FROM employee

You might even want to break the sum up by job types; that is, if you wanted the total salaries for each job group, you could use the GROUP BY clause.

SELECT job, SUM(salary)
FROM employee
GROUP BY job

This SQL statement obtains the average number of seats per country and the GROUP BY clause tells DB2 to group together the rows that have the same values in the columns indicated in the group by list. This example groups jobs into subsets. As the subsets are created, DB2 calculates the average of each of those groups or subsets, in this case, by each job.

When you combine vector functions and other elements, such as column names, scalar functions, or calculated columns, you must use the GROUP BY clause. In this case, you must include every element that is not a column function in the group by list. The only elements that can be omitted in the GROUP BY list are constant values.

The next SQL statement obtains a list that includes the average salary, the maximum bonus, and the total count of all employees for each job type:

SELECT job, DECIMAL(AVG(salary),8,2), MAX(bonus), COUNT(*)
FROM employee
GROUP BY job
ORDER by 4
JOB      2          3           4
-------- ---------- ----------- -----------
PRES       52750.00     1000.00           1
SALESREP   46500.00      900.00           1
ANALYST    26110.00      600.00           2
FIELDREP   23053.33      500.00           3
OPERATOR   18810.00      500.00           4
CLERK      24000.00      600.00           6
MANAGER    34856.42      800.00           7
DESIGNER   23731.25      600.00           8

  8 record(s) selected.

Note that the result of an aggregate function also can be used as a parameter for another function, in this case, the casting function DECIMAL. The COUNT aggregate can take either a column name for an argument or the asterisk (*). The asterisk can be used because this function does not require any specific data; rather it only requires a rowset.

OLAP applications use different levels of grouping within the same data. DB2 supports OLAP requirements implementing super group or aggregation features. OLAP-oriented grouping functions are discussed in “Advanced SQL” in the DB2 UDB Database Administration Guide.

Up to now, we have discussed how to restrict output based on row conditions. With SQL, it is also possible to restrict that output using vector functions and the GROUP BY clause. Suppose you want a list of all the job types that have more than five employees.

SELECT job, DECIMAL(AVG(salary),8,2), MAX(bonus), COUNT(*)
FROM employee
GROUP BY job
HAVING COUNT(*) > 5
ORDER by 4

This example introduces the HAVING clause. The HAVING clause is equivalent to the WHERE clause for groups and vector functions. The HAVING clause restricts the result set to only the groups that meet the condition specified in it. In this example, only the job types that have more than five employees will be displayed.

Nested Table Expressions

A nested table expression is a special kind of subquery used in the FROM clause of an SQL statement. Nested table expressions create local temporary tables that are only known in the SQL statement that defines them. These subqueries can be considered temporary views. You can use nested table expressions to select from a grouped table or to obtain the same results that you expect from a view.

Consider the problem of obtaining the maximum average salary of each job type. To gather this result, you must first obtain the averages for each job and then select the maximum value from that list. The following nested table expression accomplishes this request:

SELECT MAX(avg_salary) AS max_average
FROM (
    SELECT job, AVG(salary) AS avg_salary
    FROM employee
    GROUP BY job
) AS averages

In this example, the nested subselect creates a temporary table that will be used by the outer SELECT to obtain the maximum average salary. This temporary table is called averages.

The job column is included in the subquery to be able to gather the average for each one of the job types. After the subquery is completed, the outer SELECT is able to obtain the maximum value of the averages calculated in the nested table expression.

An advantage of using nested table expressions over views is that nested table expressions exist only during the execution of the query, so you don't have to worry about their maintenance. They reduce contention over the system catalog tables, and because they are created at execution time, they can be defined using host variables.

Scalar Fullselect

A scalar fullselect is a SELECT statement that returns only one value. This type of SELECT can be used in different parts of an SQL statement, such as the select list or the WHERE clause. Scalar fullselects can be used to combine grouped information, such as averages or sums, with detailed information in a single query.

Occasionally, you might need to include row data in the report that includes information based on the entire table. For instance, you might want a report that shows the employee's name, salary, average salary, and maximum salary. This information cannot be gathered without the help of temporary tables or views. The following shows how the scalar fullselect can be used to assist in the retrieval of this data:

SELECT firstnme, lastname, salary,
    (SELECT AVG(SALARY)
      FROM employee) AS avg_salary,
    (SELECT MAX(SALARY)
      FROM employee)AS max_salary
From employee

In the example shown, two scalar fullselects are used to retrieve the information about the aggregated data. The first scalar fullselect calculates the average salary and the second calculates the maximum salary for the employees. The SQL statements that produce the average and the maximum values are scalar fullselects. They are complete SQL statements that return only one value.

Now, you can extend this SQL to calculate the average and maximum scores for each one of the job types. To accomplish this request, you need to use a correlated subselect because you must ensure that the SELECT statement returns only one value at a time. The correlated subselect lets you generate the average and maximum scores for each job type.

SELECT firstnme, lastname, salary,
    (SELECT AVG(SALARY)
      FROM employee
      WHERE job = outer.job) AS avg_salary,
    (SELECT MAX(SALARY)
      FROM employee
      WHERE job = outer.job)AS max_salary
From employee AS outer

Examine the WHERE clauses in the previous examples. They both make reference to the table of the outer SELECT statement. The WHERE clauses are used to obtain a separate average and maximum value for each one of the job types in the employee table.

Common Table Expressions

A common table expression is a local temporary table that can be referenced many times in an SQL statement. However, this temporary table only exists for the duration of the SQL statement that defines it. A common table expression provides many of the advantages of nested table expressions discussed earlier. Every time you reference a common table expression, the result will be the same. This means that the SQL statement that generates it will not be reprocessed each time the common table expression is referenced.

Consider again the simple example used to find the maximum average salary of all of the departments. This can be accomplished using a common table expression to find the average salaries. After defining the common table expression, you can use it in an SQL statement as any other table and select the maximum salary from it. For example:

WITH avg AS
  (SELECT job, AVG(salary) AS avg_salary
   FROM employee
   GROUP BY job)
SELECT MAX(avg_salary) AS max_average
FROM avg

You can specify more than one common table expression. Each expression is separated by a comma (,). The last common table expression is not followed by a comma. You can use a common table expression as many times as you want. You can even create a common table expression based on a previously created common table expression. However, you can only use them in the SQL statement that defines them.

When defining more than one common table expression, you can either make each expression based on the last or you can select data from all of them and join their results together.

Selecting Data from Multiple Tables

There are basically two operations that combine columns from multiple tables in a single SQL statement.

  • Cartesian product

  • Join

Cartesian Product

A Cartesian product is a relational operation that merges all the values from one table with all the values from another table. This operation is not used frequently because the result table can be very large.

The number of rows in the result table is always equal to the product of the number of rows in the qualifying rows for each of the tables being accessed. Consider the two queries:

SELECT e.empno, e.firstnme, e.lastname
FROM employee e
WHERE e.job IN ('DESIGNER')
EMPNO  FIRSTNME     LASTNAME
------ ------------ ---------------
000150 BRUCE        ADAMSON
000160 ELIZABETH    PIANKA
000170 MASATOSHI    YOSHIMURA
000180 MARILYN      SCOUTTEN
000190 JAMES        WALKER
000200 DAVID        BROWN
000210 WILLIAM      JONES
000220 JENNIFER     LUTZ

  8 record(s) selected.
SELECT r.empno, r.resume_format
FROM emp_resume r
WHERE r.resume_format = 'ascii'
EMPNO  RESUME_FORMAT
------ -------------
000130 ascii
000140 ascii
000150 ascii
000190 ascii

  4 record(s) selected.

The Cartesian product of these two tables becomes:

SELECT e.empno, e.firstnme, e.lastname,
       r.empno, r.resume_format
FROM employee e, emp_resume r
WHERE e.job IN ('DESIGNER') AND
      r.resume_format = 'ascii'
EMPNO  FIRSTNME     LASTNAME        EMPNO  RESUME_FORMAT
------ ------------ --------------- ------ -------------
000150 BRUCE        ADAMSON         000130 ascii
000160 ELIZABETH    PIANKA          000130 ascii
000170 MASATOSHI    YOSHIMURA       000130 ascii
000180 MARILYN      SCOUTTEN        000130 ascii
000190 JAMES        WALKER          000130 ascii
000200 DAVID        BROWN           000130 ascii
000210 WILLIAM      JONES           000130 ascii
000220 JENNIFER     LUTZ            000130 ascii
000150 BRUCE        ADAMSON         000140 ascii
000160 ELIZABETH    PIANKA          000140 ascii
000170 MASATOSHI    YOSHIMURA       000140 ascii
000180 MARILYN      SCOUTTEN        000140 ascii
000190 JAMES        WALKER          000140 ascii
000200 DAVID        BROWN           000140 ascii
000210 WILLIAM      JONES           000140 ascii
000220 JENNIFER     LUTZ            000140 ascii
000150 BRUCE        ADAMSON         000150 ascii
000160 ELIZABETH    PIANKA          000150 ascii
000170 MASATOSHI    YOSHIMURA       000150 ascii
000180 MARILYN      SCOUTTEN        000150 ascii
000190 JAMES        WALKER          000150 ascii
000200 DAVID        BROWN           000150 ascii
000210 WILLIAM      JONES           000150 ascii
000220 JENNIFER     LUTZ            000150 ascii
000150 BRUCE        ADAMSON         000190 ascii
000160 ELIZABETH    PIANKA          000190 ascii
000170 MASATOSHI    YOSHIMURA       000190 ascii
000180 MARILYN      SCOUTTEN        000190 ascii
000190 JAMES        WALKER          000190 ascii
000200 DAVID        BROWN           000190 ascii
000210 WILLIAM      JONES           000190 ascii
000220 JENNIFER     LUTZ            000190 ascii

  32 record(s) selected.

The result table is a representation of all possible combinations of the input tables. The employee table has 8 rows and the emp_resume table has 4 rows. Therefore, the final Cartesian product consists of 32 rows. Examine the result set carefully. You will see that each row from both tables has been combined with all rows from the other table.

In the previous example, the table, employee, has been aliased to the name 'e' and all columns from this table are prepended with 'e'. Likewise, emp_resume has been aliased to 'r'. By qualifying the column names, DB2 will know from which table a column is being referred. If the column name is unique, then it does not need to be qualified. However, the column, empno, exists in both tables and must be qualified.

The result set presented here is not all that useful for any kind of report. A more useful report would be to know the names of all the employees who have a résumé at hand. To do this, you use the JOIN operator.

Join

To join the results of two tables together into a more meaningful result set, you need to develop a predicate that will limit the number of rows to something more meaningful. Both of the previous tables have a common column called empno. This column is used to relate the data from each row to data from another row in another table. For example, Bruce Adamson whose employee number is '000150', has a résumé filed within the database. The following SQL will generate a report of all the employees in your query who have a résumé.

SELECT e.empno, e.firstnme, e.lastname,
       r.empno, r.resume_format
FROM employee e, emp_resume r
WHERE e.job IN ('DESIGNER') AND
      r.resume_format = 'ascii' AND
      e.empno = r.empno
EMPNO  FIRSTNME     LASTNAME        EMPNO  RESUME_FORMAT
------ ------------ --------------- ------ -------------
000150 BRUCE        ADAMSON         000150 ascii
000190 JAMES        WALKER          000190 ascii

  2 record(s) selected.

As you can see from the query, one additional predicate was added which basically says that if an employee number from the left table (the first table in the FROM clause) is equal to an employee number in the right table (the next table in the FROM clause), then report the results.

This type of join is also called an INNER JOIN. An INNER JOIN reports only those rows that are present in both of the joined tables.

Alternatively, the INNER JOIN can be made using the following syntax where the keyword INNER is optional:

SELECT e.empno, e.firstnme, e.lastname,
       r.empno, r.resume_format
FROM employee e INNER JOIN emp_resume r ON e.empno=r.empno
WHERE e.job IN ('DESIGNER') AND
      r.resume_format = 'ascii'

Inner joins can be thought of as the cross product of the tables (combine each row of the left table with every row of the right table), keeping only the rows where the join-condition is true. The result table might be missing rows from either or both of the joined tables. Outer joins on the other hand include the inner join and preserve these missing rows. There are three types of outer joins.

  • Left outer join— Includes rows from the left table that were missing from the inner join.

  • Right outer join— Includes rows from the right table that were missing from the inner join.

  • Full outer join— Includes rows from both the left and right tables that were missing from the inner join.

Let's look at a left outer join.

WITH emp AS
  (SELECT empno, firstnme, lastname
     FROM employee
     WHERE job IN ('DESIGNER')),
res AS
  (SELECT empno, resume_format
   FROM emp_resume
   WHERE resume_format = 'ascii')
SELECT *
FROM emp LEFT OUTER JOIN res ON emp.empno = res.empno
EMPNO  FIRSTNME     LASTNAME        EMPNO  RESUME_FORMAT
------ ------------ --------------- ------ -------------
000150 BRUCE        ADAMSON         000150 ascii
000160 ELIZABETH    PIANKA          -      -
000170 MASATOSHI    YOSHIMURA       -      -
000180 MARILYN      SCOUTTEN        -      -
000190 JAMES        WALKER          000190 ascii
000200 DAVID        BROWN           -      -
000210 WILLIAM      JONES           -      -
000220 JENNIFER     LUTZ            -      -

  8 record(s) selected.

The result set contains all of the rows from the left table (employee) and NULL data if the row does not exist in the right table (emp_resume). A right outer join would reverse this effect.

WITH emp AS
  (SELECT empno, firstnme, lastname
     FROM employee
     WHERE job IN ('DESIGNER')),
res AS
  (SELECT empno, resume_format
   FROM emp_resume
   WHERE resume_format = 'ascii')
SELECT *
FROM emp RIGHT OUTER JOIN res ON emp.empno = res.empno
EMPNO  FIRSTNME     LASTNAME        EMPNO  RESUME_FORMAT
------ ------------ --------------- ------ -------------
000150 BRUCE        ADAMSON         000150 ascii
000190 JAMES        WALKER          000190 ascii
-      -            -               000130 ascii
-      -            -               000140 ascii

  4 record(s) selected.

A full outer join combines these two result sets so that NULL data can exist from both the left and right table if a row exists in one or the other.

WITH emp AS
  (SELECT empno, firstnme, lastname
     FROM employee
     WHERE job IN ('DESIGNER')),
res AS
  (SELECT empno, resume_format
   FROM emp_resume
   WHERE resume_format = 'ascii')
SELECT *
FROM emp FULL OUTER JOIN res ON emp.empno = res.empno
EMPNO  FIRSTNME     LASTNAME        EMPNO  RESUME_FORMAT
------ ------------ --------------- ------ -------------
000150 BRUCE        ADAMSON         000150 ascii
000190 JAMES        WALKER          000190 ascii
-      -            -               000130 ascii
-      -            -               000140 ascii
000160 ELIZABETH    PIANKA          -      -
000170 MASATOSHI    YOSHIMURA       -      -
000180 MARILYN      SCOUTTEN        -      -
000200 DAVID        BROWN           -      -
000210 WILLIAM      JONES           -      -
000220 JENNIFER     LUTZ            -      -

  10 record(s) selected.

Set Operators

SQL offers a group of operators that are used to implement the relational operations of union (UNION operator), intersection (INTERSECT operator), and difference (EXCEPT operator).

Union (UNION Operator)

The union operation lets you combine the results of two or more different SQL statements into one answer set. You can combine many different tables or SQL statements using the UNION operator; the only restriction is that every table or SQL statement must have the same type, number, and order of columns. Suppose you wanted to combine the minimum and maximum salary for each of the job types and add a string constant that indicates which values are the maximum and minimum:

SELECT job, 'Minimum:', MIN(salary)
FROM employee
GROUP BY job
UNION
SELECT job,'Maximum:', MAX(salary)
FROM employee
GROUP BY job
ORDER BY job,2

The UNION operator shows you the results of two or more separate queries as a single one. In the example, the first query calculates the minimum salary of the employee table. The second query calculates the maximum salary. Both queries have the same type, order, and number of columns.

In this example, the two SQL statements are very similar. However, you can combine very different queries using the UNION operator. Just remember the restriction about the resulting rows.

The UNION operator removes duplicate rows from the resulting set. However, at times you'll need to list all the rows processed by your SQL statements. SQL provides the ALL operator clause that allows you to keep all the rows involved in a UNION operation. Let's create a list of all the first names and last names in the employee table. In this example, you want all the first names that start with a letter G and all the last names for the employees who are designers.

This example cannot be processed with an OR operator because you are not interested in the first names of those employees who are designers. Because there can be a first name that is the same as a last name, use a UNION ALL operator to show all the rows.

SELECT firstnme
FROM employee
WHERE firstnme LIKE 'G%'
UNION ALL
SELECT lastname
FROM employee
WHERE job = 'DESIGNER'

NOTE

Always try to code a UNION ALL. Only code a UNION when duplicates are not desired. The UNION ALL offers better performance. However, you can't always substitute a UNION for a UNION ALL.


Intersection (INTERSECT Operator)

The intersection set operation is implemented in DB2 using the INTERSECT operator. Using INTERSECT, you can find the elements that belong to two different answer sets.

For example, you might want a list of all the employee numbers that are present in the emp_resume table and in the employee table. This requirement can be seen as the intersection of the two result sets.

SELECT empno
FROM emp_resume
INTERSECT
SELECT empno
FROM employee

Like the UNION operator, there is an INTERSECT ALL operator. If you use the INTERSECT ALL operator, the result set will return duplicate values, which is often desirable.

The SQL statement shown in this example is equivalent to a join operation between the employee and emp_resume tables using the DISTINCT clause in the select list.

Difference (EXCEPT Operator)

The difference set operation is the complementary set of an intersection between two sets. It is implemented in DB2 using the EXCEPT operator. Using a difference set operation, you can find out which elements of a resulting set are not present in another answer set.

Suppose you want to know which employee numbers are not present in the emp_resume table. This is effectively saying, “Show me all the employee numbers who do not have a résumé on file.” The query using the EXCEPT operator is shown in the following example.

SELECT empno
FROM employee
EXCEPT
SELECT empno
FROM emp_resume

The first part of the example retrieves all the employee numbers. The second part of the query retrieves the employee numbers present in the emp_resume table. Finally, the EXCEPT operator performs the difference operation that filters those employee numbers not present in the emp_resume table. Without the EXCEPT operator, the query would have been more complicated.

As with the other set operators, there is also an EXCEPT ALL operator that does not eliminate duplicates from the result.

NOTE

The columns in both select lists of a set operator, such as UNION, INTERSECT, and EXCEPT, must have the same column structure.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.21.30