Understanding Set Operations

Overview

A set operation contains
  • two queries (each beginning with a SELECT clause)
  • a set operator
  • one or both of the keywords ALL and CORR (CORRESPONDING).
General form of an SQL query using a set operator:
SELECT column-1<, ... column-n>
FROM table-1 | view-1<, ... table-n | view-n>
<optional query clauses>
set-operator <ALL> <CORR>
SELECT column-1<, ... column-n>
FROM table-1 | view-1<, ... table-n | view-n>
<optional query clauses>;
Here is an explanation of the syntax:
SELECT
specifies the column(s) that will appear in the result.
FROM
specifies the table(s) or view(s) to be queried.
optional query clauses
are used to refine the query further and include the clauses WHERE, GROUP BY, HAVING, and ORDER BY.
  • the set-operator is one of the following: EXCEPT | INTERSECT | UNION | OUTER UNION.
  • the optional keywords ALL and CORR (CORRESPONDING) further modify the set operation.
The query or set operation contains one semicolon, which is placed after the last SELECT statement.

Example

In the following PROC SQL step, the SELECT statement contains one set operation. The set operation uses the set operator UNION to combine the result of a query on the table Sasuser.Stress98 with the result of a query on the table Sasuser.Stress99.
proc sql;
   select *
      from sasuser.stress98
   union
   select *
      from sasuser.stress99;
You learn the details about using each set operator later in this chapter.

Processing a Single Set Operation

PROC SQL evaluates a SELECT statement with one set operation as follows:
  1. Each query is evaluated to produce an intermediate (internal) result table.
  2. Each intermediate result table then becomes an operand linked with a set operator to form an expression (for example, Table1 UNION Table2).
  3. PROC SQL evaluates the entire expression to produce a single output result set.

Using Multiple Set Operators

A single SELECT statement can contain more than one set operation. Each additional set operation includes a set operator and a group of query clauses, as shown in the following example:
proc sql;
   select *
      from table1
   set-operator
   select *
      from table2
   set-operator
   select *
      from table3;
This SELECT statement uses two set operators to link together three queries.
Regardless of the number of set operations in a SELECT statement, the statement contains only one semicolon, which is placed after the last query.

Example

The following PROC SQL step contains two set operators (both are OUTER UNION) that combine three queries:
 proc sql; 
    select *
       from sasuser.mechanicslevel1
    outer union
    select *
       from sasuser.mechanicslevel2
    outer union
    select *
       from sasuser.mechanicslevel3;

Processing Multiple Set Operations

When PROC SQL evaluates a SELECT statement that contains multiple set operations, an additional processing step (step 3 below) is required:
  1. Each query is evaluated to produce an intermediate (internal) result table.
  2. Each intermediate result table then becomes an operand linked with a set operator to form an expression (for example, Table1 UNION Table2).
  3. If the set operation contains more than two queries, then the result from the first two queries (enclosed in parentheses in the following examples) becomes an operand for the next set operator and operand. For example:
    • with two set operators: (Table1 UNION Table2) EXCEPT Table3
    • with three set operators: ((Table1 UNION Table2) EXCEPT Table3) INTERSECT Table4.
  4. PROC SQL evaluates the entire expression to produce a single output result set.
Note: When processing set operators, PROC SQL follows a default order of precedence, unless this order is overridden by parentheses in the expression(s). By default, INTERSECT is evaluated first. OUTER UNION, UNION, and EXCEPT all have the same level of precedence.

Introducing Set Operators

Each of the four set operators EXCEPT, INTERSECT, UNION, and OUTER UNION selects rows and handles columns in a different way, as described below.
Note: In the following chart, Table 1 is the table that is referenced in the first query and Table 2 is the table that is referenced in the second query.
Set Operator
Treatment of Rows
Treatment of Columns
Example
EXCEPT
Selects unique rows from the first table that are not found in the second table.
Venn Diagram, EXCEPT Operator
Overlays columns based on their position in the SELECT clause without regard to the individual column names.
proc sql;
    select *
       from table1
    except
    select *
       from table2;
INTERSECT
Selects unique rows that are common to both tables.
Venn Diagram, INTERSECT Operator
Overlays columns based on their position in the SELECT clause without regard to the individual column names.
proc sql;
    select *
       from table1
    intersect
    select *
       from table2;
UNION
Selects unique rows from both tables.
Venn Diagram, UNION Operator
Overlays columns based on their position in the SELECT clause without regard to the individual column names.
proc sql;
    select *
       from table1
    union
    select *
       from table2;
OUTER UNION
Selects all rows from both tables.
The OUTER UNION operator concatenates the results of the queries.
Venn Diagram, OUTER UNION Operator
Does not overlay columns.
proc sql;
    select *
       from table1
    outer union
    select *
       from table2;
Note: A set operator that selects only unique rows displays one occurrence of a given row in output.

Processing Unique versus Duplicate Rows

When processing a set operation that displays only unique rows (a set operation that contains the set operator EXCEPT, INTERSECT, or UNION), PROC SQL makes two passes through the data, by default:
  1. PROC SQL eliminates duplicate (nonunique) rows in the tables.
  2. PROC SQL selects the rows that meet the criteria and, where requested, overlays columns.
For set operations that display both unique and duplicate rows, only one pass through the data (step 2 above) is required.

Combining and Overlaying Columns

You can use a set operation to combine tables that have different numbers of columns and rows or that have columns in a different order.
Three of the four set operators (EXCEPT, INTERSECT, and UNION) combine columns by overlaying them. (The set operator OUTER UNION does not overlay columns.)
By default, the set operators EXCEPT, INTERSECT, and UNION overlay columns based on the relative position of the columns in the SELECT clause. Column names are ignored. You control how PROC SQL maps columns in one table to columns in another table by specifying the columns in the appropriate order in the SELECT clause. The first column specified in the first query's SELECT clause and the first column specified in the second query's SELECT clause are overlaid, and so on.
When columns are overlaid, PROC SQL uses the column name from the first table (the table referenced in the first query). If there is no column name in the first table, the column name from the second table is used. When the SELECT clause contains an asterisk (*) instead of a list of column names, the set operation combines the tables (and, if applicable, overlays columns) based on the positions of the columns in the tables.
For example, the following set operation uses the set operator EXCEPT, so columns are overlaid. The SELECT clause in each query uses an asterisk (*), so the columns are overlaid based on their positions in the tables. The first column in table One (X) is overlaid on the first column in table Two (X), and so on.
proc sql;
   select *
      from one
   except
   select *
      from two;
Tables One, Two, and Combined, Overlaid Table
In order to be overlaid, columns in the same relative position in the two SELECT clauses must have the same data type. If they do not, PROC SQL generates a warning message in the SAS log and stops executing. For example, in the tables shown above, if the column One.X had a different data type than column Two.X, the SAS log would display the following error message.
Table 4.1 SAS Log
ERROR: Column 1 from the first contributor of EXCEPT 
is not the same type as its counterpart from the second.
Next, we use the keywords ALL and CORR to modify the default action of the set operators.

Modifying Results By Using Keywords

To modify the behavior of set operators, you can use either or both of the keywords ALL and CORR immediately following the set operator:
proc sql;
   select *
      from table1
   set-operator <all> <corr> 
   select *
      from table2;
The use of each keyword is described below.
Keyword
Action
Used When...
ALL
Makes only one pass through the data and does not remove duplicate rows.
You do not care if there are duplicates.
Duplicates are not possible.
ALL cannot be used with OUTER UNION.
CORR (or CORRESPONDING)
Compares and overlays columns by name instead of by position:
  • When used with EXCEPT, INTERSECT, and UNION, removes any columns that do not have the same name in both tables.
  • When used with OUTER UNION, overlays same-named columns and displays columns that have nonmatching names without overlaying.
If an alias is assigned to a column in the SELECT clause, CORR use the alias instead of the permanent column name.
Two tables have some or all columns in common, but the columns are not in the same order.
In the remainder of this chapter, you learn more about the use of each set operator, with and without the keywords ALL and CORR.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.68.28