Understanding Set Operators

Introducing Set Operators

You can combine the results of two or more queries by using set operators. Each of the four set operators—EXCEPT, INTERSECT, UNION, and OUTER UNION—selects rows and handles columns in a different way, as described below.
Note: In the comparison below, Table 1 is the table that is referenced in the first query, and Table 2 is the table that is referenced in the second query.
Set Operator
Treatment of Rows
Treatment of Columns
Example
EXCEPT
Selects unique rows from the first table that are not found in the second table.
Venn Diagram, EXCEPT Operator
Overlays columns based on their position in the SELECT clause without regard to the individual column names.
proc sql;
    select *
       from table1
    except
    select *
       from table2;
INTERSECT
Selects unique rows that are common to both tables.
Venn Diagram, INTERSECT Operator
Overlays columns based on their position in the SELECT clause without regard to the individual column names.
proc sql;
   select *
      from table1
    intersect
   select *
      from table2;
quit;
UNION
Selects unique rows from both tables.
Venn Diagram, UNION Operator
Overlays columns based on their position in the SELECT clause without regard to the individual column names.
proc sql;
   select *
      from table1
    union
   select *
      from table2;
quit;
OUTER UNION
Selects all rows from both tables.
The OUTER UNION operator concatenates the results of the queries.
Venn Diagram, OUTER UNION Operator
Does not overlay columns.
proc sql;
   select *
      from table1
    outer union
   select *
      from table2;
quit;
A set operation contains the following elements:
  • two queries (each beginning with a SELECT clause)
  • a set operator
  • one or both of the keywords ALL and CORR (CORRESPONDING)

Set Operator Syntax

Syntax of an SQL query using a set operator:
SELECT column-1<, ... column-n>
FROM table-1 | view-1<, ... table-n | view-n>
<optional query clauses>
set-operator <ALL> <CORR>
SELECT column-1<, ... column-n>
FROM table-1 | view-1<, ... table-n | view-n>
<optional query clauses>;
SELECT
specifies the column(s) to appear in the result.
FROM
specifies the table(s) or view(s) to be queried.
optional query clauses
are used to refine the query further and include the clauses WHERE, GROUP BY, HAVING, and ORDER BY.
  • the set-operator is one of the following: EXCEPT|INTERSECT|UNION|OUTER UNION.
  • the optional keywords ALL and CORR (CORRESPONDING) further modify the set operation.
Note: Place a semicolon after the last SELECT statement only.
Set operators combine columns from two queries based on their position in the referenced tables without regard to the individual column names. Columns in the same relative position in the two queries must have the same data type. The column names of the tables in the first query become the column names of the output table.
The following optional keywords give you more control over set operations:
ALL
does not suppress duplicate rows. When the keyword ALL is specified, PROC SQL does not make a second pass through the data to eliminate duplicate rows. Thus, using ALL is more efficient than not using it. ALL is not allowed with the OUTER UNION operator.
CORR
overlays columns that have the same name in both tables. When used with EXCEPT, INTERSECT, and UNION, CORR suppresses columns that are not in both tables.

Example: The Basics of Using a Set Operator

When a PROC SQL step evaluates a SELECT statement with one set operator, it does the following:
  • evaluates each query to produce an intermediate, internal, result table
  • makes each intermediate result table become an operand that is linked with a set operator to form an expression
  • evaluates the entire expression to produce a single output result set
In the following PROC SQL step, the SELECT statement contains one set operation. The set operation uses the set operator UNION to combine the result of a query on the table Certadv.Stress17 with the result of a query on the table Certadv.Stress18.
proc sql;
   select *
      from certadv.stress17 union
   select *
      from certadv.stress18;
quit;

Using Multiple Set Operators

A Brief Overview

A single SELECT statement can contain more than one set operation. Each additional set operation includes a set operator and a group of query clauses.
proc sql;
   select *
      from table1
   set-operator
   select *
      from table2
   set-operator
   select *
      from table3;
quit;
This SELECT statement uses two set operators to link together three queries. Regardless of the number of set operations in a SELECT statement, the statement contains only one semicolon, which is placed after the last query.

Example: Using Multiple Set Operators

The following PROC SQL step contains two set operators that combine three queries.
proc sql; 
    select *
       from certadv.mechanicslevel1
    outer union
    select *
       from certadv.mechanicslevel2
    outer union
    select *
       from certadv.mechanicslevel3;
quit;
The PROC SQL query result displays all the columns from Certadv.MechanicsLevel1, Certadv.MechanicsLevel2, and Certadv.MechanicsLevel3. The OUTER UNION operator selects all the rows from all from the tables and displays them in the query result. It does not overlay the columns. Because Certadv.MechanicsLevel1, Certadv.MechanicsLevel2, and Certadv.MechanicsLevel3 do not have columns with the same values, there are missing values displayed in some columns.
Output 4.1 PROC SQL Query Result
PROC SQL Query Result

Processing Multiple Set Operators

When PROC SQL evaluates a SELECT statement that contains multiple set operations, an additional processing step, step 3 below, is required:
  1. Each query is evaluated to produce an intermediate (internal) result table.
  2. Each intermediate result table then becomes an operand that is linked with a set operator to form an expression (for example, Table1 UNION Table2).
  3. If the set operation contains more than two queries, the result from the first two queries (enclosed in parentheses in the following examples) becomes an operand for the next set operator and operand.
    • with two set operators: (Table1 UNION Table2) EXCEPT Table3
    • with three set operators: ((Table1 UNION Table2) EXCEPT Table3) INTERSECT Table4
  4. PROC SQL evaluates the entire expression to produce a single output result set.
Note: When processing set operators, PROC SQL follows a default order of precedence, unless this order is overridden by parentheses in the expressions. By default, INTERSECT is evaluated first. OUTER UNION, UNION, and EXCEPT all have the same level of precedence.

Processing Unique versus Duplicate Rows

When processing a set operation that displays only unique rows (a set operation that contains the set operator EXCEPT, INTERSECT, or UNION), PROC SQL makes two passes through the data, by default:
  1. PROC SQL eliminates duplicate, nonunique, rows in the tables.
  2. PROC SQL selects the rows that meet the criteria and, where requested, overlays columns.
For set operations that display both unique and duplicate rows, only one pass through the data, step 2 above, is required.

Combining and Overlaying Columns

A Brief Overview

You can use a set operation to combine tables that have different numbers of columns and rows or that have columns in a different order.
By default, the set operators EXCEPT, INTERSECT, and UNION overlay columns based on the relative position of the columns in the SELECT clause. Column names are ignored. You control how PROC SQL maps columns in one table to columns in another table by specifying the columns in the appropriate order in the SELECT clause. The first column that is specified in the first query's SELECT clause and the first column that is specified in the second query's SELECT clause are overlaid, and so on.
When columns are overlaid, PROC SQL uses the column name from the first table. If there is no column name in the first table, the column name from the second table is used. When the SELECT clause contains an asterisk (*) instead of a list of column names, the set operation combines the tables based on the positions of the columns in the tables.

Example: Overlaying Columns

The following example takes the first column from Certadv.Col1 and overlays it in Certadv.Col2.
title 'PROC SQL Query Result';
proc sql;
   select *                      /*1*/
      from certadv.col1 except   /*2*/
   select *
      from certadv.col2;
quit;
1 The SELECT clause in each query uses an asterisk (*) to overlay columns based on their positions in the tables.
2 The EXCEPT set operator overlays columns.
Output 4.2 PROC SQL Query Result: Overlaying Columns
PROC SQL Query Result
In order to be overlaid, columns in the same relative position in the two SELECT clauses must have the same data type. If they do not, PROC SQL generates a warning message in the SAS log and stops executing. For example, in the tables shown above, if the data type differed in column Col1.X and Col2.X, the SAS log would display the following error message.
Log 4.1 SAS Log
ERROR: Column 1 from the first contributor of EXCEPT 
is not the same type as its counterpart from the second.

Modifying Results Using Keywords

To modify the behavior of set operators, you can use either or both of the keywords ALL and CORR immediately following the set operator:
proc sql;
   select *
      from table1
   set-operator <all> <corr> 
   select *
      from table2;
quit;
The use of each keyword is described below.
Keyword
Action
Situation
ALL
Makes only one pass through the data and does not remove duplicate rows.
It does not matter whether there are duplicates.
Duplicates are not possible.
ALL cannot be used with OUTER UNION.
CORR (or CORRESPONDING)
Compares and overlays columns by name instead of by position:
  • When used with EXCEPT, INTERSECT, and UNION, removes any columns that do not have the same name in both tables.
  • When used with OUTER UNION, overlays same-named columns and displays columns that have nonmatching names without overlaying.
If an alias is assigned to a column in the SELECT clause, CORR uses the alias instead of the permanent column name.
Two tables have some or all columns in common, but the columns are not in the same order.
Last updated: October 16, 2019
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.43.192