Using the EXCEPT Set Operator

Overview

The set operator EXCEPT does both of the following:
  • selects unique rows from the first table (the table specified in the first query) that are not found in the second table (the table specified in the second query)
  • overlays columns.
overlays columns
Consider how EXCEPT works when used alone and with the keywords ALL and CORR.

Using the EXCEPT Operator Alone

Suppose you want to display the unique rows in table One that are not found in table Two. The PROC SQL set operation that includes the EXCEPT operator, the tables One and Two, and the output of the set operation are shown below:
proc sql;
   select *
      from one
   except
   select *
      from two;
Tables One, Two, and Output
The set operator EXCEPT overlays columns by their position. In this output, the following columns are overlaid:
  • the first columns, One.X, and Two.X, both of which are numeric
  • the second columns, One.A, and Two.B, both of which are character.
The column names from table One are used, so the second column of output is named A rather than B.
Consider how PROC SQL selects rows from table One to display in output.
In the first pass, PROC SQL eliminates any duplicate rows from the tables. As shown below, there is one duplicate row: in table One, the second row is a duplicate of the first row. All remaining rows in table One are still candidates in PROC SQL's selection process.
proc sql;
   select *
      from one
   except
   select *
      from two;
Tables One and Two
In the second pass, PROC SQL identifies any rows in table One for which there is a matching row in table Two and eliminates them. There is one matching row in the two tables, as shown below, which is eliminated.
proc sql;
   select *
      from one
   except
   select *
      from two;
Tables One and Two
The five remaining rows in table One, the unique rows, are displayed in the output.
Output

Using the Keyword ALL with the EXCEPT Operator

To select all rows in the first table (both unique and duplicate) that do not have a matching row in the second table, add the keyword ALL after the EXCEPT set operator. The modified PROC SQL set operation, the tables One and Two, and the output are shown below:
proc sql;
   select *
      from one
   except all
   select *
      from two;
Tables One, Two and Output
The output now contains six rows. PROC SQL has again eliminated the one row in table One (the fifth row) that has a matching row in table Two (the fourth row). Remember that when the keyword ALL is used with the EXCEPT operator, PROC SQL does not make an extra pass through the data to remove duplicate rows within table One. Therefore, the second row in table One, which is a duplicate of the first row, is now included in the output.

Using the Keyword CORR with the EXCEPT Operator

To display both of the following, add the keyword CORR after the set operator.
  • only columns that have the same name
  • all unique rows in the first table that do not appear in the second table.
The modified PROC SQL set operation, the tables One and Two, and the output are shown below:
proc sql;
   select *
      from one
   except corr
   select *
      from two;
Tables One, Two and Output
X is the only column that has the same name in both tables, so X is the only column that PROC SQL examines and displays in the output.
In the first pass, PROC SQL eliminates the second and third rows of table One from the output because they are not unique within the table; they contain values of X that duplicate the value of X in the first row of table One. In the second pass, PROC SQL eliminates the first, fourth, and fifth rows of table One because each contains a value of X that matches a value of X in a row of table Two. The output displays the two remaining rows in table One, the rows that are unique in table One and that do not have a row in table Two that has a matching value of X.

Using the Keywords ALL and CORR with the EXCEPT Operator

If the keywords ALL and CORR are used together, the EXCEPT operator displays all unique and duplicate rows in the first table that do not appear in the second table, and overlays and display only columns that have the same name. The modified PROC SQL set operation, the tables One and Two, and the output are shown below:
proc sql;
   select *
      from one
   except all corr
   select *
      from two;
Tables One, Two, and Output
Once again, PROC SQL looks at and displays only the column that has the same name in the two tables: X. Because the ALL keyword is used, PROC SQL does not eliminate any duplicate rows in table One. Therefore, the second and third rows in table One, which are duplicates of the first row in table One, appear in the output. PROC SQL does eliminate the first, fourth, and fifth rows in table One from the output because for each one of these three rows there is a corresponding row in table Two that has a matching value of X.
As this example shows, when the ALL keyword is used with the EXCEPT operator, a row in table One cannot be eliminated from the output unless it has a separate matching row in table Two. Table One contains three rows in which the value of X is 1, but table Two contains only one row in which the value of X is 1. That one row in table Two causes the first of the three rows in table One that have a matching value of X to be eliminated from the output. However, table Two does not have two additional rows in which the value of X is 1, so the other two rows in table One are not eliminated, and do appear in the output.

Example: EXCEPT Operator

The EXCEPT operator can be used to solve a realistic business problem. Suppose you want to display the names of all new employees of a company. There is no table that contains information for only the new employees, so you use data from the following two tables.
Table
Relevant Columns
Sasuser.Staffchanges lists information for all new employees and existing employees who have had a change in salary or job code
FirstName, LastName
Sasuser.Staffmaster lists information for all existing employees
FirstName, LastName
The relationship between these two tables is shown in the diagram below:
EXCEPT Operator
The intersection of these two tables includes information for all existing employees who have had changes in job code or salary. The shaded portion, the portion of Sasuser.Staffchanges that does not overlap with Sasuser.Staffmaster, includes information for the people that you want: new employees.
To separate the new employees from the existing employees in Sasuser.Staffchanges, you create a set operation that displays all rows from the first table (Sasuser.Staffchanges) that do not exist in the second table (Sasuser.Staffmaster). The following PROC SQL step solves the problem:
proc sql;
   select firstname, lastname
      from sasuser.staffchanges
   except all
   select firstname, lastname
      from sasuser.staffmaster;
This PROC SQL set operation includes the operator EXCEPT and the keyword ALL. Although you do not want the output to contain duplicate rows, you already know that there are no duplicates in these two tables. Therefore, ALL is specified to prevent PROC SQL from making an extra pass through the data, which speeds up the processing of this query.
PROC SQL compares only the columns that are specified in the SELECT clauses, and these columns are compared in the order in which they are specified. The output displays the first and last names of the two new employees.
PROC SQL set operation
Note: In a set operation that uses the EXCEPT operator, the order in which the tables are listed in the SELECT statement makes a difference. If the tables in this example were listed in the opposite order, the output would display all existing employees who have had no changes in salary or job code.

Example: EXCEPT Operator in an In-Line View

This example is a variation of the preceding set operation. Suppose you want to display the number of existing employees who have had no changes in salary or job code. Once again, the query uses the following tables and columns.
Table
Relevant Columns
Sasuser.Staffchanges lists information for all new employees and existing employees who have had a change in salary or job code
FirstName, LastName
Sasuser.Staffmaster lists information for all existing employees
FirstName, LastName
The following PROC SQL query solves this problem:
proc sql;
   select count(*) label='No. of Persons'
      from (select EmpID
              from sasuser.staffmaster
            except all
            select EmpID
              from sasuser.staffchanges);
This PROC SQL query uses
  • the COUNT function with an asterisk (*) as an argument to count the number of employee IDs returned from the set operation
  • the set operator EXCEPT within an in-line view.
The in-line view returns a virtual table that contains employees who have had no changes in salary or job code. This virtual table is then passed to the COUNT(*) summary function, which counts the number of rows in the virtual table. The output shows that there are 144 existing employees who have had no changes in salary or job code.
PROC SQL with COUNT function
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.134.151