Using the INTERSECT Set Operator

Overview

The set operator INTERSECT does both of the following:
  • selects unique rows that are common to both tables
  • overlays columns.
overlays columns
The following example demonstrates how INTERSECT works when used alone and with the keywords ALL and CORR.

Using the INTERSECT Operator Alone

The INTERSECT operator compares and overlays columns in the same way as the EXCEPT operator, by column position instead of column name. However, INTERSECT selects rows differently and is displayed in output the unique rows that are common to both tables. The following PROC SQL set operation uses the INTERSECT operator to combine the tables One and Two, which were introduced previously:
proc sql;
   select *
      from one
   intersect
   select *
      from two;
Tables One, Two, and Output
Tables One and Two have only one unique row in common and this row is displayed in the output. (This is the same row that was eliminated in the earlier example that contained the EXCEPT operator.)

Using the Keyword ALL with the INTERSECT Operator

Adding the keyword ALL to the preceding PROC SQL query prevents PROC SQL from making an extra pass through the data. If there were any rows common to tables One and Two that were duplicates of other common rows, they would also be included in output. However, as you have seen, there is only one common row in these tables. The modified PROC SQL query, the tables One and Two, and the output are shown below:
proc sql;
   select *
      from one
   intersect all
   select *
      from two;
Tables One, Two, and Output
As before, there is just one row of output.

Using the Keyword CORR with the INTERSECT Operator

To display the unique rows that are common to the two tables based on the column name instead of the column position, add the CORR keyword to the PROC SQL set operation. The modified query, the tables One and Two, and the output are shown below:
proc sql;
   select *
      from one
   intersect corr
   select *
      from two;
Tables One, Two, and Output
X is the only column name that is common to both tables, so X is the only column that PROC SQL examines and displays in the output. In the first pass, PROC SQL eliminates the rows that are duplicated within each table: the second and third rows in table One contain the same value for X as the first row, and the fourth row in table Two contains the same value for X as the third row. In the second pass, PROC SQL eliminates any rows that are not common across tables: the fourth and fifth rows in table One and the fifth row in table Two do not have a matching value of X in the other table. The output displays the three rows with unique values of X that are also common to both tables.

Using the Keywords ALL and CORR with the INTERSECT Operator

If the keywords ALL and CORR are used together, the INTERSECT operator displays all unique and nonunique (duplicate) rows that are common to the two tables, based on columns that have the same name. The modified query, the tables One and Two, and the output are shown below:
proc sql;
   select *
      from one
   intersect all corr
   select *
      from two;
Tables One, Two, and Output
PROC SQL examines and displays only the column with the same name, X. There are three common rows across the two tables, which are highlighted above, and these are the three rows that are displayed in the output.
Note that each of the tables contains at least one other row that duplicates a value of X in one of the common rows. For example, in the second and third rows in table One, the value of X is 1, as in one of the common rows. However, in order to be considered a common row and to be included in the output, every duplicate row in one table must have a separate duplicate row in the other table. In this example, there are no rows that have duplicate values and that are also common across tables. Therefore, in this example, the set operation with the keywords ALL and CORR generates the same output as with the keyword CORR alone.

Example: INTERSECT Operator

Now that you have seen how the INTERSECT set operator works with very small tables, we can use INTERSECT in a realistic business problem. Suppose you want to display the names of the existing employees who have changed their salary or job code. (This query is the opposite of the query that you solved with the EXCEPT operator.)
Once again, you use the following tables.
Table
Relevant Columns
Sasuser.Staffchanges lists information for all new employees and existing employees who have had a change in salary or job code
FirstName, LastName
Sasuser.Staffmaster lists information for all existing employees
FirstName, LastName
The relationship between these two tables is shown in the diagram below:
INTERSECT Operator
As shown in the earlier example with EXCEPT, the intersection of these two tables includes information for all existing employees who have had changes in job code or salary. It is the intersection of these two tables, shaded above, that you want to display.
To display the unique rows that are common to both tables, you use a PROC SQL set operation that contains INTERSECT. It is known that these tables contain no duplicates, so ALL is used to speed up query processing. The PROC SQL set operation is shown below:
proc sql;
   select firstname, lastname
      from sasuser.staffchanges
   intersect all
   select firstname, lastname
      from sasuser.staffmaster;
Note: In this PROC SQL step, which contains just one INTERSECT set operator, the order in which you list the tables in the SELECT statement does not make a difference. However, in a more complex PROC SQL step that contains multiple stacked INTERSECT set operators, it is important to think through the table order carefully, depending on when you want the non-matches to be eliminated. The output shows that there are four existing employees who have changed their salary or job code.
INTERSECT set operator
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.51.228