7.8. Set Operations

Now that we have seen how tables are combined with join queries and subqueries, we turn our attention to another type of complex query. The SQL procedure provides users with several table operators: INTERSECT, UNION, OUTER UNION, and EXCEPT, commonly referred to as set operators. In contrast to joins and subqueries where query results are combined horizontally, the purpose of each set operator is to combine or concatenate query results vertically.

7.8.1. Accessing Rows from the Intersection of Two Queries

The INTERSECT operator creates query results consisting of all the unique rows from the intersection of the two queries. Put another way, the intersection of two queries (A and B) is represented by C, indicating that the rows that are produced occur in both A and in B. As the following figure shows, the intersection of both queries is represented in the shaded area (C).

Figure 7.1. Intersection of Two Queries


To see all products that cost less than $300.00 and product types classified as “phone”, you could construct a simple query with a WHERE clause or specify the intersection of two separate queries. The next example illustrates a simple query that specifies a WHERE clause to display phones that cost less than $300.

SQL Code

PROC SQL;
  SELECT *
    FROM PRODUCTS
      WHERE prodcost < 300.00 AND
            prodtype = 'Phone';
QUIT;

Results

                           The SAS System

   Product                       Manufacturer     
           Product
    Number  Product Name               Number 
 Product Type     Cost
__________________________________________________
____________________
      2101  Analog Cell Phone            210  
 Phone          $35.00
      2102  Digital Cell Phone           210  
 Phone         $175.00
      2200  Office Phone                 220  
 Phone         $130.00


The INTERSECT approach can be constructed to produce the same results as in the previous example. The INTERSECT process assumes that the tables in each query are structurally identical to each other. It overlays the columns from both queries based on position in the SELECT statement. Should you attempt to intersect two queries with different table structures, the process may fail due to differing column types, or the produced results may contain data integrity issues.

The most significant distinction between the two approaches, and one that may affect large table processing, is that the first query example (using the AND operator) takes less time to process: 0.05 seconds versus 0.17 seconds for the second approach (using the INTERSECT operator). The next example shows how the INTERSECT operator achieves the same result less efficiently.

SQL Code

PROC SQL;
  SELECT *     1
    FROM PRODUCTS
      WHERE prodcost < 300.00

  INTERSECT
						2

  SELECT *     1
    FROM PRODUCTS
      WHERE prodtype = "Phone";
QUIT;

It is assumed that the tables in both queries are structurally identical because the wildcard character “*” is specified in the SELECT statement.

The INTERSECT operator produces rows common to both queries.

Results

                              The SAS System

  Product                     Manufacturer        
           Product
   Number    Product Name           Number  
 Product Type       Cost
  ________________________________________________
_____________________
     2101    Analog Cell Phone         210   Phone
            $35.00
     2102    Digital Cell Phone        210   Phone
           $175.00
     2200    Office Phone              220   Phone
           $130.00


7.8.2. Accessing Rows from the Combination of Two Queries

The UNION operator preserves all the unique rows from the combination of queries. The result is the same as if an OR operator is used to combine the results of each query. Put another way, the union of two queries (A and B) represents rows in A or in B or in both A and B. As illustrated in the figure below, the union represents the entire shaded area (A, B, and C).

Figure 7.2. Union of Two Queries


UNION automatically eliminates duplicate rows from the results, unless the ALL keyword is specified as part of the UNION operator. The column names assigned to the results are derived from the names in the first query.

In order for the union of two or more queries to be successful, each query must specify the same number of columns of the same or compatible types. Type compatibility means that column attributes are defined the same way. Because column names and attributes are derived from the first table, data types must be of the same type. The data types of the result columns are derived from the source table(s).

To see all products that cost less than $300.00 or products classified as a workstation, you have a choice between using OR as in the following query or UNION as in the next. As illustrated in the output from both queries, the results are identical no matter which query is used.

SQL Code

PROC SQL;
  SELECT *
    FROM PRODUCTS
      WHERE prodcost < 300.00 OR
            prodtype = "Workstation";
 QUIT;

Results

                            The SAS System

   Product                     Manufacturer       
            Product
    Number  Product Name             Number 
 Product Type        Cost
__________________________________________________
____________________
      1110  Dream Machine               111 
 Workstation    $3,200.00
      1200  Business Machine            120 
 Workstation    $3,300.00
      2101  Analog Cell Phone           210  Phone
             $35.00
      2102  Digital Cell Phone          210  Phone
            $175.00
      2200  Office Phone                220  Phone
            $130.00
      5001  Spreadsheet Software        500 
 Software         $299.00
      5003  Wordprocessor Software      500 
 Software         $299.00
      5004  Graphics Software           500 
 Software         $299.00


In the next example, the UNION operator is specified to combine the results of both queries.

SQL Code

PROC SQL;
  SELECT *
    FROM PRODUCTS
      WHERE prodcost < 300.00

  UNION
						1
  SELECT *
    FROM PRODUCTS
      WHERE prodtype = 'Workstation';
QUIT;

The UNION operator combines the results of two queries.

Results

                           The SAS System

   Product                   Manufacturer         
           Product
    Number  Product Name           Number  Product
 Type         Cost
__________________________________________________
___________________
      1110  Dream Machine             111 
 Workstation     $3,200.00
      1200  Business Machine          120 
 Workstation     $3,300.00
      2101  Analog Cell Phone         210  Phone  
            $35.00
      2102  Digital Cell Phone        210  Phone  
           $175.00
      2200  Office Phone              220  Phone  
           $130.00
      5001  Spreadsheet Software      500 
 Software          $299.00
      5003  Wordprocessor Software    500 
 Software          $299.00
      5004  Graphics Software         500 
 Software          $299.00


7.8.3. Concatenating Rows from Two Queries

The OUTER UNION operator concatenates the results of two queries. As with a DATA step or PROC APPEND concatenation the results consist of rows combined vertically. Put another way, the outer union of two queries (A and B) represents all rows in both A and B with no overlap. As illustrated below, the outer union represents the entire shaded area (A and B).

Figure 7.3. Outer Union of Two Queries


The next example concatenates the results of two queries. As illustrated in the output, the results show the rows from both queries are concatenated.

SQL Code

PROC SQL;
  SELECT prodnum, prodname, prodtype, prodcost
    FROM PRODUCTS

  OUTER UNION
						1

  SELECT prodnum, prodname, prodtype, prodcost
    FROM PRODUCTS;
QUIT;

The OUTER UNION operator concatenates the results of both queries.

Results

                   The SAS System

                   Product                    Product
Product Type          Cost   Product Type        Cost
_____________________________________________________

Workstation      $3,200.00                        .
Workstation      $3,300.00                        .
Laptop           $3,400.00                        .
Phone               $35.00                        .
Phone              $175.00                        .
Phone              $130.00                        .
Software           $299.00                        .
Software           $399.00                        .
Software           $299.00                        .
Software           $299.00                        .
                       .     Workstation    $3,200.00
                       .     Workstation    $3,300.00
                       .     Laptop         $3,400.00
                       .     Phone             $35.00
                       .     Phone            $175.00
                       .     Phone            $130.00
                       .     Software         $299.00
                       .     Software         $399.00
                       .     Software         $299.00
                       .     Software         $299.00


The OUTER UNION operator automatically concatenates rows from two queries with no overlap, unless the CORRESPONDING (CORR) keyword is specified as part of the operator. The column names assigned to the results are derived from the names in the first query. In the next example, the CORR keyword enables columns with the same name and attributes to be overlaid.

SQL Code

PROC SQL;
  SELECT prodnum, prodname, prodtype, prodcost
    FROM PRODUCTS

  OUTER UNION CORR
						1

  SELECT prodnum, prodname, prodtype, prodcost
    FROM PRODUCTS;
QUIT;

The OUTER UNION operator with the CORR keyword concatenates and overlays the results of both queries.

Results

                         The SAS System

   Product                                        
        Product
    Number   Product Name               Product
 Type         Cost
   _______________________________________________
_______________

      1110   Dream Machine             
 Workstation     $3,200.00
      1200   Business Machine          
 Workstation     $3,300.00
      1700   Travel Laptop              Laptop    
      $3,400.00
      2101   Analog Cell Phone          Phone     
         $35.00
      2102   Digital Cell Phone         Phone     
        $175.00
      2200   Office Phone               Phone     
        $130.00
      5001   Spreadsheet Software       Software  
        $299.00
      5002   Database Software          Software  
        $399.00
      5003   Wordprocessor Software     Software  
        $299.00
      5004   Graphics Software          Software  
        $299.00
      1110   Dream Machine             
 Workstation     $3,200.00
      1200   Business Machine          
 Workstation     $3,300.00
      1700   Travel Laptop              Laptop    
      $3,400.00
      2101   Analog Cell Phone          Phone     
         $35.00
      2102   Digital Cell Phone         Phone     
        $175.00
      2200   Office Phone               Phone     
        $130.00
      5001   Spreadsheet Software       Software  
        $299.00
      5002   Database Software          Software  
        $399.00
      5003   Wordprocessor Software     Software  
        $299.00
      5004   Graphics Software          Software  
        $299.00


7.8.4. Comparing Rows from Two Queries

The EXCEPT operator compares rows from two queries to determine the changes made to the first table that are not present in the second table. The result below shows new and changed rows in the first table that are not in the second table, but not rows that have been deleted from the second table. As illustrated in figure 7-4, the results of specifying the EXCEPT operator represent the shaded area (A) in the diagram.

Figure 7.4. Compare Two Tables to Determine Additions and Changes


When working with two tables consisting of similar information, you can use the EXCEPT operator to determine new and modified rows. The EXCEPT operator compares rows in both tables to identify the rows existing in the first table but not in the second table. It also uniquely identifies rows that have changed from the first to the second tables. Columns are compared in the order they appear in the SELECT statement.

If the wildcard character “*” is specified in the SELECT statement, it is assumed that the tables are structurally identical to one another. Let’s look at an example.

Suppose you have master and backup tables of the CUSTOMERS file, and you want to compare them to determine the new and changed rows. The EXCEPT operator as illustrated in the next example returns all new or changed rows from the CUSTOMERS table that do not appear in the CUSTOMERS_BACKUP table. As illustrated by the output, three new customer rows are added to the CUSTOMERS table that had not previously existed in the CUSTOMERS_BACKUP table.

SQL Code

PROC SQL;
  SELECT *
    FROM CUSTOMERS

  EXCEPT
						1

  SELECT *
    FROM CUSTOMERS_BACKUP;
QUIT;

The EXCEPT operator compares rows in both tables to identify the rows existing in the first table but not the second table.

Results

                      The SAS System

   Customer
     Number   Customer Name              
 Customer's Home City
   _______________________________________________
____________
       1302   Software Intelligence Cor   Spring
 Valley
       1901   Shipp Consulting            San Pedro
       1902   Gupta Programming           Simi Valley


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.188.138