7.7. Subqueries

Now that we have seen how two or more tables can be combined in a join query, we turn our attention to another type of complex query known as a subquery. A subquery is a query expression that is nested within another query expression. Its purpose is to have the inner query produce a single value or multiple values that can then be passed into the outer query for processing. You achieve this by embedding a SELECT statement inside a WHERE clause of an outer query’s SELECT statement, INSERT statement, DELETE statement, or HAVING clause.

Note:You should avoid nesting more than two subqueries deep because of the conceptual and processing complexities this introduces.

The typical subquery consists of a (inner) query combined inside the predicate of another (outer or main) query. When processed, the inner query passes a Boolean value to the outer query consisting of either True if it returns a minimum of one row or False if no rows are returned by the subquery. The results of the inner query are stored in a temporary results table and used as input to the main query. Our exploration of subqueries will involve using them with comparison operators, the IN predicate, and the ANY and ALL keywords, and will conclude with a look at a special type of subquery called a correlated subquery.

7.7.1. Alternate Approaches to Subqueries

A subquery is a very useful construct, especially when information from multiple tables needs to be interrelated. Unfortunately, a subquery is not always easy to construct and may even be more difficult to understand. So before constructing every table relation with a subquery, consider your options carefully.

When all the information is available in a single table, a simple query is probably all that needs to be constructed. Suppose you want to produce a report consisting of the invoice information for Global Comm Corp. Let’s further assume you know the specific manufacturer number for Global Comm Corp as well. Knowing this means that you don’t have to go into the MANUFACTURERS table to find it. In the next example, a simple query is constructed to retrieve all invoice information from the INVOICE table.

Simple Query

PROC SQL;
  SELECT *
    FROM INVOICE
      WHERE manunum = 210;
QUIT;

Results

                          The SAS System

                                     Invoice
                                    Quantity
  Invoice  Manufacturer  Customer    - Units 
 Invoice Unit   Product
   Number        Number    Number       Sold      
   Price    Number
  ________________________________________________
__________________

    1003           210       101           7      
 $245.00      2101


But what if all the information is not in a single table? And what if the manufacturer number for Global Comm Corp is not known? As shown earlier, a join can be constructed just as easily as a subquery. Some users prefer joins to subqueries because they can be easier to understand as well as maintain. In fact, a join frequently performs better than a subquery. In the next example, the manufacturer number for Global Comm Corp is not known. Consequently, a simple inner join is needed to retrieve all related rows from the MANUFACTURERS and INVOICE tables for Global Comm Corp.

Simple Join

PROC SQL;
  SELECT M.manunum, M.manuname, I.invnum,
         I.invqty, I.invprice
    FROM MANUFACTURERS M, INVOICE I
      WHERE M.manunum = I.manunum AND
            M.manuname = 'Global Comm Corp';
QUIT;

Results

                                The SAS System

                                                  
   Invoice
                                                  
  Quantity
     Manufacturer                          Invoice
   - Units   Invoice Unit
           Number   Manufacturer Name       Number
      Sold          Price
     _____________________________________________
_________________________

              210   Global Comm Corp          1003
          7        $245.00


7.7.2. Passing a Single Value with a Subquery

Now let’s see how a subquery could be constructed to provide the same results as with the join. As before, suppose you want to pull all the invoices for the manufacturer Global Comm Corp but only know the manufacturer name (or at least part of the name), but not the manufacturer number (MANUNUM). The following subquery uses an = (equal sign) in its outer query WHERE clause to accomplish this.

Since the manufacturer number is not known, a subquery is constructed to first search for it in the MANUFACTURERS table. Actually, the subquery approach is more versatile than the previous query approach, because it does not require a unique manufacturer number, which is often more difficult to remember than names. It also enables quick searches even if the manufacturer number changes for a given manufacturer.

When the entire query is executed, SQL first evaluates the inner query (or subquery) within the outer query’s WHERE clause. It executes the inner query the same way as if it were a stand-alone query. It searches the MANUFACTURERS table for any row where the manufacturer name equals the character string Global Comm Corp and then pulls the MANUNUM values for this row. SQL then substitutes the derived MANUNUM value of 210 from the inner query inside the predicate of the main query (outer query). As a result of this substitution, the SQL query looks identical to the query mentioned previously.

SQL Code

PROC SQL;
  SELECT invnum, INVOICE.manunum, custnum, invqty, invprice,
      prodnum
    FROM INVOICE
      WHERE manunum =
        (SELECT manunum
						1
						FROM MANUFACTURERS
						WHERE manuname = 'Global Comm Corp'),
      WHERE INVOICE.manunum=MANUFACTURERS.manunum;
QUIT;

Result of Inner Query

PROC SQL;
  SELECT *
    FROM INVOICE
      WHERE manunum = 210;     2
QUIT;

PROC SQL evaluates the inner query within the outer query’s WHERE clause to search for the manufacturer number for manufacturer Global Comm Corp.

The resulting query after substituting the derived manufacturer number value from the inner query evaluates to a single value and is then executed as the main (outer) query.

Results

                           The SAS System

                                    Invoice
                2                  Quantity     
Invoice
  Invoice  Manufacturer  Customer   - Units      
 Total   Product
   Number        Number    Number      Sold      
 Price    Number
__________________________________________________
_______________
     1003           210       101         7    
 $245.00      2101


Let’s look at another subquery. Suppose you want to retrieve the invoice from the INVOICE table for the manufacturer that manufactures the Dream Machine workstation. The following subquery (inner query) extracts the product number (PRODNUM) associated with the Dream Machine and passes the single value to the outer query for processing.

SQL Code

PROC SQL;
  SELECT invnum, manunum, custnum, invqty, invprice,
       INVOICE.prodnum
    FROM INVOICE
        (SELECT prodnum
						1
						FROM PRODUCTS
						WHERE prodname LIKE 'Dream%')
      WHERE INVOICE.prodnum=PRODUCTS.prodnum;
QUIT;

Result of Inner Query

PROC SQL;
   SELECT *
     FROM INVOICE
       WHERE prodnum = 1110;     2
 QUIT;


PROC SQL evaluates the inner query within the outer query’s WHERE clause to search for the product number for the product Dream Machine.

The resulting inner query after substituting the derived product number value evaluates to a single value and is then executed as the main (outer) query.

Results

                          The SAS System

                                    Invoice
                                   Quantity       

												2
  Invoice  Manufacturer  Customer   - Units 
 Invoice Unit  Product
   Number        Number    Number      Sold       
  Price   Number
  ________________________________________________
________________

     1004           111       501         3     $9
,600.00     1110


It is fortunate that our subquery in the previous example passed only one row or value to the main (outer) query. Had it returned more than one value from the PRODUCTS table, it would have made it impossible for the SQL to evaluate the condition as true or false and would have produced an error in the outer query. Let’s look at another example where more than one value is returned by the subquery.

In the next example, more than one row is returned by the inner query making it impossible for the main query to evaluate as true or false. As a result, an error is produced and the subquery does not execute. In general, it is best to avoid using the = (equal sign) and other comparison operators (<, >, <=, >=, and <>) in a subquery expression, unless you know in advance that the result of the subquery is a table with a single row of data (although it may not always be possible to know this beforehand). In a later section (“Passing More Than One Row with a Subquery”), you will see this problem alleviated by using the IN predicate.

SQL Code

PROC SQL;
  SELECT *
    FROM INVOICE
      WHERE manunum =
        (SELECT manunum
						FROM MANUFACTURERS
						WHERE UPCASE(manucity) LIKE 'SAN DIEGO%'),
QUIT;

SAS Log Result

     PROC SQL;
       SELECT *
         FROM INVOICE
           WHERE manunum =
             (SELECT manunum
               FROM MANUFACTURERS
                 WHERE UPCASE(manucity) LIKE 'SAN
 DIEGO%'),
ERROR: Subquery evaluated to more than one row.
     QUIT;
NOTE: The SAS System stopped processing this step
 because of errors.
NOTE: PROCEDURE SQL used:
      real time           0.00 seconds


Let’s look at another subquery example that uses the comparison operator < (less than). A summary function specified in an inner query forces a single row to result. In the next example, the subquery uses the AVG summary (aggregate) function to determine which products (based on their invoice quantities) were purchased in lower quantities than the average product purchase.

SQL Code

PROC SQL;
  SELECT prodnum, invnum, invqty, invprice
    FROM INVOICE
      WHERE invqty <
        (SELECT AVG(invqty)
						1
          FROM INVOICE);
QUIT;

Result of Inner Query

PROC SQL;
  SELECT prodnum, invnum, invqty, invprice
    FROM INVOICE
      WHERE invqty < 4.285714;    2
QUIT;


PROC SQL evaluates the inner query within the outer query’s WHERE clause to produce an average invoice quantity.

The resulting inner query passes the derived average invoice quantity of 4.285714 as a single value to the main (outer) query for execution.

Results

                The SAS System
                              2
                          Invoice
                         Quantity
     Product   Invoice    - Units    Invoice Unit
      Number    Number       Sold           Price
     ____________________________________________

        6001      1002          2       $1,598.00
        1110      1004          3       $9,600.00
        5002      1005          2         $798.00
        6000      1006          4         $396.00


7.7.3. Passing More Than One Row with a Subquery

To prevent the problem associated with passing more than one value to the main (outer) query, you can specify the IN predicate in a subquery. Similar to the IN operator in the DATA step, the IN predicate permits the SQL procedure to pass multiple row values from the (inner) subquery to the main (outer) query without producing an error.

Note:PROC SQL does not permit a subquery to select more than one column. The next example shows how multiple row values are passed from the subquery to the main (outer) query using the IN predicate for San Diego manufacturers.

SQL Code

PROC SQL;
  SELECT *
    FROM INVOICE
      WHERE manunum IN
						1
        (SELECT manunum
          FROM MANUFACTURERS
            WHERE UPCASE(manucity) LIKE 'SAN DIEGO%'),     2
QUIT;

Result of Inner Query

PROC SQL;
  SELECT prodnum, invnum, invqty, invprice
    FROM INVOICE
      WHERE manunum IN (210, 500, 700);    3
QUIT;


PROC SQL’s IN predicate is specified in the outer query to process a list of values that are passed from the inner query.

PROC SQL evaluates the inner query within the outer query’s WHERE clause to produce a list of manufacturer numbers for San Diego manufacturers.

The resulting inner query passes multiple row values to the main (outer) query for execution.

Result

                           The SAS System

                                    Invoice
                3                  Quantity       
Invoice
  Invoice  Manufacturer  Customer   - Units       
  Total   Product
   Number        Number    Number      Sold       
  Price    Number
  ________________________________________________
_________________

     1001           500       201         5     $1
,495.00      5001
     1003           210       101         7      
 $245.00      2101
     1005           500       801         2      
 $798.00      5002
     1006           500       901         4      
 $396.00      6000
     1007           500       401         7    $23
,100.00      1200


7.7.4. Comparing a Set of Values

A subquery can have multiple values returned for a single column to the outer query. But there are special keywords that permit comparison operators to be used in subqueries to process multiple values. The special keywords ANY and ALL can be used to compare a set of values returned by a subquery. Let’s see how these keywords work.

Suppose you want to view the products whose inventory quantity is greater than or equal to the lowest average inventory quantity. The following example illustrates a subquery with the ANY keyword specified in the WHERE clause of the main query expression. When ANY is specified, the entire WHERE clause is true if the subquery returns at least one value.

SQL Code

PROC SQL;
  SELECT manunum, prodnum, invqty, invprice
    FROM INVOICE
      WHERE invprice GE ANY
						1
        (SELECT invprice
          FROM INVOICE
            WHERE prodnum IN (5001,5002));
						2
QUIT;

Result of Inner Query

PROC SQL;
  SELECT manunum, prodnum, invqty, invprice
    FROM INVOICE
      WHERE invprice > ANY ($1,495.,$798.);
												3
QUIT;


PROC SQL retrieves any invoices from the outer query where the invoice price is greater than or equal to the row values passed from the inner query.

The WHERE clause of the inner query retrieves any invoice prices for product numbers 5001 and 5002 and passes them to the outer query.

The resulting inner query passes multiple row values to the main (outer) query for execution.

Results

                    The SAS System

                               Invoice
                              Quantity
     Manufacturer   Product    - Units     Invoice
 Unit
           Number    Number       Sold           
 Price
     _____________________________________________
_____

              500      5001          5        $1
,495.00
              600      6001          2        $1
,598.00
              111      1110          3        $9
,600.00
              500      5002          2         
 $798.00
              500      1200          7       $23
,100.00


The ALL keyword works very differently from the ANY keyword. When you specify ALL before a subquery expression, the subquery is true only if the comparison is true for values returned by the subquery. For example, suppose you want to view the products whose inventory quantity is less than the average inventory quantity.

SQL Code

PROC SQL;
  SELECT manunum, prodnum, invqty, invprice
    FROM INVOICE
      WHERE invprice < ALL
						1
        (SELECT invprice
          FROM INVOICE
            WHERE prodnum IN (5001,5002));
						2
QUIT;

Result of Inner Query

PROC SQL;
  SELECT manunum, prodnum, invqty, invprice
    FROM INVOICE
      WHERE invprice < ALL ($1,495.,$798.);
												3
QUIT;


PROC SQL retrieves all invoices from the outer query where the invoice price is less than the row values passed from the inner query.

The WHERE clause of the inner query retrieves all invoice prices for product numbers 5001 and 5002 and passes them to the outer query.

The resulting inner query passes multiple row values to the main (outer) query for execution.

Results

                          The SAS System

                                   Invoice
                                  Quantity
          Manufacturer   Product   - Units   
 Invoice Unit
                Number    Number      Sold        
   Price
          ________________________________________
________
                   210      2101         7        
 $245.00
                   500      6000         4        
 $396.00


7.7.5. Correlated Subqueries

In the subquery examples shown so far, the subquery (inner query) operates independently from the main (outer) query. Essentially the subquery’s results are evaluated and used as input to the main (outer) query. Although this is a common way subqueries execute, it is not the only way. SQL also permits a subquery to accept one or more values from its outer query. Once the subquery executes, the results are then passed to the outer query. Subqueries of this variety are called correlated subqueries. The ability to construct subqueries in this manner provides a powerful extension to SQL.

The difference between the subqueries discussed earlier and correlated subqueries is in the way the WHERE clause is constructed. Correlated subqueries relate a column in the subquery with a column in the outer query to determine the rows that match or in certain cases don’t match the expression. Suppose, for example, that we want to view products in the PRODUCTS table that do not appear in the INVOICE table. One way to do this is to construct a correlated subquery.

In the next example, the subquery compares the product number column in the PRODUCTS table with the product number column in the INVOICE table. If at least one match is found (the product appears in both the PRODUCTS and INVOICE tables) then the resulting table from the subquery will not be empty, and the NOT EXISTS condition will be false. However, if no matches are found, then the subquery returns an empty table resulting in the NOT EXISTS condition being true, causing the product number, product name, and product type of the current row in the main (outer) query to be selected.

SQL Code

PROC SQL;
  SELECT prodnum, prodname, prodtype
    FROM PRODUCTS
      WHERE NOT EXISTS
						1
        (SELECT *
          FROM INVOICE
            WHERE PRODUCTS.prodnum = INVOICE.prodnum);
						2
QUIT;

The (inner) subquery receives its value(s) from the main (outer) query. With the value(s), the subquery runs and passes the results back to the main query where the WHERE clause and the NOT EXISTS condition are processed.

The inner query selects matching product and invoice information and passes it to the outer query.

Results

                     The SAS System

     Product
     Number   Product Name                Product Type
  ________________________________________________
______
       1700   Travel Laptop               Laptop
       2102   Digital Cell Phone          Phone
       2200   Office Phone                Phone
       5003   Wordprocessor Software      Software
       5004   Graphics Software           Software


Correlated subqueries are useful for placing restrictions on the results of an entire query with a HAVING clause (or, when combined with a GROUP BY clause, of an entire group). Suppose you want to know which manufacturers have more than one invoiced product.

In the next example, the subquery compares the manufacturer number in the PRODUCTS table with the manufacturer number in the INVOICE table. A HAVING clause and a COUNT function are specified to select all manufacturers with two or more invoices. Because an aggregate (summary) function is used in an optional HAVING clause, a GROUP BY clause is not needed to select the manufacturers with two or more invoices. An EXISTS condition is specified in the outer query’s WHERE clause to capture only those manufacturers matching the subquery.

SQL Code

PROC SQL;
  SELECT prodnum, prodname, prodtype
    FROM PRODUCTS
      WHERE EXISTS
						1
        (SELECT *
          FROM INVOICE
            WHERE PRODUCTS.manunum = INVOICE.manunum
              HAVING COUNT(*) > 1);
						2
QUIT;

The (inner) subquery receives its value(s) from the main (outer) query. With the value(s), the subquery runs and passes the results back to the main query where the WHERE clause and the EXISTS condition are processed.

The inner query specifies a HAVING clause in order to subset manufacturers with two or more invoices.

Results

                          The SAS System

         Product
          Number   Product Name               
 Product Type
         _________________________________________
___________
            5001   Spreadsheet Software       
 Software
            5002   Database Software          
 Software
            5003   Wordprocessor Software     
 Software
            5004   Graphics Software          
 Software


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.114.28