Subsetting Data By Using Noncorrelated Subqueries

A noncorrelated subquery is a self-contained subquery that executes independently of the outer query.

Using Single-Value Noncorrelated Subqueries

The simplest type of subquery is a noncorrelated subquery that returns a single value.
The following PROC SQL query is the same query that is used in the previous section. This query displays job codes for which the group's average salary exceeds the company's average salary. The HAVING clause contains a noncorrelated subquery.
proc sql;
   select jobcode, 
          avg(salary) as AvgSalary 
          format=dollar11.2
      from sasuser.payrollmaster
      group by jobcode
      having avg(salary) >
         (select avg(salary)
            from sasuser.payrollmaster);
PROC SQL always evaluates a noncorrelated subquery before the outer query. If a query contains noncorrelated subqueries at more than one level, PROC SQL evaluates the innermost subquery first and works outward, evaluating the outermost query last.
In the query shown above, the inner query and outer query are processed as follows:
  1. To complete the expression in the HAVING clause, the subquery calculates the average salary for the entire company (all rows in the table), using the AVG summary function with Salary as an argument.
  2. The subquery returns the value of the average salary to the outer query.
  3. The outer query calculates the average salary (in the SELECT clause) for each JobCode group (as defined in the GROUP BY clause), and selects only the groups whose average salary is greater than the company's average salary.
The query output is shown here.
Average salary for each JobCode
This noncorrelated subquery returns only a single value, the average salary for the whole company, to the outer query. Both the subquery query and the outer query use the same table as a source.

Using Multiple-Value Noncorrelated Subqueries

Some subqueries are multiple-value subqueries: they return more than one value (row) to the outer query. If your noncorrelated subquery might return a value for more than one row, be sure to use one of the following operators in the WHERE or HAVING clause that can handle multiple values:
  • the conditional operator IN
  • a comparison operator that is modified by ANY or ALL
  • the conditional operator EXISTS.
CAUTION:
If you create a noncorrelated subquery that returns multiple values, but the WHERE or HAVING clause in the outer query contains an operator other than one of the operators that are specified above, the query fails. An error message is displayed in the SAS log, which indicates that the subquery evaluated to more than one row. For example, if you use the equal (=) operator with a noncorrelated subquery that returns multiple values, the query fails. The equal operator can handle only a single value.
Consider a query that contains both the conditional operator IN and a noncorrelated subquery that returns multiple values. (The operators ANY, ALL, and EXISTS are presented later in this chapter.)

Example

Suppose you want to send birthday cards to employees who have birthdays coming up. You decide to create a PROC SQL query that lists the names and addresses of all employees who have birthdays in February. This query, unlike the one shown on the previous page, selects data from two different tables:
  • employee names and addresses in the table Sasuser.Staffmaster
  • employee birthdates in the table Sasuser.Payrollmaster.
In both tables, the employees are identified by their employee identification number (EmpID).
In the following PROC SQL query, the WHERE clause contains the conditional operator IN followed by a noncorrelated subquery:
 proc sql;
    select empid, lastname, firstname, 
           city, state
       from sasuser.staffmaster
       where empid in
          (select empid
             from sasuser.payrollmaster
             where month(dateofbirth)=2);
This query is processed as follows:
  1. To complete the expression in the WHERE clause of the outer query, the subquery selects the employees whose date of birth is February. Note that the MONTH function is used in the subquery.
  2. The subquery then returns the EmpID values of the selected employees to the outer query.
  3. The outer query displays data (from the columns identified in the SELECT clause) for the employees identified by the subquery.
The output, shown below, lists the six employees who have February birthdays.
Employees with birthdays in February
Although an inner join would have generated the same results, it is better to use a subquery in this example since no columns from the sasuser.payrollmaster table were in the output.

Using Comparisons with Subqueries

Sometimes it is helpful to compare a value with a set of values returned by a subquery. When a subquery might return multiple values, you must use one of the conditional operators ANY or ALL to modify a comparison operator in the WHERE or HAVING clause immediately before the subquery. For example, the following WHERE clause contains the less than (<) comparison operator and the conditional operator ANY:
where dateofbirth < any
   <subquery...>
CAUTION:
If you create a noncorrelated subquery that returns multiple values, and if the WHERE or HAVING clause in the outer query contains a comparison operator that is not modified by ANY or ALL, the query fails.
When the outer query contains a comparison operator that is modified by ANY or ALL, the outer query compares each value that it retrieves against the value(s) returned by the subquery. All values for which the comparison is true are then included in the query output. If ANY is specified, then the comparison is true if it is true for any one of the values that are returned by the subquery. If ALL is specified, then the comparison is true only if it is true for all values that are returned by the subquery.
Note: The operators ANY and ALL can be used with correlated subqueries, but they are usually used only with noncorrelated subqueries.
Consider how the operators ANY or ALL are used.

Using the ANY Operator

An outer query that specifies the ANY operator selects values that pass the comparison test with any of the values that are returned by the subquery.
For example, suppose you have an outer query containing the following WHERE clause:
where dateofbirth < any
   <subquery...>
This WHERE clause specifies that DateofBirth (the operand) should be less than any (the comparison operator) of the values returned by the subquery.
The following chart shows the effect of using ANY with these common comparison operators: greater than (>), less than (<) and equal to (=).
Comparison Operator with ANY
Outer Query Selects...
Example
> ANY
values that are greater than any value returned by the subquery
If the subquery returns the values 20, 30, 40, then the outer query selects all values that are > 20 (the lowest value that was returned by the subquery).
< ANY
values that are less than any value returned by the subquery
If the subquery returns the values 20, 30, 40, then the outer query selects all values that are < 40 (the highest value that was returned by the subquery).
= ANY
values that are equal to any value returned by the subquery
If the subquery returns the values 20, 30, 40, the outer query selects all values that are = 20 or = 30 or = 40.
Tip
Instead of using the ANY operator with a subquery, there are some SAS functions that you can use to achieve the same result with greater efficiency. Instead of > ANY, use the MIN function in the subquery. Instead of < ANY, use the MAX function in the subquery.

Example

Suppose you want to identify any flight attendants at level 1 or level 2 who are older than any of the flight attendants at level 3. Job type and level are identified in JobCode; each flight attendant has the job code FA1, FA2, or FA3. The following PROC SQL query accomplishes this task by using a subquery and the ANY operator:
proc sql;
   select empid, jobcode, dateofbirth
      from sasuser.payrollmaster
      where jobcode in ('FA1','FA2')
            and dateofbirth < any
               (select dateofbirth
                  from sasuser.payrollmaster
                  where jobcode='FA3'),
Here is what happens when this query is processed:
  1. The subquery returns the birthdates of all level-3 flight attendants.
  2. The outer query selects only those level-1 and level-2 flight attendants whose birthdate is less than any of the dates returned by the subquery.
Note that both the outer query and subquery use the same table.
Note: Internally, SAS represents a date value as the number of days from January 1, 1960, to the given date. For example, the SAS date for 17 October 1991 is 11612. Representing dates as the number of days from a reference date makes it easy for the computer to store them and perform calendar calculations. These numbers are not meaningful to users, however, so several formats are available for displaying dates and datetime values in most of the commonly used notations.
Below are the query results.
Level 1 and 2 flight attendants who are older than level 3
Tip
Using the ANY operator to solve this problem results in a large number of calculations, which increases processing time. For this example, it would be more efficient to use the MAX function in the subquery. The alternative WHERE clause follows:
where jobcode in ('FA1','FA2') 
      and dateofbirth < 
         (select max(dateofbirth)
            from [...]
For more information about the MAX function, see the SAS documentation.

Using the ALL Operator

An outer query that specifies the ALL operator selects values that pass the comparison test with all of the values that are returned by the subquery.
The following chart shows the effect of using ALL with these common comparison operators: greater than (>) and less than (<).
Comparison Operator with ALL
Sample Values Returned by Subquery
Signifies...
> ALL
(20, 30, 40)
> 40
(greater than the highest number in the list)
< ALL
(20, 30, 40)
< 20
(less than the lowest number in the list)

Example

Substitute ALL for ANY in the previous query example. The following query identifies level-1 and level-2 flight attendants who are older than all of the level-3 flight attendants:
proc sql;
   select empid, jobcode, dateofbirth
      from sasuser.payrollmaster
      where jobcode in ('FA1','FA2')
            and dateofbirth < all
               (select dateofbirth
                  from sasuser.payrollmaster
                  where jobcode='FA3'),
Here is what happens when this query is processed:
  1. The subquery returns the birthdates of all level-3 flight attendants.
  2. The outer query selects only those level-1 and level-2 flight attendants whose birthdate is less than all of the dates returned by the subquery.
The query results, below, show that only two level-1 or level-2 flight attendants are older than all of the level-3 flight attendants.
Level 1 and 2 flight attendants who are older than all level 3 attendants
Tip
For this example, it would be more efficient to solve this problem using the MIN function in the subquery instead of the ALL operator. The alternative WHERE clause follows:
where jobcode in ('FA1','FA2') 
      and dateofbirth < 
        (select min(dateofbirth) 
           from [...]
For more information about the MIN function, see the SAS documentation.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.21.239