Example 8.1 Performing a Simple Subset

Goal

Create a subset of a SAS data set efficiently by selecting for processing only observations that meet a particular condition.

Example Features

Featured StepDATA step
Featured Step Options and StatementsWHERE statement
Related TechniquePROC SQL, WHERE clause
A Closer LookComparing the WHERE Statement in the DATA Step and the Subsetting IF Statement

Input Data Set

Data set CELLPHONES contains the support records for one day. Only the first 25 of 10,000 observations are shown.

                           CELLPHONES

  Obs   callhour   calllength     status       reason
    1      00          10       Resolved       Text Msg Help
    2      00           2       Transferred    Other Feature Help
    3      01           3       Resolved       Start Service
    4      01           2       Resolved       Other
    5      01           4       Resolved       Voice Mail Help
    6      01           4       Resolved       Billing Change
    7      01           3       Resolved       Other Feature Help
    8      02           3       Resolved       Start Service
    9      02           6       Transferred    Other
   10      02           3       Resolved       Account Status
   11      02           8       Resolved       Account Status
   12      02           5       Transferred    Other
   13      02           6       Transferred    Ringtones Help
   14      02           2       Transferred    Other Feature Help
   15      02           7       Resolved       Account Status
   16      02           1       Resolved       Password Question
   17      02           5       Resolved       Other Feature Help
   18      02           5       Resolved       Address Change
   19      02           4       Resolved       Text Msg Help
   20      03           7       Resolved       Password Question
   21      03           8       Resolved       Start Service
   22      03          12       Resolved       Start Service
   23      03           2       Resolved       Start Service
   24      03           5       Transferred    Other
   25      03           3       Resolved       Other Feature Help
. . .

Resulting Data Set

Output 8.1 FEATUREQUESTIONS Data Set

  Example 8.1 FEATUREQUESTIONS Data Set Created with DATA Step

 Obs   callhour   calllength     status            reason

   1      00          10       Resolved      Text Msg Help
   2      00           2       Transferred   Other Feature Help
   3      01           4       Resolved      Voice Mail Help
   4      01           3       Resolved      Other Feature Help
   5      02           6       Transferred   Ringtones Help
   6      02           2       Transferred   Other Feature Help
   7      02           5       Resolved      Other Feature Help
   8      02           4       Resolved      Text Msg Help
   9      03           3       Resolved      Other Feature Help
  10      03           1       Resolved      Other Feature Help
  11      03           2       Resolved      Voice Mail Help
  12      03           4       Resolved      Other Feature Help
  13      03           5       Resolved      Ringtones Help
  14      03           8       Transferred   Text Msg Help
  15      03           2       Resolved      Other Feature Help
  16      03           2       Resolved      Ringtones Help
  17      04           3       Resolved      Text Msg Help
  18      04           1       Transferred   Other Feature Help
  19      04           5       Transferred   Other Feature Help
  20      04           3       Transferred   Text Msg Help
  21      04           3       Resolved      Text Msg Help
  22      04          10       Resolved      Text Msg Help
  23      04           1       Resolved      Other Feature Help
  24      04           4       Resolved      Voice Mail Help
  25      04           5       Resolved      Voice Mail Help
. . .


Example Overview

This example demonstrates how to efficiently create a subset of a data set by using the WHERE statement. This technique is efficient because a WHERE statement evaluates observations before the SET statement executes. Only those observations that satisfy the conditions in the WHERE statement are moved into the Program Data Vector (PDV).

Data set CELLPHONES contains 10,000 cell phone support records. The goal is to select all observations where the reason entered for the technical support call contains the text "Help".

Note that variations on the specification of the text, such as all lowercase, are not checked by this example's WHERE statement. It is common to uppercase a character variable when specifying a condition so that all possible spellings can be found. However, if you add functions to your WHERE statement, you will slow down the processing of the DATA step and reduce the advantage of using WHERE processing.

Program

Create data set FEATUREQUESTIONS. Read observations from CELLPHONES. Specify that the SET statement read only those observations that satisfy the condition in the WHERE statement.

data featurequestions;

  set cellphones;

  where reason contains "Help";


run;

Related Technique

The PROC SQL step in this related technique creates a table equivalent to the data set that was created by the DATA step in the main example. This example selects all columns from CELLPHONES, including the column REASON that is specified on the WHERE clause.

Although not demonstrated in this example, you can specify columns on your WHERE clause that are not in the columns specified on the SELECT clause.

Create table FEATUREQUESTIONS. Select all columns from table CELLPHONES. Read only those rows that satisfy the condition in the WHERE statement.

proc sql;
  create table featurequestions as
    select * from cellphones

      where reason contains "Help";

quit;

A Closer Look

Comparing the WHERE Statement in the DATA Step and the Subsetting IF Statement

You can create subsets of your data sets in the DATA step with either the WHERE statement or the subsetting IF statement. While they both test a condition to determine whether SAS should process an observation, only the WHERE statement can prevent observations from being read into the Program Data Vector (PDV), thereby potentially yielding significant savings in processing time.

WHERE Statement in the DATA Step: A WHERE statement tests the condition before an observation is read into the PDV. If the condition is true, SAS reads the observation into the PDV and processes it. If the condition is false, SAS does not read the observation into the PDV and moves on to evaluating the next observation. Preventing unneeded observations from being read into the PDV and then processed in the DATA step can save processing time. Other advantages of the WHERE statement over the subsetting IF statement include that it can be optimized with indexes on the variables that define the subsets and that more operators, such as LIKE and CONTAINS, can be used with the WHERE statement.

Subsetting IF Statement: A subsetting IF statement tests the condition after an observation is read into the PDV. If the condition is true, SAS continues processing this observation. If the condition is false, SAS discards the observation, ends the current iteration of the DATA step, and continues processing with the next observation. The subsetting IF statement does not use indexes even if they exist on the subsetting variables. A reason to use the subsetting IF statement instead of the WHERE statement is if you have complex conditions to test or complex calculations to perform that would make it difficult to write the WHERE statement.

Generally it is more efficient to use the WHERE statement, but if your data set is small (variables, or observations, or both), or if the subset is a large proportion of your data set, or both, it might not make much difference in processing time to use the subsetting IF statement. Also, if you need to use functions in your WHERE expression, the advantage of using WHERE might also diminish.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.131.100.60