Selecting Observations for a New SAS Data Set

Deleting Observations Based on a Condition

There are two ways to select specific observations in a SAS data set when you create a new SAS data set:
  1. Delete the observations that do not meet a condition, keeping only the ones that you want.
  2. Accept only the observations that meet a condition.
To delete an observation, first identify it with an IF condition, and then use a DELETE statement in the THEN clause:
IF condition THEN DELETE;
Processing the DELETE statement for an observation causes SAS to return immediately to the beginning of the DATA step for a new observation without writing the current observation to the output DATA set. The DELETE statement does not include the observation in the output data set, but it does not delete the observation from the input data set. For example, the following statement deletes observations that contain a missing value for LandCost:
if LandCost=. then delete;
The following DATA step includes this statement:
data remove;
   set mylib.arts;
   if LandCost=. then delete;
run;

proc print data=remove;
   title 'Tours With Complete Land Costs';
run;
The following output displays the results.
Display 11.2 Deleting Observations That Have a Particular Value
Deleting Observations That Have a Particular Value
Warsaw, the observation that is missing a value for LandCost, is not included in the resulting data set, REMOVE.
You can also delete observations as you enter data from an external file. The following DATA step produces the same SAS data set as the REMOVE data set.
data remove2;
   infile 'input-file' truncover;
   input City $ 1-9 Nights 11 LandCost 13-16 Budget $ 18-23
         TourGuide $ 25-32;
   if LandCost=. then delete;
run;

proc print data=remove2;
   title 'Tours With Complete Land Costs';
run;
The following output displays the results.
Display 11.3 Deleting Observations While Reading from an External File
Deleting Observations While Reading from an External File

Accepting Observations Based on a Condition

One data set that is needed by the travel agency contains observations for tours that last only six nights. One way to make the selection is to delete observations in which the value of Nights is not equal to 6:
if Nights ne 6 then delete;
A more straightforward way is to select only observations meeting the criterion. The subsetting IF statement selects the observations that you specify. It contains only a condition:
IF condition;
The implicit action in a subsetting IF statement is always the same: if the condition is true, then continue processing the observation. If it is false, then stop processing the observation and return to the top of the DATA step for a new observation. The statement is called subsetting because the result is a subset of the original observations. For example, if you want to select only observations in which the value of Nights is equal to 6, then you specify the following statement:
if Nights = 6;
The following DATA step includes the subsetting IF statement:
data subset6;
   set mylib.arts;
   if nights=6;
run; 

proc print data=subset6;
   title 'Six-Night Tours';
run;
The following output displays the results.
Display 11.4 Selecting Observations with a Subsetting IF Statement
Selecting Observations with a Subsetting IF Statement
Two observations met the criteria for a six-night tour.

Comparing the DELETE and Subsetting IF Statements

These are the main reasons to consider when choosing between a DELETE statement and a subsetting IF statement:
  • It is usually easier to choose the statement that requires the fewest comparisons to identify the condition.
  • It is usually easier to think in positive terms than negative ones (this favors the subsetting IF).
One additional situation favors the subsetting IF: it is the safer method to use if your data has missing or misspelled values. Consider the following situation.
Tradewinds Travel needs a SAS data set of low-priced to medium-priced tours. Knowing that the values of Budget are Low, Medium, and High, a first thought would be to delete observations with a value of High. The following program creates a SAS data set by deleting observations that have a Budget value of HIGH:
   /* first attempt */
data lowmed;
   set mylib.arts;
   if upcase(Budget)='HIGH' then delete;
run; 

proc print data=lowmed;
   title 'Medium and Low Priced Tours';
run;
The following output displays the results.
Display 11.5 Producing a Subset by Deletion
Producing a Subset by Deletion
The data set LOWMED contains both the tours that you want and the tour to Warsaw. The inclusion of the tour to Warsaw is erroneous because the value of Budget for the Warsaw observation is missing. Using a subsetting IF statement ensures that the data set contains exactly the observations that you want. This DATA step creates the subset with a subsetting IF statement:
   /* a safer method */
data lowmed2;
   set mylib.arts;
   if upcase(Budget)='MEDIUM' or upcase(Budget)='LOW';
run;

proc print data=lowmed2;
   title 'Medium and Low Priced Tours';
run;
The following output displays the results.
Display 11.6 Producing an Exact Subset with the Subsetting IF Statement
Producing an Exact Subset with the Subsetting IF Statement
The result is a SAS data set with no missing values for Budget.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.174.202