Subsetting Data

Deleting Unwanted Observations

You can specify any executable SAS statement in an IF-THEN statement. For example, you can use an IF-THEN statement with a DELETE statement to determine which observations to omit as you read data.
Syntax, DELETE statement:
DELETE;
To conditionally execute a DELETE statement, use the following syntax for an IF statement:
IF expression THEN DELETE;
The expression is evaluated as follows:
  • If it is true, execution stops for that observation. The DELETE statement deletes the observation from the output data set, and control returns to the top of the DATA step.
  • If it is false, the DELETE statement does not execute, and processing continues with the next statement in the DATA step.

Example: IF-THEN and DELETE Statements

The IF-THEN and DELETE statements below omit any observations whose values for RestHR are lower than 70.
data clinic.stress; 
   infile tests; 
    input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
          RecHR 35-37 TimeMin 39-40 TimeSec 42-43
          Tolerance $ 45; 
   if resthr<70 then delete;
   TotalTime=(timemin*60)+timesec; 
   retain SumSec 5400; 
   sumsec+totaltime; 
   length TestLength $ 6; 
   if totaltime>800 then testlength='Long'; 
   else if 750<=totaltime<=800 then testlength='Normal'; 
   else if totaltime<750 then TestLength='Short'; 
run;

Selecting Variables

Sometimes you want to read and process variables that you do not want to keep in your output data set. In this case, use the DROP= and KEEP= data set options to specify the variables to drop or keep.
Use the KEEP= option instead of the DROP= option if more variables are dropped than kept.
Syntax, DROP=, and KEEP= data set options:
(DROP=variable(s))
(KEEP=variable(s))
  • The DROP= or KEEP= options, in parentheses, follow the names of the data sets that contain the variables to be dropped or kept.
  • variable(s) identifies the variables to drop or keep.

Example: DROP Data Set Option

Suppose you want to use the TimeMin and TimeSec variables to calculate the total time in the TotalTime variable, but you do not want to keep them in the output data set. You want to keep only the TotalTime variable. When you use the DROP data set option, the TimeMin and TimeSec variables are not written to the output data set:
data clinic.stress(drop=timemin timesec); 
   infile tests; 
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45; 
   if tolerance='D'; 
   TotalTime=(timemin*60)+timesec; 
   retain SumSec 5400; 
   sumsec+totaltime; 
   length TestLength $ 6; 
   if totaltime>800 then testlength='Long'; 
   else if 750<=totaltime<=800 then testlength='Normal'; 
   else if totaltime<750 then TestLength='Short'; 
run;
Figure 11.4 Stress Data Set with Dropped Variables (partial output)
Partial Output: Stress Data Set with Dropped Variables
Another way to exclude variables from a data set is to use the DROP statement or the KEEP statement. Like the DROP= and KEEP= data set options, these statements drop or keep variables. However, the DROP and KEEP statements differ from the DROP= and KEEP= data set options in the following ways:
  • You cannot use the DROP and KEEP statements in SAS procedure steps.
  • The DROP and KEEP statements apply to all output data sets that are named in the DATA statement. To exclude variables from some data sets but not from others, use the DROP= and KEEP= data set options in the DATA statement.
The KEEP statement is similar to the DROP statement, except that the KEEP statement specifies a list of variables to write to output data sets. Use the KEEP statement instead of the DROP statement if the number of variables to keep is smaller than the number to drop.
Syntax, DROP, and KEEP statements:
DROP variable(s);
KEEP variable(s);
variable(s) identifies the variables to drop or keep.

Example: Using the DROP= Data Set Option and DROP Statement

The two programs below produce the same results. The first example uses the DROP= data set option. The second example uses the DROP statement.
data clinic.stress(drop=timemin timesec); 
   infile tests; 
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45; 
   if tolerance='D'; 
   TotalTime=(timemin*60)+timesec; 
   retain SumSec 5400; 
   sumsec+totaltime; 
   length TestLength $ 6; 
   if totaltime>800 then testlength='Long'; 
   else if 750<=totaltime<=800 then testlength='Normal'; 
   else if totaltime<750 then TestLength='Short'; 
run; 

data clinic.stress; 
   infile tests; 
   input ID $ 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
         RecHR 35-37 TimeMin 39-40 TimeSec 42-43
         Tolerance $ 45; 
   if tolerance='D'; 
   drop timemin timesec; 
   TotalTime=(timemin*60)+timesec; 
   retain SumSec 5400; 
   sumsec+totaltime; 
   length TestLength $ 6; 
   if totaltime>800 then testlength='Long'; 
   else if 750<=totaltime<=800 then testlength='Normal'; 
   else if totaltime<750 then TestLength='Short'; 
run;
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.117.191