Subsetting Data

Using a Subsetting IF Statement

The subsetting IF statement causes the DATA step to continue processing only those observations that meet the condition of the expression specified in the IF statement. The resulting SAS data set or data sets contain a subset of the original external file or SAS data set.
Syntax, subsetting IF statement:
IF expression;
expression is any valid SAS expression.
  • If the expression is true, the DATA step continues to process that observation.
  • If the expression is false, no further statements are processed for that observation, and control returns to the top of the DATA step.

Example: Subsetting IF Statement

The subsetting IF statement below selects only observations whose values for Tolerance are D. It is positioned in the DATA step for efficiency: other statements do not need to process unwanted observations.
data work.stresstest;
  set cert.tests;
  if tolerance='D';
    TotalTime=(timemin*60)+timesec;
run;
proc print data=work.stresstest;
run;
Because Tolerance is a character variable, the value D must be enclosed in quotation marks, and it must be the same case as in the data set.
Notice that, in the output below, only the values where Tolerance contains the value of D are displayed and TotalTime was calculated.
Output 9.4 Subsetted Data of Work.StressTest
Subsetted data of Work.Stresstest

Categorizing Values

Suppose you want to create a variable that categorizes the length of time that a subject spends on the treadmill during a stress test. This new variable, TestLength, is based on the value of the existing variable TotalTime. The value of TestLength is assigned conditionally:
Value for TotalTime
Resulting Value for TestLength
greater than 800
Long
750 - 800
Normal
less than 750
Short
To perform an action conditionally, use an IF-THEN statement. The IF-THEN statement executes a SAS statement when the condition in the IF clause is true.
Syntax, IF-THEN statement:
IF expression THEN statement;
  • expression is any valid SAS expression.
  • statement is any executable SAS statement.

Example: IF-THEN Statement

To assign the value Long to the variable TestLength when the value of TotalTime is greater than 800, add the following IF-THEN statement to your DATA step:
data work.stresstest;
  set cert.tests;
  TotalTime=(timemin*60)+timesec; 
  retain SumSec 5400; 
  sumsec+totaltime; 
  if totaltime>800 then TestLength='Long'; 
run;
SAS executes the assignment statement only when the condition (TotalTime>800) is true. If the condition is false, the value of TestLength is missing.

Examples: Logical Operators

The following examples use IF-THEN statements with logical operators:
  • Use the AND operator to execute the THEN statement if both expressions that are linked by AND are true.
    if status='OK' and type=3 
        then Count+1; 
    if (age^=agecheck | time^=3) 
        & error=1 then Test=1;
  • Use the OR operator to execute the THEN statement if either expression that is linked by OR is true.
    if (age^=agecheck | time^=3) 
        & error=1 then Test=1; 
    if status='S' or cond='E' 
        then Control='Stop';
  • Use the NOT operator with other operators to reverse the logic of a comparison.
    if not(loghours<7500) 
        then Schedule='Quarterly'; 
    if region not in ('NE','SE') 
        then Bonus=200;
  • Character values must be specified in the same case in which they appear in the data set and must be enclosed in quotation marks.
    if status='OK' and type=3 
        then Count+1; 
    if status='S' or cond='E' 
        then Control='Stop'; 
    if not(loghours<7500) 
        then Schedule='Quarterly'; 
    if region not in ('NE','SE') 
        then Bonus=200;
Logical comparisons that are enclosed in parentheses are evaluated as true or false before they are compared to other expressions. In the example below, the OR comparison in parenthesis is evaluated before the first expression and the AND operator are evaluated.
Figure 9.3 Example of a Logical Comparison
Example of a Logical Comparison
Therefore, be careful when using the OR operator with a series of comparisons. Remember that only one comparison in a series of OR comparisons must be true to make a condition true, and any nonzero, not missing constant is always evaluated as true. Therefore, the following subsetting IF statement is always true:
if x=1 or 2;
SAS first evaluates x=1, and the result can be either true or false. However, since the 2 is evaluated as nonzero and not missing (true), the entire expression is true. In this statement, however, the condition is not necessarily true because either comparison can be evaluated as true or false:
if x=1 or x=2;
Note: Both sides of the OR must contain complete expressions.

Providing an Alternative Action

Suppose you want to assign a value to TestLength based on the other possible values of TotalTime. One way to do this is to add IF-THEN statements for the other two conditions.
    if totaltime>800 then TestLength='Long';
    if 750<=totaltime<=800 then TestLength='Normal';
    if totaltime<750 then TestLength='Short';
However, when the DATA step executes, each IF statement is evaluated in order, even if the first condition is true. This wastes system resources and slows the processing of your program.
Instead of using a series of IF-THEN statements, you can use the ELSE statement to specify an alternative action to be performed when the condition in an IF-THEN statement is false. As shown below, you can write multiple IF-THEN/ELSE statements to specify a series of mutually exclusive conditions.
    if totaltime>800 then TestLength='Long';
      else if 750<=totaltime<=800 then TestLength='Normal';
      else if totaltime<750 then TestLength='Short';
The ELSE statement must immediately follow the IF-THEN statement in your program. An ELSE statement executes only if the previous IF-THEN/ELSE statement is false.
Syntax, ELSE statement:
ELSE statement;
statement is any executable SAS statement, including another IF-THEN statement.
To assign a value to TestLength when the condition in your IF-THEN statement is false, you can add the ELSE statement to your DATA step:
data work.stresstest;
  set cert.tests;
  TotalTime=(timemin*60)+timesec;
  retain SumSec 5400;
  sumsec+totaltime;
  length TestLength $6;
  if totaltime>800 then TestLength='Long'; 
      else if 750<=totaltime<=800 then TestLength='Normal'; 
      else if totaltime<750 then TestLength='Short'; 
run;
proc print data=work.stresstest;
run;
For greater efficiency, construct your IF-THEN/ELSE statements with conditions of decreasing probability.
Tip
You can use PUT statements to test your conditional logic.
if totaltime>800 then TestLength='Long'; 
   else if 750<=totaltime<=800 then TestLength='Normal'; 
      else put 'NOTE: Check this Length: ' totaltime=; 
run;

Deleting Unwanted Observations

You can specify any executable SAS statement in an IF-THEN statement. For example, you can use an IF-THEN statement with a DELETE statement to determine which observations to omit as you read data.
Syntax, DELETE statement:
DELETE;
To conditionally execute a DELETE statement, use the following syntax for an IF statement:
IF expression THEN DELETE;
The expression is evaluated as follows:
  • If it is true, execution stops for that observation. The DELETE statement deletes the observation from the output data set, and control returns to the top of the DATA step.
  • If it is false, the DELETE statement does not execute, and processing continues with the next statement in the DATA step.

Example: IF-THEN and DELETE Statements

In the following example, the IF-THEN and DELETE statements omit any observations whose values for RestHR are below 70.
data work.stresstest;
  set cert.tests;
  if resthr<70 then delete;
  TotalTime=(timemin*60)+timesec;
  retain SumSec 5400;
  sumsec+totaltime;
  length TestLength $6;
  if totaltime>800 then TestLength='Long'; 
    else if 750<=totaltime<=800 then TestLength='Normal'; 
    else if totaltime<750 then TestLength='Short'; 
run;
proc print data=work.stresstest;
run;
Output 9.5 Values for RestHR Less Than 70 Are Not in the Output (partial output)
Partial Output: Values for RestHR Less Than 70 Are Not In The Output

Selecting Variables

You might want to read and process variables that you do not want to keep in your output data set. In this case, use the DROP= and KEEP= data set options to specify the variables to drop or keep.
Use the KEEP= option instead of the DROP= option if more variables are dropped than kept.
Syntax, DROP=, and KEEP= data set options:
(DROP=variable(s))
(KEEP=variable(s))
  • The DROP= or KEEP= options, in parentheses, follow the names of the data sets that contain the variables to be dropped or kept.
  • variable(s) identifies the variables to drop or keep.

Example: DROP Data Set Option

Suppose you want to use theTimeMin and TimeSec variables to calculate the total time in the TotalTime variable, but you do not want to keep them in the output data set. You want to keep only the TotalTime variable. When you use the DROP data set option, the TimeMin and TimeSec variables are not written to the output data set:
data work.stresstest (drop=timemin timesec);
  set cert.tests;
  if resthr<70 then delete;
  TotalTime=(timemin*60)+timesec;
  retain SumSec 5400;
  sumsec+totaltime;
  length TestLength $6;
  if totaltime>800 then TestLength='Long'; 
    else if 750<=totaltime<=800 then TestLength='Normal'; 
      else if totaltime<750 then TestLength='Short'; 
run;
proc print data=work.stresstest;
run;
Output 9.6 StressTest Data Set with Dropped Variables (partial output)
Partial Output: StressTest Data Set with Dropped Variables
Another way to exclude variables from a data set is to use the DROP statement or the KEEP statement. Like the DROP= and KEEP= data set options, these statements drop or keep variables. However, the DROP and KEEP statements differ from the DROP= and KEEP= data set options in the following ways:
  • You cannot use the DROP and KEEP statements in SAS procedure steps.
  • The DROP and KEEP statements apply to all output data sets that are named in the DATA statement. To exclude variables from some data sets but not from others, use the DROP= and KEEP= data set options in the DATA statement.
The KEEP statement is similar to the DROP statement, except that the KEEP statement specifies a list of variables to write to output data sets. Use the KEEP statement instead of the DROP statement if the number of variables to keep is smaller than the number to drop.
Syntax, DROP, and KEEP statements:
DROP variable(s);
KEEP variable(s);
variable(s) identifies the variables to drop or keep.

Example: Using the DROP Statement

The following example uses the DROP statement to drop unwanted variables.
data work.stresstest;
  set cert.tests;
  if tolerance='D';
  drop timemin timesec;
  TotalTime=(timemin*60)+timesec;
  retain SumSec 5400;
  sumsec+totaltime;
  length TestLength $6;
  if totaltime>800 then TestLength='Long'; 
    else if 750<=totaltime<=800 then TestLength='Normal'; 
    else if totaltime<750 then TestLength='Short'; 
run;
proc print data=work.stresstest;
run;
Last updated: August 23, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.88.110