Creating a Macro for Range Checking

Because range checking is such a common data cleaning task, it makes some sense to automate the procedure somewhat. Looking at Program 2-6, you see that the range checking lines all look similar except for the variable names and the low and high cutoff values. Even if you are not a macro expert, the following program should not be too difficult to understand. (For an excellent review of macro programming, I recommend two books in SAS Institute’s BBU series: SAS® Macro Programming Made Easy, by Michele M. Burlew, and Carpenter’s Complete Guide to the SAS® Macro Language, by Art Carpenter, both published by SAS Institute, Cary, NC.)

Program 2-7. Writing a Macro to List Out-of-Range Data Values
*---------------------------------------------------------------*
| Program Name: RANGE.SAS  in C:CLEANING                       |
| Purpose: Macro that takes lower and upper limits for a        |
|          numeric variable and an ID variable to print out     |
|          an exception report to the Output window.            |
| Arguments: DSN    - Data set name                             |
|            VAR    - Numeric variable to test                  |
|            LOW    - Lowest valid value                        |
|            HIGH   - Highest valid value                       |
|            IDVAR  - ID variable to print in the exception     |
|                     report                                    |
| Example: %RANGE(CLEAN.PATIENTS,HR,40,100,PATNO)               |
*---------------------------------------------------------------*;


%MACRO RANGE(DSN,VAR,LOW,HIGH,IDVAR);


   TITLE "Listing of Invalid Patient Numbers and Data Values";
   DATA _NULL_;
      SET &DSN(KEEP=&IDVAR &VAR);  1
      FILE PRINT;
      IF (&VAR LT &LOW AND &VAR NE .) OR &VAR GT &HIGH THEN
         PUT "&IDVAR:" &IDVAR  @18 "Variable:&VAR"
                               @38 "Value:" &VAR
                               @50 "out-of-range";
   RUN;


%MEND RANGE;

First, a brief explanation. A macro program is a piece of SAS code where parts of the code are substituted with variable information by the macro processor before the code is processed in the usual way by the SAS compiler. The macro in Program 2-7 is named RANGE, and it begins with a %MACRO statement and ends with a %MEND (macro end) statement. The first line of the macro contains the macro name, followed by a list of arguments. When the macro is called, the macro processor replaces each of these arguments with the values you specify. Then, in the macro program, every macro variable (that is, every variable name preceded by an ampersand (&)) is replaced by the assigned value. For example, if you want to use this macro to look for out-of-range values for heart rate in the PATIENTS data set, you would call the macro like this

%RANGE(CLEAN.PATIENTS,HR,40,100,PATNO)

The macro processor will substitute these calling arguments for the &variables in the macro program. For example, line will become:

SET CLEAN.PATIENTS(KEEP=PATNO HR);

&DSN was replaced by CLEAN.PATIENTS, &IDVAR was replaced by PATNO, and &VAR was replaced by HR. To be sure this concept is clear, (and to help you understand how the macro processor works), you can call the macro with the MPRINT option turned on. This option lists the macro generated code in the SAS Log. Here is the section of the SAS Log containing the macro generated statements when the macro is called with the above arguments:

MPRINT(RANGE):   TITLE  "Listing  of  Invalid  Patient  Numbers  and  Data
Values";
MPRINT(RANGE):   DATA _NULL_;
MPRINT(RANGE):   SET CLEAN.PATIENTS(KEEP=PATNO HR);
MPRINT(RANGE):   FILE PRINT;
MPRINT(RANGE):   IF (HR LT 40 AND HR NE .) OR HR GT 100 THEN PUT "PATNO:"
                 PATNO  @18 "Variable:HR"  @38 "Value:"  HR  @50 "out-of-
range";
MPRINT(RANGE):   RUN;

By the way, the missing semicolon at the end of the line where the macro is called is not a mistake — you don’t need it. The reason is that the macro code contains a semicolon after the last RUN statement so that an extra semicolon is unnecessary. If you like, you may put one in any way. As pointed out by Mike Zdeb, one of my reviewers, if you include the unnecessary semicolon, you can change the line to a comment by placing an asterisk at the beginning.

The results from running the macro with the above calling arguments are listed next:

Listing of Invalid Patient Numbers and Data Values

PATNO:004        Variable:HR         Value:101  
 out-of-range
PATNO:008        Variable:HR         Value:210  
 out-of-range
PATNO:014        Variable:HR         Value:22   
 out-of-range
PATNO:017        Variable:HR         Value:208  
 out-of-range
PATNO:321        Variable:HR         Value:900  
 out-of-range
PATNO:020        Variable:HR         Value:10   
 out-of-range
PATNO:023        Variable:HR         Value:22   
 out-of-range


While this saves on programming time, it is not as efficient as a program that checks all the numeric variables in one DATA step. However, sometimes it is reasonable to sacrifice computer time for human time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.17.64