Using Formats to Check for Invalid Values

Just as you did with character values in Chapter 1, you can use user-defined formats to check for out-of-range data values. Program 2-8 uses formats to find invalid data values, based on the same ranges used in Program 2-5 in this chapter.

Program 2-8. Detecting Out-of-Range Values Using User-Defined Formats
PROC FORMAT;
   VALUE HR_CK  40-100, . = 'OK';
   VALUE SBP_CK 80-200, . = 'OK';
   VALUE DBP_CK 60-120, . = 'OK';
RUN;


DATA _NULL_;
   INFILE "C:CLEANINGPATIENTS.TXT" PAD;
   FILE PRINT; ***Send output to the Output window;
   TITLE "Listing of Invalid Patient Numbers and Data Values";
   ***Note: We will only input those variables of interest;
   INPUT @1  PATNO    $3.
         @15 HR        3.
         @18 SBP       3.
         @21 DBP       3.;
   IF PUT(HR,HR_CK.)   NE 'OK' THEN PUT PATNO= HR=;
   IF PUT(SBP,SBP_CK.) NE 'OK' THEN PUT PATNO= SBP=;
   IF PUT(DBP,DBP_CK.) NE 'OK' THEN PUT PATNO= DBP=;
RUN;

This is a fairly simple and efficient program. The user-defined formats HR_CK., SBP_CK., and DBP_CK. all assign the formatted value ’OK’ for any data value in the acceptable range. In the DATA step, the result of the PUT function is the value of the first argument (the variable to be tested) formatted by the format specified as the second calling argument of the function. For example, any value of heart rate between 40 and 100 (or missing) falls into the format range ’OK’. A value of 22 for heart rate does not fall within the range of 40 to 100 or missing and the formatted value ’OK’is not assigned. In that case, the PUT function for heart rate does not return the value ’OK’ and the IF statement condition is true. The appropriate PUT statement is then executed and the invalid value is printed to the print file.

Output from this program is shown next:

Listing of Invalid Patient Numbers and Data Values

PATNO=004  HR=101
PATNO=008  HR=210
PATNO=009  SBP=240
PATNO=009  DBP=180
PATNO=010  SBP=40
PATNO=011  SBP=300
PATNO=011  DBP=20
PATNO=014  HR=22
PATNO=017  HR=208
PATNO=321  HR=900
PATNO=321  SBP=400
PATNO=321  DBP=200
PATNO=020  HR=10
PATNO=020  SBP=20
PATNO=020  DBP=8
PATNO=023  HR=22
PATNO=023  SBP=34


Notice that patient number 27, who had a value of ’NA’ for heart rate, did not appear in this listing. Why not? Well, the INPUT statement generates a missing value in its attempt to read a character value with a numeric informat. Because missing values are not treated as errors in this example, no error listing is produced for patient number 27. If you would like to include invalid character values (such as NA) as errors, you can use the internal _ERROR_ variable to check if such a value was processed by the INPUT statement. Unfortunately, the program cannot tell which variable for patient number 27 contained the invalid value. It is certainly possible to distinguish between invalid character values in numeric fields from true missing values. One possible approach is to use an enhanced numeric informat. Another is to read all of the numeric variables as character data, test the values, and then convert to numeric for range checking. In the section that follows, the program demonstrates how a user-defined enhanced numeric informat can be used. A simple “work-around” for program 2-8 is to test for any character values that were converted to missing values by using the internal variable _ERROR_, which gets set to ’1’ any time the input processor detects such an error. A modified version of Program 2-8, shown below, will print a notification that one or more variables for a patient had an invalid character value.

Program 2-9. Modifying the Previous Program to Detect Invalid (Character) Data Values
DATA _NULL_;
   INFILE "C:CLEANINGPATIENTS.TXT" PAD;
   FILE PRINT; ***Send output to the Output window;
   TITLE "Listing of Invalid Patient Numbers and Data Values";
   ***Note: We will only input those variables of interest;
   INPUT @1  PATNO    $3.
         @15 HR        3.
         @18 SBP       3.
         @21 DBP       3.;
   IF PUT(HR,HR_CK.)   NE 'OK' OR _ERROR_ GT 0 THEN PUT PATNO= HR=;
   IF PUT(SBP,SBP_CK.) NE 'OK' OR _ERROR_ GT 0 THEN PUT PATNO= SBP=;
   IF PUT(DBP,DBP_CK.) NE 'OK' OR _ERROR_ GT 0 THEN PUT PATNO= DBP=;
   IF _ERROR_ GT 0 THEN
      PUT PATNO= "had one or more invalid character values";
   ***Set the Error flag back to 0;
   _ERROR_ = 0;
RUN;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.49.182