Using Informats to Check for Invalid Values

You can accomplish the same result as the previous program by using user-defined informats. Remember that informats are used to replace values as the raw data is being read in or as the second argument in an INPUT function. Following is a program very similar to Program 2-9, however this one uses informats.

Program 2-10. Using User-Defined Informats to Detect Out-of-Range Data Values
PROC FORMAT;
   INVALUE HR_CK  40-100, . = 9999;
   INVALUE SBP_CK 80-200, . = 9999;
   INVALUE DBP_CK 60-120, . = 9999;
RUN;


DATA _NULL_;
   INFILE "C:CLEANINGPATIENTS.TXT" PAD;
   FILE PRINT; ***Send output to the Output window;
   TITLE "Listing of Invalid Patient Numbers and Data Values";
   ***Note: We will only input those variables of interest;
   INPUT @1  PATNO    $3.
         @15 HR        HR_CK3.
         @18 SBP       SBP_CK3.
         @21 DBP       DBP_CK3.;
   IF HR NE 9999 THEN PUT PATNO= HR=;
   IF SBP NE 9999 THEN PUT PATNO= SBP=;
   IF DBP NE 9999 THEN PUT PATNO= DBP=;
RUN;

PROC FORMAT is used to create three informats (note the use of INVALUE statements instead of the usual VALUE statements). For the informat HR_CK, any numeric value in the range 40 to 100 or missing is assigned a value of 9999. Note that you cannot assign a character value here because the result of a numeric informat must be numeric. In this example, using the value of 9999 is a good choice because 9999 can never be a valid value for any of the variables (they are stored in three columns in the input file).

Running Program 2-10 results in the following output:

Listing of Invalid Patient Numbers and Data Values

PATNO=004  HR=101
PATNO=008  HR=210
PATNO=009  SBP=240
PATNO=009  DBP=180
PATNO=010  SBP=40
PATNO=011  SBP=300
PATNO=011  DBP=20
PATNO=014  HR=22
PATNO=017  HR=208
PATNO=321  HR=900
PATNO=321  SBP=400
PATNO=321  DBP=200
PATNO=020  HR=10
PATNO=020  SBP=20
PATNO=020  DBP=8
PATNO=023  HR=22
PATNO=023  SBP=34
PATNO=027  HR=.


If you look carefully at the output from this program and the earlier program that used user-defined formats, you will notice that patient number 027 is listed here with a missing heart rate but not shown in the earlier listing. What’s going on? Inspection of the raw data shows a value of ’NA’ for heart rate for patient number 27. When you used a format, the original data value of ’NA’ was converted to a numeric missing value by the SAS processor (with a resulting message being written to the Log). The result of the PUT function was therefore ’OK’ and the value was not flagged as invalid. When an informat was used, the value ’NA’ was not in the valid range so that the value of 9999 was not assigned to heart rate and the value was flagged as invalid. If you would like to go the “extra mile,” you can use an enhanced numeric informat to assign a value to any alphabetic value. With this technique, you can distinguish invalid character values from true missing values. Program 2-11 demonstrates this.

Program 2-11. Modifying the Previous Program to Detect Invalid (Character) Data Values
PROC FORMAT;
   INVALUE HR_CK (UPCASE)
                  40 - 100, .  = 9999
                 'A' - 'Z'     = 8888;
   INVALUE SBP_CK (UPCASE)
                  80 - 200, .  = 9999
                  'A' - 'Z'    = 8888;
   INVALUE DBP_CK (UPCASE)
                  60 - 120, .  = 9999
                  'A' - 'Z'    = 8888;
RUN;


DATA _NULL_;
   INFILE "C:CLEANINGPATIENTS.TXT" PAD;
   FILE PRINT; ***Send output to the Output window;
   TITLE "Listing of Invalid Patient Numbers and Data Values";
   ***Note: We will only input those variables of interest;
   INPUT @1  PATNO    $3.
         @15 HR        HR_CK3.
         @18 SBP       SBP_CK3.
         @21 DBP       DBP_CK3.;
   IF HR = 8888 THEN PUT PATNO= "Invalid character value for HR";
   ELSE IF HR NE 9999 THEN PUT PATNO= HR=;


   IF SBP = 8888 THEN PUT PATNO= "Invalid character value for SBP";
   ELSE IF SBP NE 9999 THEN PUT PATNO= SBP=;


   IF DBP = 8888 THEN PUT PATNO= "Invalid character value for DBP";
   ELSE IF DBP NE 9999 THEN PUT PATNO= DBP=;
RUN;

The UPCASE option converts any character values to uppercase before it is determined if the value fits into one of the specified ranges. Notice that the ranges for the three informats contain both numeric ranges and character ranges. This feature, called an enhanced numeric informat, is very powerful and allows programs to read a combination of numeric and character data with a single informat (see SAS Technical Report P-222, Changes and Enhancements to Base SAS Software, Release 6.07). Notice in the next output, that patient number 27 is reported to have invalid character data for heart rate.

  Listing of Invalid Patient Numbers and Data Values

  PATNO=004 HR=101
  PATNO=008 HR=210
  PATNO=009 SBP=240
  PATNO=009 DBP=180
  PATNO=010 SBP=40
  PATNO=011 SBP=300
  PATNO=011 DBP=20
  PATNO=014 HR=22
  PATNO=017 HR=208
  PATNO=321 HR=900
  PATNO=321 SBP=400
  PATNO=321 DBP=200
  PATNO=020 HR=10
  PATNO=020 SBP=20
  PATNO=020 DBP=8
  PATNO=023 HR=22
  PATNO=023 SBP=34
  PATNO=027 Invalid character value for HR


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.105.83