Just as you did with character values in Chapter 1, you can use user-defined formats to check for out-of-range data values. Program 2-8 uses formats to find invalid data values, based on the same ranges used in Program 2-5 in this chapter.
PROC FORMAT; VALUE HR_CK 40-100, . = 'OK'; VALUE SBP_CK 80-200, . = 'OK'; VALUE DBP_CK 60-120, . = 'OK'; RUN; DATA _NULL_; INFILE "C:CLEANINGPATIENTS.TXT" PAD; FILE PRINT; ***Send output to the Output window; TITLE "Listing of Invalid Patient Numbers and Data Values"; ***Note: We will only input those variables of interest; INPUT @1 PATNO $3. @15 HR 3. @18 SBP 3. @21 DBP 3.; IF PUT(HR,HR_CK.) NE 'OK' THEN PUT PATNO= HR=; IF PUT(SBP,SBP_CK.) NE 'OK' THEN PUT PATNO= SBP=; IF PUT(DBP,DBP_CK.) NE 'OK' THEN PUT PATNO= DBP=; RUN; |
This is a fairly simple and efficient program. The user-defined formats HR_CK., SBP_CK., and DBP_CK. all assign the formatted value ’OK’ for any data value in the acceptable range. In the DATA step, the result of the PUT function is the value of the first argument (the variable to be tested) formatted by the format specified as the second calling argument of the function. For example, any value of heart rate between 40 and 100 (or missing) falls into the format range ’OK’. A value of 22 for heart rate does not fall within the range of 40 to 100 or missing and the formatted value ’OK’is not assigned. In that case, the PUT function for heart rate does not return the value ’OK’ and the IF statement condition is true. The appropriate PUT statement is then executed and the invalid value is printed to the print file.
Output from this program is shown next:
Listing of Invalid Patient Numbers and Data Values PATNO=004 HR=101 PATNO=008 HR=210 PATNO=009 SBP=240 PATNO=009 DBP=180 PATNO=010 SBP=40 PATNO=011 SBP=300 PATNO=011 DBP=20 PATNO=014 HR=22 PATNO=017 HR=208 PATNO=321 HR=900 PATNO=321 SBP=400 PATNO=321 DBP=200 PATNO=020 HR=10 PATNO=020 SBP=20 PATNO=020 DBP=8 PATNO=023 HR=22 PATNO=023 SBP=34 |
Notice that patient number 27, who had a value of ’NA’ for heart rate, did not appear in this listing. Why not? Well, the INPUT statement generates a missing value in its attempt to read a character value with a numeric informat. Because missing values are not treated as errors in this example, no error listing is produced for patient number 27. If you would like to include invalid character values (such as NA) as errors, you can use the internal _ERROR_ variable to check if such a value was processed by the INPUT statement. Unfortunately, the program cannot tell which variable for patient number 27 contained the invalid value. It is certainly possible to distinguish between invalid character values in numeric fields from true missing values. One possible approach is to use an enhanced numeric informat. Another is to read all of the numeric variables as character data, test the values, and then convert to numeric for range checking. In the section that follows, the program demonstrates how a user-defined enhanced numeric informat can be used. A simple “work-around” for program 2-8 is to test for any character values that were converted to missing values by using the internal variable _ERROR_, which gets set to ’1’ any time the input processor detects such an error. A modified version of Program 2-8, shown below, will print a notification that one or more variables for a patient had an invalid character value.
DATA _NULL_; INFILE "C:CLEANINGPATIENTS.TXT" PAD; FILE PRINT; ***Send output to the Output window; TITLE "Listing of Invalid Patient Numbers and Data Values"; ***Note: We will only input those variables of interest; INPUT @1 PATNO $3. @15 HR 3. @18 SBP 3. @21 DBP 3.; IF PUT(HR,HR_CK.) NE 'OK' OR _ERROR_ GT 0 THEN PUT PATNO= HR=; IF PUT(SBP,SBP_CK.) NE 'OK' OR _ERROR_ GT 0 THEN PUT PATNO= SBP=; IF PUT(DBP,DBP_CK.) NE 'OK' OR _ERROR_ GT 0 THEN PUT PATNO= DBP=; IF _ERROR_ GT 0 THEN PUT PATNO= "had one or more invalid character values"; ***Set the Error flag back to 0; _ERROR_ = 0; RUN; |
3.147.49.182