Reading and Verifying the Data

Verifying the Code That Reads the Data

Before you read a complete external file, you can verify the code that reads the data by limiting the number of observations that SAS reads. Adding OBS=n to the INFILE statement enables you to process records only 1 through n, so you can verify that the correct fields are read before reading the entire data file.
The program below reads the first 10 records in the raw data file that is referenced by the fileref Tests. The data is stored in a permanent SAS data set, named sasuser.stress. The RUN statement tells SAS to execute the previous SAS statements.
data sasuser.stress; 
   infile tests obs=10; 
   input ID $ 1-4 Name $ 6-25  
         RestHR 27-29 MaxHR 31-33 
         RecHR 35-37 TimeMin 39-40  
         TimeSec 42-43 Tolerance $ 45; 
run;

Checking DATA Step Processing

Messages in the log verify that the raw data file was read correctly. The notes in the log indicate the following:
  • 10 records were read from the raw data file.
  • The SAS data set sasuser.stress was created with 10 observations and 8 variables.
Log 6.1 SAS Log
NOTE: The infile TESTS is:
			 Filename=Z:sasuser	ests.dat,
      RECFM=V,LRECL=32767,File Size (bytes)=1722,
      Last Modified=02Feb2017:13:50:22,
      Create Time=19Dec2016:12:49:09

NOTE: 10 records were read from the infile TESTS.
			 The minimum record length was 80.
			 The maximum record length was 80.
NOTE: The data set SASUSER.STRESS has 10 observations 
			 and 8 variables. 
NOTE: DATA statement used 0.07 seconds

Printing the Data Set

The messages in the log seem to indicate that the DATA step program correctly accessed the raw data file. But it is a good idea to look at the 10 observations in the new data set before reading the entire raw data file. You can submit a PROC PRINT step to view the data.
The following PROC PRINT step prints the Sasuser.Stress data set.
proc print data=sasuser.stress;
run;
The PROC PRINT output indicates that the variables in the Sasuser.Stress data set were read correctly for the first 10 records.
Figure 6.10 PROC Print Output
PROC Print Output

Reading the Entire Raw Data File

To modify the DATA step to read the entire raw data file, remove the OBS= option from the INFILE statement and resubmit the program.
data sasuser.stress; 
   infile tests; 
   input ID $ 1-4 Name $ 6-25 
         RestHR 27-29 MaxHR 31-33 
         RecHR 35-37 TimeMin 39-40 
         TimeSec 42-43 Tolerance $ 45; 
run;

Invalid Data

The log includes a note indicating that invalid data appears for the variable RecHR in line 14 of the raw data file, columns 35-37.
This note is followed by a column ruler and the actual data line that contains the invalid value for RecHR.
Output 6.1 SAS Log
NOTE: Invalid data for RecHR in line 14 35-37.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----
14        2575 Quigley, M           74  152 Q13 11 26 I                                    80
ID=2575 Name=Quigley, M RestHR=74 MaxHR=152 RecHR=. TimeMin=11 TimeSec=26 Tolerance=I _ERROR_=1
_N_=14
NOTE: 21 records were read from the infile TEST.
      The minimum record length was 80.
      The maximum record length was 80.
NOTE: The data set SASUSER.STRESS has 21 observations and 8 variables.
NOTE: DATA statement used 0.13 seconds
The value Q13 is a data-entry error. It was entered incorrectly for the variable RecHR.
RecHR is a numeric variable, but Q13 is not a valid number. So RecHR is assigned a missing value, as indicated in the log. Because RecHR is numeric, the missing value is represented with a period (RecHR=.).
Notice, though, that the DATA step does not fail as a result of the invalid data but continues to execute. Unlike syntax errors, invalid data errors do not cause SAS to stop processing a program.
When you correct the invalid value and rerun the DATA step, the log will then show that the data set Sasuser.Stress was created with 21 observations and 8 variables. There will be no messages about invalid data.
Output 6.2 SAS Log
NOTE: The infile TESTS2 is:
			 File Name=Z:sasuser	ests.dat, 
			 RECFM=V, LRECL=256
NOTE: 21 records were read from the infile TESTS.
			 The minimum record length was 80.
			 The maximum record length was 80.
NOTE: The data set SASUSER.STRESS has 21 observations 
			 and 8 variables. 
NOTE: DATA statement used 0.14 seconds
After correcting the raw data file, you can print the data again to verify that it is correct.
proc print data=sasuser.stress; 
run;
Figure 6.11 PROC Print Output
PROC Print Output
When you use the DATA step to read raw data, remember the steps that you followed in this chapter. These help you to avoid wasting resources when accessing data:
  • write the DATA step using the OBS= option in the INFILE statement
  • submit the DATA step
  • check the log for messages
  • view the resulting data set
  • remove the OBS= option and resubmit the DATA step
  • check the log again
  • view the resulting data set again
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.113.163