Reading the Same Number of Repeating Fields

The Basics of Reading the Same Number of Repeating Fields

Each record in the file Data97 contains a sales representative's ID number, followed by four repeating fields that represent his or her quarterly sales totals for 1997. You want to pair each employee ID number with one quarterly sales total to produce a single observation. Four observations are generated from each record.
Figure 21.17 Multiple Fields for the Same ID
Raw data file and output file showing ID number and four repeating fields.
To accomplish this, you must execute the DATA step once for each record, repetitively reading and writing values in one iteration.
This means that a DATA step must do the following:
  • read the value for ID and hold the current record
  • create a new variable named Quarter to identify the fiscal quarter for each sales figure
  • read a new value for Sales and write the values to the data set as an observation
  • continue reading a new value for Sales and writing values to the data set three more times

Using the Single Trailing At-Sign (@) to Hold the Current Record

To read the value for ID and hold the record so that subsequent values for Sales can be read, submit the following code.
data perm.sales07; 
   infile data07; 
   input ID $
Figure 21.18 Holding a Record
Raw data file with the first record highlighted.
Recall that, in general, the trailing @@ holds the current record across multiple iterations of the DATA step.
However, in this case, you want to hold the record with the trailing @, so that a second INPUT statement can read the multiple sales values from a single record within the same iteration of the DATA step. Like the trailing @@, the single trailing @ can do the following:
  • It enables the next INPUT statement to continue reading from the same record.
  • It releases the current record when a subsequent INPUT statement executes without a line-hold specifier.
To distinguish between the trailing @@ and the trailing @, remember the following:
  • The double trailing at-sign (@@) holds a record across multiple iterations of the DATA step until the end of the record is reached.
  • The single trailing at-sign (@) releases a record when control returns to the top of the DATA step.
In this example, the first INPUT statement reads the value for ID and uses the trailing @ to hold the current record for the next INPUT statement in the DATA step.
data perm.sales07; 
   infile data07; 
   input ID $ @; 
   input Sales : comma. @; 
output;
Figure 21.19 Reading the Value for ID
Raw data file showing the first ID highlighted.
The second INPUT statement reads a value for Sales and holds the record. The COMMAw.d informat in the INPUT statement reads the numeric value for Sales and removes the embedded commas. An OUTPUT statement writes the observation to the SAS data set, and the DATA step continues processing.
Note: The COMMAw.d informat does not specify a w value. Remember that list input reads values until the next blank is detected. The default length of numeric variables is 8 bytes, so you do not need to specify a w value to determine the length of a numeric variable.
When all of the repeating fields have been read and written to output, control returns to the top of the DATA step, and the record is released.
data perm.sales07; 
   infile data07; 
   input ID $ @; 
   input Sales : comma. @; 
   output; 
   input Sales : comma. @; 
   output; 
   input Sales : comma. @; 
   output; 
   input Sales : comma. @; 
   output; 
run;
Figure 21.20 Reading the Value for Sales
Raw data file showing the first value for Sales highlighted.

Tips for More Efficient Programming

Each record contains four different values for the variable Sales. Therefore, the INPUT statement must execute four times. Rather than writing four INPUT statements, you can execute one INPUT statement repeatedly in an iterative DO loop.
Each time the loop executes, you need to write the values for ID, Quarter, and Sales as an observation to the data set. This is easily accomplished by using the OUTPUT statement.
data perm.sales07; 
   infile data07; 
   input ID $ @; 
   do Quarter=1 to 4; 
      input Sales : comma. @; 
      output; 
   end; 
run;
By default, every DATA step contains an implicit OUTPUT statement at the end of the step. Placing an explicit OUTPUT statement in a DATA step overrides the automatic output, and SAS adds an observation to a data set only when the explicit OUTPUT statement is executed.

Processing a DATA Step That Contains an Iterative DO Loop

Now that the program is complete, here is how SAS processes a DATA step that contains an iterative DO loop.
data perm.sales07;  
   infile data07; 
   input ID $ @; 
   do Quarter=1 to 4; 
      input Sales : comma. @; 
      output; 
   end; 
run;
During the first iteration, the value for ID is read and Quarter is initialized to 1 as the loop begins to execute.
Figure 21.21 Reading the Value for ID and Initializing Quarter
Raw data file showing the value for ID highlighted. Program Data Vector showing Quarter initialized to 1.
The INPUT statement reads the first repeating field and assigns the value to Sales in the program data vector. The @ holds the current record.
Figure 21.22 Results of the INPUT Statement
Raw data file showing the values for Quarter and Sales highlighted. Program Data Vector showing the value for Sales highlighted.
The OUTPUT statement writes the values in the program data vector to the data set as the first observation.
Figure 21.23 Results of the OUTPUT Statement
Raw data file showing the first Quarter value highlighted. Program data vector with the values for ID, Quarter, and Sales. SAS data set with the values for ID, Quarter, and Sales.
The END statement indicates the bottom of the loop, but control returns to the DO statement, not to the top of the DATA step. Now the value of Quarter is incremented to 2.
Figure 21.24 Results of the END Statement
Raw data file showing the first Quarter value highlighted. Program data vector with the values for ID, Quarter, and Sales. SAS data set with the values for ID, Quarter, and Sales.
The INPUT statement executes again, reading the second repeating field and storing the value for Sales in the program data vector.
Figure 21.25 Results of the Second Reading of the INPUT Statement
Raw data file showing the second Quarter value highlighted. Program data vector with the values for ID, Quarter, and Sales. SAS data set with the values for ID, Quarter, and Sales.
The OUTPUT statement writes the values in the program data vector as the second observation.
Figure 21.26 Results of the Second Reading of the OUTPUT Statement
Raw data file showing the second Quarter value highlighted. Program data vector with the values for ID, Quarter, and Sales. SAS data set with the values for ID, Quarter, and Sales.
The loop continues executing while the value for Quarter is 3, and then 4. In the process, the third and fourth observations are written.
Figure 21.27 Writing the Third and Fourth Observations
Raw data file showing the fourth Quarter value highlighted. Program data vector with the values for ID, Quarter, and Sales. SAS data set with the values for ID, Quarter, and Sales.
After the fourth observation is written, Quarter is incremented to 5 at the bottom of the DO loop and control returns to the top of the loop. The loop does not execute again because the value of Quarter is now greater than 4.
Raw data file showing no highlighting. Program data vector with the values for ID, Quarter, and Sales. SAS data set with the values for ID, Quarter, and Sales.
Control returns to the top of the DATA step, and the input pointer moves to column 1 of the next record. The variable values in the program data vector are reset to missing.
Figure 21.28 Returning Control to the Top of the DATA Step
Raw data file showing the ID value highlighted in the second record. Program data vector with no values for ID, Quarter, and Sales. SAS data set with the values for ID, Quarter, and Sales of the first record.
You can display the data set with the PRINT procedure.
proc print data=perm.sales07; 
run;
Figure 21.29 Partial Output of PROC PRINT
Partial Output of PROC PRINT.
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.27.131