Reading a Varying Number of Repeating Fields

The Basics of Reading a Varying Number of Repeating Fields

Suppose that some of the employees quit at various times. Their records might not contain sales totals for the second, third, or fourth quarters. These records contain a variable number of repeating fields.
Figure 21.30 Raw Data File Data97 Showing Empty Records
Raw Data File Data07 Showing Empty Records.
The DATA step that you just wrote does not work with a variable number of repeating fields because now the value of Quarter is not constant for every record.
data perm.sales07;  
   infile data07;  
   input ID $ @;   
   do Quarter=1 to 4; 
      input Sales : comma. @; 
      output;  
   end; 
run;

Using the MISSOVER Option

You can adapt the DATA step to accommodate a varying number of Sales values.
Like the previous example with the same number of repeating fields, your DATA step must read the same record repeatedly. However, you need to prevent the input pointer from moving to the next record when there are missing Sales values.
You can use the MISSOVER option in an INFILE statement to prevent SAS from reading the next record when missing values are encountered at the end of a record. Essentially, records that have a varying number of repeating fields are records that contain missing values. Therefore, you need to specify the MISSOVER option here as well.
Because there is at least one value for the repeating field, Sales, in each record, the first INPUT statement reads both the value for ID and the first Sales value for each record. The trailing @ holds the record so that any subsequent repeating fields can be read.
data perm.sales07; 
   infile data07 missover; 
   input ID $ Sales : comma. @;
Figure 21.31 Holding a Record
Raw Data File Data97 showing the ID and first sales record highlighted.
Note: SAS provides several options to control reading past the end of a line. Recall that the MISSOVER option can be used to set the remaining INPUT statement variables to missing values if the pointer reaches the end of a record. You can also use other options such as the TRUNCOVER option, which reads column or formatted input when the last variable that is read by the INPUT statement contains varying-length data. The TRUNCOVER option assigns the contents of the input buffer to a variable when the field is shorter than expected.
Other related options include FLOWOVER (the default), STOPOVER, and SCANOVER.

Executing SAS Statements While a Condition Is True

To execute SAS statements while a condition is true, use a DO WHILE statement instead of the iterative DO statement, enclosing the expression in parentheses. In the example below, the DO WHILE statement executes while the value of Sales is not equal to a missing value (which is represented by a period).
data perm.sales07; 
   infile data07 missover; 
   input ID $ Sales : comma. @; 
   do while (sales ne .);

Creating a Counter Variable

The DO WHILE statement does not create an index variable, but you can create your own “counter” variable. You can use a sum statement to increment the value of the counter variable each time the DO WHILE loop executes.
In the example below, the assignment statement that precedes the loop creates the counter variable Quarter and assigns it an initial value of zero. Each time the DO WHILE loop executes, the sum statement increments the value of Quarter by one.
data perm.sales08; 
   infile data08 missover; 
   input ID $ Sales : comma. @; 
   Quarter=0; 
   do while (sales ne .); 
      quarter+1;

Completing the DO WHILE Loop

You need an OUTPUT statement to write the current observation to the data set. Then, another INPUT statement reads the next value for Sales and holds the record. You complete the DO WHILE loop with an END statement.
data perm.sales08; 
   infile data08 missover; 
   input ID $ Sales : comma. @; 
   Quarter=0;  
   do while (sales ne .); 
      quarter+1; 
      output; 
      input sales : comma. @; 
   end; 
run;

Processing a DATA Step That Has a Varying Number of Repeating Fields

This example uses the following DATA step:
data perm.sales08; 
   infile data08 missover; 
   input ID $ Sales : comma. @; 
   Quarter=0;  
   do while (sales ne .); 
      quarter+1; 
      output; 
      input sales : comma. @; 
   end; 
run;
During the first iteration of the DATA step, values for ID and Sales are read. Quarter is initialized to 0.
Figure 21.32 Initializing the Value of Quarter to 0
Raw Data File Showing the values of ID and Sales highlighted. Program data vector showing the values of ID and Sales, and the value of Quarter initialized to 0.
The DO WHILE statement checks to see whether Sales has a value, which it does, so the other statements in the DO loop execute. The Value of Quarter is incremented by 1 and the current observation is written to the data set.
Figure 21.33 Executing the DO Loop
Raw Data File Showing the values of ID and Sales highlighted. Program data vector showing the values of ID and Sales, and the value of Quarter incremented to 1. SAS data set showing the values of ID and Sales, and the value of Quarter incremented to 1.
The INPUT statement reads the next value for Sales, the end of the loop is reached, and control returns to the DO WHILE statement.
Figure 21.34 Returning Control to the DO WHILE Statement
Raw Data File Showing the value of second Quarter Sales highlighted. Program data vector showing the values of ID and Sales, and the value of Quarter incremented to 1. SAS data set showing the values of ID and Sales, and the value of Quarter incremented to 1.
The condition is checked and Sales still has a value, so the loop executes again.
Figure 21.35 Executing the Loop Again
Raw Data File Showing the value of second Quarter Sales highlighted. Program data vector showing the values of ID and Sales, and the value of Quarter incremented to 1. SAS data set showing the values of ID and Sales, and the value of Quarter incremented to 1.
Quarter is incremented to 2, and the values in the program data vector are written as the second observation.
Raw Data File showing no highlighted values. Program data vector showing the values of ID and Sales, and the value of Quarter incremented to 2. SAS data set showing the values of ID and Sales, and the value of Quarter incremented to 2.
The INPUT statement executes again. The MISSOVER option prevents the input pointer from moving to the next record in search of another value for Sales. Therefore, Sales receives a missing value.
Raw Data File showing no highlighted values. Program data vector showing the value of ID, a blank value for Sales, and the value of Quarter at 2. SAS data set showing first and second observations.
The end of the loop is reached, and control returns to the DO WHILE statement. Because the condition is now false, the statements in the loop are not executed and the values in the PDV are not written to output.
Figure 21.36 Returning Control to the DO WHILE Statement
Raw Data File showing no highlighted values. Program data vector showing the value of ID, a blank value for Sales, and the value of Quarter at 2. SAS data set showing first and second observations.
Instead, control returns to the top of the DATA step, the values in the program data vector are reset to missing, and the INPUT statement reads the next record. The DATA step continues executing until the end of the file.
Figure 21.37 Returning Control to the Top of the DATA Step
Raw Data File showing the ID of the second record highlighted. Program data vector showing the second record with the values of ID, Sales, and Quarter blank. SAS data set showing first and second observations.
PROC PRINT output for the data set shows a varying number of observations for each employee.
proc print data=perm.sales08; 
run;
Figure 21.38 PROC PRINT Output
PROC PRINT Output
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.189.23