5. At the end of the statements, an output, return, and reset occur automatically. SAS
writes an observation to the SAS data set, the system automatically returns to the top
of the DATA step, and the values of variables created by INPUT and assignment
statements are reset to missing in the program data vector. Note that variables that
you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to
missing here.
6. SAS counts another iteration, reads the next record or observation, and executes the
subsequent programming statements for the current observation.
7. The DATA step terminates when SAS encounters the end-of-file in a SAS data set or
a raw data file.
Note: The figure shows the default processing of the DATA step. You can place data-
reading statements (such as INPUT or SET), or data-writing statements (such as
OUTPUT), in any order in your program.
Processing a DATA Step: A Walk-through
Sample DATA Step
The following statements provide an example of a DATA step that reads raw data,
calculates totals, and creates a data set:
data total_points (drop=TeamName); 1
input TeamName $ ParticipantName $ Event1 Event2 Event3; 2
TeamTotal + (Event1 + Event2 + Event3); 3
datalines;
Knights Sue 6 8 8
Kings Jane 9 7 8
Knights John 7 7 7
Knights Lisa 8 9 9
Knights Fran 7 6 6
Knights Walter 9 8 10
;
proc print data=total_points;
run;
1
The DROP= data set option prevents the variable TeamName from being written to
the output SAS data set called Total_Points.
2
The INPUT statement describes the data by giving a name to each variable,
identifying its data type (character or numeric), and identifying its relative location in
the data record.
3
The SUM statement accumulates the scores for three events in the variable
TeamTotal.
Creating the Input Buffer and the Program Data Vector
When DATA step statements are compiled, SAS determines whether to create an input
buffer. If the input file contains raw data (as in the example above), SAS creates an input
buffer to hold the data before moving the data to the program data vector (PDV). (If the
Processing a DATA Step: A Walk-through 405
input file is a SAS data set, however, SAS does not create an input buffer. SAS writes
the input data directly to the PDV.)
The PDV contains all the variables in the input data set, the variables created in DATA
step statements, and the two variables, _N_ and _ERROR_, that are automatically
generated for every DATA step. The _N_ variable represents the number of times the
DATA step has iterated. The _ERROR_ variable acts like a binary switch whose value is
0 if no errors exist in the DATA step, or 1 if one or more errors exist. The following
figure shows the Input Buffer and the program data vector after DATA step compilation.
Figure 18.2 Input Buffer and Program Data Vector
Input Buffer
1 2 3 4 5 6 7 8 9
1
0 1 2 3 4 5 6 7 8 9
2
0
Program Data Vector
TeamName ParticipantName Event1 TeamTotal _N_ _ERROR_
Drop
Event2 Event3
1 0
Drop Drop
1 2 3 4 5
0
Variables that are created by the INPUT and the Sum statements (TeamName,
ParticipantName, Event1, Event2, Event3, and TeamTotal) are set to missing initially.
Note that in this representation, numeric variables are initialized with a period and
character variables are initialized with blanks. The automatic variable _N_ is set to 1; the
automatic variable _ERROR_ is set to 0.
The variable TeamName is marked Drop in the PDV because of the DROP= data set
option in the DATA statement. Dropped variables are not written to the SAS data set.
The _N_ and _ERROR_ variables are dropped because automatic variables created by
the DATA step are not written to a SAS data set. See Chapter 4, “SAS Variables,” on
page 37 for details about automatic variables.
Reading a Record
SAS reads the first data line into the input buffer. The input pointer, which SAS uses to
keep its place as it reads data from the input buffer, is positioned at the beginning of the
buffer, ready to read the data record. The following figure shows the position of the input
pointer in the input buffer before SAS reads the data.
Figure 18.3 Position of the Pointer in the Input Buffer Before SAS Reads Data
Input Buffer
1 2 3 4 5 6 7 8 9
1
0 1 2 3 4 5 6 7 8 9
2
0 1 2 3 4 5
K n i g h t s S u e 6 8 8
The INPUT statement then reads data values from the record in the input buffer and
writes them to the PDV where they become variable values. The following figure shows
both the position of the pointer in the input buffer, and the values in the PDV after SAS
reads the first record.
406 Chapter 18 DATA Step Processing
Figure 18.4 Values from the First Record Are Read into the Program Data Vector
Program Data Vector
TeamName ParticipantName Event1 TeamTotal _N_ _ERROR_
Drop
Event2 Event3
1 0
Drop Drop
Knights Sue 6 8 8
Input Buffer
1 2 3 4 5 6 7 8 9
1
0 1 2 3 4 5 6 7 8 9
2
0
1 2 3 4 5
K n i g h t s S u e 6 8 8
0
After the INPUT statement reads a value for each variable, SAS executes the Sum
statement. SAS computes a value for the variable TeamTotal and writes it to the PDV.
The following figure shows the PDV with all of its values before SAS writes the
observation to the data set.
Figure 18.5 Program Data Vector with Computed Value of the Sum Statement
Program Data Vector
TeamName ParticipantName Event1 TeamTotal _N_ _ERROR_
Drop
Event2 Event3
1 0
Drop Drop
Knights Sue 6 8 8 22
Writing an Observation to the SAS Data Set
When SAS executes the last statement in the DATA step, all values in the PDV, except
those marked to be dropped, are written as a single observation to the data set
Total_Points. The following figure shows the first observation in the Total_Points data
set.
Figure 18.6 The First Observation in Data Set Total_Points
Output SAS Data Set TOTAL_POINTS: 1st observation
ParticipantName Event1 TeamTotalEvent2 Event3
Sue 6 8 8 22
SAS then returns to the DATA statement to begin the next iteration. SAS resets the
values in the PDV in the following way:
The values of variables created by the INPUT statement are set to missing.
The value created by the Sum statement is automatically retained.
The value of the automatic variable _N_ is incremented by 1, and the value of
_ERROR_ is reset to 0.
The following figure shows the current values in the PDV.
Processing a DATA Step: A Walk-through 407
Figure 18.7 Current Values in the Program Data Vector
222
Program Data Vector
TeamName ParticipantName Event1 TeamTotal _N_ _ERROR_
Drop
Event2 Event3
0
Drop
Drop
Reading the Next Record
SAS reads the next record into the input buffer. The INPUT statement reads the data
values from the input buffer and writes them to the PDV. The Sum statement adds the
values of Event1, Event2, and Event3 to TeamTotal. The value of 2 for variable _N_
indicates that SAS is beginning the second iteration of the DATA step. The following
figure shows the input buffer, the PDV for the second record, and the SAS data set with
the first two observations.
Figure 18.8 Input Buffer, Program Data Vector, and First Two Observations
Program Data Vector
TeamName ParticipantName Event1 TeamTotal _N_ _ERROR_
Drop
Event2 Event3
2 0
Drop
Drop
Cardinals Jane 9 7 8 46
ParticipantName Event1 TeamTotalEvent2 Event3
Sue 6 8 8 22
Jane 9 7 8 46
Output SAS Data Set TOTAL_POINTS: 1st and 2nd observations
Input Buffer
1 2 3 4 5 6 7 8 9
1
0 1 2 3 4 5 6 7 8 9
2
0
1 2 3 4 5
C a r d i n a J a n 9 7 8l s e
As SAS continues to read records, the value in TeamTotal grows larger as more
participant scores are added to the variable. _N_ is incremented at the beginning of each
iteration of the DATA step. This process continues until SAS reaches the end of the input
file.
When the DATA Step Finishes Executing
The DATA step stops executing after it processes the last input record. You can use
PROC PRINT to print the output in the Total_Points data set:
data total_points (drop=TeamName);
408 Chapter 18 DATA Step Processing
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.190.175