statement. The group of language statements contains other programming statements that
manipulate existing SAS data sets or create SAS data sets from raw data files.
You can use the DATA step for the following tasks:
creating SAS data sets (SAS data files or SAS views)
creating SAS data sets from input files that contain raw data (external files)
creating new SAS data sets from existing ones by subsetting, merging, modifying,
and updating existing SAS data sets
analyzing, manipulating, or presenting your data
computing the values for new variables
report writing, or writing files to disk or tape
retrieving information
file management
Note: A DATA step creates a SAS data set. This data set can be a SAS data file or a
SAS view. A SAS data file stores data values while a SAS view stores instructions
for retrieving and processing data. When you can use a SAS view as a SAS data file,
as is true in most cases, this documentation uses the broader term SAS data set.
Overview of DATA Step Processing
Flow of Action
When you submit a DATA step for execution, it is first compiled and then executed. The
following figure shows the flow of action for a typical SAS DATA step.
402 Chapter 18 DATA Step Processing
Figure 18.1 Flow of Action in the DATA Step
data-reading
statement:
is there a
record to read?
reads
an input record
executes
additional
executable statements
writes
an observation to
the SAS data set
returns
to the beginning of
the DATA step
compiles
SAS statements
(includes syntax checking)
creates
an input buffer
a program data vector
descriptor information
begins
with a DATA statement
(counts iterations)
sets
variable values
to missing in the
program data vector
closes
data set;
goes on to the next
DATA or PROC step
NO
YES
Compile Phase
Execution Phase
Overview of DATA Step Processing 403
The Compilation Phase
When you submit a DATA step for execution, SAS checks the syntax of the SAS
statements and compiles them, that is, automatically translates the statements into
machine code. In this phase, SAS identifies the type and length of each new variable,
and determines whether a variable type conversion is necessary for each subsequent
reference to a variable. During the compilation phase, SAS creates the following three
items:
input buffer
is a logical area in memory into which SAS reads each record of raw data when SAS
executes an INPUT statement. Note that this buffer is created only when the DATA
step reads raw data. (When the DATA step reads a SAS data set, SAS reads the data
directly into the program data vector.)
program data vector (PDV)
is a logical area in memory where SAS builds a data set, one observation at a time.
When a program executes, SAS reads data values from the input buffer or creates
them by executing SAS language statements. The data values are assigned to the
appropriate variables in the program data vector. From here, SAS writes the values to
a SAS data set as a single observation.
Along with data set variables and computed variables, the PDV contains two
automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of
times the DATA step begins to iterate. The _ERROR_ variable signals the
occurrence of an error caused by the data during execution. The value of _ERROR_
is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have
occurred). SAS does not write these variables to the output data set.
descriptor information
is information that SAS creates and maintains about each SAS data set, including
data set attributes and variable attributes. For example, it contains the name of the
data set, its member type, the date and time that the data set was created, and the
number, names, and data types (character or numeric) of the variables. The descriptor
information also contains information about extended attributes (if defined on a data
set). Extended attribute descriptor information includes the name of the attribute, the
name of the variable, and the value of the attribute.
The Execution Phase
By default, a simple DATA step iterates once for each observation that is being created.
The flow of action in the Execution Phase of a simple DATA step is described as
follows:
1. The DATA step begins with a DATA statement. Each time the DATA statement
executes, a new iteration of the DATA step begins, and the _N_ automatic variable is
incremented by 1.
2. SAS sets the newly created program variables to missing in the program data vector
(PDV).
3. SAS reads a data record from a raw data file into the input buffer, or it reads an
observation from a SAS data set directly into the program data vector. You can use
an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record.
4. SAS executes any subsequent programming statements for the current record.
404 Chapter 18 DATA Step Processing
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.74.211