Detecting the End of a Data Set

Overview

Instead of reading specific observations, you might want to determine when the last observation in an input data set has been read, so that you can perform specific processing. For example, you might want to write only one observation that contains grand totals for numeric variables.
To create a temporary numeric variable whose value is used to detect the last observation, use the END= option in the SET statement.
Syntax, END= option:
END=variable
variable creates and names a temporary variable that contains an end-of-file marker. The variable, which is initialized to 0, is set to 1 when the SET statement reads the last observation of the data set.
This variable is not added to the data set.

Example: The END= Option

Suppose you want to sum the number of seconds for treadmill stress tests. If you submit the following program, you produce a new data set that contains cumulative totals for each of the values of TotalTime.
data work.addtoend(drop=timemin timesec); 
   set sasuser.stress2(keep=timemin timesec); 
   TotalMin+timemin; 
   TotalSec+timesec; 
   TotalTime=totalmin*60+totalsec; 
run; 
proc print data=work.addtoend noobs; 
run;
Figure 12.9 Data Set with Cumulative Totals for Each of the Values of TotalTime
Data Set with Cumulative Totals for Each of the Values of TotalTime
The following program uses the END= variable in the SET statement. The END= variable selects only the last observation of the data set.
data work.addtoend(drop=timemin timesec); 
   set sasuser.stress2(keep=timemin timesec) end=last; 
   TotalMin+timemin; 
   TotalSec+timesec; 
   TotalTime=totalmin*60+totalsec; 
   if last; 
run; 
proc print data=work.addtoend noobs; 
run;
Now the new data set has one observation:
Figure 12.10 Data Set with One Observation
Data Set with One Observation

Understanding How Data Sets Are Read

DATA step processing for reading existing SAS data sets is very similar to the compilation and execution phases of the DATA step. The main difference is that while reading an existing data set with the SET statement, SAS retains the values of existing variables from one observation to the next.
In the following example, the DATA step reads the data set Finance.Loans, creates the variable Interest, and creates the new data set Finance.DueJan.
data finance.duejan; 
   set finance.loans; 
   Interest=amount*(rate/12); 
run;
Figure 12.11 New Data Set Finance.DueJan
New Data Set Finance.DueJan

Compilation Phase

  1. The program data vector is created and contains the automatic variables _N_ and _ERROR_.
    program data vector with Automatic Variables
  2. SAS also scans each statement in the DATA step, looking for syntax errors.
  3. When the SET statement is compiled, a slot is added to the program data vector for each variable in the input data set. The input data set supplies the variable names, as well as attributes such as type and length.
    program data vector with Variable Names
  4. Any variables that are created in the DATA step are also added to the program data vector. The attributes of each of these variables are determined by the expression in the statement.
    program data vector with variables that are created in the DATA step
  5. At the bottom of the DATA step, the compilation phase is complete, and the descriptor portion of the new SAS data set is created. There are no observations because the DATA step has not yet executed.
When the compilation phase is complete, the execution phase begins.

Execution Phase

  1. The DATA step executes once for each observation in the input data set. For example, this DATA step executes four times because there are four observations in the input data set Finance.Loans.
  2. At the beginning of the execution phase, the value of _N_ is 1. Because there are no data errors, the value of _ERROR_ is 0. The remaining variables are initialized to missing. Missing numeric values are represented by a period, and missing character values are represented by a blank.
    execution phase diagram
  3. The SET statement reads the first observation from the input data set into the program data vector.
    SET Statement
  4. Then, the assignment statement executes to compute the value for Interest.
    assignment statement
  5. At the end of the first iteration of the DATA step, the values in the program data vector are written to the new data set as the first observation.
    values in the new data set as first observation
  6. The value of _N_ increments from 1 to 2, and control returns to the top of the DATA step. Recall that the automatic variable _N_ keeps track of how many times the DATA step has begun to execute.
    value of N increments
  7. SAS retains the values of variables that were read from a SAS data set with the SET statement, or that were created by a sum statement. All other variable values, such as the values of the variable Interest, are set to missing.
    SAS retains values of variables
    Note: When SAS reads raw data, the situation is different. In that case, SAS sets the value of each variable in the DATA step to missing at the beginning of each iteration, with these exceptions.
    • variables named in a RETAIN statement
    • variables created in a sum statement
    • data elements in a _TEMPORARY_ array
    • any variables created by using options in the FILE or INFILE statements
    • automatic variables
  8. At the beginning of the second iteration, the value of _ERROR_ is reset to 0.
    value of ERROR is reset to 0
  9. As the SET statement executes, the values from the second observation are read into the program data vector.
    values from the second observation are read into the PDV.
  10. The assignment statement executes again to compute the value for Interest for the second observation.
    compute the value for Interest for the second observation
  11. At the bottom of the DATA step, the values in the program data vector are written to the data set as the second observation.
    values in the program data vector are written to the data set as the second observation
  12. The value of _N_ increments from 2 to 3, and control returns to the top of the DATA step. SAS retains the values of variables that were read from a SAS data set with the SET statement, or that were created by a sum statement. All other variable values, such as the values of the variable Interest, are set to missing.
    The value of _N_ increments from 2 to 3.
    This process continues until all of the observations are read.
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.229.111