Creating One Observation per Header Record

The Basics of Creating One Observation per Header Record

This example uses the data set below to create a summary variable, Total.
Raw Data File
SAS Data Set
As you write the DATA step to read this file, think about performing several tasks. First, the value of Address must be retained as detail records are read and summarized.
data perm.residents; 
   infile census; 
   retain Address;
Using a RETAIN Statement
Next, the value of Type must be read in order to determine whether the current record is a header record or a detail record. You can add an @ to hold the record so that another INPUT statement can read the remaining values.
data perm.residents; 
   infile census; 
   retain Address; 
   input type $1. @;
Holding a Record
When the value of Type indicates a header record, several statements need to be executed. When the value of Type indicates a detail record, you need to define an alternative set of actions.

DO Group Actions for Header Records

To execute multiple SAS statements based on the value of a variable, you can use a simple DO group with an IF-THEN statement. When the condition Type='H' is true, several statements need to be executed.
data perm.residents; 
   infile census; 
   retain Address; 
   input type $1. @; 
   if type='H' then do;
  • First, determine whether this is the first header record in the external file. You do not want the first header record to be written as an observation until the related detail records have been read and summarized.
    _N_ is an automatic variable whose value is the number of times the DATA step has begun to execute. The expression _n_ > 1 defines a condition where the DATA step has executed more than once. Use this expression in conjunction with the previous IF-THEN statement to check for these two conditions:
    1. The current record is a header record.
    2. The DATA step has executed more than once.
      data perm.residents; 
         infile census; 
         retain Address; 
         input type $1. @; 
         if type='H' then do; 
            if _n_ > 1
  • When the conditions Type='H' and _n_ > 1 are true, an OUTPUT statement is executed. Thus, each header record except for the first one causes an observation to be written to the data set.
    data perm.residents;  
       infile census; 
       retain Address; 
       input type $1. @; 
       if type='H' then do; 
          if _n_ > 1 then output;
  • An assignment statement creates the summary variable Total and sets its value to 0.
    data perm.residents; 
       infile census;  
       retain Address;  
       input type $1. @; 
       if type='H' then do; 
          if _n_ > 1 then output; 
          Total=0;
  • An INPUT statement reads the values for Address.
    data perm.residents; 
       infile census; 
       retain Address; 
       input type $1. @; 
       if type='H' then do; 
          if _n_ > 1 then output; 
          Total=0; 
          input address $ 3-17;
  • An END statement closes the DO group.
     data perm.residents; 
       infile census; 
       retain Address; 
       input type $1. @; 
       if type='H' then do; 
          if _n_ > 1 then output; 
          Total=0; 
          input address $ 3-17; 
       end;

Reading Detail Records

When the value of type is not H, you need to define an alternative action. You can do this by adding an ELSE statement after the DO group.
Remember that the IF-THEN statement executes a SAS statement when the condition that is specified in the IF clause is true. By adding an ELSE statement after the IF-THEN statement, you define an alternative action to be performed when the IF condition is false.
data perm.residents; 
   infile census;  
   retain Address; 
   input type $1. @; 
   if type='H' then do; 
      if _n_ > 1 then output; 
      Total=0; 
      input address $ 3-17; 
   end; 
   else
The only other type of record is a detail record, represented by a P. You want to count each person who is represented by a detail record and store the accumulated value in the summary variable Total. You do not need to read the values for Name, Age, and Gender.
data perm.residents; 
   infile census;  
   retain Address; 
   input type $1. @; 
   if type='H' then do; 
      if _n_ > 1 then output; 
      Total=0; 
      input address $ 3-17; 
   end; 
   else if type='P' then
Reading Detail Records
At this point, the value of Total has been initialized each time a header record is read and set to 0. Now, as each detail record is read, you can increment the value of Total by using a sum statement. In this example, you are counting the number of detail records for each header record. Therefore, you increment the value of Total by 1 when the value of type is P.
data perm.residents;   
   infile census;  
   retain Address; 
   input type $1. @; 
   if type='H' then do; 
      if _n_ > 1 then output; 
      Total=0; 
      input address $ 3-17; 
   end; 
   else if type='P' then total+1;
Note: A sum statement enables you to add any valid SAS expression to an accumulator variable.
else if type='B' then total+cost;
The value generated by a sum statement is automatically retained throughout the DATA step. That is why it is important to set the value of Total to 0 each time a header record is read.

Determining the End of the External File

The program writes an observation to the data set only when another header record is read and the DATA step has executed more than once. But after the last detail record is read, there are no more header records to cause the last observation to be written to the data set.
data perm.residents; 
   infile census; 
   retain Address; 
   input type $1. @; 
   if type='H' then do; 
      if _n_ > 1 then  
         output; 
      Total=0; 
      input address $ 3-17; 
   end; 
   else if type='P' 
        then total+1;
Determining the End of the External File

The INFILE Statement

It is necessary to determine when the last record in the file is read so that you can then execute another explicit OUTPUT statement. You can determine when the current record is the last record in an external file by specifying the END= option in the INFILE statement.
Syntax, INFILE statement with the END= option:
INFILE file-specification END=variable;
variable is a temporary numeric variable whose value is 0 until the last line is read and 1 after the last line is read.
Note: Like automatic variables, the END= variable is not written to the data set.
In the following example, the END= variable is defined in the INFILE statement as Last. When Last has a value other than 0, the OUTPUT statement writes the final observation to the data set.
data perm.residents; 
   infile census end=last; 
   retain Address; 
   input type $1. @; 
   if type='H' then do; 
    if _n_ > 1 then output; 
       Total=0; 
      input address $ 3-17; 
   end; 
   else if type='P' then total+1; 
   if last then output;
A DROP= option in the DATA statement drops the variable type from the data set, and a RUN statement completes the DATA step.
data perm.residents (drop=type); 
   infile census end=last; 
   retain Address; 
   input type $1. @; 
   if type='H' then do; 
      if _n_ > 1 then output; 
         Total=0; 
      input address $ 3-17; 
   end; 
   else if type='P' then total+1; 
   if last then output; 
run;
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.158.32