FIRST. and LAST. DATA Step Variables

How the DATA Step Identifies BY Groups

In the DATA step, SAS identifies the beginning and end of each BY group by creating the following two temporary variables:
  • FIRST.variable
  • LAST.variable
The temporary variables are available for DATA step programming, but they are not added to the output data set. Their values indicate whether an observation is one of the following positions:
  • the first one in a BY group
  • the last one in a BY group
  • neither the first nor the last one in a BY group
  • both first and last, as is the case when there is only one observation in a BY group

How SAS Determines FIRST.variable and LAST.variable

  • When an observation is the first in a BY group, SAS sets the value of the FIRST.variable to 1. This happens when the value of the variable changed from the previous observation.
  • For all other observations in the BY group, the value of FIRST.variable is 0.
  • When an observation is the last in a BY group, SAS sets the value of LAST.variable to 1. This happens when the value of the variable changes in the next observation.
  • For all other observations in the BY group, the value of LAST.variable is 0.
  • For the last observation in a data set, the value of all LAST.variable variables are set to 1.

Example: Grouping Observations Using One BY Variable

In this example, the Cert.Usa data set contains payroll information for individual employees. Suppose you want to compute the annual payroll by department. Assume 2,000 work hours per year for hourly employees.
Before computing the annual payroll, you need to group observations by the values of the variable Dept.
Output 8.2 Sample Data Set: Cert.Usa
Sample Data Set: Cert.Usa
The following program computes the annual payroll by department. Notice that the variable name Dept has been appended to FIRST. and LAST.
proc sort data=cert.usa out=work.temp;                /*#1*/
  by dept;
run;
data work.budget(keep=dept payroll);                  /*#2*/
  set work.temp;
  by dept;                                            /*#3*/
  if wagecat='S' then Yearly=wagerate*12;             /*#4*/
    else if wagecat='H' then Yearly=wagerate*2000;
  if first.dept then Payroll=0;                       /*#5*/
  payroll+yearly;                                     /*#6*/
  if last.dept;                                       /*#7*/
run;
1 The SORT procedure sorts the data in Cert.Usa by the variable Dept. The results of the SORT procedure are stored in Work.Temp.
2 The KEEP= data set option keeps the variables Dept and Payroll in the output data set, Work.Budget.
3 The BY statement in a DATA step applies only to the SET statement. The data set Work.Temp must be sorted by the Dept variable for the BY statement to set up grouping variables. By specifying Dept as the variable, you can identify the first and last observations for each Dept group. The Dept groups are ADM10, ADM20, ADM30, CAM10, and CAM20.
4 The IF statement executes the statements conditionally. If the value for WageCat is S, then the variable Yearly contains the value of WageRate multiplied by 12. If the value of WageCat is H, then the variable Yearly contains the value of WageRate multiplied by 2000.
5 If the observation is the first observation for the variable Dept, initialize Payroll to 0.
Note: FIRST.Dept variable is not written to the data set and does not appear in the output.
6 Add the value of Yearly to the value of Payroll.
7 If this observation is the last in the variable, Dept, then end. If not, then read the next observation.
Note: LAST.Dept variable is not written to the data set and does not appear in the output.
The following figure illustrates how SAS processes FIRST.Dept and LAST.Dept. Notice that the values of FIRST.Dept and LAST.Dept change as the value for Dept changes.
Figure 8.1 BY Group for Dept
BY Group for Dept
When you print the new data set, you can now list and sum the annual payroll by department.
proc print data=work.budget noobs; 
  sum payroll; 
  format payroll dollar12.2; 
run;
Output 8.3 PROC PRINT Output of Work.Budget: Sum of Payroll
PROC PRINT Output of Work.Budget: Sum of Payroll

Example: Grouping Observations Using Multiple BY Variables

Suppose you now want to compute the annual payroll by job type for each manager. In the following example, you specify two BY variables, Manager and JobType, creating two groups. The Manager group contains three subgroups: Coxe, Delgado, and Overby. The JobType subgroup contains nine subgroups: 1, 3, 5, 10, 20, 50, 240, 420, and 440. Within these subgroups, you can identify the first and last observations for each of these subgroups.
proc sort data=cert.usa out=work.temp2;               /*#1*/
  by manager jobtype;
run;
data work.budget2 (keep=manager jobtype payroll);     /*#2*/
  set work.temp2;
  by manager jobtype;                                 /*#3*/
  if wagecat='S' then Yearly=wagerate*12;             /*#4*/
    else if wagecat='H' then Yearly=wagerate*2000;
  if first.jobtype then Payroll=0;                    /*#5*/
  payroll+yearly;                                     /*#6*/
  if last.jobtype;                                    /*#7*/
run;
1 The SORT procedure sorts the data in Cert.Usa by the variables Manager and JobType. The results of the SORT procedure are stored in Work.Temp2.
2 The KEEP= data set option specifies the variables Manager, JobType, and Payroll and writes the variables to the new data set, Work.Budget.
3 The BY statement in a DATA step applies only to the SET statement. The data set Work.Temp2 must be sorted by the Manager and JobType variables in order for the BY statement to set up grouping variables. The data set is sorted by the variable Manager first and then by JobType.
4 The IF statement executes the statements conditionally. If the value for WageCat is S, then the variable Yearly contains the value of WageRate multiplied by 12. If the value of WageCat is H, then the variable Yearly contains the value of WageRate multiplied by 2000.
5 If the observation is the first for JobType, then initialize Payroll to 0.
6 Add the value of Yearly to the value of Payroll.
7 If this observation is the last in the variable, JobType, then end. If not, then read the next observation.
The following figure illustrates how SAS processes FIRST.Manager, FIRST.JobType, LAST.Manager, and LAST.JobType. Notice how the values of FIRST.Manager and LAST.Manager change only when the Manager value changes. However, the values for FIRST.JobType and LAST.JobType values change multiple times even when the Manager value remains the same.
Figure 8.2 Multiple BY Group Variables: Manager and JobType
Multiple BY Group Variables: Manager and JobType
You can generate a sum for the annual payroll by job type for each manager. The example below shows the payroll sum for only two managers, Coxe and Delgado.
proc print data=work.budget2 noobs;
  by manager;
  var jobtype;
  sum payroll;
  where manager in ('Coxe', 'Delgado');
  format payroll dollar12.2;
run;
Figure 8.3 Payroll Sum by Job Type and Manager
Payroll Sum by Job Type and Manager
Last updated: August 23, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.186.153