Using BY-Group Processing

Finding the First and Last Observations in a Group

Creating List Reports explained how to use a BY statement in PROC SORT to sort observations and in PROC PRINT to group observations for subtotals. You can also use the BY statement in the DATA step to group observations for processing.
data temp; 
   set salary; 
   by dept; 
run;
When you use the BY statement with the SET statement, here are requirements and results:
  • The data sets that are listed in the SET statement must either be sorted by the values of the BY variables, or they must have an appropriate index.
  • The DATA step creates two temporary variables for each BY variable. One is named FIRST.variable, where variable is the name of the BY variable, and the other is named LAST.variable. Their values are either 1 or 0. The FIRST.variable and LAST.variable identify the first and last observations, respectively, in each BY group.
Table 12.2 Finding the First and Last Observations in a Group
Variable
Equivalent
FIRST.variable
1 for the first observation in a BY group
0 for any other observation in a BY group
LAST.variable
1 for the last observation in a BY group
0 for any other observation in a BY group

Example: Finding the First and Last Observations in a Group

The Sasuser.USA data set contains payroll information for individual employees. Suppose you want to compute the annual payroll by department. Assume 2,000 work hours per year for hourly employees.
Before computing the annual payroll, you need to group observations by values of the variable Dept.
Figure 12.3 Sample Data Set (partial output)
Partial Listing of Data Set
The following program computes the annual payroll by department. Notice that the variable name Dept has been appended to FIRST. and LAST.
proc sort data=sasuser.usa out=work.temp; 
   by dept; 
run; 
data finance.budget(keep=dept payroll); 
   set work.temp; 
   by dept; 
   if wagecat='S' then Yearly=wagerate*12; 
   else if wagecat='H' then Yearly=wagerate*2000; 
   if first.dept then Payroll=0; 
   payroll+yearly;  
   if last.dept; 
run;
If you could look behind the scenes at the program data vector (PDV) as the Finance.Budget data set is created, you would see the following. Notice the values for FIRST.Dept and LAST.Dept.
Figure 12.4 Program Data Vector
Program Data Vector
When you print the new data set, you can now list and sum the annual payroll by department.
proc print data=finance.budget noobs; 
   sum payroll; 
   format payroll dollar12.2; 
run;
Figure 12.5 Payroll Sum
Payroll Sum

Finding the First and Last Observations in Subgroups

When you specify multiple BY variables, the following statements are true:
  • The FIRST.variable for each variable is set to 1 at the first occurrence of a new value for the primary variable.
  • A change in the value of a primary BY variable forces LAST.variable to equal 1 for the secondary BY variables.

Example: Finding the First and Last Observations in Subgroups

Suppose you now want to compute the annual payroll by job type for each manager. In your program, you specify two BY variables, Manager and JobType.
proc sort data=sasuser.usa out=work.temp2; 
   by manager job_type; 
data finance.budget2(keep=manager job_type payroll); 
   set work.temp2; 
   by manager job_type; 
   if wagecat='S' then Yearly=wagerate*12; 
   else if wagecat='H' then Yearly=wagerate*2000; 
   if first.job_type then Payroll=0; 
   payroll+yearly; 
   if last.job_type; 
run;
If you could look at the PDV now, you would see the following. Notice that the values for FIRST.JobType and LAST.JobType change according to values of FIRST.Manager and LAST.Manager.
Figure 12.6 Sample Program Data Vector
PDV
Now you can sum the annual payroll by job type for each manager. Here, the payroll for only two managers (Coxe and Delgado) is listed.
proc print data=finance.budget2 noobs; 
   by manager; 
   var job_type; 
   sum payroll; 
   where manager in ('Coxe','Delgado'); 
   format payroll dollar12.2; 
run;
Figure 12.7 Payroll Sum by Job Type and Manager
Payroll Sum by Job Type and Manager
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.77.208