input x $ y $ 9-17 z $ 19-26;
datalines;
apple banana coconut
apple banana coconut
apple blueberry citron
apricot blueberry citron
;
data _null_;
set testfile;
by x y z;
if _N_=1 then put 'Grouped by X Y Z';
put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
run;
data _null_;
set testfile;
by y x z;
if _N_=1 then put 'Grouped by Y X Z';
put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
run;
Log 20.1 Partial SAS Log Showing the Results of Processing with BY Variables
Grouped by X Y Z
_N_=1 x=Apple FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0
_N_=2 x=Apple FIRST.x=0 LAST.x=0 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1
_N_=3 x=Apple FIRST.x=0 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1
_N_=4 x=Apricot FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1
Grouped by Y X Z
_N_=1 x=Apple FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0
_N_=2 x=Apple FIRST.x=0 LAST.x=1 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1
_N_=3 x=Apple FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=1
_N_=4 x=Apricot FIRST.x=1 LAST.x=1 FIRST.y=0 LAST.y=1 FIRST.z=1
LAST.z=1
Processing BY-Groups in the DATA Step
Overview
The most common use of BY-group processing in the DATA step is to use SET,
MERGE, MODIFY, or UPDATE with the BY statement to combine two or more SAS
data sets.. (If you use a SET, MERGE, or UPDATE statement with the BY statement,
your observations must be grouped or ordered.) When processing these statements, SAS
reads one observation at a time into the program data vector. With BY-group processing,
SAS selects the observations from the data sets according to the values of the BY
variable or variables. After processing all the observations from one BY group, SAS
expects the next observation to be from the next BY group.
The BY statement modifies the action of the SET, MERGE, MODIFY, or UPDATE
statement by controlling when the values in the program data vector are set to missing.
During BY-group processing, SAS retains the values of variables until it has copied the
last observation that it finds for that BY group in any of the data sets. Without the BY
458 Chapter 20 BY-Group Processing in the DATA Step
statement, the SET statement sets variables to missing when it reads the last observation
from any data set, and the MERGE statement does not set variables to missing after the
DATA step starts reading observations into the program data vector.
Processing BY-Groups Conditionally
You can process observations conditionally by using the subsetting IF or IF-THEN
statements, or the SELECT statement, with the temporary variables FIRST.variable and
LAST.variable (set up during BY-group processing). For example, you can use the IF or
IF THEN statements to perform calculations for each BY group and to write an
observation when the first or the last observation of a BY group has been read into the
program data vector.
The following example computes annual payroll by department. It uses IF-THEN
statements and the values of FIRST.variable and LAST.variable automatic variables to
reset the value of PAYROLL to 0 at the beginning of each BY group and to write an
observation after the last observation in a BY group is processed.
title;
options linesize=80 pagesize=60;
data salaries;
input Department $ Name $ WageCategory $ WageRate;
datalines;
BAD Carol Salaried 20000
BAD Elizabeth Salaried 5000
BAD Linda Salaried 7000
BAD Thomas Salaried 9000
BAD Lynne Hourly 230
DDG Jason Hourly 200
DDG Paul Salaried 4000
PPD Kevin Salaried 5500
PPD Amber Hourly 150
PPD Tina Salaried 13000
STD Helen Hourly 200
STD Jim Salaried 8000
;
proc print data=salaries;
run;
proc sort data=salaries out=temp;
by Department;
run;
data budget (keep=Department Payroll);
set temp;
by Department;
if WageCategory='Salaried' then YearlyWage=WageRate*12;
else if WageCategory='Hourly' then YearlyWage=WageRate*2000;
/* SAS sets FIRST.variable to 1 if this is a new */
/* department in the BY group. */
if first.Department then Payroll=0;
Payroll+YearlyWage;
/* SAS sets LAST.variable to 1 if this is the last */
/* department in the current BY group. */
Processing BY-Groups in the DATA Step 459
if last.Department;
run;
proc print data=budget;
format Payroll dollar10.;
title 'Annual Payroll by Department';
run;
Output 20.1 Output from Conditional BY-Group Processing
Data Not in Alphabetic or Numeric Order
In BY-group processing, you can use data that is arranged in an order other than
alphabetic or numeric, such as by calendar month or by category. To do this, use the
NOTSORTED option in a BY statement when you use a SET statement. The
NOTSORTED option in the BY statement tells SAS that the data is not in alphabetic or
numeric order, but that it is arranged in groups by the values of the BY variable. You
cannot use the NOTSORTED option with the MERGE statement, the UPDATE
statement, or when the SET statement lists more than one data set.
This example assumes that the data is grouped by the character variable MONTH. The
subsetting IF statement conditionally writes an observation, based on the value of
LAST.month. This DATA step writes an observation only after processing the last
observation in each BY group.
data sales;
input month
data total_sale(drop=sales);
set region.sales
by month notsorted;
total+sales;
if last.month;
run;
Data Grouped by Formatted Values
Use the GROUPFORMAT option in the BY statement to ensure that
formatted values are used to group observations when a FORMAT statement and a
BY statement are used together in a DATA step
460 Chapter 20 BY-Group Processing in the DATA Step
the FIRST.variable and LAST.variable are assigned by the formatted values of the
variable
The GROUPFORMAT option is valid only in the DATA step that creates the SAS data
set. It is particularly useful with user-defined formats. The following examples illustrate
the use of the GROUPFORMAT option.
Example 1: Using GROUPFORMAT with Formats
proc format;
value range
low -55 = 'Under 55'
55-60 = '55 to 60'
60-65 = '60 to 65'
65-70 = '65 to 70'
other = 'Over 70';
run;
proc sort data=class out=sorted_class;
by height;
run;
data _null_;
format height range.;
set sorted_class;
by height groupformat;
if first.height then
put 'Shortest in ' height 'measures ' height:best12.;
run;
SAS writes the following output to the log:
Log 20.2 SAS Log Output Using the BY Statement GROUPFORMAT Option
Shortest
in Under 55 measures 51.3
Shortest in 55 to 60 measures 56.3
Shortest in 60 to 65 measures 62.5
Shortest in 65 to 70 measures 65.3
Shortest in Over 70 measures 72
Example 2: Using GROUPFORMAT with Formats
options
linesize=80 pagesize=60;
/* Create SAS data set test */
data test;
infile datalines;
input name $ Score;
datalines;
Jon 1
Anthony 3
Miguel 3
Joseph 4
Processing BY-Groups in the DATA Step 461
Ian 5
Jan 6
;
/* Create a user-defined format */
proc format;
value Range 1-2='Low'
3-4='Medium'
5-6='High';
run;
/* Create the SAS data set newtest */
data newtest;
set test;
by groupformat Score;
format Score Range.;
run;
/* Print using formatted values */
proc print data=newtest;
title 'Score Categories';
var Name Score;
by Score;
run;
462 Chapter 20 BY-Group Processing in the DATA Step
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.103.210