Indexing for BY-Group Processing
You can also ensure that observations are processed in ascending numeric or character
order by creating an index based on one or more variables in the SAS data set. If you
specify a BY statement in a DATA step, SAS looks for an appropriate index. If it finds
the index, SAS automatically retrieves the observations from the data set in indexed
order.
Note: Because creating and maintaining indexes require additional resources, you
should determine whether using them significantly improves performance.
Depending on the nature of the data in your SAS data set, using PROC SORT to
order data values can be more advantageous than indexing. For an overview of
indexes, see “Understanding SAS Indexes” on page 638.
How the DATA Step Identifies BY Groups
Processing Observations in a BY Group
In the DATA step, SAS identifies the beginning and end of each BY group by creating
two temporary variables for each BY variable: FIRST.variable and LAST.variable.
These temporary variables are available for DATA step programming but are not added
to the output data set. Their values indicate whether an observation is one of the
following positions:
the first one in a BY group
the last one in a BY group
neither the first nor the last one in a BY group
both first and last, as is the case when there is only one observation in a BY group
You can take actions conditionally, based on whether you are processing the first or the
last observation in a BY group.
Using a Name Literal as the BY-Group Variable
When you designate a name literal as the BY variable in BY-group processing and you
want to refer to the corresponding FIRST. or LAST. temporary variables, you must
include the FIRST. or LAST. portion of the two-level variable name within quotation
marks. Here is an example:
data sedanTypes;
set cars;
by 'Sedan Types'n;
if 'first.Sedan Types'n then type=1;
run;
For more information about BY-Group Processing and how SAS creates the temporary
variables, FIRST and LAST, see “How SAS Determines FIRST.variable and
LAST.variable” on page 455 and “How SAS Identifies the Beginning and End of a BY
Group” in SAS Statements: Reference.
454 Chapter 20 BY-Group Processing in the DATA Step
How SAS Determines FIRST.variable and LAST.variable
When an observation is the first in a BY group, SAS sets the value of FIRST.variable to
1 for the variable whose value changed, as well as for all of the variables that follow in
the BY statement. For all other observations in the BY group, the value of
FIRST.variable is 0. Likewise, if the observation is the last in a BY group, SAS sets the
value of LAST.variable to 1 for the variable whose value changes on the next
observation, as well as for all of the variables that follow in the BY statement. For all
other observations in the BY group, the value of LAST.variable is 0. For the last
observation in a data set, the value of all LAST.variable variables are set to 1.
Note: See “SAS Name Literals” on page 31 for more information about SAS name
literals.
Example 1: Grouping Observations by State, City, ZIP code, and
Street
This example shows how SAS uses the FIRST.variable and LAST.variable to flag the
beginning and end of four BY groups: State, City, ZipCode, and Street. Six temporary
variables are created within the program data vector. These variables can be used during
the DATA step, but they do not become variables in the new data set.
In the figure that follows, observations in the SAS data set are arranged in an order that
can be used with this BY statement:
by State City ZipCode;
SAS creates the following temporary variables: FIRST.State, LAST.State, FIRST.City,
LAST.City, FIRST.ZipCode, and LAST.ZipCode.
options pageno=1 nodate linesize=80 pagesize=60;
data testfile;
input State $ ZipCode $ City $ Street $ 19-33;
datalines;
AZ 85730 Tucson Gleeson Place
FL 33133 Miami Rice Street
FL 33133 Miami Thomas Avenue
FL 33133 Miami Surrey Drive
FL 33146 Miami Nervia Street
FL 33146 Miami Corsica Street
OH 45056 Miami Myrtle Street
;
data test2;
set testfile;
by State City ZipCode;
put _N_= state= first.state= last.state= first.city= last.city=
first.ZipCode= last.ZipCode= ;
run;
NOTE: PROCEDURE PRINTTO used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
79 options pageno=1 nodate linesize=80 pagesize=60;
80 data testfile;
81 input State $ ZipCode $ City $ Street $ 19-33;
How the DATA Step Identifies BY Groups 455
82 datalines;
NOTE: The data set WORK.TESTFILE has 7 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
90 ;
91 data test2;
92 set testfile;
93 by State City ZipCode;
94 put _N_= state= first.state= last.state= first.city= last.city=
95 first.ZipCode= last.ZipCode= ;
96 run;
_N_=1 State=AZ FIRST.State=1 LAST.State=1 FIRST.City=1 LAST.City=1
FIRST.ZipCode=1 LAST.ZipCode=1
_N_=2 State=FL FIRST.State=1 LAST.State=0 FIRST.City=1 LAST.City=0
FIRST.ZipCode=1 LAST.ZipCode=0
_N_=3 State=FL FIRST.State=0 LAST.State=0 FIRST.City=0 LAST.City=0
FIRST.ZipCode=0 LAST.ZipCode=0
_N_=4 State=FL FIRST.State=0 LAST.State=0 FIRST.City=0 LAST.City=0
FIRST.ZipCode=0 LAST.ZipCode=1
_N_=5 State=FL FIRST.State=0 LAST.State=0 FIRST.City=0 LAST.City=0
FIRST.ZipCode=1 LAST.ZipCode=0
_N_=6 State=FL FIRST.State=0 LAST.State=1 FIRST.City=0 LAST.City=1
FIRST.ZipCode=0 LAST.ZipCode=1
_N_=7 State=OH FIRST.State=1 LAST.State=1 FIRST.City=1 LAST.City=1
FIRST.ZipCode=1 LAST.ZipCode=1
NOTE: There were 7 observations read from the data set WORK.TESTFILE.
NOTE: The data set WORK.TEST2 has 7 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
97 proc printto; run;
Table 20.1 BY Groups for State, City, and Zipcode
Observations in Four BY Groups Corresponding FIRST. and LAST. Values
State City ZipCode Street FIRST.
State
LAST.
State
FIRST.
City
LAST.
City
FIRST.
ZipCode
LAST. ZipCode
AZ Tucson 85730 Glen Pl 1 1 1 1 1 1
FL Miami 33133 Rice St 1 0 1 0 1 0
FL Miami 33133 Tom Ave 0 0 0 0 0 0
FL Miami 33133 Surrey Dr 0 0 0 0 0 1
FL Miami 33146 Nervia St 0 0 0 0 1 0
FL Miami 33146 Corsica
St
0 1 0 1 0 1
OH Miami 45056 Myrtle St 1 1 1 1 1 1
456 Chapter 20 BY-Group Processing in the DATA Step
Example 2: Grouping Observations by City, State, ZIP code, and
Street
This example shows how SAS uses the FIRST.variable and LAST.variable to flag the
beginning and end of four BY groups: City, State, ZipCode, and Street. Six temporary
variables are created within the program data vector. These variables can be used during
the DATA step, but they do not become variables in the new data set.
In the figure that follows, observations in the SAS data set are arranged in an order that
can be used with this BY statement:
by City State ZipCode;
SAS creates the following temporary variables: FIRST.City, LAST.City, FIRST.State,
LAST.State, FIRST.ZipCode, and LAST.ZipCode.
Table 20.2 Grouping Observations by City, State, ZIP code, and Street
Observations in Four BY Groups Corresponding FIRST. and LAST. Values
City State ZipCode Street FIRST.
City
LAST.
City
FIRST.
State
LAST.S
tate
FIRST.
ZipCode
LAST.
ZipCode
Miami FL 33133 Rice St 1 0 1 0 1 0
Miami FL 33133 Tom
Ave
0 0 0 0 0 0
Miami FL 33133 Surrey
Dr
0 0 0 0 0 1
Miami FL 33146 Nervia
St
0 0 0 0 1 0
Miami FL 33146 Corsica
St
0 0 0 1 0 1
Miami OH 45056 Myrtle
St
0 1 1 1 1 1
Tucson AZ 85730 Glen Pl 1 1 1 1 1 1
Example 3: A Change Affecting FIRST.variable
The value of FIRST.variable can be affected by a change in a previous value, even if the
current value of the variable remains the same.
In this example, the value of FIRST.variable and LAST.variable are dependent on sort
order, and not just by the value of the BY variable. For observation 3, the value of
FIRST.Y is set to 1 because BLUEBERRY is a new value for Y. This change in Y causes
FIRST.
Z to be set to 1 as well, even though the value of Z did not change.
options pageno=1 nodate linesize=80 pagesize=60;
data testfile;
How the DATA Step Identifies BY Groups 457
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.229.44