Understanding the Tools for Processing Information in Groups

Processing BY Groups in the DATA Step

When combining SAS data sets in a DATA step, it is often convenient or necessary to process observations in BY groups (that is, groups of observations that have the same value for one or more selected variables). Many examples in this book use BY-group processing with one or more SAS data sets to create a new data set.

The BY statement identifies one or more BY variables. When using the BY statement with the SET, MERGE, or UPDATE statement, your data must be sorted or indexed on the BY variable or variables.

In a DATA step, SAS identifies the beginning and end of each BY group by creating two temporary variables for each BY variable: FIRST. variable and LAST. variable. These variables are set to 1 if true and 0 if false to indicate whether that observation is the first or last in the current BY group. Using programming logic, you can test FIRST. variable and LAST. variable to determine whether the current observation is the first or last (or both first and last, or neither first nor last) in the current BY group. Testing the values of these variables in conditional processing enables you to perform certain operations at the beginning or end of a BY group.

Processing Grouped Data in PROC SQL

The same programming functionality that BY-group processing offers in the DATA step is not replicated in PROC SQL. The GROUP BY clause processes data in groups, similar to the way a BY statement in a PROC step processes data. Tables do not have to be sorted by the columns that are specified in the GROUP BY clause. The ORDER BY clause can be added to arrange the results.

Understanding BY-Group Processing with the MODIFY and BY Statements

Internally, the MODIFY statement handles BY-group processing differently from the SET, MERGE, and UPDATE statements. MODIFY creates a dynamic WHERE clause, making it possible for you to use BY-group processing without either sorting or indexing your data first. However, processing based on FIRST. variables and LAST. variables can result in multiple BY groups for the same BY values if your data are not sorted. Therefore, you might not get the expected results unless you use sorted data. And even though sorting is not required, it is often useful for improved performance.

Processing Groups with Arrays in the DATA Step

When you want to process several variables in the same way, you might be able to use arrays. Processing variables in arrays can save you time and simplify your code. Use an ARRAY statement to define a temporary grouping of variables as an array. Then use a DO loop to perform a task repetitively on all or selected elements in the array.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.198.61