Create in one DATA step two data sets that summarize information found in one data set. Create one data set that contains cumulative totals for observations within a BY group. Create another data set that collapses each BY group into a single observation and produces grand totals for variables in each BY group.
Featured Step | DATA step |
Featured Step Options and Statements | BY-group processing Accumulator variables |
Related Technique | PROC SQL, GROUP BY clause, CASE expression, MAX and SUM functions |
Data set TECHCERTIF contains certification-eligible events for five technologists.
TECHCERTIF event_ Obs techid event_date event_type credits 1 1756 08/22/2008 Seminar 5 2 1756 09/12/2008 Practice 100 3 1756 09/22/2008 Service 20 4 1756 01/15/2009 Class 100 5 1756 03/02/2009 Practice 75 6 1756 05/23/2009 Practice 125 7 1935 05/03/2008 Conference 75 8 1935 08/22/2008 Seminar 5 9 1935 09/12/2008 Practice 100 10 1935 10/15/2008 Class 50 11 1935 12/02/2008 Practice 125 12 1935 03/12/2009 Seminar 5 13 1935 06/12/2009 Conference 105 14 2234 05/23/2009 Practice 125 15 9185 01/15/2009 Class 100 16 9186 04/22/2009 Seminar 10 17 9186 05/23/2009 Practice 125 18 9186 07/29/2009 Conference 150
Output 5.5a CUMCREDITS Data SetExample 5.5 CUMCREDITS Data Set Created with DATA Step event_ cum_ Obs techid event_date event_type credits credits 1 1756 08/22/2008 Seminar 5 5 2 1756 09/12/2008 Practice 100 105 3 1756 09/22/2008 Service 20 125 4 1756 01/15/2009 Class 100 225 5 1756 03/02/2009 Practice 75 300 6 1756 05/23/2009 Practice 125 425 7 1935 05/03/2008 Conference 75 75 8 1935 08/22/2008 Seminar 5 80 9 1935 09/12/2008 Practice 100 180 10 1935 10/15/2008 Class 50 230 11 1935 12/02/2008 Practice 125 355 12 1935 03/12/2009 Seminar 5 360 13 1935 06/12/2009 Conference 105 465 14 2234 05/23/2009 Practice 125 125 15 9185 01/15/2009 Class 100 100 16 9186 04/22/2009 Seminar 10 10 17 9186 05/23/2009 Practice 125 135 18 9186 07/29/2009 Conference 150 285 |
Output 5.5b GRANDCREDITS Data SetExample 5.5 GRANDCREDITS Data Set Created with DATA Step total_ total_ practice_ Obs techid last_event credits credits 1 1756 05/23/2009 425 300 2 1935 06/12/2009 465 225 3 2234 05/23/2009 125 125 4 9185 01/15/2009 100 0 5 9186 07/29/2009 285 125 |
This example shows how to apply BY-group processing in a DATA step to produce cumulative and grand totals. It processes data set TECHCERTIF in BY groups defined by the values of variable TECHID. This data set contains certification-eligible events for five technologists identified by the values of TECHID.
The DATA step creates two data sets:
CUMCREDITS, which contains cumulative credits over time for each technologist. Since these are cumulative totals, the number of observations in CUMCREDITS is the same as in TECHCERTIF.
GRANDCREDITS, which contains the total certification-eligible credits and total practice-type credits for each technologist. Because these are overall totals for each technologist, the number of observations in GRANDCREDITS is equal to the number of technologists with data in TECHCERTIF, which is five.
Because the DATA step processes TECHCERTIF in BY groups defined by TECHID, it is sorted prior to the DATA step by TECHID. The observations are also sorted chronologically by LAST_EVENT within each value of TECHID because a running total of credits across time is needed.
The DATA step uses accumulator variables, CUM_CREDITS and TOTAL_PRACTICE_CREDITS, to tally the certification-eligible credits and practice-type credits.
The DATA statement renames and drops variables based on the type of content that each output data set contains.
Create data set CUMCREDITS to contain the cumulative totals of credits across time. Do not keep the variable that has the cumulative total for practice credits. Create data set GRANDCREDITS to contain the grand totals of certification-eligible and practice credits. Rename variables to reflect that they are for the last event and credits grand total. Drop the variables that are pertinent only to the individual event attendance. Read the observations from TECHCERTIF. Process TECHCERTIF in BY groups that are defined by the values of TECHID. Initialize accumulator variables CUM_CREDITS and TOTAL_PRACTICE_CREDITS to 0 at the beginning of each BY group. Add the current value of EVENT_CREDITS to the running total that is stored in accumulator variable CUM_CREDITS. Add the current value of EVENT_CREDITS to the running total that is stored in accumulator variable TOTAL_PRACTICE_CREDITS. This pertains to "Practice" events only. Output each observation from TECHCERTIF to CUMCREDITS. Output the last observation in each BY group to GRANDCREDITS.
data cumcredits(drop=total_practice_credits) grandcredits(rename=(event_date=last_event cum_credits=total_credits) drop=event_type event_credits); set techcertif; by techid; if first.techid then do; cum_credits=0; total_practice_credits=0; end; cum_credits+event_credits; if event_type="Practice" then total_practice_credits +event_credits; output cumcredits; if last.techid then output grandcredits; run;
The following PROC SQL step creates a grand total table equivalent to the GRANDCREDITS data set that was created in the main example. It does not create a cumulative total table equivalent to CUMCREDITS because the accumulator variable processing in the DATA step is not similarly available in PROC SQL.
Table GRANDCREDITS is not produced by combining summary and detail data, but it is presented here to illustrate another way to produce grand totals for BY groups.
The CREATE TABLE statement uses the MAX and SUM aggregate functions and the GROUP BY clause to summarize the EVENT DATE and EVENT_CREDITS columns in TECHCERTIF in groups defined by the values of TECHID.
Computing the total practice credits requires testing the value of EVENT TYPE. A CASE expression examines the values of EVENT TYPE and assigns a value to EVENT CREDITS during processing of the expression. When EVENT TYPE is not 'Practice', the CASE expression assigns a value of 0. Otherwise, the CASE expression uses the value in EVENT_CREDITS. One technologist, 9185, does not have any practice credits. Therefore, the value for TOTAL_PRACTICE_CREDITS for 9185 is 0 in the GRANDCREDITS table.
Example 5.6 also uses aggregate functions and the GROUP BY clause.
Create table GRANDCREDITS. Specify the columns to select from GRANDCREDITS. Do not apply an aggregate function to TECHID since it is the GROUP BY column. Determine the most recent event date for a technologist. Determine the total number of credits. Examine the value of EVENT_TYPE to determine the total practice credits. When it is 'Practice', use the value in EVENT_CREDITS. Otherwise, use 0. Determine the total number of practice credits by applying the SUM function to the results of the CASE expression. Read the rows in TECHCERTIF. Group the rows by the values of TECHID.
proc sql; create table grandcredits as select techid max(event_date) as last_event format=mmddyy10. sum(event_credits) as total_credits, sum(case when event_type='Practice' then event_credits else 0 end) as total_practice_credits from techcertif group by techid; quit;
18.222.167.161