Example 5.5 Calculating Totals across a BY Group to Produce Cumulative and Grand Totals

Goal

Create in one DATA step two data sets that summarize information found in one data set. Create one data set that contains cumulative totals for observations within a BY group. Create another data set that collapses each BY group into a single observation and produces grand totals for variables in each BY group.

Example Features

Featured StepDATA step
Featured Step Options and StatementsBY-group processing Accumulator variables
Related TechniquePROC SQL, GROUP BY clause, CASE expression, MAX and SUM functions

Input Data Set

Data set TECHCERTIF contains certification-eligible events for five technologists.

                           TECHCERTIF

                                                    event_
      Obs    techid    event_date    event_type     credits
        1     1756     08/22/2008    Seminar            5
        2     1756     09/12/2008    Practice         100
        3     1756     09/22/2008    Service           20
        4     1756     01/15/2009    Class            100
        5     1756     03/02/2009    Practice          75
        6     1756     05/23/2009    Practice         125
        7     1935     05/03/2008    Conference        75
        8     1935     08/22/2008    Seminar            5
        9     1935     09/12/2008    Practice         100
       10     1935     10/15/2008    Class             50
       11     1935     12/02/2008    Practice         125
       12     1935     03/12/2009    Seminar            5
       13     1935     06/12/2009    Conference       105
       14     2234     05/23/2009    Practice         125
       15     9185     01/15/2009    Class            100
       16     9186     04/22/2009    Seminar           10
       17     9186     05/23/2009    Practice         125
       18     9186     07/29/2009    Conference       150

Resulting Data Sets

Output 5.5a CUMCREDITS Data Set

     Example 5.5 CUMCREDITS Data Set Created with DATA Step

                                              event_      cum_
Obs    techid    event_date    event_type    credits    credits

  1     1756     08/22/2008    Seminar           5          5
  2     1756     09/12/2008    Practice        100        105
  3     1756     09/22/2008    Service          20        125
  4     1756     01/15/2009    Class           100        225
  5     1756     03/02/2009    Practice         75        300
  6     1756     05/23/2009    Practice        125        425
  7     1935     05/03/2008    Conference       75         75
  8     1935     08/22/2008    Seminar           5         80
  9     1935     09/12/2008    Practice        100        180
 10     1935     10/15/2008    Class            50        230
 11     1935     12/02/2008    Practice        125        355
 12     1935     03/12/2009    Seminar           5        360
 13     1935     06/12/2009    Conference      105        465
 14     2234     05/23/2009    Practice        125        125
 15     9185     01/15/2009    Class           100        100
 16     9186     04/22/2009    Seminar          10        10
 17     9186     05/23/2009    Practice        125        135
 18     9186     07/29/2009    Conference      150        285


Output 5.5b GRANDCREDITS Data Set

    Example 5.5 GRANDCREDITS Data Set Created with DATA Step

                                                  total_
                                      total_    practice_
      Obs    techid    last_event    credits      credits

       1      1756     05/23/2009      425          300
       2      1935     06/12/2009      465          225
       3      2234     05/23/2009      125          125
       4      9185     01/15/2009      100            0
       5      9186     07/29/2009      285          125


Example Overview

This example shows how to apply BY-group processing in a DATA step to produce cumulative and grand totals. It processes data set TECHCERTIF in BY groups defined by the values of variable TECHID. This data set contains certification-eligible events for five technologists identified by the values of TECHID.

The DATA step creates two data sets:

  • CUMCREDITS, which contains cumulative credits over time for each technologist. Since these are cumulative totals, the number of observations in CUMCREDITS is the same as in TECHCERTIF.

  • GRANDCREDITS, which contains the total certification-eligible credits and total practice-type credits for each technologist. Because these are overall totals for each technologist, the number of observations in GRANDCREDITS is equal to the number of technologists with data in TECHCERTIF, which is five.

Because the DATA step processes TECHCERTIF in BY groups defined by TECHID, it is sorted prior to the DATA step by TECHID. The observations are also sorted chronologically by LAST_EVENT within each value of TECHID because a running total of credits across time is needed.

The DATA step uses accumulator variables, CUM_CREDITS and TOTAL_PRACTICE_CREDITS, to tally the certification-eligible credits and practice-type credits.

The DATA statement renames and drops variables based on the type of content that each output data set contains.

Program

Create data set CUMCREDITS to contain the cumulative totals of credits across time. Do not keep the variable that has the cumulative total for practice credits. Create data set GRANDCREDITS to contain the grand totals of certification-eligible and practice credits. Rename variables to reflect that they are for the last event and credits grand total. Drop the variables that are pertinent only to the individual event attendance. Read the observations from TECHCERTIF. Process TECHCERTIF in BY groups that are defined by the values of TECHID. Initialize accumulator variables CUM_CREDITS and TOTAL_PRACTICE_CREDITS to 0 at the beginning of each BY group. Add the current value of EVENT_CREDITS to the running total that is stored in accumulator variable CUM_CREDITS. Add the current value of EVENT_CREDITS to the running total that is stored in accumulator variable TOTAL_PRACTICE_CREDITS. This pertains to "Practice" events only. Output each observation from TECHCERTIF to CUMCREDITS. Output the last observation in each BY group to GRANDCREDITS.

data cumcredits(drop=total_practice_credits)



     grandcredits(rename=(event_date=last_event
                          cum_credits=total_credits)
                  drop=event_type event_credits);




  set techcertif;

  by techid;

  if first.techid then do;
    cum_credits=0;
    total_practice_credits=0;
  end;
  cum_credits+event_credits;


  if event_type="Practice" then
             total_practice_credits +event_credits;


  output cumcredits;

  if last.techid then output grandcredits;

run;

Related Technique

The following PROC SQL step creates a grand total table equivalent to the GRANDCREDITS data set that was created in the main example. It does not create a cumulative total table equivalent to CUMCREDITS because the accumulator variable processing in the DATA step is not similarly available in PROC SQL.

Table GRANDCREDITS is not produced by combining summary and detail data, but it is presented here to illustrate another way to produce grand totals for BY groups.

The CREATE TABLE statement uses the MAX and SUM aggregate functions and the GROUP BY clause to summarize the EVENT DATE and EVENT_CREDITS columns in TECHCERTIF in groups defined by the values of TECHID.

Computing the total practice credits requires testing the value of EVENT TYPE. A CASE expression examines the values of EVENT TYPE and assigns a value to EVENT CREDITS during processing of the expression. When EVENT TYPE is not 'Practice', the CASE expression assigns a value of 0. Otherwise, the CASE expression uses the value in EVENT_CREDITS. One technologist, 9185, does not have any practice credits. Therefore, the value for TOTAL_PRACTICE_CREDITS for 9185 is 0 in the GRANDCREDITS table.

Example 5.6 also uses aggregate functions and the GROUP BY clause.

Create table GRANDCREDITS. Specify the columns to select from GRANDCREDITS. Do not apply an aggregate function to TECHID since it is the GROUP BY column. Determine the most recent event date for a technologist. Determine the total number of credits. Examine the value of EVENT_TYPE to determine the total practice credits. When it is 'Practice', use the value in EVENT_CREDITS. Otherwise, use 0. Determine the total number of practice credits by applying the SUM function to the results of the CASE expression. Read the rows in TECHCERTIF. Group the rows by the values of TECHID.

proc sql;
  create table grandcredits as
    select techid


           max(event_date) as last_event
                            format=mmddyy10.
           sum(event_credits) as total_credits,
           sum(case when event_type='Practice' then
                            event_credits
                    else 0
               end) as total_practice_credits



         from techcertif
         group by techid;

quit;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.167.161