Combining Summary Data and Detail Data

Overview

You have seen how to combine data from multiple data sets. Suppose you want to calculate percentages based on individual values from a data set as compared to a summary statistic of the data. You need to complete these tasks:
  • create a summary statistic
  • combine the summary data with the detail rows of the original data set
  • calculate the percentages
For example, the data set Sasuser.Monthsum has one row for every value of SalesMonth (month and year) from 1997 to 1999. Each row contains information about the revenue generated by an airline.
Table 15.5 SAS Data Set Sasuser.Monthsum (Partial Listing)
Sales Month
RevCargo
JAN1997
$171,520,869.10
JAN1998
$238,786,807.60
JAN1999
$280,350,393.00
FEB1997
$177,671,530.40
FEB1998
$215,959,695.50
FEB1999
$253,999,924.00
Suppose you want to produce a report that shows what percentage of the total cargo revenue for the three-year period was generated in each month of each year. You could summarize the data to get the total revenue for cargo for the three-year period and assign that value to a new variable called Cargosum in a summary data set.
Table 15.6 Summary Data Set
Cargosum
$8,593,432,002.35
Combine the summary data (Cargosum) with the detail data in Sasuser.Monthsum to calculate percentages of the total cargo revenue for each month.
Table 15.7 Partial Listing of the Combined Data Set
Sales Month
RevCargo
MonthNo
Cargosum
PctRev
JAN1997
$171,520,869.10
1
$8,593,432,002.35
<RevCargo/Cargosum>
JAN1998
$238,786,807.60
1
$8,593,432,002.35
<RevCargo/Cargosum>
JAN1999
$280,350,393.00
1
$8,593,432,002.35
<RevCargo/Cargosum>
FEB1997
$177,671,530.40
2
$8,593,432,002.35
<RevCargo/Cargosum>
FEB1998
$215,959,695.50
2
$8,593,432,002.35
<RevCargo/Cargosum>
FEB1999
$253,999,924.00
2
$8,593,432,002.35
<RevCargo/Cargosum>
We will examine this task more closely.

The MEANS Procedure

You should already know how to use the MEANS procedure for producing summary statistics. By default, PROC MEANS generates a report that contains descriptive statistics. The descriptive statistics can be routed to a SAS data set by using an OUTPUT statement. The default report can be suppressed by using the NOPRINT option.
General form, PROC MEANS with OUTPUT statement:
PROC MEANS DATA=input-SAS-data-set NOPRINT;
<VAR variable(s);>
OUTPUT OUT= output-SAS-data-set
statistic=output-variable(s);
RUN;
Here is an explanation of the syntax:
input-SAS-data-set
identifies the data set on which the summary statistic is generated.
variable(s)
is the name or names of the variable or variables that are being analyzed.
output-SAS-data-set
names the data set where the descriptive statistics will be stored.
statistic
is one of the summary statistics generated.
output-variable(s)
names the variable or variables in which to store the value or values of statistic in the output data set.
The output data set that a PROC MEANS step creates contains the requested statistics as values for output-variable(s), as well as two additional variables that are automatically included, as follows:
  • _TYPE_ contains information about the class variables
  • _FREQ_ contains the number of observations that an output level represents.

Example

The following program creates a summary data set named Sasuser.Summary. Sasuser.Summary contains the sum of the values of Revcargo from Sasuser.Monthsum, stored in the variable Cargosum.
proc means data=sasuser.monthsum noprint;
   var revcargo;
   output out=sasuser.summary sum=Cargosum;
run;

proc print data=sasuser.summary;
run;
Because of the NOPRINT option, the PROC MEANS step does not produce a report. Printing the Sasuser.Summary data set produces the following report.
Comparing DATA Step Match-Merges
Once you have created the summary statistic, you need to combine this summary information with the detail rows of the data set so that you can calculate the percentages. Remember that you can use multiple SET statements to combine data horizontally. Consider how this process works by using multiple set statements to combine the detail rows of Sasuser.Monthsum with the summary data that we created in Sasuser.Summary.

Example

This example creates a new data set named Percent1 that combines
  • summary data (total revenue for cargo from the three-year period) from Sasuser.Summary
  • detail data (month and total cargo for the month) from Sasuser.Monthsum.
Percent1 also contains a new variable named PctRev that records the calculated percentage of the total revenue that each observation represents.
Remember, the automatic variable _N_ tracks of how many times the DATA step iterates. The following DATA step uses _N_ to prevent SAS from reaching end of file on Sasuser.Summary after the first iteration of the step. Since variables read from a SAS data set are not reinitialized, the value of Cargosum is retained in the PDV as each observation is read from Sasuser.Monthsum.
  1. This example shows the compilation and execution of the DATA step below. This DATA step creates a new data set that combines summary data from one input data set (Sasuser.Summary) and detail data from a second input data set (Sasuser.Monthsum).
    data sasuser.percent1(drop=cargosum);
       if _N_=1 then set sasuser.summary(keep=cargosum);
       set sasuser.monthsum(keep=salemon revcargo);
       PctRev=revcargo/cargosum;
    run;
  2. During the compilation phase, SAS reads the descriptor portion of the input data set and creates the PDV. _N_ is a temporary variable that is included in the PDV, although it will not be included in the output data set.
  3. Execution begins. On the first iteration of the DATA step, _N_ has a value of 1. The IF statement evaluates as true, so the first SET statement reads the value of Cargosum from Sasuser.Summary into the PDV.
  4. The second SET statement reads the first observation in Sasuser.Monthsum into the PDV.
  5. SAS calculates the value of PctRev and records it in the PDV.
  6. At the bottom of the DATA step, SAS writes the values in the PDV to the output data set. _N_ is not included in the output data set since it is a temporary variable. CargoSum is dropped from the output data set as well.
  7. On the second iteration of the DATA step, the value of _N_ is 2, so the IF statement evaluates to false and the first SET statement does not execute. However, the value of CargoSum is retained in the PDV.
  8. The second SET statement reads the second observation from Sasuser.Monthsum into the PDV.
  9. The value for PctRev is calculated and recorded in the PDV. SAS writes the values in the PDV to the output data set (except for _N_ and CargoSum).
  10. The DATA step continues to execute until all observations have been read from Sasuser.Monthsum.
Another method of combining summary data and detail data is to create the summary statistic in a DATA step and combine it with the detail data in the same step. To do this you must do the following:
  • read the data once and calculate the summary statistic
  • re-read the data to combine the summary statistic with the detail data to calculate the percentages

The Sum Statement

You can use the sum statement to obtain a summary statistic within a DATA step. The sum statement adds the result of an expression to an accumulator variable.
General form, sum statement:
variable+expression;
Here is an explanation of the syntax:
variable
specifies the name of the accumulator variable. This variable must be numeric. The variable is automatically set to 0 before the first observation is read. The variable's value is retained from one DATA step iteration to the next.
expression
is any numeric SAS expression.
CAUTION:
If the expression produces a missing value, the sum statement ignores it. This action is different from the assignment statements that assign a missing value if the expression produces a missing value.
Note: The sum statement and assignment statement are among the few SAS statements that does not begin with a keyword.
The sum statement adds the result of the expression that is on the right side of the plus sign (+) to the numeric variable that is on the left side of the plus sign. At the top of the DATA step, the value of the numeric variable is not set to missing as it usually is when reading raw data. Instead, the variable retains the new value in the program data vector for use in processing the next observation.

Example

The following example uses a sum statement to generate the summary statistic in a DO UNTIL loop. On the first iteration of the DATA step, the DO UNTIL loop reads each observation of Sasuser.Monthsum and keeps a running tally of the total value of RevCargo from each observation. On each subsequent iteration of the DATA step, this tally (stored in the variable TotalRev) is divided into RevCargo to calculate the new variable PctRev.
Note: Remember that the END= a set statement option creates a temporary variable that contains an end-of-file indicator.
  1. This example shows the execution of the DATA step below. This DATA step reads the same data set, Sasuser.Monthsum, twice: first, to create a summary statistic; second, to merge the summary statistic back into the detail data to calculate PCTREV and create a new data set, Sasuser.Percent2.
    data sasuser.percent2(drop=totalrev);
       if _N_=1 then do until (LastObs);
          set sasuser.monthsum(keep=revcargo) end=lastobs;
          TotalRev+revcargo;
       end;
       set sasuser.monthsum(keep=salemon revcargo);
       PctRev=revcargo/totalrev;
    run;
  2. During the compilation phase, SAS reads the descriptor portion of the input data set and creates the PDV. _N_, LastObs, and TotalRev are temporary variables that are included in the PDV but not written to the output data set.
  3. Execution begins. The temporary variables are initialized with values. The IF statement resolves to true on the first iteration of the DATA step, so the DO UNTIL loop begins to execute. Remember, in a DO UNTIL loop, the condition is evaluated at the bottom of the loop.
  4. The first SET statement reads the first observation from Sasuser.Monthsum and writes the value for RevCargo to the PDV.
  5. The value of TotalRev is increased by the value of RevCargo and overwritten in the PDV.
  6. At the bottom of the DO loop, SAS evaluates the UNTIL expression. It resolves to false since the value of LastObs is 0, so the loop continues to execute.
  7. The first SET statement reads the second observation from Sasuser.Monthsum, overwriting the value for RevCargo in the PDV and adding this value to the accumulator variable TotalRev.
  8. The DO UNTIL loop continues to execute until the first SET statement reads the last observation from Sasuser.Monthsum and the value of LastObs is set to 1. At this point, the value for TotalRev in the PDV is the sum of all values for RevCargo in Sasuser.Monthsum. The loop is satisfied.
  9. The second SET statement reads the same data set as the first SET statement. However, this time SaleMon and RevCargo are read into the PDV. TotalRev remains populated in the PDV.
  10. PctRev is calculated for observation 1 and recorded in the PDV. Then, SAS writes the values in the PDV to the output data set Sasuser.Percent2, except for the temporary variables and the variable TotalRev.
  11. On the second iteration of the DATA step, the value of _N_ increases to 2, so the IF expression is false. The second SET statement reads from the second observation of Sasuser.Monthsum into the PDV.
  12. The value for the accumulator variable TotalRev is retained from the previous iteration and used to calculate a new value for PctRev, which is recorded in the PDV. SAS writes the values in the PDV to the output data set.
  13. The DATA step iterates until end-of-file on Sasuser.Monthsum.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.178.237