Example 8.4 Separating the First Observation in a BY Group from the Other Observations in the BY Group

Goal

Write the first observation in a BY group to one data set. Write the other observations in the BY group to a second data set.

Example Features

Featured StepDATA step
Featured Step Options and StatementsFIRST.variable and LAST.variable temporary variables
Related TechniquePROC SORT, DUPOUT=, NODUPKEY, and OUT= options

Input Data Set

Data set TRADE_ASSN contains membership records for four members of a trade association. Each observation stores the membership information for one company for one year. Note that three of the four companies have more than one observation and that one company, Surname Associates, has only one observation.

                           TRADE_ASSN

 Obs         company         membershipyear nemployees  revenue
   1 Always Ready Fix-it          2008           4       103929
   2 Always Ready Fix-it          2009           5       198482
   3 Always Ready Fix-it          2010           3       154105
   4 Fairlakes Manufacturing      2006          50       763120
   5 Fairlakes Manufacturing      2007          25      5301903
   6 Fairlakes Manufacturing      2008          31      7928185
   7 Fairlakes Manufacturing      2009          28      8391869
   8 Fairlakes Manufacturing      2010          27      8678291
   9 Surname Associates           2010          22     10682910
  10 Wise Wizards Consulting      2005          10      3918593
  11 Wise Wizards Consulting      2006           5       645120
  12 Wise Wizards Consulting      2008           2        38371

Resulting Data Sets

Output 8.4a JOINED_ASSN Data Set

    Example 8.4 JOINED_ASSN Data Set Created with DATA Step

 Obs         company         membershipyear nemployees  revenue

  1  Always Ready Fix-it          2008           4       103929
  2  Fairlakes Manufacturing      2006          50       763120
  3  Surname Associates           2010          22     10682910
  4  Wise Wizards Consulting      2005          10      3918593


Output 8.4b RENEWED_ASSN Data Set

    Example 8.4 RENEWED_ASSN Data Set Created with DATA Step

 Obs         company         membershipyear nemployees revenue

  1  Always Ready Fix-it          2009           5      198482
  2  Always Ready Fix-it          2010           3      154105
  3  Fairlakes Manufacturing      2007          25     5301903
  4  Fairlakes Manufacturing      2008          31     7928185
  5  Fairlakes Manufacturing      2009          28     8391869
  6  Fairlakes Manufacturing      2010          27     8678291
  7  Wise Wizards Consulting      2006           5      645120
  8  Wise Wizards Consulting      2008           2       38371


Example Overview

This example shows how you can determine when an observation is the first observation in a BY group so that you can separate it from the other observations in its BY group.

The DATA step requires a BY statement that lists the BY-group variables. When you process your data in BY groups, SAS automatically creates the FIRST. and LAST. temporary variables for the variables that are named in the BY statement. The DATA step does not write these variables to the output data sets.

Data set TRADE_ASSN maintains membership records for four members of a trade association starting in 2005. The goal is to find the record when the member joined the association and to find all of the member's subsequent renewal records.

Assume that the observations in data set TRADE_ASSN have already been sorted or indexed by the two variables, COMPANY and MEMBERSHIPYEAR. The BY statement names only one variable, COMPANY. The data are sorted by MEMBERSHIPYEAR within each value of COMPANY so that the first observation in a BY group will be the first year the company was in the trade association. (Your program will not generate an error if you keep MEMBERSHIPYEAR in the BY statement.)

The DATA step tests the value of the temporary variable FIRST.COMPANY to determine when an observation is the first one in the BY group. It outputs observations to the appropriate data set based on the value of this temporary variable.

With data set TRADE_ASSN sorted by COMPANY and within each value of COMPANY by MEMBERSHIPYEAR, the FIRST.COMPANY temporary variable is 1 (true) only when processing the first observation in a BY group. This condition exists when the company joined the trade association. If it is not the first observation in the BY group, the value of FIRST.COMPANY is 0 (false). This condition exists for membership renewal observations.

Create data sets JOINED_ASSN and RENEWED_ASSN. Read each observation from TRADE_ASSN. Define BY-group processing for TRADE_ASSN. Output observations that are first in their BY group to JOINED_ASSN. Output observations that are not first in their BY group to RENEWED_ASSN.

 data joined_assn renewed_assn;

  set trade_assn;

  by company;

  if first.company then output joined_assn;

  else output renewed_assn;

run;

Related Technique

The following PROC SORT step creates two data sets identical to the data sets JOINED_ASSN and RENEWED_ASSN that were created by the preceding DATA step. A combination of PROC SORT options can achieve the same results as a DATA step in this example.

Because this technique sorts and separates the observations in one step, it is more efficient than sorting the data set and executing the DATA step in the main example. When you do not need to add additional programming statements and your data set has not already been sorted or indexed prior to executing the DATA step, consider the PROC SORT technique.

The step uses the NODUPKEY option to separate the observations that are first in a BY group from the remaining observations in the BY group. The OUT= option specifies the data set that will hold these first observations. Using the OUT= option will prevent PROC SORT from overwriting TRADE_ASSN. The DUPOUT= option specifies the data set that will contain the observations removed from TRADE_ASSN because of the NODUPKEY option.

Sort data set TRADE_ASSN. Write only the first observation in a BY group to the OUT= data set. Save the observations that are not the first ones in each BY group in the DUPOUT= data set. Specify the BY variables.

proc sort data=trade_assn
          nodupkey
          out=joined_assn
          dupout=renewed_assn;

  by company membershipyear;
run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.125.51