Example 4.6 Performing a "Chained" Lookup

Goal

Trace the links that associate observations in one data set. Assume there are no breaks in the chain.

Example Features

Featured StepDATA step
Featured Step Options and StatementsHash object in the DATA step
Related TechniqueDATA step, BY-group processing, RETAIN statement

Input Data Set

Data set FLEET_REGISTRATION contains registration transactions for three vehicles. Variable VEHICLE_ID uniquely identifies the vehicle. The values for REGISTRATION change when the vehicle is transferred. When ACTION= "XFER", the vehicle has been transferred and its new registration ID is stored in variable NEW_REG. There are no breaks in the links between observations for a vehicle because all values of NEW_REG have a match in FLEET_REGISTRATION.

                      FLEET_REGISTRATION

      vehicle_
Obs      id       action   registration   new_reg     effective
 1    SEDAN0238    XFER       WI008        IL302     05/01/2001
 2    TRUCK0081    XFER       IN082        IL235     01/15/2002
 3    TRUCK0081    XFER       IL235        WI371     03/29/2003
 4    SEDAN0238    XFER       IL302        IL419     07/21/2004
 5    SEDAN0238    TERM       IL419                  11/03/2008
 6    MINIV0761    NEW        IL658                  09/19/2008
 7    TRUCK0081    XFER       WI371        IN454     08/22/2008

Resulting Data Set

Output 4.6 FLEET_HISTORY Data Set

                             Example 4.6 FLEET_HISTORY Data Set Created with DATA Step

    vehicle_
Obs    id     current nxfers history                                                         firstxfer    lastxfer

 1  SEDAN0238            2   WI008; IL302(05/01/2001); IL419(07/21/2004); REMOVED 11/03/2008 05/01/2001 07/21/2004
 2  TRUCK0081  IN454     3   IN082; IL235(01/15/2002); WI371(03/29/2003); IN454(08/22/2008)  01/15/2002 08/22/2008
 3  MINIV0761  IL658     0   IL658(NEW 09/19/2008)


Example Overview

This example tracks the links between observations in a data set. It uses the value of a variable in one observation to find a subsequent observation with that value.

The following DATA step collects the registration history for the three vehicles in data set FLEET_REGISTRATION. Two vehicles have registration transfers and one has none. A vehicle is assigned a new registration ID when it is transferred. Variable ACTION indicates the type of registration transaction. For observations that record a vehicle transfer (i.e., ACTION= "XFER"), variable REGISTRATION stores the current registration ID, and variable NEW_REG stores the new registration ID.

The DATA step outputs one observation per vehicle with the vehicle registration history consolidated in two variables, NXFERS and HISTORY. Variable NXFERS records the number of times the vehicle was transferred. Character variable HISTORY concatenates all the registration IDs and registration transaction dates for a vehicle.

On its first iteration, the DATA step defines a hash table, loads data set FLEET_REGISTRATION into the hash table by using the ADD method, and defines hash keys on variables VEHICLE_ID and REGISTRATION. The data that are loaded into the hash table include variables from data set FLEET_REGISTRATION and new variable ALREADY_USED.

After loading the hash table, the DATA step processes the observations in FLEET_REGISTRATION sequentially and traverses the hash table for registration changes for each vehicle.

The DATA step has two DO UNTIL loops. The first reads each observation from FLEET_REGISTRATION. The second, which is nested in the first, looks for all the matches for a vehicle in the hash table. Observations are output in the second loop after all links between observations for a vehicle have been found.

Variable ALREADY_USED controls the processing of the first DO loop. It is defined to be a true/false (1/0) variable that tracks whether the DATA step already used the current observation as a link in the chain that connects observations. If an observation has already been used as a link in the chain, it is not processed again.

Variable ACTION can have one of three values that indicate the observation's transaction type. The DATA step examines the value of ACTION to determine how to process the observations. Only observations with type "XFER" have links to other observations in the data set. Here are the three values of ACTION:

  • The value "XFER" indicates a registration transfer.

  • The value "NEW" indicates a new fleet vehicle with no transfers.

  • The value "TERM" indicates a fleet vehicle removed from service.

The DATA step code assumes that certain conditions of the input data set are met. Some of these assumptions are in the following list. When adapting this program, you might need to add code to handle different conditions in your data.

  • The keys are unique.

  • There are no breaks in the links between observations for a vehicle.

  • The values for EFFECTIVE are in ascending order within the chain of links between observations for a vehicle.

  • A value for NEW_REG is always present when ACTION= "XFER".

  • NEW_REG is always missing when ACTION= "NEW" or ACTION= "TERM".

Program

Create data set FLEET_HISTORY. Rename REGISTRATION to indicate that its value is the current registration ID.

Assign attributes to variables. Define new variable ALREADY_USED.

On the first iteration of the DATA step, define and load a hash table. Initialize ALREADY_USED. Define hash table F. Define key items on which to retrieve data from F. Specify the data items to store in F. Include the key items and new variable ALREADY_USED. End the definition of F. Load F with data from FLEET_REGISTRATION and include new variable ALREADY_USED in F since it was defined as data.

Process a DO UNTIL loop until the end of FLEET_REGISTRATION has been reached. Read the observations from FLEET_REGISTRATION. Specify the END= option so that code can test when the last observation in FLEET_REGISTRATION has been read. Look for a match in F for the current When a match is found and ALREADY_USED is 1, do not process the observation and return to the top of the first DO UNTIL loop. A value of 1 for ALREADY_USED means that the row in the hash table has already been used as a link in the chain that connects observations for the current vehicle. Initialize NXFERS to 0 for each observation that has not already been used as a link between observations. Process the block that performs the lookups for links in the chain that connects observations. In this example, NEW_REG will be nonmissing only when ACTION= "XFER". Because the DATA step is currently processing the first observation in the chain of linked observations, initialize the values of FIRSTXFER and LASTXFER. Initialize the value of HISTORY with the first transfer data. Increment NXFERS because a transfer has occurred. Copy the value of NEW_REG to REGISTRATION because REGISTRATION is one of the hash table keys and the goal is to look for a row in the hash table for this new registration ID. Process a DO UNTIL loop until no more linked observations are found in F. Look for a match in F for the current key values. On the first iteration of the loop, the value of REGISTRATION is the value that is copied from NEW_REG in the statement preceding the loop. Process this DO group when the lookup in F is successful. Process this DO group when the lookup returns a row from the hash table with a value for NEW_REG (i.e., a transfer). Concatenate the registration change data to HISTORY. Change the value of LASTXFER, which records the date of the last transfer. Increment NXFERS because a transfer has occurred.

Flag the row returned from the hash table as one that has already been used as a link in the chain that connects observations. Replace the data in the row in F that is associated with the current key values. This example changes only the value of hash item ALREADY_USED. Copy the value of NEW_REG to REGISTRATION because REGISTRATION is one of the hash table keys and the goal is to look for a row in F for this new registration ID when processing returns to the top of the second DO UNTIL loop.

Process this DO group when the lookup in the hash table does not find a matching row for the current key values. This condition indicates that the chain of links has ended. Assign specific text to HISTORY to indicate that the vehicle was removed from the fleet. Output an observation when the end of the chain of linked observations has been reached.

Process this DO group when the current observation from FLEET_REGISTRATION does not have a value of NEW_REG and its value for ACTION is "NEW". Because the block is processing a vehicle that has not yet been transferred, initialize the values of FIRSTXFER and LASTXFER to missing. When ACTION= "NEW", assign specific text to HISTORY to indicate that the vehicle is new to the fleet. Output the data for this new vehicle. No further lookup in the hash table is performed for this observation. For unexpected errors, stop the program.

data fleet_history(rename=(registration=current));


  keep vehicle_id registration history nxfers firstxfer
       lastxfer;
  length vehicle_id $ 9 registration $ 5 nxfers 3
         history $ 100 already_used 4;
  format firstxfer lastxfer mmddyy10.;
  if _n_=1 then do;

    already_used=0;
    declare hash f();
    f.definekey('vehicle_id','registration'),
    f.definedata('vehicle_id', 'registration',
                 'new_reg', 'action', 'effective',
                 'already_used'),
    f.definedone();
    do until(done);
      set fleet_registration end=done;
      rc=f.add();
    end;
  end;
  do until(done);

    set fleet_registration end=done;




    notfound=f.find();

    if notfound=0 and already_used=1 then continue;






    nxfers=0;

    if new_reg ne ' ' then do;



      firstxfer=effective;
      lastxfer=effective;


      history=catx('; ', registration,
        cats(new_reg,'(',put(effective,mmddyy10.),')'));
      nxfers+1;

      registration=new_reg;




      do until(rc ne 0);

        rc=f.find();
        if rc=0 then do;

          if new_reg ne ' ' then do;


            history=catx('; ', history,
      cats(new_reg,'(', put(effective,mmddyy10.), ')'));
            lastxfer=effective;

            nxfers+1;

          end;
          already_used=1;


          f.replace();


          registration=new_reg;





        end;
        else do;



          if action='TERM' then history=
       catx('; ', history, catx(' ', "REMOVED",
                          put(effective,mmddyy10.)));
          output;

        end;
      end;
    end;
    else if action="NEW" then do;



      firstxfer=.;
      lastxfer=.;


      history=cats(registration,
       catx(' ', "(NEW", put(effective,mmddyy10.)),')'),

      output;

    end;
    else do;
      putlog "ERROR: Unexpected error. Program stopped
                            at " _all_;
      stop;
    end;
  end;
run;

Related Technique

The following DATA step also tracks the links between observations in data set FLEET_REGISTRATION. It does this by processing the data set in BY groups that are defined by the values of VEHICLE_ID. Prior to the DATA step, the data set is sorted by variable VEHICLE_ID, and within each value of VEHICLE_ID, by the values of EFFECTIVE. This chronologically arranges each vehicle's group of observations.

The presence of a variable such as EFFECTIVE, which can arrange the observations in the same order as the way the links are made, enables the chaining process to be accomplished in a DATA step that processes the data in BY groups. Without such a variable, adapt the hash table approach that was presented in the main example.

Statements test the value of ACTION to determine what to add to variable HISTORY. Variable values are retained across iterations of the DATA step within each BY group, and observations are output only when processing the last observation in a BY group.

Sort the observations in the way they need to be processed by the DATA step.

Create data set FLEET_HISTORY. Read the observations from FLEET_REGISTRATION. Process FLEET_REGISTRATION in BY groups that are defined by the values of VEHICLE_ID.

List the variables whose values should be retained across iterations of the DATA step. Initialize several variables when processing the first observation in a BY group. Because this is the first observation in a BY group and its only history at this point is its first registration, assign the value of REGISTRATION to HISTORY.

Process this block when NEW_REG has a nonmissing value. Process this block for observations where the transaction is a registration transfer. Concatenate the registration change data to HISTORY. When the DATA step is processing the first observation in the BY group, which is when FIRSTXFER will be set to missing by the code several statements earlier, initialize the value of FIRSTXFER. Set the value of LASTXFER to EFFECTIVE each time a transfer transaction observation is processed. Copy the value of NEW_REG to CURRENT since this variable is defined to contain the current registration ID for the vehicle. Increment NXFERS because a transfer has occurred.

For unexpected errors, link to ERRMSG, which stops the program.

Process this block for observations that do not contain transfer information. When ACTION= "NEW", assign specific text to HISTORY to indicate that the vehicle is new to the fleet. Process this block for vehicles removed from the fleet. Set CURRENT to missing because the vehicle has been removed from the fleet and is no longer registered. Assign specific text to indicate that the vehicle was removed from the fleet.

For unexpected errors, link to ERRMSG, which stops the program.

Output an observation at the end of the BY group, which is when the end of the chain of linked observations has been reached. Return to the top of the DATA step so that the following block is not executed. Write an error message to the log and stop the program.

proc sort data=fleet_registration;
  by vehicle_id effective;
run;
data fleet_history;
  set fleet_registration;

  by vehicle_id;

  keep vehicle_id current history nxfers firstxfer
       lastxfer;
  length vehicle_id $ 9 current $ 5 nxfers 3
         history $ 100;
  format firstxfer lastxfer mmddyy10.;
  retain history nxfers firstxfer lastxfer;

  if first.vehicle_id then do;


    history=registration;



    nxfers=0;
    firstxfer=.;
    lastxfer=.;
  end;
  if new_reg ne ' ' then do;

    if action='XFER' then do;


     history=catx('; ', history,
  cats(new_reg, '(', put(effective,mmddyy10.), ')'));
     if firstxfer=. then firstxfer=effective;

      lastxfer=effective;


      current=new_reg;


      nxfers+1;

    end;
    else link errmsg;

  end;
  else do;

     if action='NEW' then
        history=cats(registration,
         catx(' ', "(NEW",put(effective,mmddyy10.)),')'),
     else if action='TERM' then do;

      current=' ';


      history=catx('; ', history,
         catx(' ', "REMOVED", put(effective,mmddyy10.)));
    end;
    else link errmsg;

  end;
  if last.vehicle_id then output;


  return;

  errmsg:
    putlog "ERROR: Unexpected error. Program stopped at
                              " _all_;
    stop;
run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.43.149