Chapter 13: A Final Illusion: Backtesting

13.1 The Preparation

13.2 The Steps

13.3 The Implementation

13.3.1 Programming Challenge

This chapter is an illusion, a vision of what could exist. Right now it exists only in the realm of imagination. Here is the situation that inspires this vision.

Many macros get updated from time to time. How do we know the new and old versions do substantially the same thing? Either formally or informally, we design a few tests, run them, and compare the results. What would happen a year later if someone suspected that the old version had not worked properly in all cases? It may be impossible to test. There may have been three or four updates in that period of time, and it may be impossible to recover the original version. Often, life goes on. We don’t worry about what the old macro would have done, because the updated version is performing well. What if it becomes crucial to validate the original version of the macro … or perhaps to test the variation that existed at a particular point in time?

Now expand that thought to cover an entire library of macros. How can you run a program that calls eight macros, each of which calls two additional macros, and use the version of the macros that existed on a particular day six months ago? How do you prepare for the moment this is necessary, so that it becomes easy to do? This chapter covers the methods needed to turn this illusion into a reality, to easily store and access any version of any macro.

13.1 The Preparation

Macros within an autocall library typically contain a header block with information about the macro’s purpose, author, inputs and outputs, history, and perhaps key usage notes. Consider this section from a header block:

%**                                                                                                          **;

%**  Program Name:          /main/macro/library/my_macro.sas           **;

%**                                                                                                         **;

%**  History/Modifications:                                                                   **;

%**                11/24/2012  Moved to main macro library                      **;

%**                03/08/2013  Enhanced report 4 by switching from         **;

%**                                     from proc print to proc report                     **;

%**                07/03/2013  Added significance level parameter            **;

%**                10/15/2013  Added debugging tools                               **;

%**                                                                                                         **;

At any point from November 24, 2012 onward, this macro resided in the main macro library under the name my_macro.sas. However, the contents of that file changed three times. Each time, the new version of my_macro.sas replaced the old. That process does not need to change. What should change is that a second macro library should be created containing every version of the macro. Within the folder /historical/macro/library, all these files would reside, each holding a different definition of %MY_MACRO:

•   my_macro.20121124, holding the original version of the macro.

•   my_macro.20130308, holding the first revised version.

•   my_macro.20130703, holding the second revised version.

•   my_macro.20131015, holding the third revised version.

The last portion of each file name matches the first day that the macro was in effect, replacing the previous version of the macro. With these pieces in place, two pieces of information would be sufficient to determine which version of the macro should run:

•    The name of the macro (in this case, my_macro).

•    The date. Based on the files named above, a date of 20130805 would imply that the most recent version was from 20130703.

A macro call might define which version of which macro(s) should be in effect:

%backdate (macro_list=my_macro, 

           as_of_date=20130805,

           folder=/historical/macro/library)

The intent would be that all calls to %MY_MACRO use the variation that was in effect as of August 5, 2013. The %BACKDATE macro could take these steps:

1.    Inspect the historical macro library. Retrieve a list of all file names that define %MY_MACRO.

2.    Compare the dates in the file extensions to the user-specified &AS_OF_DATE.

3.    Select the extension that is closest to, but less than or equal to, &AS_OF_DATE. Provide an appropriate error message if &AS_OF_DATE precedes all the available file extensions.

4.    %INCLUDE the selected file, defining %MY_MACRO.

SAS can accomplish all of these steps, with a little help from the operating system.

13.2 The Steps

Step 1: Retrieve a List of File Names in the Folder

Each operating system uses different tools to accomplish this task. Under UNIX, for example, this command would work:

ls /historical/macro/library/*.* > some_file.txt

The pieces of this command work as described below:

ls

This command requests a list of all file names:

/historical/macro/library/*.*

This is the folder and name pattern to search for. *.* indicates any two-part name with a dot separating the parts:

This symbol indicates that the list of names should be directed to a given location rather than displayed:

some_file.txt

This is the destination file that will hold the list of names.

Once this command has executed, some_file.txt will contain the full list of file names, such as:

/historical/macro/library/my_macro.20121124

/historical/macro/library/my_macro.20130308

/historical/macro/library/my_macro.20130703

/historical/macro/library/my_macro.20131215

/historical/macro/library/your_macro.20130102

/historical/macro/library/your_macro.20130414

Each operating system will have its own methods for extracting a list of file names. The remainder of this section will demonstrate just the UNIX example that begins with the steps illustrated thus far.

The next task is to convert the UNIX command to a form that a SAS program can run. The X command allows SAS to run an operating system command. This macro serves as a starting point:

%macro backdate (macro_list=, 

                               as_of_date=,

                               folder=/historical/macro/library);

   x "ls /historical/macro/library/*.* > some_file.txt";

%mend backdate;

Within the X command, both the input folder and the output file should have some flexibility. Choosing the input folder is easy enough:

   x "ls &folder/*.* > some_file.txt";

But naming the output file requires a little more work. Wouldn’t it be nice if macro language handled this for you? For example, suppose this program called %BACKDATE:

/my/program/folder/prog1.sas

Automatically, macro language could create this file, holding the list of file names:

/my/program/folder/prog1.macro_list

While not illustrated here, it would be wise to add a fourth parameter to the macro that permits the user to designate the name of the output file. If that parameter is left blank, the macro would use the logic below to supply the name. Before diving into that logic, however, let’s briefly examine some of the issues. What benefits accrue by having the program determine the name of the output file?

This approach provides many advantages:

•    The list of macro names would always be stored in a known location.

•    That list could serve as documentation at a later date, showing which macros were actually available within the historical macro library.

•    The programmer would not be burdened with having to name the output file.

•    Multiple programs running at the same time would never contend for the same output file.

There would be disadvantages as well:

•    Because the output file name depends on the program name, %BACKDATE could not run in interactive mode (unless a fourth parameter were added).

•    The programming becomes more complex if %BACKDATE runs multiple times in the same program (particularly if there might be more than one historical macro library involved). These complications lie beyond the scope of this chapter.

Section 4.2 showed how to create an output file that matches the program name. This statement would retrieve the full program name:

%let prog_name = %sysfunc(getoption(sysin));

Assuming the program name always ends with .sas, the statements in bold could be added to the macro:

%macro backdate (macro_list=, 

                               as_of_date=,

                               folder=/historical/macro/library);

   %local output_file;

   %let output_file = %sysfunc(getoption(sysin));

   %let output_file = %substr(&output_file,1,%length(&output_file)-3);

   %let output_file = &output_file.macro_list;

   x "ls &folder/*.* > &output_file";

%mend backdate;

Notice that the middle %LET statement removes “sas” from the end of the name, but it does not remove the dot before that. Therefore, the third %LET statement requires just one dot, not two.

Step 2: Convert the List of Macros to a SAS Data Set

Having retrieved a list of all file names, the macro must convert that list to a SAS data set. The plan is to use that SAS data set to determine which file holds the properly dated version of %MY_MACRO. Here is one approach to preparing the SAS data set:

   data all_macros;

      infile "&output_file" lrecl=200 pad;

      length filename    $ 200

             macro_name  $  32

             filedate           $   8;

      input filename $200.;

      filedate = scan(filename, -1, ".");

      macro_name = upcase(scan(filename, -2, "./"));

   run;

   proc sort data=all_macros;

      by macro_name descending filedate;

   run;

The INPUT statement must apply an informat. Some folder names may contain a blank, rendering list input insufficient.

Values for FILENAME will typically take this form:

/historical/macro/library/macroname.YYYYMMDD

The SCAN function uses delimiters of . and / to select:

•    FILEDATE holds YYYYMMDD. The -1 word is the first word when reading from right to left.

•    MACRO_NAME holds the name of the macro. The -2 word is the second word when reading from right to left.

Step 3: Find the Proper Macro Definition

For each macro that the user includes in &MACRO_LIST, the program must:

•     Identify the files that could potentially define the right macro.

•     Compare FILEDATE to &AS_OF_DATE.

•     %INCLUDE the proper file.

Even with a large macro library, the data set of macro definitions might contain on the order of 10,000 observations -- small enough to process multiple times. The idea is to process the list within ALL_MACROS for each macro that must be defined. This code would locate the proper definition of %MY_MACRO:

   data _null_;

      set all_macros end=nomore;

      where macro_name = "MY_MACRO";

      if filedate <= "&as_of_date" then do;

         call execute("%include '" || trim(filename) || "';");

         stop;

      end;

      if nomore;

      put "ER"
          "ROR:  The macro MY_MACRO could not be located."

        / "as_of_date=&as_of_date, folder=&folder";

      call execute("endsas;");

   run;

Because the file names are sorted by both MACRO_NAME and descending FILEDATE, the first matching file will be the correct one. “Matching” means that the name of the macro is correct, and the date is on or before the user-specified &AS_OF_DATE.

When the matching file is located, the program should %INCLUDE it and then halt the DATA step. CALL EXECUTE (followed by the STOP statement) handles these tasks. And if no matching file exists, the user should receive a message about that before halting the program.

In order to define multiple macros with a single call to %BACKDATE, loop through the same process for each macro in the list:

   %local i next_macro_name;

   %if %length(&macro_list) > 0 %then 

   %do i=1 %to %sysfunc(countw(&macro_list, %str( )));

       %let next_macro_name = %upcase(%scan(&macro_list, &i, %str( ))); 

       data _null_;

          set all_macros end=nomore;

          where macro_name = "&next_macro_name";

          if filedate <= "&as_of_date" then do;

             call execute("%include '" || trim(filename) || "';");

             stop;

          end;

          if nomore;

          put "ER" 

              "ROR:  The macro &next_macro_name could not be located.";

          put "as_of_date=&as_of_date, folder=&folder";

       run;

   %end;

Each iteration through the %DO loop defines one macro from &MACRO_LIST.

13.3 The Implementation

All of the code from Section 13.2 gets incorporated into the final definition of %BACKDATE. After you have added the optional fourth parameter, the complete macro definition becomes:

%macro backdate (macro_list=, 

                               as_of_date=,

                               folder=/historical/macro/library,

                               output_file=);

   %local i next_macro_name;

   %if %length(&output_file)=0 %then %do;

      %let output_file = %sysfunc(getoption(sysin));

      %let output_file = %substr(&output_file,1,&i-3);

      %let output_file = &output_file.macro_list;

   %end;

   x "ls &folder/*.* > &output_file";

   data all_macros;

      infile "&output_file" lrecl=200 pad;

      length filename    $ 200

             macro_name  $  32

             filedate           $   8;

      input filename $200.;

      filedate = scan(filename, -1, ".");

      macro_name = upcase(scan(filename, -2, "./"));

   run;

   proc sort data=all_macros;

      by macro_name descending filedate;

   run;

   %if %length(&macro_list) > 0 %then 

   %do i=1 %to %sysfunc(countw(&macro_list, %str( )));

       %let macro_name = %upcase(%scan(&macro_list, &i, %str( )));

       data _null_;

          set all_macros end=nomore;

          where macro_name = "&macro_name";

          if filedate <= "&as_of_date" then do;

             call execute("%include '" || trim(filename) || "';");

             stop;

          end;

          if nomore; 

          put "ER" 

              "ROR:  The macro &next_macro_name did not exist as of";

          put "&as_of_date, folder=&folder";

       run;

   %end;

%mend backdate;

This application requires an output file that holds a list of all files in the historical macro library. Therefore, the first %DO group checks whether the user specified the name for an output file. If not, the program computes the name as program_name.macro_list. In practice, the user will often leave &OUTPUT_FILE blank and allow the program to compute the name.

Next, the program reads that output file, producing a SAS data set holding one observation for each file in the historical macro library. A sample observation might look like this:

FILENAME:                /historical/macro/library/my_macro.20130308

MACRO_NAME:       MY_MACRO

FILEDATE:                  20130308

With the data sorted by MACRO_NAME DESCENDING FILEDATE, the program is ready to locate the proper definition of each name in &MACRO_LIST. Each iteration through the final %DO loop defines one macro, performing these steps:

1.   Find the name of the next macro in &MACRO_LIST.

2.   Read in the SAS data set holding the name of every file in the historical macro library.

3.   Process those file names that match the name of the next macro in &MACRO_LIST.

4.   When the names match, compare the dates. Because the incoming file names are in sorted order from latest to earliest date, the first observation falling on or before the &AS_OF_DATE is the right one.

5.   Use CALL EXECUTE to generate a %INCLUDE statement for the proper file. The %INCLUDE statement executes automatically when the DATA step finishes.

Is it magic? Perhaps it seems that way. But once you practice the tricks and techniques, your macros will perform magic as well.

13.3.1 Programming Challenge

The final programming challenge doesn’t appear in a book. It will arise in many forms, as you program and as you field requests from end users. Can you assimilate what you have learned and utilize appropriate techniques? Can your attitude be, “I’ll find a way to make that happen”? Can you apply tools in new, innovative ways to expand upon what you have seen here?

I wish you the very best as you rise to the occasion.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.148.187