Chapter 12: Complexity vs. Simplicity

12.1 Building Job Security

12.1.1 Programming Challenge #8

12.1.2 Solution

12.2 Ease of Use

12.3 Checking Parameters

12.4 Portability

12.5 Complexity vs. Speed

12.6 Miscellaneous Applications

Simplicity has obvious advantages. It makes macros easier to write, understand, and maintain. When is complexity the right path? Many legitimate answers exist, such as:

•    When the end users are not so sophisticated and require the macros to do more of the work. Chapter 11 illustrated how to end a program when an error condition occurs, while providing a useful message. This chapter will explore additional topics, such as checking the macro parameters before starting to execute.

•    When the macro must adapt to many environments, such as across multiple operating systems or both batch and interactive applications.

•    When added complexity generates a faster running program or a more capable macro.

Before tackling the serious answers, let's briefly explore the wrong answer.

12.1 Building Job Security

Complex macros fortify job security. But building complexity into a macro just for the sake of job security is another matter. Unfortunately, for a handful of programmers, this is not just a goal but an art form. After all, why code a simple statement like this?

%let lullaby = 1;

Wouldn't this logic be much more elegant?

%let lullaby = %eval(no ne na); 

Because "no" is not equal to "na", the comparison is true and %EVAL returns a 1. (For similar %EVAL examples, refer to Section 9.2.) If that is too easy, try "enhancing" this overly simple statement:

%let pi = 3.14;

Remember, %SYSEVALF performs non-integer arithmetic:

%let pi = %sysevalf(0.11+3.03**(y=%substr(&sysday, %length(&sysday))));

Of course, &SYSDAY contains the current day of the week, and every day of the week ends in "y". So the y= comparison inside %SYSEVALF is always true and the partially evaluated statement resolves into:

%let pi = %sysevalf(0.11 + 3.03**1);

Naturally, expressly coding the %EVAL or %SYSEVALF function might be too big a clue. Another programmer might actually be able to decipher your code. Why not use comparisons in places where the software automatically invokes %EVAL for you? Consider this all-too-simple %DO loop:

%do i=1 %to 5; 

Anyone can follow that. Try replacing it with this beauty:

%do i = no ne na %to (Ep Eq Ep) + 4; 

Remember, %DO loops automatically invoke %EVAL for the FROM, TO, and BY values. Because Ep is equal to Ep, the partially resolved statement becomes:

%do i = 1 %to (1) + 4;

Use %EVAL to disguise the fact that sections of your macro do nothing. Here is a stripped-down example:

%macro disguise (indata=, top=8, middle=5, bottom=2);

   %if &top > &middle > &bottom %then %do;

       proc princess data=&indata;

          var _numeric_;

          output out=kingdoms;

       run;

   %end;

%mend disguise;

Of course, there is no such thing as PROC PRINCESS. But if you saw this code buried inside a working macro, would you try to track it down? The key is understanding the %IF comparison:

8 > 5 > 2

A DATA step evaluates this as two conditions:

(8 > 5) and (5 > 2)

Both comparisons are true, and the DATA step interprets the entire expression as being true. However, macro language evaluates the %IF conditions from left to right:

(8 > 5) > 2

%EVAL finds that 8 is greater than 5 and replaces the true comparison with a 1:

(1) > 2

%EVAL finds this comparison is false, so the macro never generates PROC PRINCESS. Are you feeling more secure yet?

Finally, why should another programmer be allowed to use your macros without asking you for instructions? Construct default parameters that must be blanked out for the macro to work. For example:

%macro mine_not_yours (p1=OB, p2=L_B, p3=CAL);

   data new;

      set old (&p3&p2&p1);

   run;

%mend mine_not_yours;

Anyone who dares to use this macro while leaving the default parameters in place will receive a message:

Invalid option name CALL_BOB.

Of course, you can adjust the message to be nicer or nastier. Are you inspired yet? Try this small test before moving on.

12.1.1 Programming Challenge #8

The sample statements below generate 0 or 1 as their result.

%let eq1 = %eval(eq);                          1

%let eq2 = %eval(eq eq);                     0

%let eq3 = %eval(eq eq eq);                0

%let eq4 = %eval(eq eq eq eq);            0

%let eq5 = %eval(eq eq eq eq eq);       0

Why are these the results?

12.1.2 Solution

If the series were extended, all the additional statements would generate 0. To understand why, apply three resolution rules:

•    In every case, EQ is a comparison operator.

•    %EVAL replaces true comparisons with a 1 and false comparisons with a 0.

•    Multiple comparisons evaluate from left to right.

Consider the first and third statements. The first statement compares a null value to a null value. Because they are equal, %EVAL returns a 1. The third statement begins the same way, starting with the leftmost comparison. The partially resolved statement becomes:

%let eq3 = %eval(1 eq eq);

The next EQ comparison compares 1 to a null value. Because they are not equal, %EVAL returns a 0. So the partially resolved statement now becomes:

%let eq3 = %eval(0 eq);

Finally, the last EQ comparison compares 0 to a null value. Because they are not equal, %EVAL returns its final value, 0. Based on these results, impress your colleagues by explaining what will work and what will fail in this statement:

%do i=ga ge go %to na ne no %by eq eq eq;

Assuming that job security does not justify added complexity, what are the valid reasons? The remainder of this chapter explores those situations.

12.2 Ease of Use

Remember that despite their flaws, end users play a vital role in the life of programmers. Make their life as easy as possible! Take on more of the burden when writing a macro if it will make life easier for those who use the macro.

Let’s begin with a simple example. Consider a macro that compares a SAS log with a SAS program. It must determine whether there is a match: did the current version of the program generate the log? The details of such a macro, including its feasibility and limitations, lie beyond the scope of this book. Our only concern here is ease of use. The macro call would look like this:

%compare (program_name=/path/to/folder/some_program.sas,

          log_name=/path/to/folder/some_program.log)

For the vast majority of calls to %COMPARE, the names of the program and log will match except for the last three letters. The simpler implementation would force the user to enter both parameters, increasing both the burden and the chance for a typographical error. The more complex (for the programmer) implementation would allow one parameter to remain blank. In that case, the macro would derive the value for the other parameter.

How much extra complexity would be involved? The macro might begin by checking whether &LOG_NAME is blank. This would be the wrong way:

%if &log_name= %then %do;

This statement runs into trouble when the user-entered &LOG_NAME contains either a forward slash or a dash. Depending on the operating system, those characters could be legitimate parts of a filename. However, they also constitute arithmetic operators. As explained in Chapter 9, the %IF condition triggers the %EVAL function, where the combination of arithmetic operators and letters would generate an error. This variation easily overcomes the problem:

%if %length(&log_name)=0 %then %do;

That's the safer way to check for a null value.

Next, the macro would have to construct the matching name to assign &LOG_NAME. These assumptions (while not appropriate for every operating system) would make the task easy:

•    The name of the program ends with .sas

•    The name of the log ends with .log

In that case, one statement would do the trick:

%let log_name = %substr(&program_name, 1, %length(&program_name)-3)log;

What should happen if the name of the incoming program does not end with .sas? Those complications are beyond the scope of this book. However, note that macro language can easily verify whether that assumption is true:

%local suffix;

%let suffix = %scan(&program_name, -1, .);

%if %qupcase(&suffix)=SAS %then %do;

As noted in Section 2.1, the %SCAN function reads from right to left when the second parameter is negative. While the %UPCASE function would uppercase its argument, the %QUPCASE function also quotes the result. That would be necessary if it turns out that the last portion of the program name was not “sas” but was “and” or “or”. Without some sort of quoting function, these statements would be trouble:

%if AND=SAS %then %do;

%if OR=SAS %then %do;

But the real lesson here is not the syntax. The real lesson is to shift the work from the user to the macro. The additional programming complexity lets the user leave the second parameter blank most of the time.

Consider another example, where our unsophisticated users apply a macro to subset and analyze data. The generated SAS code might look like this:

proc means data=huge.database;

     var salary bonus;

     where zipcode="01801";

     title "Salaries for Zipcode 01801";

run;

The user merely has to specify:

%analyze (zipcode=01801)

But one day a user gets creative:

%analyze (zipcode=parents_zipcode)

Reasonably or not, the user wants to analyze salaries where children live in the same ZIP code as their parents. Of course, that macro won't work. The generated code looks like this:

proc means data=huge.database;

     var salary bonus;

     where zipcode="parents_zipcode";

     title "Salaries for Zipcode parents_zipcode";

run;

The WHERE clause selects zero observations. But why shouldn't the macro work? If the macro could determine that the user has passed a variable name instead of a variable value, the generated code would have to change just a bit:

proc means data=huge.database;

     var salary bonus;

     where zipcode=parents_zipcode;

     title "Salaries for Zipcode equal to parents_zipcode";

run;

To make this happen, the program has to determine that, in this case, the user entered a variable name. The software provides many tools to check for that. The COMPRESS function can remove all digits by specifying “d” as the third parameter:

%if %length(%sysfunc(compress(&zipcode,,d))=0 %then %do;

Or the ANYALPHA function can determine whether a value contains any letters:

%if %sysfunc(anyalpha(&zipcode)) %then %do;

Without delving into the details of the final macro structure, the lesson resonates clearly. When possible, shift the work from the user to the macro. And remember, most unreasonable requests on the part of users reflect an analytical need.

12.3 Checking Parameters

How much checking should a macro perform on values the user has entered? Some macros double their length by checking before executing. Even in simple cases, apply some judgment. If the user is supposed to enter Y or N for a parameter, which of these alternatives are acceptable?

get_reports=Y

get_reports=YES

get_reports=y

get_reports=yes

get_reports=YeS

get_reports=Yoyo

There are no clear cut right and wrong answers to this question. It is reasonable to accept any form of Y or YES and any form of N or NO. It is equally reasonable to accept Y or N, but nothing else. It is up to the programmer to draw the line, to determine how easy the user’s life should be. Whatever your decision, it will affect the complexity of your macro.

How much checking makes sense? Has the user entered a value for all required parameters? Are user-entered values legal? Do combinations of user-entered parameters make sense? Consider specific user-entered values:

•    Does a folder exist?

•    Does a file exist?

•    Does a SAS data set exist?

•    Does a list of variables exist within a SAS data set?

Functions can answer all these questions. For example, the FEXIST and FILEEXIST functions check for the existence of a file or a folder. FILEEXIST expects the full path, while FEXIST expects the filename:

filename fname "path to some file or folder";

%let results = %sysfunc(fexist(fname));

These statements assign &RESULTS a value of 1 when the file exists and 0 when it does not. Notice that the program can detect an error condition without necessarily assigning the result to a macro variable. The following %DO group executes when the file or folder does not exist:

%if not %sysfunc(fexist(fname)) %then %do;

Checking for the existence of a SAS data set is just as easy. Again, the %DO group executes when the SAS data set does not exist:

%if not %sysfunc(exist(mylib.my_dataset)) %then %do;

Additional parameters extend the capability of the EXIST function. It can detect more complex conditions, such as whether or not a catalog exists. However, those features are not needed to check parameters that a user would typically enter.

Checking for the existence of a variable is a little bit trickier. The three-step process involves:

1.      Open the data set that should contain the variable.

2.      Attempt to locate the variable within the data set.

3.      Close the data set.

Each statement below accomplishes one of these three steps:

%let dataset_id  = %sysfunc(open(mylib.my_dataset),i);

%let var_found  = %sysfunc(varnum(&dataset_id,variable_name));

%let dataset_id  = %sysfunc(close(&dataset_id));

Here are some key points:

•    The first statement opens the incoming data set and assigns an integer value to &DATASET_ID. The value of the integer is actually a sequential count of the data sets that have been opened so far. However, the value is not important. What is important is that the assigned number can now be used by later functions to identify the data set.

•    The second statement searches the data set for the target variable VARIABLE_NAME. It extracts the variable number within the data set and assigns that number to &VAR_FOUND. If VARIABLE_NAME cannot be found, &VAR_FOUND receives a value of 0.

•    The third statement closes the data set.

Following these three statements, macro language can examine the value of &VAR_FOUND to ensure that it is greater than 0.

Why jump through these hoops when SAS has other ways to determine whether or not a variable exists? For example, PROC SQL can retrieve information about any existing variable (using DICTIONARY.COLUMNS) and any existing data set (using DICTIONARY.TABLES). There is a good reason to add the complexity, however. Macro language statements have an advantage over other approaches, such as PROC SQL. Macro language statements can execute at the beginning of a macro, even when that macro builds just the middle section of a DATA step. Refer to similar examples in Section 10.4, "Prefer the Macro Solution."

Some applications require unusual critical conditions. For example:

•    Does a CLASS variable ever take on a missing value?

Think through the possibilities for both conditions in the data and the objectives of the users. How sophisticated are the end users? What is the cost if a macro fails? But draw the line at some point, and rely on the user to enter intelligent values. The end product is a macro, not an off-the-shelf software package. Expect that an application will occasionally require debugging.

12.4 Portability

Programmers rarely think about portability. Must a macro work on multiple operating systems? Must it work in both batch and interactive applications?

Section 4.2 illustrates how a macro can capture the name of a program and create an output file with a matching name. But what happens when the macro executes interactively, and there is no such thing as the name of the program?

This section will not solve all of those issues. The details are too intricate and too specific to a given application. Still, it is worth mentioning a few tools and how they might help:

•    An automatic macro variable, &SYSPROCESSNAME, begins with the word “Program” for non-interactive programs.

•    Another automatic variable, &SYSSCPL, returns the operating system. That information might be needed in order to execute a system command.

•    Consider adding a parameter to name the output file for interactive applications but which can be left blank and calculated by non-interactive applications.

12.5 Complexity vs. Speed

Macros tend to get used repeatedly. When working with large data sets, this repetitive use brings efficiency into play. As an example, consider this program before encapsulating it into a macro:

proc summary data=my_data; 

     var amount;

     output out=_statistics_ (keep=mean_value) mean=mean_value;

run;

data my_data;

     if _n_=1 then set _statistics_;

     set my_data;

     if amount > mean_value then amount_c='Greater';

     else if amount = mean_value then amount_c='Equal';

     else amount_c='Less';

     drop mean_value;

run;

The program compares AMOUNT to its mean value to assign AMOUNT_C. In macro form, the program might look like this:

%macro COMPARE (varname);

   proc summary data=my_data; 

      var &varname;

      output out=_statistics_ (keep=mean_value) mean=mean_value;

   run;

   data my_data;

      if _n_=1 then set _statistics_;

      set my_data;

      if &varname > mean_value then &varname._c='Greater';

      else if &varname = mean_value then &varname._c='Equal';

      else &varname._c='Less';

      drop mean_value;

   run;

%mend COMPARE;

%COMPARE (amount)

In real life, the macro would set up parameters for the names of the input and output data sets. For this lesson, we can omit that. So where is the problem?

The day after the macro becomes available, an analyst calls the macro five times for the same data set:

%COMPARE (amount)

%COMPARE (salary)

%COMPARE (bonus)

%COMPARE (sales)

%COMPARE (profits)

Each macro call generates a PROC SUMMARY and a DATA step. To make matters worse, a junior programmer notices five macro calls and says, “Let me make your life easy. I’ll write a macro you can call once instead of five times.” The result:

%macro COMPARE2 (varlist);

   %local i next_varname;

   %do i=1 %to %sysfunc(countw(&varlist, %str( )));

       %let next_varname = %scan(&varlist, &i, %str( ));

       %COMPARE (&next_varname)

   %end;

%mend COMPARE2;

%COMPARE2 (amount salary bonus sales profits)

%COMPARE2 still generates five DATA and PROC steps, but now the problem is hidden. It’s the job of the senior programmer to anticipate this situation, to talk with the users about how they will use the macro. The original %COMPARE macro should have been written differently, using a more complex logic to generate two steps instead of ten. The generated program could take this form:

proc summary data=my_data; 

   var amount salary bonus sales profits;

   output out=_statistics_ (keep=mean_value_:)

          mean=mean_value_1 - mean_value_5;

run;

data my_data;

    if _n_=1 then set _statistics_;

    set my_data; 

    if amount > mean_value_1 then amount_c='Greater';

    else if amount = mean_value_1 then amount_c='Equal';

    else amount_c='Less'; 

    if salary > mean_value_2 then salary_c='Greater';

    else if salary = mean_value_2 then salary_c='Equal';

    else salary_c='Less'; 

    if bonus > mean_value_3 then bonus_c='Greater';

    else if bonus = mean_value_3 then bonus_c='Equal';

    else bonus_c='Less'; 

    if sales > mean_value_4 then sales_c='Greater';

    else if sales = mean_value_4 then sales_c='Equal';

    else sales_c='Less'; 

    if profits > mean_value_5 then profits_c='Greater';

    else if profits = mean_value_5 then profits_c='Equal';

    else profits_c='Less';

    drop mean_value_:;

run;

The macro that generates this code must be more complex, and it takes extra programming time to write. But it will run nearly five times faster. The new version might look like this:

%macro COMPARE (varlist);

   %local i next_varname n_varnames;

   %let n_varnames = %sysfunc(countw(&varlist, %str( )));

   proc summary data=my_data; 

      var &varlist;

      output out=_statistics_ (keep=mean_value_:)

             mean=mean_value_1 - mean_value_&n_varnames;

   run;

   data my_data;

      if _n_=1 then set _statistics_;

      set my_data;

      %do i=1 %to &n_varnames;

         %let next_varname = %scan(&varlist, &i, %str( ));

         if &next_varname > mean_value_&i then

            &next_varname._c='Greater';

         else if &next_varname = mean_value_&i then

            &next_varname._c='Equal';

         else &next_varname._c='Less';

      %end;

      drop mean_value_:;

   run;

%mend COMPARE;

The extra programming time and complexity lets the new version of %COMPARE run faster and on a list of variables instead of a single variable.

12.6 Miscellaneous Applications

This section examines two more applications where added complexity eases the burden on the end user. The first program generates a monthly report, but it requires maintenance:

proc means data=sales;

   class region; 

   var amount;

   title 'Sales for 2014-05';

   where ('01May2014'd <= sasdate <= '31May2014'd);

run; 

This program runs monthly and requires that the three sections in bold change each month. But with a little ingenuity (and a little macro language), the changes can be automated. Consider this approach instead:

%let datevar = &sysdate9;

data _null_;

   call symputx('month_begin', intnx('month',"&datevar"d,-1));

   call symputx('month_end', intnx('month',"&datevar"d,0)-1);  

   call symput('year_month',

               put(intnx('month',"&datevar"d,-1),yymmd7.));  

run;

%put &month_begin;    integer equivalent to 01May2014

%put &month_end;       integer equivalent to 31May2014

%put &year_month;      2014-05

proc means data=sales;

   class region; 

   var amount;

   title "Sales for &year_month";

   where (&month_begin <= sasdate <= &month_end);

run; 

The key is that all three changes can be derived from the current date. The INTNX function returns the first day of a specified time period. Therefore:

•    &MONTH_BEGIN is the first day of the month before the current date.

•    &MONTH_END is the last day of the month before the current date.

•    &YEAR_MONTH is the month before the current date, in YYYY-MM format.

No maintenance is required. Zero. All that is necessary is to run the program some time during the month in order to get the previous month’s report. As an added bonus, the macro version will never make a mistake when computing the last day of the month.

Technically, there is no need to copy &SYSDATE9 into &DATEVAR. The program could have used &SYSDATE9 any place that &DATEVAR appears. However, creating &DATEVAR can simplify the process in another way. Consider overriding the %LET statement with:

%let datevar = 14Feb2014;

A change to &DATEVAR automatically passes through to all the other macro variables. If the report ever needs to be backdated, the changes are now limited to the single %LET statement at the top of the program instead of taking place at multiple points in the body of the program.

In the final example, end users are actually going to request the added functionality that they desire. A macro is supposed to be called with a list of variables to process:

%mymac (var_list = lastname firstname q1 q2 q3 q4)

Along comes an end user wanting to be able to pass variable lists, something like this parameter value:

%mymac (var_list = lastname n_: q1-q4)

How easy or difficult is it to allow that? Well, within limits, it’s pretty simple. In many applications, the order of the variables doesn’t matter and capitalization of variable names doesn’t matter. In that case, PROC CONTENTS can parse the list and PROC SQL can capture all the individual variable names:

proc contents data=&in_data (keep=&var_list) noprint 

   out=_contents_ (keep=name);

run;

proc sql noprint;

  select trim(name) into : var_list separated by ' '

  from _contents_;

quit;

Because the software already has the ability to parse variable lists, let the software transform the variable lists into a set of variable names! With very little added complexity, your macros can now parse variable lists. But is this good enough? What if the order of the variables is important? Does SAS possess a way to parse a list of variables, output the variable names, and maintain their order? PROC TRANSPOSE can do it:

proc transpose data=&in_data (obs=0) out=transposed (keep=_NAME_);

   var &var_list;

run;

proc sql noprint;

  select trim(_name_) into : var_list separated by ' '

  from transposed;

quit;

There is no need to actually transpose the data values. Transposing the variable names will be sufficient, hence OBS=0 on the PROC statement. But the transposed variable names will appear in order in the output data set, in _NAME_. All that is left to do is to retrieve the names.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.46.78