Chapter 10: Generating Text

10.1 Utilizing Generated Text

10.2 Counting Words in a String

10.3 Working with Lists

10.4 Prefer the Macro Solution

When macro language generates a word, the software figures out what to do with that word. The software might incorporate the word into a macro language statement, or it might embed the word as part of a SAS language statement. While that sounds simple, it leads to some interesting coding techniques.

10.1 Utilizing Generated Text

Here is some legitimate code that could appear within a macro definition:

data   %do quarter=1 %to 4;

                   q&quarter

          %end;

          ;

Each word, whether hard-coded or generated by macro language, gets incorporated into the SAS program, producing:

data q1 q2 q3 q4;

Notice how:

•    A macro variable can resolve into as little as part of a word within a SAS program.

•    Multiple semicolons might seem confusing, but there is a trick to interpreting them. Match each semicolon to the macro language statement that it completes. Any remaining semicolons become generated text.

Next, reconsider the %BUBBLY macro from Chapter 9. The essence of the code was this:

%macro bubbly (macvar);

   %** Code to break up &macvar into 4 words,     ;

   %** then reorder the words.                                 ;

   %let &macvar=&word1 &word2 &word3 &word4;

%mend bubbly;

As originally written, this macro replaces the incoming macro variable with the same words but in a different order. But what if the objective were slightly different, such as:

•    Write the reordered words with a %PUT statement.

•    Assign the reordered words to a different macro variable instead of replacing the original.

To add this sort of flexibility, a cumbersome approach would add parameters to the macro definition:

•    A yes/no parameter to control whether a %PUT statement should write the reordered words.

•    The name of another macro variable that should hold the reordered words. (If this parameter is left blank, replace the original incoming macro variable.)

Even with these added complexities, the macro might require modification down the road if additional objectives were added. A better approach would simplify the macro by changing the last line:

%macro bubbly (macvar);

   %** Code to break up &macvar into 4 words,   ;

   %** then reorder the words                                ;

   &word1 &word2 &word3 &word4

%mend bubbly;

This version looks strange. It dumps four words into the middle of a program. Oddly enough, this approach adds the flexibility needed to accomplish any of the objectives. For example, this macro call replaces &LIST with four words in a new order:

%let list = %bubbly (list);

When the macro executes, it generates four words in a new order, producing:

%let list = &word1 &word2 &word3 &word4;

Similarly, this statement assigns those four words as the value of a new macro variable:

%let new_macro_variable = %bubbly (list);

And this statement writes the four words in their new order:

%put %bubbly (list);

This statement puts the four reordered words into a DATA step array:

array _4_words {4} %bubbly (list);

In every case, executing the macro generates four words as text. The four words get added to the middle of a program, completing either a macro language statement or a SAS language statement. It is not necessary for the macro to define how those four words are going to be used.

Generating text broadens a macro's applicability, without adding parameters to its definition. Let's examine a few useful examples.

10.2 Counting Words in a String

When a macro parameter contains a series of variable names, it helps to know how many names are in the list. Clearly, macro language can count them:

%macro countem (list_of_words);

     %local i;

     %let i=0;

     %if %length(&list_of_words) %then 

     %do %until (%scan(&list_of_words, &i+1, %str( )) = );

         %let i=%eval(&i + 1); 

     %end;

%mend countem;

The %STR function defines blanks as the only possible delimiter for the %SCAN function. Because most lists in SAS programs are lists of variables or data sets, other delimiters would hardly ever be needed. When this macro runs, the final value of &I is the number of words encountered. But a %LOCAL variable won’t be useful once the macro finishes executing. How can the program utilize the final value of &I? The best way is to generate it as text. Add one more line to the macro definition, just before the %MEND statement:

%macro countem (list_of_words);

     %local i;

     %let i=0;

     %if %length(&list_of_words) %then

     %do %until (%scan(&list_of_words, &i+1, %str( )) = );

         %let i=%eval(&i + 1);

     %end;

     &i

%mend countem;

Once the final &I resolves into text, that text gets added to the program as part of a macro statement or as part of a SAS language statement. All of these statements could utilize that generated text:

%let n_words = %countem (&var_list);

array vars {%countem (&var_list)} &var_list;

%do i=1 %to %countem (&var_list);

Other methods exist to count words. For example, this DATA step formula works in most cases:

if string=' ' then n_words=0;

else n_words = 1 + length(compbl(string)) - length(compress(string));

The COMPBL function replaces multiple, consecutive blanks with a single blank. The COMPRESS function removes all blanks. Macro language can utilize these functions by applying %SYSFUNC:

%if %length(&string)=0 %then %let n_words=0;

%else %let n_words = %eval( 1 + %length(%sysfunc(compbl(&string))) -

                                      %length(%sysfunc(compress(&string))));

In practice, this formula usually works. However, it over counts by 1 if the incoming string contains any leading blanks. Left-hand-justifying the incoming string is safer (although it adds to the complexity of the code).

Advances in the software have actually made this macro obsolete. This expression counts the number of words in a string, without defining a macro:

%sysfunc(countw(&string, %str( )))

SAS language uses COUNTW to count the number of words in a string, and %SYSFUNC allows macro language to invoke the SAS language function. It should be noted that earlier releases of the software can encounter trouble with this syntax when &STRING is null. To counteract that, it can be helpful to add %SUPERQ:

%sysfunc(countw(%superq(string), %str( )))

Let's move on to another example of generating text.

10.3 Working with Lists

Once again, a macro parameter contains a list of variable names. When the list is long (or perhaps when concatenating two or more lists), it might be burdensome for the user to check that there are no duplicates on the list. So a macro is designed to remove duplicate words from the list:

%macro dedup (word_list);

      %local i next_word deduped_list;

      %if %length(&word_list) %then

      %do i=1 %to %sysfunc(countw(&word_list, %str( )));

          %let next_word = %scan(&word_list, &i, %str( ));

          %if %index(%str( &deduped_list ), %str( &next_word ))=0

          %then %let deduped_list = &deduped_list &next_word;

      %end;

%mend dedup;

The macro examines every word in the incoming list. Any word that has not been found before gets added to &DEDUPED_LIST. The %STR function adds leading and trailing blanks within the %INDEX function. That’s necessary because the %INDEX function searches for strings, not words. Without those extra blanks, the %INDEX function would, for example, never add var1 to a list that already contains var10.

While the final value of &DEDUPED_LIST contains the proper list of words, the macro still has to make that final list available to the program. Once again, the best solution is to generate that list as text:

%macro dedup (word_list);

      %local i next_word deduped_list;

      %if %length(&word_list) %then 

      %do i=1 %to %sysfunc(countw(&word_list, %str( )));

          %let next_word = %scan(&word_list, &i, %str( ));

          %if %index(%str( &deduped_list ), %str( &next_word ))=0

          %then %let deduped_list = &deduped_list &next_word;

      %end;

      &deduped_list

%mend dedup;

Generating text as the output increases flexibility without adding complexity. Two sample applications:

%let var_list = %dedup(&var_list);

%do i=1 %to %sysfunc(countw(%dedup(&list1 &list2 &list3), %str( )));

In its current form, %DEDUP treats var1 and VAR1 as two different words. Yet most applications will involve lists of variable names or data set names, where capitalization should not matter. The macro would be more valuable with a second parameter:

%macro dedup (word_list, case_matters=N);

If programming standards permit changing user-entered parameters, a one-line addition would satisfy the requirements:

%if &case_matters=N %then %let word_list = %upcase(&word_list);

The rest of the macro would remain unchanged. However, if programming standards discourage changing a user-entered parameter, a more complex enhancement would be needed. The macro could apply the existing logic when &case_matters=Y but would otherwise use:

%if &case_matters=N %then %do;

    %if %index(%str( %upcase(&deduped_list) ),

               %str( %upcase(&next_word) ))=0

    %then %let deduped_list = &deduped_list &next_word;

%end;

Perhaps a simpler alternative would process a new macro variable:

%let new_macro_variable = %upcase(&word_list);

In any event, similar macros could combine the same tools in slightly different ways to generate a variety of results:

•    Find (and return) the overlap between two lists.

•    Verify that every word in one list also appears in another list.

For example, this macro returns the overlapping words that appear in both of two lists:

%macro overlap (list1=, list2=);

    %local i next_word overlapping_words;

    %if %length(&list1) %then 

    %do i=1 %to %sysfunc(countw(&list1, %str( )));

        %let next_word = %scan(&list1, &i, %str( ));

        %if %index(%str( &list2 ), %str( &next_word )) %then

        %let overlapping_words = &overlapping_words &next_word;

    %end;

    &overlapping_words

%mend overlap;

In its current form, the macro accumulates overlapping words into a macro variable (&OVERLAPPING_WORDS) and then generates the final list as text. However, for this purpose, it would be simpler to generate each word separately, without accumulating the overlapping words into a list:

%macro overlap (list1=, list2=);

    %local i next_word;

    %if %length(&list1) %then %do i=1 %to %countem(&list1);

        %let next_word = %scan(&list1, &i, %str( ));

        %if %index(%str( &list2 ), %str( &next_word )) %then

        &next_word;

    %end;

%mend overlap;

For this particular application, macro language can generate text one word at a time.

10.4 Prefer the Macro Solution

SAS often provides many feasible approaches to a programming problem. If the choices include a 100% macro-based approach, there may be text-generating advantages. Consider this example, where the objective is to capture the number of observations in a SAS data set. Here is a simple approach:

data _null_; 

   call symputx('n_obs', how_many);

   stop;

   set incoming.dataset nobs=how_many;

run;

Note these features:

•    Macro language performs very little of the work.

•    The SET statement option NOBS= creates HOW_MANY during the compilation phase of the DATA step. CALL SYMPUT can retrieve that value as soon as the execution phase begins.

•    Theoretically, the result could be incorrect. The observation count includes observations that have been marked for deletion, but have not yet been physically removed from the data set.

A different approach can account for the deleted observations:

proc sql noprint;

   select nobs - delobs into : n_obs

          from dictionary.tables

          where libname='INCOMING' and memname='DATASET';

quit;

PROC SQL can access both necessary pieces of information: the total number of observations (NOBS) and the number of deleted observations (DELOBS).

A purely macro-based approach, while more complex, can also account for deleted observations:

%let dsn_id = %sysfunc(open(incoming.dataset));

%let n_obs  = %sysfunc(attrn(&dsn_id, nlobs));

%let dsn_id = %sysfunc(close(&dsn_id));

The data set attribute NLOBS is the number of “logical” observations: the total number that exist minus those that have been marked for deletion.

Any of these programming approaches could be encapsulated in a macro. Using the first approach, the macro would become:

%macro how_many_obs (dsn);

   %global n_obs;

   data _null_; 

      call symputx('n_obs', how_many);

      stop;

      set &dsn nobs=how_many;

   run;

%mend how_many_obs;

However, only the purely macro-based solution can successfully generate text as the outcome:

%macro pure_macro_based_solution (dsn);

   %local dsn_id n_obs;

   %let dsn_id = %sysfunc(open(&dsn));

   %let n_obs  = %sysfunc(attrn(&dsn_id, nlobs));

   %let dsn_id = %sysfunc(close(&dsn_id));

   &n_obs

%mend pure_macro_based_solution;

All of these statements can make use of the generated text:

%let any_var_i_choose = %pure_macro_based_solution (incoming.dataset);

%if %pure_macro_based_solution (incoming.dataset) > 2000 %then %do;

do i=1 to %pure_macro_based_solution (incoming.dataset);

However, other approaches fail when encapsulated as a text-generating macro. This variation tries to generate text with the DATA step approach:

%macro how_many_obs (dsn);

   data _null_; 

      call symputx('n_obs', how_many);

      stop;

      set &dsn nobs=how_many;

   run;

   &n_obs

%mend how_many_obs;

%let any_var_i_choose = %how_many_obs (incoming.dataset);

This generates a strange, non-working result:

%let any_var_i_choose = data _null_; 

call symputx('n_obs', how_many);

stop;

set &dsn nobs=how_many;

run;

&n_obs

After assigning an unusual value for &ANY_VAR_I_CHOOSE, the program then generates three DATA step statements that appear outside of a DATA step. Clearly, these statements won’t do the job. Only the pure macro solution generates text in a useful, flexible fashion.

Even when text generation is not a requirement, a pure macro-based approach has advantages. Consider this scenario:

•    %A constructs the beginning of a DATA step.

•    %B constructs the middle of that DATA step.

•    %C constructs the ending of that DATA step.

Under these circumstances, any of these macros could invoke the pure macro version of %HOW_MANY_OBS. But any definition of %HOW_MANY_OBS that includes SAS language statements would fail. For example, think about the result if %B were to invoke a version of %HOW_MANY_OBS that generates SAS language statements. Those SAS language statements would unfortunately be inserted into the DATA step that begins with %A and ends with %C.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.188.121