Chapter 7: Macro Quoting

7.1 Why Quoting is Necessary

7.2 Why Quoting is a Nightmare

7.3 What Quoting Really Does

7.4 Peeking Inside the Black Box of Quoting

7.5 The Final Word on Quoting

Some of life’s pleasures are bittersweet. Sometimes you have to pretend to enjoy the meal that your child prepared. Sometimes your appearance in the mirror demands that you visit the gym. And sometimes you have to learn macro language quoting. It’s always nice to learn something new, but it’s not always pleasant.

7.1 Why Quoting is Necessary

All four statements below run into trouble.

%let feature = '&trait';

The intent: The macro variable should resolve despite being in single quotes. Actually, single quotes prevent all macro activity, including macro variable resolution.

%let heading = title "My Dog"; ;

The intent: The first semicolon should be part of the value of &HEADING, and the second should end the %LET statement. Actually, the first semicolon ends the %LET statement.

%filmclip (movie=The Good, the Bad and the Ugly)

The intent: The comma should be part of the value of &MOVIE. Actually, the comma ends the value assigned to &MOVIE, and the text that follows produces an error condition.

%let drink = A&W Root Beer;

The intent: The ampersand should be text, part of the value of &DRINK. Actually, SAS will search for a macro variable &W.

In all of these cases, macro language encounters characters that have special meaning, such as the comma, the semicolon, the ampersand, or the single quotes. To succeed, macro language needs to remove the special meaning of those characters, to treat those characters as text rather than as significant symbols.

The DATA step uses quotes for this purpose:

name =  Bob ;          Bob is symbolic, a variable name

name = 'Bob';           Bob is text

reaction = '&;*,-';     All five quoted characters are text

But macro language does not use quotes for this purpose:

%let name =  Bob ;    Assigns a three-character value

%let name = 'Bob';     Assigns a five-character value

Instead, macro language uses functions to treat normally symbolic characters as text:

%let feature = %str(%'&trait%'),

The result: The %STR function (in combination with using %’ to refer to single quotes) allows &TRAIT to resolve, while still surrounding it with single quotes.

%let heading = %str(title "My Dog";);

The result: The %STR function lets macro language treat the first semicolon as text, assigning it as part of the value of &HEADING.

%filmclip (movie=%str(The Good, the Bad and the Ugly))

The result: The %STR function lets macro language treat the comma as text, making it part of the value of &MOVIE.

%let drink = %nrstr(A&W Root Beer);

The result: The %NRSTR function lets macro language treat the ampersand as text, allowing &W to become the second and third characters of &DRINK.

The DATA step uses quotes to turn symbolic characters into text. Macro language uses functions instead. As a result, these macro functions are called quoting functions.

7.2 Why Quoting is a Nightmare

Quoting functions were developed over a period of years. Several times, programming situations revealed a need for additional functionality beyond what %STR and %NRSTR provide. Over time, all these features were addressed:

•    Quoting words, not just individual characters.

•    Execution time quoting vs. compilation time quoting.

•    Unquoting.

•    Special situations that just couldn't be handled.

Here are a couple of related examples. This first example illustrates the second bullet point, the difference between compilation time quoting and execution time quoting:

%macro idaho (state=);

    %if &state=ID %then %do;

        %put State is IDAHO.;

    %end;

%mend idaho;

%idaho (state=NY)

The macro works just fine until the day a user wants to process Oregon (state=OR) or Nebraska (state=NE). Either of these statements would generate an error:

%if OR=ID %then %do;

%if NE=ID %then %do;

Macro language considers both OR and NE (not equal) to be meaningful words, not text. Quoting &STATE can’t help because the logic of the macro requires &STATE to resolve. %STR and %NRSTR operate when a macro is compiled, not when it executes. What is needed is the ability to quote later, when a macro executes. So functions such as %BQUOTE and %NRBQUOTE were added to macro language to handle that.

The next situation illustrates the fourth bullet point. Consider this situation where the intent is to generate the equivalent of:

title 'My Favorite Candy:  M&Ms';

The wrinkle is that the name of the candy is stored in a macro variable:

%let ms = Mighty Stupid;

data _null_;

   call symput('candy', 'M&Ms'),

run;

title "My Favorite Candy:  &candy";

Using double quotes allows &CANDY to resolve. But double quotes cannot suppress the resolution of &MS. So the double-quoted title becomes:

title "My Favorite Candy:  MMighty Stupid";

Yet another quoting function handles the situation. When a macro executes, %SUPERQ masks every special character (as well as mnemonic operators such as OR and NE):

title "My Favorite Candy:  %superq(candy)";

More situations to handle = more quoting functions = more complications. The programming analogy might be:

•    Write a complex program.

•    Modify the program to handle unforeseen problems.

•    Repeat the second step multiple times.

With each iteration, the program gets messier and messier. By the time you are done, your program works. But you might be saying to yourself, "If I had known about all these issues when I started, I might have written this differently." That is the evolution of quoting, over time.

7.3 What Quoting Really Does

Technically, quoting functions change the symbolic characters, storing them using a different bit pattern. When it is time to turn generated text into SAS language statements, macro language magically figures out that it is time to change the characters back to their original bit pattern. As an example, consider this program:

data _null_;

   three_semicolons = ';;;';

   quoted_version = ";%str(;);";

   put three_semicolons $hex6.    3B3B3B

       quoted_version   $hex6.;       3B3B3B

run;

Even though one variable contains a quoted semicolon, the software unquotes it before the PUT statement writes it. By the time the DATA step executes, both variables contain the same characters. So how do we know that the quoting function does anything at all? Writing out a quoted character with a %PUT statement sheds some light:

%let quoted_semi = %str(;);

%put &quoted_semi;

%put _user_;

The first %PUT statement simply writes a semicolon, as if no quoting had occurred. But the second shows the impact of quoting. When writing user-created macro variables with automatic variables, such as _USER_, quoting remains in place. Instead of writing out a semicolon, the second %PUT statement writes out three unprintable characters. (Unprintable characters typically display as an empty box.) While all three are unprintable, internally they are actually different. Why three characters instead of one? %STR uses one unprintable character to show when quoting begins, a second one to hold the quoted semicolon, and a third to show when quoting ends.

There is other evidence that quoting really changes characters internally. For example, the software doesn't always figure out when it should unquote a character. Be alert for this combination of conditions:

•    It looks like nothing is wrong with the program, but

•    There is an error message, and

•    The program contains characters quoted by macro language.

This combination indicates that the software failed to unquote a quoted character. Sending a password to SQL is one case that typically encounters this issue. Here is the section of a SAS statement that specifies the password:

' " my_secret_password " '

When a macro variable holds the password, this combination would not work:

%let password=my_secret_password;

' " &password " '

The single quotes suppress macro activity, preventing the resolution of &PASSWORD. The %STR function helps by quoting the single quotes but not the ampersand. It allows &PASSWORD to resolve:

%str(%' " &password " %')

Yet the software generates an error message because it cannot convert the quoted characters back to their original form in time for SQL to properly parse the expression. Luckily, unquoting the entire expression converts the characters back to their proper form in time:

%unquote(%str(%' " &password " %'))

7.4 Peeking Inside the Black Box of Quoting

It's time for a little magic. Suppose we wanted to break the secret code of macro language quoting, and figure out which bit patterns SAS uses to store quoted characters. Could we do that? Could we even approach the problem? The rest of this section will trick the software into revealing its secrets.

Will this be interesting? It depends on your point of view. It's decidedly boring to determine which bit patterns quoting uses. But it's much more interesting if you think of it this way. Let's open up the black box of macro language quoting to explore parts of the process that we were never meant to see.

Here is one plan to pry open the quoting process:

•    Transfer each character to a macro variable.

•    Quote the macro variable.

•    Print the quoted version.

As the example with three semicolons demonstrates, this plan faces serious obstacles. The software is built to unquote characters before we can examine them. Here is one more attempt that still fails:

data _null_;

   semicolon = ';';

   put semicolon $hex2.;               3B

   call symput ('macrovar', semicolon);

   call execute('%let macrovar = %superq(macrovar);'),

   quoted_semicolon=symget('macrovar'),

   put quoted_semicolon $hex2.;   3B

run;

As noted in Section 3.4, when CALL EXECUTE generates macro language statements, those statements execute immediately. So in executing the DATA step, the software runs CALL SYMPUT, then executes the %LET statement, and then continues the DATA step, using SYMGET to assign a value to QUOTED_SEMICOLON. %SUPERQ quotes any and all special characters, including a semicolon. But as SYMGET retrieves &MACROVAR, it unquotes the semicolon before we can examine it. Can we conjure up some more powerful magic to overcome this feature of SYMGET?

The strength of the software also proves to be its undoing, forcing it to reveal its quoting secrets. We will approach the problem from the opposite direction. For each character:

•    Display it in hex format.

•    Transfer it to a macro variable.

•    Unquote the macro variable.

•    Display the unquoted value, both as a character and in hex format.

In a nutshell, the software does an excellent job of unquoting characters. So let it unquote every character, and we can examine which ones change. Using hindsight, here are the results for one selected character:

data _null_;

   character_in = '0E'x; 

   call symput ('macrovar', character_in));

   length character_out $ 1;

   character_out = symget('macrovar'),

   hexcode_out = put(character_out, $hex2.); 

   put hexcode_out             3B

       character_out;             ;

run;

This program starts with hex code 0E, and it demonstrates that this is the bit pattern that macro language uses to hold a quoted semicolon. The key steps include:

•    CALL SYMPUT copies into a macro variable the character represented by hex code 0E.

•    SYMGET retrieves that macro variable, automatically unquotes it, and stores the result as the DATA step variable CHARACTER_OUT.

•    The PUT statement reveals that the unquoted character is a semicolon, with hex code 3B instead of 0E.

In short, this program demonstrates that unquoting hex code 0E turns it into a semicolon. But why start with 0E? How did we know it would turn into a semicolon? The answer is to write a program to find all characters affected by unquoting. For example:

data _null_;

   file print notitles;

   put '***************************************************';

   do _i_=0 to 255;

      hexcode_in = put(_i_, hex2.);

      call symput('macrovar', input(hexcode_in, $hex2.));

      length hexcode_out $ 2;

      hexcode_out = put(symget('macrovar'), $hex2.);

      if hexcode_in ne hexcode_out then do;

         print_char = input(hexcode_out, $hex2.);

         put hexcode_in= hexcode_out= print_char=;

      end;

   end;

   put '***************************************************';

run;

This program transfers every possible character to a macro variable, unquotes it, and checks whether unquoting changes the value. Here is the list of those that change, for one operating system:

***************************************************

hexcode_in=01 hexcode_out=20 print_char=

hexcode_in=02 hexcode_out=20 print_char=

hexcode_in=03 hexcode_out=20 print_char=

hexcode_in=04 hexcode_out=20 print_char=

hexcode_in=05 hexcode_out=20 print_char=

hexcode_in=06 hexcode_out=20 print_char=

hexcode_in=07 hexcode_out=20 print_char=

hexcode_in=08 hexcode_out=20 print_char=

hexcode_in=0B hexcode_out=5E print_char=^

hexcode_in=0E hexcode_out=3B print_char=;

hexcode_in=0F hexcode_out=26 print_char=&

hexcode_in=10 hexcode_out=25 print_char=%

hexcode_in=11 hexcode_out=27 print_char='

hexcode_in=12 hexcode_out=22 print_char="

hexcode_in=13 hexcode_out=28 print_char=(

hexcode_in=14 hexcode_out=29 print_char=)

hexcode_in=15 hexcode_out=2B print_char=+

hexcode_in=16 hexcode_out=2D print_char=-

hexcode_in=17 hexcode_out=2A print_char=*

hexcode_in=18 hexcode_out=2F print_char=/

hexcode_in=19 hexcode_out=3C print_char=<

hexcode_in=1A hexcode_out=3E print_char=>

hexcode_in=1C hexcode_out=3D print_char==

hexcode_in=1D hexcode_out=7C print_char=|

hexcode_in=1E hexcode_out=2C print_char=,

hexcode_in=1F hexcode_out=7E print_char=~

hexcode_in=7F hexcode_out=23 print_char=#

***************************************************

The values of PRINT_CHAR might look vaguely familiar. They form the list of all characters affected by quoting. Also note how most of the hex codes are sequential. But the last line skips from hex code 1F to hex code 7F. That is no accident. Macro language added the pound sign as a special character many years after the others on the list, when SAS 9.3 introduced it as the equivalent of a macro language IN operator.

There is no guarantee that these mappings apply across operating systems or even across releases of the software for the same operating system. But a guarantee is unnecessary … just rerun the program.

7.5 The Final Word on Quoting

Quoting is as much mystical as it is intuitive. You may (or may not) be able to predict the results of these tests:

%let food=fruit;

%let fruit=apple;

%let t1 = %nrbquote(&&&food);        apple

%let t2 = %str(&&)&food;                  &fruit

%let t3 = &t2;                                      &fruit

%let t4 = %unquote(&t2);                   apple

%let ampersand = &&;                        &

%let t5 = &ampersand.fruit;               apple

%let t6 = &&%nrbquote(&food);       &fruit

%let t7 = &t6;                                      &fruit

%let t8 = &&%str(fruit);                     &fruit

%let t9 = &&fruit;                                apple

It is always important to test your code. Test it more when it involves macro quoting. In addition to the %PUT _USER_ statement mentioned earlier, the SYMBOLGEN option can also help. It displays unquoted values, but it also mentions on the log when macro variables have been unquoted for printing purposes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.6.243