Chapter 3: CALL EXECUTE

3.1 Basic Rules

3.2 Achieving the Impossible

3.3 Multiple CALL EXECUTEs

3.4 Finally, the Intricacies

3.4.1 Programming Challenge #2

3.4.2 Solution

3.5 Execute an Experiment

3.6 The Final Intricacy: Macro Variable Resolution

The CALL EXECUTE statement toils in relative obscurity. Despite its power, and despite the fact that it has been available for many years, relatively few programmers are familiar with it. As a result, this section presents some of the basics as well as the intricacies.

3.1 Basic Rules

CALL EXECUTE is a DATA step statement that means: “Run this code.” Here is an overly simple example:

data sales;

     call execute ('proc print data=sales; run;'),

     amount=5;

run;

Even though CALL EXECUTE asks for a PROC PRINT to run, it is impossible to run a PROC PRINT in the middle of a DATA step. So SAS holds that code aside. PROC PRINT runs once the DATA step completes, just as if the program were:

data sales;

     amount=5;

run;

proc print data=sales;

run;

These basic rules govern code generated with CALL EXECUTE:

•    The statements run as soon as possible.

•    The statements can be data-driven. The expression inside parentheses can include reference to DATA step variables, not just quoted strings.

•    A single DATA step can include multiple CALL EXECUTE statements. Any generated SAS code just stacks up, word by word, to run once the DATA step is over.

•    Always include the RUN statement at the end of the DATA step.

While we’ll add some more complex rules shortly, just these simple rules can produce magical results.

3.2 Achieving the Impossible

The first bit of magic revisits that “impossible” task of combining DATA and PROC steps:

data _null_;

   set sales end=nomore;

   total + amount;

   if nomore;

   if (total < 1000000) then do;

      proc means data=sales;

         class state;

         var amount;

         title "Sales by State";

      run;

   end;

   else do;

      proc means data=sales;

         class state year;

         var amount;

         title "Sales by State and Year";

      run;

   end;

run;

The intent is that the DATA step should calculate the total of all the AMOUNTs and use that to determine which version of PROC MEANS should run. Of course, there are numerous reasons why this code will fail. Mixing DATA and PROC steps generates errors, and the presence of three RUN statements will surely cause problems. But CALL EXECUTE easily overcomes these obstacles:

data _null_;

   set sales end=nomore;

     total + amount;

     if nomore;

     if (total < 1000000) then 

     call execute('

        proc means data=sales;

               class state;

               var amount;

               title "Sales by State";

        run;

   '),

   else call execute('

        proc means data=sales;

               class state year;

               var amount;

               title "Sales by State and Year";

        run;

   '),

run;

The revised DATA step replaces DO groups with CALL EXECUTE statements. As planned, the DATA step adds up all the AMOUNTs and then uses CALL EXECUTE to determine which version of PROC MEANS executes once the DATA step is over. And the program is nearly identical to the failing version. This example illustrates one of the basic times CALL EXECUTE comes in handy: when the program must integrate DATA step variables into the subsequent PROC step.

Another impossible task: %IF %THEN statements cannot appear in open code. These statements are legal but only within a macro definition:

%if &city = Boston %then %do;

    proc means data=beantown;

       var pahk youh caah;

    run;

%end;

%else %if &city = New York %then %do;

    proc means data=big_apple;

       var a nominal egg;

    run;

%end;

The only purpose of adding %MACRO and %MEND statements in this program would be to permit the use of %IF %THEN statements. Could CALL EXECUTE help here? It eliminates the need to define a macro by substituting DATA step IF THEN statements for macro %IF %THEN statements:

data _null_;

   if "&city" = "Boston" then call execute('

      proc means data=beantown;

         var pahk youh caah;

      run;

   '),

   else if "&city" = "New York" then call execute('

      proc means data=big_apple;

         var a nominal egg;

      run;

   '),

run;

True, the results will be different if &CITY contains leading blanks. Correcting that is easy enough, however. Better programs will, as a rule, handle rare but foreseeable circumstances. Section 4.4 presents another workaround that gets results equivalent to using %IF %THEN without defining a macro.

3.3 Multiple CALL EXECUTEs

Multiple CALL EXECUTE statements can build a more sophisticated program. The incoming SAS data set named CUTOFFS illustrates this point. Assume that the observations are sorted as shown, from highest to lowest SAT_CUTOFF:

Obs

SAT_cutoff

Group_name

1

1200

Honor students

2

900

Regular track

3

300

Challenged

The DATA step must use CUTOFFS to construct the following program, extracting the data values in bold:

data student_groups;

   set all_students;

   length group $ 14;

   if score >= 1200 then group='Honor students';

   else if score >= 900 then group='Regular track';

   else if score >= 300 then group='Challenged';

run;

CALL EXECUTE does the job, using neither a macro nor any macro variables. (As usual, the spacing and indentation are for readability only and have no impact on the program's results.)

data _null_;

   set cutoffs end=lastone;

   if _n_=1 then call execute("data student_groups;

                                  set all_students;    

                                  length group $ 14;");

   call execute("if score >= ");

   call execute(put(SAT_cutoff, best16.));

   call execute(" then group = '");

   call execute(group_name || "';");

   if lastone=0 then call execute("else ");

   else call execute("run;");

run;

The first CALL EXECUTE runs just once, generating the DATA, SET, and LENGTH statements. The next set of four CALL EXECUTE statements runs for each observation in CUTOFFS. They generate an IF THEN statement (without the word ELSE). The next-to-last CALL EXECUTE runs for each observation except the last one, adding ELSE before a subsequent IF THEN statement. The final CALL EXECUTE runs just once, adding a RUN statement.

Although beauty is in the eye of the beholder, it would have been possible to combine the set of four CALL EXECUTE statements into a single statement:

call execute("if score >= " || put(SAT_cutoff, best16.)

              || " then group = '" || group_name || "';");

This program illustrates a few features of a realistic application:

•    Multiple CALL EXECUTE statements each generate separate sections of the program.

•    The arguments to CALL EXECUTE contain complex expressions.

•    Data values contribute to the generated SAS code.

3.4 Finally, the Intricacies

If those are the basics, what are the intricacies? One intricacy is that the software must have already interpreted the entire DATA step in order to run. So CALL EXECUTE cannot impact the current DATA step in any way. Neither of these statements would impact the running DATA step:

call execute ('pi=3.14;'),

call execute ('options obs=10;'),

Another intricacy involves the need for a RUN statement at the end of the DATA step. When a RUN statement ends the DATA step, it signals that the DATA step is complete and should run immediately. The software stops parsing and starts executing. But what would happen without a RUN statement?

data _null_;

     call execute ('proc print data=sales;'),

     var state amount;

run;

Should the software ignore the missing RUN statement and generate PROC PRINT before the VAR statement?

data _null_;

     call execute ('var state amount;'),

proc print data=sales;

run;

Should the software parse the complete PROC PRINT statement and then allow CALL EXECUTE to add the VAR statement?

data _null_;

     call execute ('data=sales;'),

     proc print

     var state amount;

run;

Should the software parse just the two words PROC and PRINT and allow CALL EXECUTE to complete the PROC statement? The real answer is: “Don’t do it.” CALL EXECUTE without a RUN statement is improper, unsupported syntax. It is even possible that the results would be inconsistent from one release of the software to the next.

The final complexity is that macro statements execute immediately, when generated by CALL EXECUTE. They do not stack up, waiting for the DATA step to finish. So let’s revisit that idea that CALL EXECUTE cannot change the currently executing DATA step. By using SYMGET, it is actually possible to work around that limitation:

%let pet=CAT;

data test;

     call execute ('%let pet=DOG;'),

     animal = symget('pet'),

     put animal;  DOG

run;

Before the DATA step executes, &PET is CAT. When CALL EXECUTE generates a %LET statement, that statement runs immediately, changing &PET to DOG. So ANIMAL receives the value of DOG. Just for the record, SYMGET defines ANIMAL as having a length of 200. When you are creating a DATA step variable using SYMGET, define that variable with a LENGTH statement first.

Notice how this feature of CALL EXECUTE, executing macro language statements immediately, can be useful. Normally, it is impossible for a DATA step to conditionally execute a %LET statement. Consider this DATA step as an example:

%let pet=CAT;

data _null_;

     if 5>4 then do;

        %let pet=DOG;

     end;

     if 5=4 then do;

        %let pet=RAT;

     end; 

     animal = symget('pet'),

     put animal;  RAT

run;

Both %LET statements run during the compilation phase of the DATA step. Neither is part of the DATA step, and neither is affected by whether a DATA step IF THEN condition is true or false. Simply put, the DATA step cannot conditionally execute %LET statements, unless you utilize CALL EXECUTE:

%let pet=CAT;

data _null_;

     if 5>4 then do;

        call execute ('%let pet=DOG;'),

     end;

     if 5=4 then do;

        call execute ('%let pet=RAT;'),

     end; 

     animal = symget('pet'),

     put animal;  DOG

run;

The DATA step can control whether CALL EXECUTE runs, and CALL EXECUTE can control the generation of a %LET statement. So with a little extra work, DATA steps can conditionally execute macro language statements such as %LET.

3.4.1 Programming Challenge #2

If all this seems easy, it’s time to test yourself. Write a macro such that these programs generate different results:

%mymac data _null_;

   call execute('%mymac'),

run;

Hint: The key feature that creates the difference between the programs is timing. Any macro language statements generated by CALL EXECUTE will run immediately. But any SAS language statements must stack up and wait until the current DATA step finishes.

3.4.2 Solution

A simple solution illustrates how timing makes the difference. Here is one possible definition of %MYMAC:

%macro mymac;

     %let color=blue;

     %put Color began as &color..;

     data _null_;

        call symput('color', 'red'),

     run;

     %put Color ends up as &color..;

%mend mymac; 

When the %MYMAC statement executes the macro, all its statements execute in order. The messages are:

Color began as blue.

Color ends up as red.

When CALL EXECUTE invokes the macro, however, the statements execute in a different order. All macro statements execute immediately:

   %let color=blue;

   %put Color began as &color..;

   %put Color ends up as &color..;

But the DATA step statements (notably CALL SYMPUT) have to stack up and wait to execute. Therefore, the messages are:

Color began as blue.

Color ends up as blue.

3.5 Execute an Experiment

In both theory and practice, macro language statements generated by CALL EXECUTE run immediately. But there must be times when we would like to change that and have macro language statements wait until the current DATA step is over. Let's take an artificially simple example:

%let value = BEFORE;

data _null_;

   call execute('%let value = AFTER;'),

   data_step_var = symget('value'),

   put data_step_var;          AFTER

run;

%put Value is &value..;     Value is AFTER.

This program shows how the %LET statement generated by CALL EXECUTE runs immediately, without stacking up until the DATA step is over. SYMGET retrieves AFTER, the &VALUE assigned by the second %LET statement. Could we possibly change that? Could we find a way to force all statements generated by CALL EXECUTE, including macro language statements, to wait until the current DATA step finishes? In this program, the objective would be that SYMGET retrieves BEFORE. The %LET statement executes once the DATA step completes, so that the PUT statement writes BEFORE yet the %PUT statement writes AFTER.

Here are some failing experiments. Each one takes the same DATA step but replaces the CALL EXECUTE statement.

First, try to fool the macro processor by separating the % from the rest of the generated code:

call execute ('%' || 'let value = AFTER;'),

Next, try to fool the macro processor into thinking a nonexistent macro is being called (%L):

call execute ('%L' || 'et value = AFTER;'),

Next, try to bury the %LET statement, hiding it in a subsequent DATA step:

call execute 

         ("data _null_; call execute('%let value = AFTER;'), run;");

None of these work. In every case, the %LET statement executes immediately, and DATA_STEP_VAR is assigned the value AFTER. However, combining the first and third attempts does work! The key steps:

•    Split the %LET statement into pieces, so it is not easily recognizable.

•    At the same time, bury the %LET statement inside a subsequently executing DATA step.

By itself, the CALL EXECUTE statement becomes:

call execute ("data _null_; call execute('%" ||

              "let value = AFTER;'), run;");

In context, the full program looks like this:

%let value = BEFORE;

data _null_;

   call execute ("data _null_; call execute('%" ||

              "let value = AFTER;'), run;");

   data_step_var = symget('value'),

   put data_step_var;          BEFORE

run;

%put Value is &value..;    Value is AFTER.

Somehow, splitting the %LET statement hides it from the macro processor while the initial DATA step executes. The %LET statement doesn’t execute until it is generated by the second DATA step.

These complexities are unnecessary, however. The software does contain the right tool for the job, a tool that temporarily hides from the macro processor the fact that a macro statement appears. By design, this variation successfully delays the execution of %LET:

%let value=BEFORE;

data _null_;

   call execute('%nrstr(%let value=AFTER;)'),

   data_step_var = symget('value'),

   put data_step_var;            BEFORE

run;

%put Value is &value..;       Value is AFTER.

The %NRSTR function is designed to mask macro references until the statement executes. That is enough to hide the %LET statement, delaying its execution until the DATA step is over. Chapter 7 will explore macro quoting functions in more detail.

3.6 The Final Intricacy: Macro Variable Resolution

Single quotes prevent macro variable resolution (as well as all other macro language activity). Therefore, the choice of single vs. double quotes makes a difference with combinations of CALL EXECUTE and CALL SYMPUT. Consider the following test:

%let value = BEFORE;

data _null_; 

   call symput('value', 'AFTER'), 

   call execute("%put Double Quotes:  value is &VALUE..;");

   call execute('%put Single Quotes:  value is &VALUE..;'),

run;

Both %PUT statements execute immediately. Double quotes permit resolution of &VALUE during the initial phase of the DATA step when statements are interpreted and checked for syntax errors. The first CALL EXECUTE generates:

Double Quotes:  value is BEFORE.

However, single quotes prevent resolution of &VALUE until later on, once the DATA step begins to execute. CALL EXECUTE still generates a %PUT statement, but that %PUT statement contains an unresolved &VALUE. As the DATA step executes, it replaces &VALUE with AFTER. When CALL EXECUTE generates its %PUT statement, &VALUE contains AFTER:

Single Quotes:  value is AFTER.

The same principle would apply if CALL EXECUTE were to generate SAS language statements, rather than macro language statements:

%let value = BEFORE;

data _null_; 

   call symput('value', 'AFTER'), 

   call execute("proc print data=&VALUE.; run;");

   call execute('proc print data=&VALUE.; run;'),

run;

The first CALL EXECUTE uses double quotes, so &VALUE resolves early:

proc print data=BEFORE; run;

But the second uses single quotes, so &VALUE resolves later:

proc print data=AFTER; run;

When CALL EXECUTE is the right tool for the job, it often simplifies a program dramatically. Based on your own experience, see if you can find an application that would benefit from using it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.239.226