Chapter 4: %SYSFUNC

4.1 Basic Examples

4.2 Capturing the Program Name

4.3 Commas and Nested %SYSFUNCs

4.4 Achieving the Impossible, Revisited

4.5 Capturing Option Settings

4.6 Efficiency Considerations

4.7 A Final Example: ZIP Codes

4.7.1 Programming Challenge #3

4.7.2 Solution

The DATA step supports hundreds of functions. But the macro language contains a handful of functions, such as %SCAN and %SUBSTR. %SYSFUNC bridges the gap, permitting the macro language to utilize virtually all DATA step functions. (Technically, it can also utilize functions written using SAS/TOOLKIT software and PROC FCMP.)

Programmers apply functions in many ways, for many purposes. As a result, %SYSFUNC examples cover a broad range of applications. To the extent possible, this chapter focuses on typical, widely applicable examples.

4.1 Basic Examples

This first example lets macro language invoke the TODAY() function. %SYSFUNC’s second parameter (mmddyy10) specifies the format for expressing the result:

%let currdate = %sysfunc(today(), mmddyy10);

If this code were to run on February 15, 2014, the result would be 02/15/2014. Even such a simple example has realistic applications. For example, consider this scenario:

•    A program runs periodically, creating a permanent output data set each time it runs.

•    All output data sets must be preserved, rather than reusing the same data set name each time.

•    A convenient way to save the data sets would be to include the creation date as part of the data set name. Better yet, do it in a way where the alphabetized list of names happens to be in date order.

Half of one SAS statement achieves this result in its entirety:

data perm.dataset_%sysfunc(today(), yymmddn8);

The “n” in yymmddn8 requests that there are “no” separators between the year, month, and day. So if this code were to run on February 15, 2014, it would generate:

data perm.dataset_20140215;

As long as the program never runs twice in the same day, each data set name will be unique. And by adding year/month/day as part of the name, the alphabetized list is also in date order.

%SYSFUNC can utilize nearly all DATA step functions. The most notable exception is that the PUT function is not supported. Instead, switch to either PUTN or PUTC. For example, this loop attempts to create three local macro variables:

%do i=001 %to 003;

      %local v&i;

%end;

While the intent is to create macro variables named v001, v002, and v003, this program fails. Instead, it creates v1, v2, and v3. The leading zeros are not part of the value of &I. To generate the desired names, %SYSFUNC comes to the rescue:

%do i=1 %to 3;

     %local v%sysfunc(putn(&i,z3));

%end;

Another common %SYSFUNC application is checking user-entered parameters. Chapter 11 contains many such examples. For now, here is a basic example:

%let rc = %sysfunc(exist(&dsn));

The EXIST function checks to see whether a SAS data set exists, returning either a 0 or a 1. So when a user supplies a data set name to a macro, the macro can detect whether that data set exists and take appropriate action.

Moving beyond these basic examples, the next section delves into one of the most widespread uses of %SYSFUNC.

4.2 Capturing the Program Name

This example applies to batch programs only:

%let program_path = %sysfunc(getoption(sysin));

GETOPTION retrieves information about the current program or session. By specifying SYSIN as the argument, the function retrieves the complete path to the current program. (This function cannot return a path when you are using SAS interactively.) Consider this simple application:

title "Program:  %sysfunc(getoption(sysin))";

Now the first title line will contain the complete path to the program that produced the output. And if the program gets moved or copied, the TITLE statement doesn’t have to change. The next time the program runs, it will generate the path to the new program location. So with minimal effort, all your reports can automatically display the name of the source program.

Similar tools exist, including some that apply to the interactive use of SAS, but they are beyond the scope of this book. If this is an area of interest, consider:

•    The view SASHELP.VEXTFL contains information about known external files, with the variable XPATH holding the path to each file. Any program brought into the enhanced program editor is automatically tracked in this view.

•    In a Windows environment, the environmental variables SAS_EXECFILEPATH and SAS_EXECFILENAME can be retrieved using %SYSGET:

%put %sysget(SAS_EXECFILEPATH);

With a little imagination, applications can expand beyond the realm of TITLE statements. Suppose a long program produces a ton of output. Usually, the analyst is interested in just the final PROC PRINT, but the earlier output must be available in case there are any questions about the final report. Splitting the output in two would make life easier for the analyst: leave most of the output in the .lst file, but move the final PROC PRINT report to a separate file. In theory, the software supports this:

proc printto print="some_other_file" new;

run;

proc print data=final_results;

run;

But the harder part is linking the new output file with the original program. %SYSFUNC makes that task easy. These statements capture the full path to the program and then remove the last three letters (presumably removing the letters "sas" while leaving in place the “.” at the end):

%let program_path = %sysfunc(getoption(sysin));

%let program_path = %substr(&program_path,1,%length(&program_path)-3);

Once the program has captured the program name and removed the letters "sas" from the end, redirect the final report to a matching file name:

proc printto print="&program_path.final_report" new;

run;

proc print data=final_results;

run;

The output includes the .lst file with all the earlier results, plus a new file holding the final PROC PRINT results. The name of that new file automatically matches the name of the program but with the extension .final_report. While Section 5.4 will embellish upon this technique, many objectives could benefit from creating output files that match the program name. Here are just a couple of examples:

•    A program could use its results to generate the next program that should run, saving the next program in a file with a matching name. Creating a separate program gives the analyst an opportunity to inspect and approve the results of the first program before running the subsequent program.

•    A program may process thousands of variables and save a list of variables that satisfy testing criteria. Saving that list in a separate file serves as documentation of the program results while making it easy to incorporate the list into subsequent programs.

4.3 Commas and Nested %SYSFUNCs

Even relatively simple applications can nest %SYSFUNCs. For example, consider an application that expresses an amount in the comma9 format and removes leading or trailing blanks. In a DATA step, the code might look like this:

without_blanks = compress( put(amount, comma11.) );

Of course, the DATA step could have issues with the preassigned length of WITHOUT_BLANKS, and it might add some trailing blanks. But the focus here is how to achieve a similar result using macro language. To illustrate the problem, temporarily split apart the nested functions. Here is one attempt that illustrates how %SYSFUNC must switch from PUT to PUTN:

%let without_blanks = %sysfunc(putn(&amount,comma11));

%let without_blanks = %sysfunc(compress(&without_blanks));

Presumably, the second statement resolves to something like:

%let without_blanks = %sysfunc(compress(    135,791));

Remember, the comma is a key symbol within the COMPRESS function, separating the string to compress and the characters to remove. So the macro language interprets this statement as saying: “Remove all instances of 1, 7, and 9 from the string 135.” (The combination of %SYSFUNC and COMPRESS automatically ignores the leading blanks in the first argument.) So the final value of &WITHOUT_BLANKS is 35. A single letter overcomes this problem:

%let without_blanks=%sysfunc(compress(%qsysfunc(putn(&amount,comma11))));

Switching to %QSYSFUNC quotes the results, turning the generated comma into text instead of a symbolic character. For more quoting examples, refer to Chapter 7.

Applications that insert the current date into the title could face a similar issue. Date formats are centered, and the WORDDATE18 format would contain a comma. That comma would cause a similar problem, forcing a switch from %SYSFUNC to %QSYSFUNC.

Finally, remember that there are other ways to remove leading and trailing blanks. An extra statement could replace COMPRESS:

%let without_blanks = %sysfunc(putn(&amount,comma11));

%let without_blanks = &without_blanks;

Or switch to STRIP instead of COMPRESS:

%let without_blanks = %sysfunc(strip(%sysfunc(putn(&amount,comma11))));

In the first example, the %LET statement ignores leading and trailing blanks to the right of the equal sign, so the second %LET statement automatically removes any leading or trailing blanks. In the second example, STRIP does not support a second parameter (at least not in current releases of the software). So the comma does not cause a problem.

Sometimes the presence of extra blanks is cosmetic, such as within a TITLE statement. But sometimes the extra blanks cause errors. Chapter 5 explores cases where removing blanks is essential.

4.4 Achieving the Impossible, Revisited

Section 3.2 showed how the CALL EXECUTE statement can circumvent the need to define a macro just to permit %IF %THEN statements. %SYSFUNC provides an alternative method, by invoking the IFN function, which requires three arguments (separated by commas):

•    A true/false comparison

•    A statement to perform when the condition is true

•    A statement to perform when the condition is false

In a DATA step, this code could be replaced:

if amount > 10000 then type = 'Large';

else type = 'Small';

The IFN function requires one long statement:

type = ifn (amount > 10000, 'Large', 'Small'),

To shift over to a macro language application, consider a simplified version of some code from Section 3.2:

%if &city = Boston %then %do;

    proc means data=beantown;

       var pahk youh caah;

    run;

%end;

%else %do;

    proc means data=big_apple;

       var a nominal egg;

    run;

%end;

The %IF %THEN statements cannot appear in open code. Although it would be clumsy, %SYSFUNC can circumvent the requirement of defining a macro:

%let which_one = %sysfunc( ifn( &city = Boston, 

                     %str(proc means data=beantown;

                             var pahk youh caah;

                           run;),

                      %str(proc means data=big_apple;

                              var a nominal egg;

                           run;) ));

&which_one

%SYSFUNC lets macro language invoke the IFN function, determining which set of statements get assigned to &WHICH_ONE. With no %IF %THEN statements, this approach works outside of a macro to generate the PROC PRINT that matches the value of &CITY.

4.5 Capturing Option Settings

Turning on all of these options generates an overwhelming amount of feedback:

options mprint mlogic symbolgen;

And yet, these options are useful temporarily when a section of a macro requires debugging. It is certainly easy to turn them off again:

options nomprint nomlogic nosymbolgen;

However, turning them off again may be the wrong action. Rather, a more complex set of steps might be better:

•    Capture the current settings for these options.

•    Turn all the options on, just before the troublesome section of macro code.

•    Just after the troublesome section, set all the options back to their original settings, rather than turning them all off.

%SYSFUNC makes it easy to capture the current settings:

%let original_settings = %sysfunc(getoption(mprint))

                                        %sysfunc(getoption(mlogic))

                                        %sysfunc(getoption(symbolgen));

Turn on all options before the troublesome section and then reset them after:

options mprint mlogic symbolgen;

%* Troublesome section of code;

options &original_settings;

Depending on the macros in effect, it may be necessary to define &ORIGINAL_SETTINGS with a %GLOBAL statement. Chapter 8 explores %GLOBAL and %LOCAL issues. Also note that PROC OPTSAVE will save all current options settings, enabling PROC OPTLOAD to restore them at a later point.

4.6 Efficiency Considerations

Some function calls generate the same result on every observation:

current_date = today();

next_month = intnx("month", today(), +1);

Instead of calling the functions on every observation, better technique would call them once:

if _n_=1 then do;

   current_date = today();

   next_month = intnx("month", today(), +1);

end;

retain current_date next_month;

Still, this approach must check _n_=1 for each observation. Although the cost is small, %SYSFUNC can eliminate it:

retain current_date %sysfunc(today());

retain next_month %sysfunc(intnx(month, %sysfunc(today()), +1));

In fact, this approach still calls the TODAY() function twice. Macro language could reduce that to once, with a little more code:

%let today = %sysfunc(today());

retain current_date &today;

retain next_month %sysfunc(intnx(month, &today, +1));

Is it worth the extra effort? You decide.

4.7 A Final Example: ZIP Codes

In this final %SYSFUNC application, a macro runs PROC FREQ on each ofa series of data sets:

proc freq data=&dsn;

     tables zipcode;

     title "Zip Codes for &dsn";

run;

However, ZIPCODE is character in some data sets and numeric in others. Whenever ZIPCODE is numeric, the macro must detect that fact and add the statement in bold:

proc freq data=next;

     tables zipcode;

     title "Zip Codes for Next Data Set";

     format zipcode z5.;

run;

%SYSFUNC simplifies the task. These statements could appear between the TITLE and RUN statements:

%let dataset_id = %sysfunc(open(&dsn));

%let var_id = %sysfunc(varnum(&dataset_id, zipcode));

%let zip_type = %sysfunc(vartype(&dataset_id, &var_id));

%let rc = %sysfunc(close(&dataset_id));

%if &zip_type=N %then %do;

    format zipcode z5.;

%end;

In combination, these statements assign &ZIP_TYPE a value of N or C, and they add the FORMAT statement if needed. More specifically, the first statement assigns &DATASET_ID a number that can be used to identify &DSN. The second statement assigns &VAR_ID a number that can be used to identify ZIPCODE within &DSN. The third statement identifies whether ZIPCODE is character or numeric. And the fourth statement closes &DSN.

Alternatives exist. For example, DICTIONARY.COLUMNS contains information about every variable within every SAS data set. But there are advantages to a purely macro-based solution. Section 10.4 will explore those advantages in more detail. Finally, note that more powerful magic can make the macro language disappear entirely from this application. Switch to this FORMAT statement, and all the macro language is unnecessary:

format _numeric_ z5.;

When ZIPCODE is numeric, the format applies. But when it is character, the FORMAT statement does nothing … no impact but no harm.

4.7.1 Programming Challenge #3

What about more complex situations:

•    A TABLES statement lists additional numeric variables that should not use a Z5 format.

•    ZIPCODE is still numeric in some data sets and character in others.

How could a FORMAT statement apply to a numeric ZIPCODE but ignore a character ZIPCODE as well as every other numeric variable? Your clue for this problem is that it is possible … a simple FORMAT statement can do the trick.

4.7.2 Solution

Once again, ingenuity makes macro language disappear. Consider these variable lists:

dog -- cat                        all variables from DOG through CAT

dog-numeric-cat    xs      all numeric variables from DOG through CAT

The second list works even when DOG and CAT are both character. In fact, all variables in the range might be character so that the list refers to zero variables. This FORMAT statement could apply to no variables in that case:

format dog-numeric-cat z5.;

That brings us back to the situation where ZIPCODE might be character or numeric:

format zipcode-numeric-zipcode z5.;

The FORMAT applies to all numeric variables from ZIPCODE through ZIPCODE. If ZIPCODE is character, then the list is empty, and ZIPCODE remains unaffected by the FORMAT statement.

By its nature, %SYSFUNC adds a layer of complexity. Always consider alternatives. Could a DATA step do the job? Is macro language even necessary? Most SAS applications give you a choice.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.128.113