Creating a General Purpose Macro to Count Missing and Nonmissing Values for Both Numeric and Character Variables

When you omit the VAR statement (or use _NUMERIC_ in place of the variable list) with PROC MEANS, all numeric variables are listed. With PROC FREQ, you can use _CHARACTER_ in place of a variable list in the TABLES statement. However, when you use PROC TABULATE, you need to provide a list of variable names in the CLASS or VAR statement. Yes, it would be nice to use PROC TABULATE to provide a summary of missing values for all the variables in a SAS data set (and minimum and maximum values for the numeric variables) without having to “hard code” all of the variable names. A novel way to accomplish this task is to use PROC SQL to create two macro variables: one, a list of numeric variables in the data set; the other, a list of character variables. Program 3-8 uses this method, along with PROC TABULATE, to list the number of missing and nonmissing values for all the character and numeric variables in a data set.

Program 3-8. Writing a Macro to Count the Number of Missing and Nonmissing Observations for All Numeric and Character Variables in a Data Set
*-----------------------------------------------------------------*
| Program Name: AUTOMISS.SAS  in C:CLEANING                      |
| Purpose: Macro to list the number of missing and nonmissing     |
|          variables in a SAS data set                            |
| Arguments: DSNAME =  SAS data set name (one- or two-level)      |
| Example: %AUTOMISS(CLEAN.PATIENTS)                              |
*-----------------------------------------------------------------*;


%MACRO AUTOMISS(DSNAME);


   %***One-level data set name;
   %IF %INDEX(&DSNAME,.) = 0 %THEN %DO;  1
      %LET LIB = WORK;
      %LET DSN = %UPCASE(&DSNAME);
   %END;


   %***Two-level data set name;
   %ELSE %DO;  2
      %LET LIB = %UPCASE(%SCAN(&DSNAME,1,"."));
      %LET DSN = %UPCASE(%SCAN(&DSNAME,2,"."));
   %END;


   %*Note: it is important for the libname and data set name to
     be in uppercase;


   %* Initialize macro variables to null;
   %LET NVARLIST=;
   %LET CVARLIST=;


   TITLE1 "Number of Missing and Nonmissing Values from &DSNAME";
   %* Get list of numeric variables;
   PROC SQL NOPRINT;
      SELECT NAME INTO :NVARLIST SEPARATED BY " "  3
      FROM DICTIONARY.COLUMNS  4
      WHERE LIBNAME = "&LIB" AND MEMNAME = "&DSN" AND TYPE = "num"; 5

   %* Get list of character variables;
   SELECT NAME INTO :CVARLIST SEPARATED BY " "
      FROM DICTIONARY.COLUMNS
      WHERE LIBNAME = "&LIB" AND MEMNAME = "&DSN" AND TYPE = "char"; 6
   QUIT;

   PROC FORMAT; 7
      VALUE $MISSCH " " = "Missing"
                    OTHER = "Nonmissing";
   RUN;

   PROC TABULATE DATA=&LIB..&DSN MISSING FORMAT=8.;  8

      %* If there are any numeric variables, do the following;
      %IF &NVARLIST NE %THEN %DO;
         VAR &NVARLIST;
         TITLE2 "for Numeric Variables";
         TABLE &NVARLIST,
            N NMISS MIN MAX / RTSPACE=26;
      %END;


      %* If there are any character variables, do the following;
      %IF &CVARLIST NE %THEN %DO;
         CLASS &CVARLIST;
         TITLE2 "for Character Variables";
         TABLE &CVARLIST,
            N / RTSPACE=26;
         FORMAT &CVARLIST $MISSCH.;
       %END;


      KEYLABEL N                           = "Number"
               NMISS                       = "Number Missing"
               MIN                         = "Lowest Value"
               MAX                         = "Highest Value";
   RUN;


%MEND AUTOMISS;

The macro starts with a test to see if the data set name (the calling argument) is a one- or two-level name. The macro function %INDEX returns a 0 if there is no period in the data set name. In this case, the macro variable LIB is set equal to WORK, and the macro variable DSN is set equal to the data set name. If the data set name contains a period, the %INDEX function returns a number greater than 0 and the two macro variables LIB and DSN are set equal to the libname and data set name, respectively . It is important that the libname and data set names be in uppercase, so the %UPCASE function is used. The SELECT statement is used to place the variable name (NAME) in a macro variable called NVARLIST. This is accomplished by the terms INTO :NVARLIST. The list of names produced is separated by spaces, as indicated in the SELECT statement. The keyword DICTIONARY. COLUMNS returns the list of variables. Finally, to obtain the numeric and character lists separately, TYPE=“num” or TYPE=“char” is added to the WHERE statements , . When the two SELECT statements are processed, the macro variable NVARLIST will contain a list of all the numeric variables, separated by spaces and the macro variable CVARLIST will contain a list of all the character variables in the data set specified by the calling arguments.

The $MISSCH format is the same as we used in Program 3-7. Finally, the PROC TABULATE statements are identical to the statements in the previous programs where the variable lists were “hard coded,” except the macro variables are substituted for the explicit list of variables.

Calling this macro with the data set name CLEAN.PATIENTS as the argument, %AUTOMISS(CLEAN.PATIENTS), produced the following output:

The PROC MEANS and PROC FREQ methods for listing missing values are certainly easier than the complicated program you were shown before. You may prefer the appearance of the PROC TABULATE output better, or you may just be interested in the programming techniques that were used.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.123.183