Using PROC MEANS and PROC FREQ to Count Missing Values

There are several procedures that will count missing values for you. It may be normal to have missing values for certain variables in your data set. There may also be variables for which no missing values are permitted (such as a patient ID). An easy way to count missing values is by using PROC MEANS; for character variables, PROC FREQ will provide this information. Program 3-1 is a simple program that can be used to check the number of numeric and character missing values in the PATIENTS data set.

Program 3-1. Counting Missing and Nonmissing Values for Numeric and Character Variables
LIBNAME CLEAN "C:CLEANING";


TITLE "Missing Value Check for the PATIENTS Data Set";


PROC MEANS DATA=CLEAN.PATIENTS N NMISS;
RUN;


PROC FORMAT;
   VALUE $MISSCNT ' '   = 'MISSING'
                  OTHER = 'NONMISSING';
RUN;


PROC FREQ DATA=CLEAN.PATIENTS;
   TABLES _CHARACTER_ / NOCUM MISSING;
   FORMAT _CHARACTER_ $MISSCNT.;
RUN;

The check for numeric missing values is straightforward. By using the N and NMISS options with PROC MEANS, you get a count of the nonmissing and missing values for all your numeric variables (the default if no VAR statement is included). You could also choose to use a VAR statement to list only the variables of interest.

Counting missing values for character variables takes an extra step. First, you do not simply want to create one-way frequencies for all the character variables. Some variables, such as patient ID (PATNO) can, conceivably, have thousands of values. By creating a character format that has only two value ranges, one for missing and the other for everything else, you can have PROC FREQ count missing and nonmissing values for you.

Notice also, that it is necessary to use the SAS keyword _CHARACTER_ in the TABLES statement (or to provide a list of character variables). PROC FREQ can produce frequency tables for numeric as well as character variables. Finally, the TABLES option MISSING includes the missing values in the body of the frequency listing. Examination of the listing from these two procedures is a good first step in your investigation of missing values. The output from Program 3-1 is shown next.

Missing Value Check for the PATIENTS Data Set

The MEANS Procedure

                                                  N
Variable    Label                        N     Miss
---------------------------------------------------
VISIT       Visit Date                  24        7
HR          Heart Rate                  28        3
SBP         Systolic Blood Pressure     27        4
DBP         Diastolic Blood Pressure    28        3
---------------------------------------------------

Missing Value Check for the PATIENTS data set

The FREQ Procedure

           Patient Number

PATNO         Frequency      Percent
-----------------------------------
MISSING              1         3.23
NONMISSING          30        96.77


              Gender

GENDER        Frequency      Percent
------------------------------------
MISSING              1         3.23
NONMISSING          30        96.77


           Diagnosis Code

DX            Frequency      Percent
------------------------------------
MISSING              8        25.81
NONMISSING          23        74.19


           Adverse Event?

AE            Frequency      Percent
------------------------------------
MISSING              1         3.23
NONMISSING          30        96.77


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.163.91