Dealing Practically with Missing Data

Missing Data Steps

As discussed in Chapter 4 you first need to diagnose and then deal with missing data. Your first task is to assess observations with too much missing data, then to assess variables for missing data.
SAS has various programs that can assess the issue. As we have seen in the “Code07a” file, the standard descriptive statistics models assess missing data per variable (as do the frequencies modules).
However, in addition, I have written a SAS program that gives you automatic analyses of both observation-level and variable-level missing data. Open and run “Code09a Gregs missing data analysis,” which is based on the dataset “Data01_Initial.” Open and run it to see the result.

Assessing Missing Data in Observations

Having run my SAS missing data analysis, you will see observation-level output, sorted by the highest percentage of missing data. The left side of Figure 9.2 My missing data analysis program shows the observation-level output in this output.
Figure 9.2 My missing data analysis program
Clearly, observation numbers 6, 40 and 59 may be serious, with 67% missing data each. You would want to look at the original data. If the major focus variables such as sales are missing, or too many other variables are missing, you might favor deleting these observations.

Assessing Missing Data in Variables

Secondly, you wish to analyze the variables (columns) for missing data. The second part of my SAS missing data program gives the variable-level missing data analysis. As can be seen there, the only missing data is for survey data (trust and satisfaction items), because the other data is from databases that gather data within more automated systems. The right side of Figure 9.2 My missing data analysis program shows the number and percentage missing within variables, sorted by highest number. You will note that Satisfaction01 is missing for 25 observations, which is 9% of data, and so on.
The following additional steps would be taken:
  1. No variable has excessive missing data, with the greatest at 9%, which is not all that high. Also, because the highest missing variables are part of multi-item scales, we may be able to deal with it in that context.
  2. Again, more complex techniques for addressing missing data exist, such as multiple imputation and full information maximum likelihood (FIML). SAS has very good multiple imputation and FIML options, but learning these techniques is somewhat beyond the scope of this course.
  3. The following simpler options might be considered:
    1. Simple imputation: Some researchers might be tempted to replace the missing data with a central score (e.g. replace missing data in Satisfaction01 with the average score in that variable). When the variable is a sole one (not part of a multi-item scale) and there is not too much missing data, you might consider this. However, replacing individual items in a multi-item scale may not be necessary.
    2. The multi-item variables: It is probably acceptable to aggregate multi-item scales using averages, which would smooth over some of the missing data. We need to follow all the multi-item scale steps before making this decision. These steps are discussed below.
Last updated: April 18, 2017
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.195.111