Using the LSMEANS Statement to Analyze Data from Unbalanced Designs

As discussed in Chapter 9, an experimental design is balanced if the same numbers of observations (participants) appear in each cell of the design. For example, Figure 10.7 (presented earlier in this chapter) illustrates the research design used in the investment model study. It shows that there are five participants in each cell of the design. When a research design is balanced, it is generally appropriate to use the MEANS statement with PROC GLM to request group means, multiple comparison procedures, and confidence intervals.

In contrast, a research design is said to be unbalanced if some cells in the design contain a different number of observations (participants) than other cells. For example, again consider Figure 10.7 where the research contains six cells. If there were 20 participants in one of the cells, but just five participants in each of the remaining five cells, the research design would then be unbalanced.

When analyzing data from an unbalanced design, it is generally best to not use the MEANS statement. This is because (with unequal cell sizes), the MEANS statement can produce marginal means that are biased. When analyzing data from an unbalanced design, it is generally preferable to use the LSMEANS statement in your program, rather than the MEANS statement. This is because the LSMEANS statement estimates the marginal means over a balanced population. LSMEANS estimates what the marginal means would be if you did have equal cell sizes. In other words, the marginal means estimated by the LSMEANS statement are less likely to be biased.

Writing the LSMEANS Statements

The General Form

Below is the general form for the PROC step of a SAS program that uses the LSMEANS statement rather than the MEANS statement:

PROC GLM DATA = dataset-name;
   CLASS    predictorA  predictorB;
   MODEL    criterion-variable  =  predictorA predictorB
            predictorA*predictorB;
   LSMEANS  predictorA predictorB   predictorA*predictorB;
   LSMEANS  predictorA predictorB   / PDIFF  ADJUST=TUKEY  ALPHA=alpha-
level;
RUN;

The preceding general form is very similar to the general form that used the MEANS statement earlier in this chapter. The primary difference is that the two MEANS statements is replaced with two LSMEANS statements. The first LSMEANS statement takes this form:

   LSMEANS  predictorA  predictorB  predictorA*predictorB;

You can see that this LSMEANS statement is identical to the earlier MEANS statement, except that “MEANS” is replaced with “LSMEANS.”

The second LSMEANS statement is a bit more complex:

   LSMEANS predictorA  predictorB  /  PDIFF  ADJUST=TUKEY
           ALPHA=alpha-level;

You can see that this second LSMEANS statement contains a forward slash, followed by a number of keywords for options. Here is what the key words request:

  • PDIFF requests that SAS print p values for significance tests related to the multiple comparison procedure. These p values tell you whether or not there are significant differences between the least-squares means for the different levels under the two predictor variables.

  • ADJUST=TUKEY requests a multiple comparison adjustment for the p values and confidence limits for the differences between the least-squares means. Including ADJUST=TUKEY requests an adjustment based on the Tukey HSD test. The adjustment can also be based on other multiple-comparison procedures.

  • ALPHA= specifies the significance level to be used for the multiple comparison procedure and the confidence level to be used with the confidence limits. Specifying ALPHA=0.05 requests that the significance level (alpha) be set at .05 for the Tukey tests. If you had wanted alpha set at .01, you would have used the option ALPHA=0.01, and if you had wanted alpha set at .10, you would have used the option ALPHA=0.1.

The Actual SAS Statements

Below are the actual statements that you would include in a SAS program to request a factorial ANOVA using the LSMEANS statement rather than the MEANS statement. The following statements are appropriate to analyze data from the aggression study described in this chapter. Notice that alpha is set at .05 for the Tukey tests.

 1          PROC GLM DATA=D1;
 2             CLASS    REWGRP COSTGRP;
 3             MODEL    COMMIT = REWGRP COSTGRP REWGRP*COSTGRP;
 4             LSMEANS  REWGRP COSTGRP REWGRP*COSTGRP;
 5             LSMEANS  REWGRP COSTGRP / PDIFF ADJUST=TUKEY  ALPHA=0.05;
 6          RUN;
 7
 8          PROC SORT DATA=D1;
 9            BY COSTGRP;
10         RUN;
11
12         PROC GLM DATA=D1;
13            CLASS REWGRP;
14            MODEL COMMIT = REWGRP;
15            LSMEANS REWGRP / PDIFF ADJUST=TUKEY  ALPHA=0.05;
16            LSMEANS REWGRP;
17            BY COSTGRP;
18         RUN;

Output Produced by LSMEANS

The output produced by the LSMEANS statements is very similar to the output produced by the MEANS statements except that the means have been appropriately adjusted.

There are a few additional differences. For example, the MEANS statement prints means and standard deviations, while the LSMEANS statement prints only adjusted means. For the most part, however, if you have read the sections of this chapter that show how to interpret the results of the MEANS statements, you should have little difficulty interpreting the results produced by LSMEANS. Given that our example is a balanced design, the results obtained with the MEANS and LSMEANS statements will be virtually identical and not reproduced here. This discussion of when to use LSMEANS statements and the accompanying code is intended to provide you with an example should you be faced with unequal designs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.110.16