Chapter 5 Unbalanced Data Analysis: Basic Methods

5.1 Introduction

5.2 Applied Concepts of Analyzing Unbalanced Data

5.2.1 ANOVA for Unbalanced Data

5.2.2 Using the CONTRAST and ESTIMATE Statements with Unbalanced Data

5.2.3 The LSMEANS Statement

5.2.4 More on Comparing Means: Other Hypotheses and Types of Sums of Squares

5.3 Issues Associated with Empty Cells

5.3.1 The Effect of Empty Cells on Types of Sums of Squares

5.3.2 The Effect of Empty Cells on CONTRAST, ESTIMATE, and LSMEANS Results

5.4 Some Problems with Unbalanced Mixed-Model Data

5.5 Using the GLM Procedure to Analyze Unbalanced Mixed-Model Data

5.5.1 Approximate F-Statistics from ANOVA Mean Squares with Unbalanced Mixed-Model Data

5.5.2 Using the CONTRAST, ESTIMATE, and LSMEANS Statements in GLM with Unbalanced Mixed-Model Data

5.6 Using the MIXED Procedure to Analyze Unbalanced Mixed-Model Data

5.7 Using the GLM and MIXED Procedures to Analyze Mixed-Model Data with Empty Cells

5.8 Summary and Conclusions about Using the GLM and MIXED Procedures to Analyze Unbalanced Mixed-Model Data

5.1 Introduction

Most persons who have analyzed data have experienced “unbalanced data,” although the term is hard to precisely define. Usually, it refers to having different numbers of observations in different groups of data. For example, a data set contained final exam scores from a statistics class at a university. The class had 66 students from four colleges, but there were different numbers of students from the various colleges. The set of exam scores would make up an unbalanced one-way classification of data. However, the main problems associated with analyzing unbalanced data do not occur in a one-way classification. In the statistics class, there were both part-time and full-time students within each college, and only coincidentally were there the same number of each from a given college. Thus, the scores of final exams constitute a two-way classification of data with differing numbers of observations in the combinations of college and status. A combination of college and status makes a cell of data. With this terminology, we could say that unbalanced data refers to data sets with different numbers of observations in the cells. You might consider this situation an “observational study” because the numbers of students in the various groups were not controlled. The instructor would be faced with problems of analyzing unbalanced data if he or she wanted to compare mean exam scores of full-time with part-time students, averaged across colleges. The basic problem is to decide what weight to attach to the mean scores for each status group in each college.

In another example, a pharmaceutical company compared effects of two drugs, A and B, on a clinical measurement called “flush.” The study utilized patients in 10 clinics. Multiple clinics were used in order to obtain representation of diverse patient populations. The original plan called for each clinic to assign the two drugs to 15 patients within each clinic. However, there were not enough patients at the clinics, so all patients available were randomly assigned to the two drugs. This was basically done, but a few patients abandoned the trial before completion, leaving unequal numbers of patients on the two drugs within some of the clinics. In addition, the availability of patients varied between the clinics, ranging from three to 28. Thus, even though this was a designed experiment, realities of the situation resulted in unbalanced data. A statistical comparison of mean flush between the two drugs would raise the question of how to assign weights to the individual means.

Analysis of unbalanced data received sporadic attention for several decades, but the attention intensified in the 1970’s when computer programs such as PROC GLM became readily accessible. Most of the writing focused on the fixed-effects case, prompted in part by the different types of sums of squares in PROC GLM. Other popular texts that discuss fixed-effects issues include Milliken and Johnson (1991), Hocking (1986), and Searle (1987). Analysis of unbalanced mixed-model data still contains many mysteries. The GLM procedure contains certain capabilities that are adoptions of fixed-effects computations, but there has been relatively little concrete description of how to use them. This prompted a re-evaluation of how to analyze mixed-model data in the late 1980’s and PROC MIXED implemented newer methodology based on generalized least squares and likelihood-based methods.

The purpose of this chapter is to illustrate methods that are available in PROC GLM and PROC MIXED for analyzing unbalanced data. In Sections 5.2 and 5.3 you will see the issues of analyzing fixed-effect unbalanced data presented on a conceptual level using the clinical trial example described above. Then, in later sections, you will see ANOVA and generalized least squares and likelihood methods for analysis of unbalanced mixed-model data.

5.2 Applied Concepts of Analyzing Unbalanced Data

The FLUSH measurements from the pharmaceutical study are recorded in a SAS data set named DRUGS. A portion of the data set is printed in Output 5.1. Variables in the data set include STUDY, TRT, PATIENT, FLUSH0, and FLUSH. The values of FLUSH0 were obtained prior to administration of the drugs, but are not used in the discussions in this chapter.

Output 5.1 Data Set DRUGS

Unbalanced Two-way Classification
 
OBS  STUDY   TRT   PATIENT  FLUSH0      FLUSH
 
1 42 A 201 50.5 70.3333
2 42 A 203 84.5 16.1429
3 42 B 202 33.5 28.3333
4 43 A 302 22.0 14.5000
5 43 A 305 23.0 25.5000
6 43 A 306 22.0 12.2500
7 43 A 307 13.0 3.1250
8 43 A 310 50.5 51.1250
9 43 A 313 57.0 49.2500
10 43 A 316 13.5 1.6250
11 43 A 317 36.5 29.5000
12 43 A 321 59.0 30.5000
13 43 A 322 30.5 33.5000
14 43 A 323 10.5 2.2500
15 43 A 325 37.0 13.8750
16 43 A 327 35.5 21.0000
17 43 A 329 28.0 16.0000
18 43 B 301 40.5 17.5000
19 43 B 303 12.5 8.8333
20 43 B 304 47.5 40.0000
21 43 B 308 34.5 23.1429
22 43 B 309 15.5 3.1250
23 43 B 311 43.0 35.8750
24 43 B 314 30.0 31.6250
25 43 B 315 27.5 16.0000
26 43 B 318 62.0 41.1250
27 43 B 319 105.0 44.7500
28 43 B 324 38.5 43.1250
29 43 B 326 7.0 15.0000
30 43 B 328 8.0 4.5000
31 43 B 330 30.5 19.0000
32 44 A 401 46.0 14.8750
33 44 A 405 36.5 2.9231
34 44 A 406 22.5 2.8000
35 44 A 408 21.5 1.3750
36 44 A 409 27.0 22.0000
37 44 A 411 46.5  ⋅    
38 44 B 402 14.0 4.7778
39 44 B 403 23.0 3.6667
40 44 B 404 30.0 17.1250
41 44 B 407 19.0 22.3636
42 44 B 410 67.5 18.1667
43 44 B 412 12.0 2.0000
44 45 A 502 60.0 62.0000
45 45 A 503 36.0 13.6250
46 45 A 506 24.0 1.0000
47 45 A 507 29.0 24.1250
48 45 A 510 12.5 11.5000
49 45 A 512 82.5 84.0000
50 45 A 513 31.5 0.6250
51 45 A 515 53.0 45.0000
52 45 A 518 56.0 43.7500
53 45 A 519 23.0  7.3750
54 45 A 520 48.5  43.1429
55 45 A 527 16.0   ⋅    
56 45 B 501 34.0  30.0000
57 45 B 504 74.5  38.3750
58 45 B 505 22.0  25.3750
59 45 B 508 7.0  2.8750
60 45 B 509 13.0  8.1250
61 45 B 511 34.5  28.8750
62 45 B 514 20.5  22.6000
63 45 B 516 75.5  37.0000
64 45 B 517 50.0  59.1250
65 45 B 529 27.5  26.8000
66 45 B 530 49.0  33.0000
67 46 A 601 31.0  5.0000
68 46 A 602 53.0  20.8750
69 46 A 605 28.0  16.0000
70 46 A 608 21.5  7.5000
71 46 A 609 11.5  3.3750
72 46 A 611 59.0  35.6250
73 46 B 603 39.0  50.0000
74 46 B 604 65.0  43.0000
75 46 B 606 43.5  41.0000
76 46 B 607 25.0  8.5000
77 46 B 610 26.5  0.5000
78 46 B 629 27.5  15.5000
79 46 B 630 19.5  11.1250

Output 5.2 shows summary statistics for FLUSH for each combination of STUDY and TRT.

Output 5.2 Summary Statistics for the FLUSH Data Set

Unbalanced Two-way Classification
 
Analysis Variable : FLUSH
 
 
STUDY TRT N Obs Mean N Std Dev Minimum Maximum
42  A 2   43.2381000 2   38.3183993   16.1429000   70.3333000
   B 1 28.3333000 1 28.3333000 28.3333000
 
43  A 14 21.7142857 14 15.8805206 1.6250000 51.1250000
   B 14 24.5429429 14 14.6750757 3.1250000 44.7500000
 
44  A 6 8.7946200 5 9.1762473 1.3750000 22.0000000
   B 6 11.3499667 6 8.8404391 2.0000000 22.3636000
 
45  A 12 30.5584455 11 27.1736184 0.6250000 84.0000000
   B 11 28.3772727 11 14.9979268 2.8750000 59.1250000
 
46  A 6 14.7291667 6 12.2625998 3.3750000 35.6250000
   B 7 24.2321429 7 19.8164006 0.5000000 50.0000000
 
47  A 6 20.8777833 6 7.8619415 6.8750000 28.7500000
   B 7 49.0178571 7 30.9056384 4.3750000 92.1250000
 
48  A 8 21.7857143 7 13.8723185 7.0000000 42.5000000
   B 8 22.5732500 8 14.7692870 3.8750000 49.2500000
 
49  A 10 32.1554000 10 31.0770557 0.1250000 79.7500000
   B 10 27.3953000 10 24.3228740 0 59.6250000
 
50  A 4 9.4687500 4 7.4486961 2.1250000 18.7500000
   B 3 71.8750000 3 53.6849898 32.3750000 133.0000000

There are different numbers of observations in the TRT-by-STUDY cells, meaning we have unbalanced data. Note that there is at least one observation for each combination of the factors.

5.2.1 ANOVA for Unbalanced Data

The four types of sums of squares available in PROC GLM are designed to deal with unbalanced data. Run the statements

proc glm data=drugs;
   class study trt;
   model flush=trt study trt*study / ss1 ss2 ss3;
run;

Results appear in Output 5.3. Types I, II, and III are selected with the options ss1 ss2 ss3 on the MODEL statement. Type IV was not selected because Types III and IV are equal for situations such as this that have at least one observation for each factor combination; that is no empty cells. Types I, II, and III have different values for TRT, but Types I and II are the same for STUDY. Types I, II, and III are the same for TRT*STUDY. The technical reasons for the sameness and differences of the Types of sums of squares are explained in Chapter 6.

The primary objective of the study is to compare TRT means. Before making comparisons of the drugs averaged over clinics, note that the TRT*STUDY interaction is significant at the p=.0178 level. This means that differences between drug A and drug B vary across clinics. In Output 5.2 you see that drug B has a larger mean than drug A for six of the nine clinics. If there is additional information about the clinics, you might want to try to investigate whether characteristics of the clinics can explain the interaction between STUDY and TRT. Recall in Section 3.7 that METHODS were compared separately for each VARIETY due to the presence of METHOD*VARIETY interaction.) Depending on the situation, it may or may not be meaningful to compare the drugs averaged across clinics. For the present situation, we have no other information about the clinics. Also, the clinics were used to obtain representation of the drug differences over a set of clinics. Therefore, even in the face of TRT*STUDY interaction, it could be important and meaningful to compare the drugs averaged over clinics. This would be the case if, for example, you must choose one of the drugs to be used at all clinics.

Output 5.3 Three Types of ANOVA Tables for the FLUSH Data Set

Unbalanced Two-way Classification
 
The GLM Procedure
 
Dependent Variable: FLUSH        
 
   Sum of  
Source DF  Squares  Mean Square F Value Pr > F
 
Model 17  16618.75357 977.57374 2.24 0.0063
Error 114  49684.09084 435.82536
Corrected Total 131  66302.84440
 
R-Square Coeff Var Root MSE drywt Mean
 
0.250649 80.31125 20.87643 25.99440
 
Source DF Type I SS Mean Square F Value Pr > F
 
TRT 1 1134.560964 1134.560964 2.60 0.1094
STUDY 8 6971.606045 871.450756 2.00 0.0526
TRT*STUDY 8 8512.586561 1064.073320 2.44 0.0178
 
Source DF Type II SS Mean Square F Value Pr > F
 
TRT 1 1377.550724 1377.550724 3.16 0.0781
STUDY 8 6971.606045 871.450756 2.00 0.0526
TRT*STUDY 8 8512.586561 1064.073320 2.44 0.0178
 
Source DF Type III SS Mean Square F Value Pr > F
 
TRT 1 1843.572090 1843.572090 4.23 0.0420
STUDY 8 7081.377266 885.172158 2.03 0.0488
TRT*STUDY 8 8512.586561 1064.073320 2.44 0.0178

Next you must decide which of the three F-tests from the three types of sums of squares is most appropriate for comparing the drugs. The first consideration is to select a test statistic that tests the hypothesis you want to test. Of course, the hypothesis you want to test should have been prescribed at the planning stage of the study, not in the middle of data analysis.

Let μij denote the population for drug i and clinic j. If you are equally interested in each clinic, then a reasonable hypothesis to test is

H0: μA. = μB.

where

μA. = (1/9)(μA1 + μA2 + μA3 + μA4 + μA5 + μA6 + μA7 + μA8 + μA9)

and

μB. = (1/9)(μB1 + μB2 + μB3 + μB4 + μB5 + μB6 + μB7 + μB8 + μB9)

The F-test based on the Type III sum of squares in Output 5.3 gives a test of this hypothesis, which we will refer to as a “Type III hypothesis.” This means that if you test this hypothesis at the .05 level using the Type III F-test, the probability you will make a Type I error is exactly .05.

The Type III hypothesis is a statement of equality of the drug means, averaged across the clinics, with equal weights attached to each clinic. Other hypotheses could be formulated attaching different weights to different clinics. Without overriding reasons for attaching different weights, the type III hypothesis is often reasonable. However, there are other considerations that enter into the decision. For example, power of the Type III test can be very low if sample sizes for some cells are small compared to other sample sizes of other cells.

5.2.2 Using the CONTRAST and ESTIMATE Statements with Unbalanced Data

If there are no empty cells you can use CONTRAST and ESTIMATE statements in the same way you used them with balanced data in Chapter 3. For example, you can test the significance of the difference between the drug means with the CONTRAST statement

contrast trtB-trtA’ trt –1 1;

and you can estimate the difference between the drug means with the statement

estimate ‘trtB-trtA’ trt –1 1;

Results of the CONTRAST and ESTIMATE statements appear in Output 5.4.

Output 5.4 Results of the CONTRAST and ESTIMATE Statements

Unbalanced Two-way Classification
 
Contrast DF Contrast SS Mean Square F Value Pr > F
 
trtB-trtA 1 1843.572090 1843.572090 4.23 0.0420
 
  Standard  
Parameter Estimate Error  t Value  Pr > |t|
 
trtB-trtA 9.37497409 4.55823028 2.06 0.0420

The CONTRAST statement produces the same sum of squares, mean square, F-test and p-value for the difference between drug means that you obtained from the Type III ANOVA F-test in Output 5.4. It is a test of H0: μA. = μB..

The difference between the estimates of μA. and μB. is 9.375, and the standard error of the estimate is 4.558. A t-statistic for testing H0: μA. = μB. is t=9.375/4.558=2.06. The p-value for the t-statistic is .0420. This also is the same p-value you got from the Type III F-test in Output 5.3. In most situations, the results of t-tests from ESTIMATE statements are equivalent to F-tests from Type III ANOVA.

5.2.3 The LSMEANS Statement

You can calculate means from unbalanced data using the LSMEANS statement. With balanced data, the LSMEANS statement computes ordinary means. LSMEANS for the drugs are obtained from the statement

lsmeans trt / pdiff;

The PDIFF option is a request for t-tests to compare the LSMEANS. Results appear in Output 5.5.

Output 5.5 Results of the LSMEANS Statement

Unbalanced Two-way Classification
 
The GLM Procedure
Least Squares Mean
 
  H0:LSMean1=
Standard   H0:LSMEAN=0 LSMean2 
TRT FLUSH LSMEAN Error Pr > |t| Pr > |t|
A 22.5913628   3.0141710 <.0001 0.0420
B 31.9663369 3.4193912 <.0001

The estimate of μA. is 22.591, with standard error 3.014, and the estimate of μB. is 31.966 with standard error 3.419. The p-value for comparing the means in .0420, which is the same as the p-value you got from the ESTIMATE statement in Output 5.4. In fact, the difference between the means in Output 5.4 is the same as the difference between the LSMEANS in Output 5.5. More information about LSMEANS is given in Chapter 6.

5.2.4 More on Comparing Means: Other Hypotheses and Types of Sums of Squares

In Section 5.2.2 you learned that the F-statistic based on the Type III sum of squares gives a test of the null hypothesis

H0: μA. = μB.

where

μA. = (1/9)(μA1 + μA2 + μA3 + μA4 + μA5 + μA6 + μA7 + μA8 + μA9)

and

μB. = (1/9)(μB1 + μB2 + μB3 + μB4 + μB5 + μB6 + μB7 + μB8 + μB9)

This null hypothesis states that the average of the drug A means is equal to the average of the drug B means, where the averages are computed across the clinics with equal weights for each clinic. In some circumstances you might want to compare averages of the drug means, but with different weights for the clinics. For example, the clinics might serve different patient populations, and you might want to weight the means proportional to the patient population sizes. Let wj, j = 1, … ,9, denote the relative population sizes, with w1 + … + w9= 1. Then the weighted hypothesis would be

H0*A. = μ*B.             (5.2)

where

μ*A. = (1/9)(w1μA1 + … + w9μA9)

and

μ*B. = (1/9)(w1μB1 + … + w9μB9)

You could test this hypothesis with the CONTRAST statement. To illustrate, suppose the weights are .03, .20, .09, .17, .1, .1, .1, .14, and .07. The CONTRAST statement would be

contrast ‘trtB – trtA wtd’ trt –1 1
          trt*study –.03 –.20 –.09 –.17 –.1 –.1 –.1 –.14 –.07
                     .03  .20    .09   .17  .1  .1  .1  .14  .07;

Results appear in Output 5.6.

Output 5.6 Results of the CONTRAST Statement for Weighted Hypothesis

Unbalanced Two-way Classification
 
The GLM Procedure
 
Dependent Variable: FLUSH  
 
  Contrast DF   Contrast SS   Mean Square   F Value   Pr > F
 
  trtB-trtA wtd 1 1829.354286 1829.354286 4.20 0.0428

You see that the sum of squares for this CONTRAST statement is different from the sum of squares in Output 5.4 for the equally weighted hypothesis, although not by very much. You also see that the sum of squares for the weighted hypothesis in Output 5.6 is different from the TRT sum of squares for any of the ANOVA tables in Output 5.3. This illustrates that there are different values of sums of squares for TRT depending on the weights assigned to the means. Each type of sum of squares for TRT in Output 5.6 is associated with a certain set of weights. These are explained in detail in Chapter 6.

5.3 Issues Associated with Empty Cells

The data set discussed in Section 5.2 had at least one observation for each cell corresponding to combinations of TRT and STUDY. In the original data set there was another clinic, STUDY=41, that had patients only for drug B. So the cell for that clinic and drug A was empty. Empty cells create another layer of complications in analyzing unbalanced data. In this section we illustrate some of these issues using the original data set, which we call DRUGS1. The first several observations are printed in Output 5.7.

Output 5.7 Partial Printout of Data Set DRUGS1

Unbalanced Two-way Classification
 
OBS  STUDY   TRT   PATIENT  FLUSH0      FLUSH
 
1 41 B 102 77.5 72.0000
2 41 B 104 23.5 5.6250
3 41 B 105 63.5 81.8750
4 41 B 106 72.5 83.5000
5 41 B 107 58.0 75.5000
6 41 B 108 49.0 13.7500
7 41 B 109 7.5 9.3750
8 41 B 110 13.5 7.8750
9 41 B 111 13.5 6.0000
10 41 B 112 76.5 61.6000
11 41 B 113 78.5 98.1250
12 41 B 114 56.5 46.1250
13 41 B 115 61.0 24.2500
14 41 B 116 91.0 64.4000
15 41 B 117 13.5 7.3333
16 41 B 118 63.5 79.2500
17 42 A 201 50.5 70.3333
18 42 A 203 84.5 16.1429

You see that there are 16 observations for STUDY=41 and TRT=B, but no observations for TRT=A. Now we briefly review the effects of the empty cell on the analysis methods shown in Section 5.2.

5.3.1 The Effect of Empty Cells on Types of Sums of Squares

Empty cells create problems with ANOVA computations associated with difficulties in specifying meaningful hypotheses. Run the statements

proc glm data=drugs1;
   class study trt;
   model flush=trt study trt*study / ss1 ss2 ss3 ss4;
run;

Results appear in Output 5.8.

Output 5.8 Four Types of ANOVA Tables for Data Set with Empty Cell

Unbalanced Two-way Classification
 
The GLM Procedure
 
Dependent Variable: FLUSH  
 
  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 18 22350.89135 1241.71619 2.38 0.0027
 
Error 129 67361.38451 522.18128    
 
Corrected Total 147 89712.27586      
 
R-Square Coeff Var Root MSE FLUSH Mean
 
0.249140 81.14483 22.85129 28.16111
 
  Source DF Type I SS Mean Square F Value Pr > F
 
  TRT 1 3065.96578 3065.96578 5.87 0.0168
  STUDY 9 10772.33900 1196.92656 2.29 0.0202
  TRT*STUDY 8 8512.58656 1064.07332 2.04 0.0468
 
  Source DF Type II SS Mean Square F Value Pr > F
 
  TRT 1 1377.55072 1377.55072 2.64 0.1068
  STUDY 9 10772.33900 1196.92656 2.29 0.0202
  TRT*STUDY 8 8512.58656 1064.07332 2.04 0.0468
 
  Source DF Type III SS Mean Square F Value Pr > F
 
  TRT 1 1843.57209 1843.57209 3.53 0.0625
  STUDY 9 10261.36525 1140.15169 2.18 0.0272
  TRT*STUDY 8 8512.58656 1064.07332 2.04 0.0468
 
  Source DF Type IV SS Mean Square F Value Pr > F
 
  TRT 1* 1843.572090 1843.572090 3.53 0.0625
  STUDY 9* 7462.538828 829.170981 1.59 0.1254
  TRT*STUDY 8 8512.586561 1064.073320 2.04 0.0468
 
* NOTE: Other Type IV Testable Hypotheses exist which may yield different SS.

Compare Output 5.8 with Output 5.3. In general, you see different values in the two tables for the Types I, II and III sums of squares for TRT and STUDY. Also, the Types III and IV sums of squares for STUDY are different from each other in Output 5.8. These differences illustrate the fact that the associated hypotheses are different. Details of the specific hypotheses are discussed in Chapter 6.

5.3.2 The Effect of Empty Cells on CONTRAST, ESTIMATE, and LSMEANS Results

Run the statements

estimate 'trtB-trtA' trt -1 1;
contrast 'trtB-trtA' trt -1 1;
lsmeans trt / stderr pdiff;
run;

Results appear in Output 5.9.

Output 5.9 Effects of Empty Cell on LSMEANS

Unbalanced Two-way Classification
 
The GLM Procedure
Least Squares Means
 
  Standard  
TRT FLUSH LSMEAN Error  Pr > |t|
 
A Non-est  ⋅         ⋅    
B 33.3733489 3.4166700 <.0001

No output is given from the CONTRAST and ESTIMATE statements because the underlying linear combinations of parameters are non-estimable. This message appears in the SAS log. The LSMEAN for TRT=A also is non-estimable because the empty cell was for drug A. Non-estimability is discussed in detail in Chapter 6.

5.4 Some Problems with Unbalanced Mixed-Model Data

In Chapter 4 you read about statistical analysis of data with random effects. The methods discussed there were in the setting of balanced data. The statistical issues concerned construction of F-tests and standard errors of estimates that take into account multiple sources of random variation in the data. Applications in Chapter 4 illustrated both analysis-of-variance methods using the GLM procedure, and mixed model methods using the MIXED procedure. In Sections 5.2 and 5.3 you read about statistical analysis of unbalanced data. The critical issues were constructing meaningful linear combinations of model parameters for estimation and hypothesis testing. In the present section we address problems of analyzing unbalanced data with random effects. We must identify statistical procedures that simultaneously define meaningful linear combinations of model parameters and account for multiple sources of random variation. As in the case of balanced data, we illustrate two approaches, analysis of variance using the GLM procedure, and mixed-model methodology using PROC MIXED.

We return to the clinical trial example of Section 5.1. Now we assume that the clinics were chosen from a population of clinics, and that the objective is to make inference about the drugs that is relevant to the entire population of clinics. Thus, we consider clinics to be a random factor. Ideally, the clinics would be chosen as a random sample from the population of clinics, but this is not realistic. Instead, we assume that the clinics in the data set reasonably represent the population of clinics as would a truly random sample of clinics. The statistical model is

yijk = μ + αi+ bj + (αb)ij + eijk

where

yijk

is the FLUSH measurement on the kth patient assigned to drug i in clinic j.

μ + αi

is the mean FLUSH for drug i.

bj

is the random effect associated with clinic j.

b)ij

is the random interaction effect associated with drug i and clinic j.

eijk

is the random error associated with the kth patient assigned to drug i in clinic j.

We assume the bj random variables for clinics are normally and independently distributed with mean 0 and variance σSTUDY2 and the (αb)ij random variables for DRUG*CLINIC interaction are normally and independently distributed with mean 0 and variance σSTUDY*TRT2. Also, we assume the eijk random variables for patients are normally and independently distributed with mean 0 and variance σ2.

This is the same model introduced in Chapter 4 for a two-way classification mixed model. The only distinction is that the number of observations in each clinic-drug cell may change from cell to cell. The objectives are the same as in the balanced case. This is an important point: The failure to obtain the same number of observations in each cell should not influence the objectives of the research.

There are basically two approaches to analyzing unbalanced mixed-model data—ANOVA and mixed-model methods. In the context of SAS/STAT procedures, ANOVA means using the GLM procedure, and mixed-model methods means using the MIXED procedure. You saw both approaches applied to balanced data in Chapter 4. Both approaches result in approximate methods for unbalanced mixed-model data. We illustrate ANOVA methods in Section 5.5 and mixed-model methods in Section 5.6.

5.5 Using the GLM Procedure to Analyze Unbalanced Mixed-Model Data

The term “ANOVA” methods refers to adapting analysis-of-variance computations for statistical inference with mixed-model data. The computations have their basis in comparing fixed-effects models, but have been found useful in comparing mixed models. However, there are some troublesome difficulties. First, it is not clear how to choose a mean square to measure the effect we want to test. Second, it also is not clear how to choose a mean square for the denominator of the test. As in the balanced data case, expected mean squares are used to determine appropriate denominators for F-tests, but coefficients for variance components do not match with coefficients in the numerator. Third, the two mean squares are usually not independent, so the ratio does not have a true F-distribution. The same difficulties carry over to contrasts and standard errors for linear combinations.

5.5.1 Approximate F-Statistics from ANOVA Mean Squares with Unbalanced Mixed-Model Data

Run the statements

proc glm data=drugs;
   class study trt;
   model flush=trt study trt*study / ss1 ss2 ss3;
run;

Results appear in Output 5.10.

Output 5.10 Three Types of ANOVA Tables for the FLUSH Data Set

Unbalanced Two-way Classification
 
The GLM Procedure
 
Dependent Variable: FLUSH          
  Sum of  
  Source DF Squares   Mean Square  F Value  Pr > F
 
  Model 17   16618.75357 977.57374 2.24 0.0063
 
  Error 114 49684.09084 435.82536
 
  Corrected Total 131 66302.84440
 
  Source DF Type I SS Mean Square F Value Pr > F
 
  TRT 1 1134.560964 1134.560964 2.60 0.1094
  STUDY 8 6971.606045 871.450756 2.00 0.0526
  TRT*STUDY 8 8512.586561 1064.073320 2.44 0.0178
 
  Source DF Type II SS Mean Square F Value Pr > F
 
  TRT 1 1377.550724 1377.550724 3.16 0.0781
  STUDY 8 6971.606045 871.450756 2.00 0.0526
  TRT*STUDY 8 8512.586561 1064.073320 2.44 0.0178
 
  Source DF Type III SS Mean Square F Value Pr > F
 
  TRT 1 1843.572090 1843.572090 4.23 0.0420
  STUDY 8 7081.377266 885.172158 2.03 0.0488
  TRT*STUDY 8 8512.586561 1064.073320 2.44 0.0178

The Types I, II, and III sums of squares are the same as in Output 5.3. The first task is to choose a mean square for the numerator of an F-statistic to test for the difference between drug means. The technical considerations in doing so are very different from those faced with fixed-effects models. Essentially, we want to select a mean square that measures the effect we want to test with the least amount of random variation. In general, it is not clear which of Types I, II, or III mean squares to use for this purpose. See Littell (1996) for details. Without further justification, we will use the Type III mean square for TRT as the numerator of an F-statistic. However, we return to this problem in Section 6.5.1.

The next task is to select a denominator for the test. While this choice is not totally clear, at least we have some useful criteria for the choice in the expected mean squares. You learned in Chapter 4 how to use the expected mean squares to select a mean square for the denominator of an F-statistic whose expectation matches the expectation of the numerator mean square under the null hypothesis. Run the RANDOM statement to obtain the expected mean squares:

random study trt*study / test;

Results appear in Output 5.11.

Output 5.11 Results of the RANDOM Statement

Unbalanced Two-way Classification
 
The GLM Procedure
 
Source Type III Expected Mean Square
 
TRT Var(Error) + 4.6613 Var(TRT*STUDY) + Q(TRT)
 
STUDY Var(Error) + 7.0585 Var(TRT*STUDY) + 14.117 Var(STUDY)
 
TRT*STUDY Var(Error) + 7.0585 Var(TRT*STUDY)
 
Tests of Hypotheses for Mixed Model Analysis of Variance
 
Dependent Variable: FLUSH  
 
  Source DF   Type III SS  Mean Square  F Value  Pr > F
 
  TRT 1 1843.572090 1843.572090 2.17 0.1674
 
  Error 11.689 9943.710652 850.710718
  Error: 0.6604*MS(TRT*STUDY) + 0.3396*MS(Error)

You see that the expected mean square for TRT is σ2 + 4.66 σSTUDY*TRT2 + ϕ2 (TRT). The Q option (see Chapter 4) could be used to discover that ϕ2(TRT)=20.97 (α1 – α2)2. Thus, under the null hypothesis H0: α1 – α2 = 0, the expected mean square for TRT is σ2 + 4.66 σSTUDY*TRT2. We want to obtain another mean square with this expectation to use as the denominator for the F-statistic. Unfortunately, none is directly available, so a combination of mean squares must be used. The TEST option in the RANDOM statement instructs GLM to compute such a combination, shown in Output 5.11 as 0.66*MS(STUDY*TRT) + 0.34*MS(Error), with Satterthwaite’s approximate DF=11.69. The approximate F-statistic has value F=2.17 and significance probability p=0.1674. Thus, the difference between drugs is less significant when making inference to the population of clinics instead of to the set of clinics in the data set.

You should remember that the significance probability for an ANOVA F-test is only approximate due to the complications of unbalanced data. The F-statistic does not have a true F-distribution for two reasons: One, the denominator is a linear combination of mean squares, but is not distributed as a constant-times-a-chi-squared random variable. Two, the numerator and denominator of the F-ratio are not independent. Nonetheless, statistics obtained in this manner are very useful and sometimes provide the only available means of statistical inference.

Expected mean squares also can be used to estimate variance components, as you learned in Chapter 4 with balanced data. To do this, equate the mean squares to their expectations and solve for the values of the variances estimates. These are called ANOVA variance component estimates. Here are the equations to solve to obtain the ANOVA estimates:

σ^2 + 7.06σ^STUDY*TRT2 + 14.12σ^STUDY2 =  885.17
σ^2 + 7.06 σ^STUDY*TRT2 = 1064.07
σ^2 =  435.83

The last equation gives σ^2 = 435.83. Next, substitute σ^2 = 435.83 into the second equation to obtain 435.83 + 7.06 σ^STUDY*TRT2 = 1064.07 and solve for σ^STUDY*TRT2 = 88.99. Finally, substitute σ^2 = 435.83 and σ^STUDY*TRT2 = 88.99 into the first equation to obtain 435.83 + 7.06 (88.99) + 14.12 σ^STUDY2 = 885.17. Solving this equation gives σ^STUDY2 = –12.67. Since the variances are positive numbers by definition, a negative estimate is not satisfactory. Zero is often substituted instead of the negative estimate. However, this has ripple effects on other issues, such as the standard errors of estimates and test statistics that utilize the variance component estimates in their computations. From this perspective, there are legitimate reasons for not routinely setting negative estimates to zero. This problem is not limited to unbalanced data. Refer to Section 4.4.2.

5.5.2 Using the CONTRAST, ESTIMATE, and LSMEANS Statements in GLM with Unbalanced Mixed-Model Data

You must be very careful in using the CONTRAST, ESTIMATE, and LSMEANS statements with mixed-model data because their output is not automatically modified to accommodate random effects when you specify a RANDOM statement. In some cases it is not possible to appropriately modify the results. These comments pertain to both the balanced and unbalanced situations.

Run the statements

contrast ‘trtA-trtB’ trt 1 –1;
estimate ‘trtA-trtB’ trt 1 –1;
lsmeans trt / pdiff;
random study trt*study;
run;

Results appear in Output 5.12.

Output 5.12 Results of CONTRAST, ESTIMATE, and LSMEANS Statements with the RANDOM Statement

Unbalanced Two-way Classification
 
Contrast DF Contrast SS Mean Square F Value Pr > F
 
trtA 1 1843.572090 1843.572090 4.23 0.0420
 
    Standard    
Parameter Estimate Error t Value Pr > |t|
 
 
trtB-trtA 9.37497409 4.55823028 2.06 0.0420
 
Least Squares Mean
 
  H0:LSMean1=
Standard   H0:LSMEAN=0 LSMean2 
TRT FLUSH LSMEAN Error Pr > |t| Pr > |t|
 
A 22.5913628   3.0141710 <.0001 0.0420
B 31.9663369 3.4193912 <.0001
 
Contrast   Contrast Expected Mean Square
 
trtB-trtA   Var(Error) + 4.6613 Var(TRT*STUDY) + Q(TRT)

You see that the F-test for trtB-trtA is the same as in the fixed case in Output 5.4. The expected mean square for the CONTRAST statement is printed when the RANDOM statement follows the CONTRAST statement. The expected mean square for trtB-trtA in Output 5.12 indicates an appropriate denominator for the F-statistic to be σ2 + 4.66 σSTUDY*TRT2. (Since there is only one degree of freedom for TRT, the sum of squares for the contrast trtB-trtA is the same as the Type III sum of squares for TRT in the ANOVA table. This would not be true with more degrees of freedom for TRT.) There is no mean square with this expectation. Thus, you cannot directly obtain an F-statistic for the contrast that has an appropriate denominator. You can use the expected mean squares to determine an appropriate combination of mean squares and then compute the F-statistic by hand. In this case, we know from the ANOVA results in Section 5.5.1 that the appropriate combination is 0.66*MS(STUDY*TRT) + 0.34*MS(Error), and has Satterthwaite’s approximate DF = 11.69. The appropriate F-statistic is then F=1843.57 / (0.66(1064.07) + 0.34(435.82)) = 2.17 and it has significance probability p=0.1674.

The ESTIMATE statement cannot be modified to accommodate random effects. Therefore, the standard error and t-statistic for ESTIMATE statement are usually invalid with mixed-model data.

You can specify an option E= effect in the LSMEANS statement, where effect is an effect in the MODEL statement. This sometimes is useful for declaring an appropriate mean square to compute standard errors of differences for LSMEANS, but almost certainly does not specify appropriate computations for standard errors of individual LSMEANS. With unbalanced data there usually is no effect whose expected mean square provides the correct linear combination of variance components.

In summary, the expected mean squares for CONTRAST statements can be useful for determining appropriate combinations of mean squares for the denominator of F-statistics. Unfortunately, these are not computed automatically with the TEST option on the RANDOM statement. Standard errors for ESTIMATE and LSMEANS statements cannot be computed correctly in most cases.

Many of the shortcomings of the GLM procedure for analyzing unbalanced mixed-model data can be overcome by using the MIXED procedure, as you will see in the next section.

5.6 Using the MIXED Procedure to Analyze Unbalanced Mixed-Model Data

The MIXED procedure is used in the same way with unbalanced data as it is with balanced data. Run the statements

proc mixed data=drugs;
   class study trt;
   model flush=trt / ddfm=satterth;
   random study study*trt;
   contrast ‘trtB-trtA’;
   estimate ‘trtB-trtA’;
   lsmeans trt;
run;

Results appear in Output 5.13.

Output 5.13 Results of the MIXED Procedure with Unbalanced Data

Unbalanced Two-way Classification
 
The Mixed Procedure
 
Model Information
 
Data Set WORK.DRUGS
Dependent Variable FLUSH
Covariance Structure Variance Components
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Model-Based
Degrees of Freedom Method Satterthwaite
 
Covariance Parameter
Estimates
 
Cov Parm Estimate
 
STUDY 0
TRT*STUDY 75.3629
Residual 447.57
 
Type 3 Tests of Fixed Effects
 
  Num Den    
Effect DF DF F Value Pr > F
 
TRT 1 9.3 1.88 0.2028
 
Estimates
 
    Standard   
Label Estimate Error   DF   t Value   Pr > |t|
 
trtB-trtA 7.8198 5.7076 9.3 1.37 0.2028
 
Contrasts
 
  Num Den    
Label DF DF F Value Pr > F
 
trtB-trtA 1 9.3 1.88 0.2028
 
Least Squares Means
 
  Standard  
Effect   TRT Estimate Error DF  t Value  Pr > |t|
 
TRT   A 22.3593 4.0316 9.6 5.55 0.0003
TRT   B 30.1791 4.0401 9 7.47 >.0001
 
        Standard      
Effect TRT _TRT Estimate Error  DF   t Value   Pr > |t|
 
TRT A B -7.8198 5.7076 9.3 -1.37 0.2028

First of all, you see that the REML estimate of σSTUDY2 is 0. (Recall that the ANOVA estimate you would obtain using the expected mean squares in Output 5.11, was negative.) The REML estimates of σSTUDY*TRT2 and σ2 are 75.36 and 447.57, respectively.

You see that the F-statistic is equal to 1.88, with p-value equal to 0.2028 in the “Type 3 Tests of Fixed Effects.” This is the test for the TRT null hypothesis H0: α1 – α2 = 0. The results are similar to the test using expected mean squares from the GLM procedure in Output 5.11.

The ESTIMATE statement produces an estimated difference equal to 7.82, with standard error 5.71. The resulting t-statistic for testing the null hypothesis H0: α1 - α2 = 0 has value t=1.37 and significance probability p=0.2079.

The CONTRAST statement produces results equivalent to the F-test in “Type 3 Tests of Fixed Effects.”

The LSMEANS statement produces LS means of 22.36 for drug A and 30.18 for treatment B. These are slightly different from the LS means produced by GLM in Output 5.12. More importantly, the standard errors of LS means in Output 5.13 are larger than standard errors in Output 5.12 because the MIXED procedure correctly computes the standard errors. Likewise, the difference between LS means in Output 5.13 is less significant than in Output 5.12 because the MIXED procedure computes a t-statistic that is more nearly valid that does the GLM procedure. Details on these computations are described in Chapter 6.

5.7 Using the GLM and MIXED Procedures to Analyze Mixed-Model Data with Empty Cells

Refer once more to the DRUGS1 data set, which has no data for TRT=A in STUDY=41 (See Output 5.7). Run the statements

proc glm data=drugs1;
   class study trt;
   model flush=trt study trt*study / ss1 ss2 ss3 ss4;
   random study trt*study / test;

run;

You would get exactly the same ANOVA results from the MODEL statement that you saw in Output 5.7.

The RANDOM statement produces Types III and IV expected mean squares, as shown in Output 5.14

Output 5.14 Results of the RANDOM Statement with Empty Cells

Unbalanced Two-way Classification
 
The GLM Procedure
 
Source   Type III Expected Mean Square
 
TRT   Var(Error) + 4.6613 Var(TRT*STUDY) + Q(TRT)
 
STUDY   Var(Error) + 7.8109 Var(TRT*STUDY) + 14.111 Var(STUDY)
 
TRT*STUDY   Var(Error) + 7.0585 Var(TRT*STUDY) Unbalanced Two-way Classification
 
The GLM Procedure
Tests of Hypotheses for Mixed Model Analysis of Variance
 
Source DF Type III SS Mean Square F Value Pr > F
 
TRT 1 1843.572090 1843.572090 2.09 0.1724
 
Error 12.498 10999 880.038506
Error: 0.6604*MS(TRT*STUDY) + 0.3396*MS(Error)
 
Source   Type IV Expected Mean Square
 
TRT   Var(Error) + 4.6613 Var(TRT*STUDY) + Q(TRT)
 
STUDY   Var(Error) + 7.0961 Var(TRT*STUDY) + 13.16 Var(STUDY)
 
TRT*STUDY   Var(Error) + 7.0585 Var(TRT*STUDY) Unbalanced Two-way Classification
 
Tests of Hypotheses for Mixed Model Analysis of Variance
 
Source DF Type IV SS Mean Square F Value Pr > F
 
TRT 1 1843.572090 1843.572090 2.09 0.1724
 
Error 12.498 10999 880.038506
Error: 0.6604*MS(TRT*STUDY) + 0.3396*MS(Error)

Types III and IV expected mean squares are the same for TRT because there are only two levels of the factor. Types III and IV expected mean squares for STUDY differ only slightly. A greater prevalence of empty cells would tend to cause a greater difference between all aspects of Type III and Type IV, including the expected mean squares. Additional detail is presented in Chapter 6 on the Type III and Type IV distinction.

The F-tests based on the Types III and IV mean squares for TRT are the same, with F=2.09 and p=0.1724. The results could differ if TRT had more levels. As discussed following Output 5.11, there are no definitive reasons for using one of these tests instead of the other. This point is discussed further in Chapter 6.

CONTRAST, ESTIMATE, and LSMEANS statements with PROC GLM would produce the same results as in the fixed-effects case because the LSMEAN for TRT A is non-estimable. (See Output 5.9.)

The MIXED procedure is used in the same way with unbalanced data as it is with balanced data, even with empty cells. Run the statements

proc mixed data=drugs;
   class study trt;
   model flush=trt / ddfm=satterth;
   random study study*trt;
   contrast ‘trtB-trtA’;
   estimate ‘trtB-trtA’;
   lsmeans trt;
run;

Edited results appear in Output 5.15.

Output 5.15 Results of the MIXED Procedure with Unbalanced Data

The Mixed Procedure
 
Model Information
 
Covariance Parameter
Estimates
 
Cov Parm Estimate
 
STUDY 0
TRT*STUDY 77.0369
Residual 530.50
 
Type 3 Tests of Fixed Effects
 
  Num Den    
Effect DF DF F Value Pr > F
 
TRT 1 12.4 2.96 0.1103
 
Estimates
 
  Standard   
Label Estimate Error DF t Value Pr > |t|
 
trtB-trtA 9.9035 5.7585 12.4 1.72 0.1103
 
Contrasts
 
  Num Den    
Label DF DF F Value Pr > F
 
trtB-trtA 1 12.4 2.96 0.1103
 
Least Squares Means
 
    Standard  
Effect   TRT Estimate Error DF  t Value  Pr > |t|
 
TRT   A 22.3908 4.2200 13.4 5.31 0.0001
TRT   B 32.2943 3.9182 11.4 8.24 <.0001

Results of tests, estimates and standard errors are similar, but not identical, to those in the case of no empty cell in Output 5.13. Do not expect this to always happen. Generally speaking, with more prevalent empty cells, you can expect more different results.

There is a very important point to be observed in the comparison of results for unbalanced data analysis with and without empty cells. Empty cells cause non-estimability of certain LSMEANS and linear functions of model parameters. Thus, ESTIMATE and CONTRAST statements will not produce output if they specify non-estimable linear functions. This occurred when using PROC GLM with the data set DRUGS1, both for the fixed and mixed-model analyses, because GLM makes the same essential computations with or without a RANDOM statement. In other words, estimability is judged by GLM considering all effects fixed, even though a RANDOM statement is used. PROC MIXED, on the contrary, judges estimability only in terms of fixed effects. That is why complete results were presented for the LSMEANS, ESTIMATE, and CONTRAST statements in Output 5.15.

5.8 Summary and Conclusions about Using the GLM and MIXED Procedures to Analyze Unbalanced Mixed-Model Data

The GLM and MIXED procedures both have certain capabilities for analysis of mixed-model data, as described in Chapter 4. The GLM capabilities are oriented around analysis of variance, based on an ordinary least squares fit of the model, in which random effects are treated as fixed effects. The RANDOM statement in PROC GLM produces expected mean squares, which can be used to construct F-statistics for tests of hypotheses. In balanced data situations, these F-statistics are often “exact,” meaning that the distribution of the statistic, under the null hypothesis, has a true F-distribution. In unbalanced data applications, the distributions are only approximate, but still useful, and must be used with caution. Moreover, there are no definitive guidelines for selecting a “type” of sum of squares for the numerator of the F-statistic. Standard errors computed by PROC GLM for LS means, differences between LS means, and ESTIMATE statements are generally unreliable. There are methods for determining appropriate standard errors of estimates from ESTIMATE and LSMEANS statements using the CONTRAST and RANDOM statements (Littell and Linda 1990; Milliken and Johnson 1994, Chapter 28), but these are tedious and are not feasible for many users.

The MIXED procedure, on the other hand, uses true mixed-model methodology. It builds the parameters for the random effects into the statistical model through the covariance structure, using either the RANDOM or REPEATED statement. Test statistics, and estimates, and standard errors of estimates for fixed effects are computed from principles of generalized least squares, with random effects parameters replaced by their estimates (see Chapter 6). Estimates computed in this manner are called estimated generalized least squares estimates. They are unbiased, and their standard errors are computed on the basis of a valid formula, except that the standard errors do not account for the variation in the random effects parameter estimates. In most cases, this is not a serious problem. Test statistics for fixed effects are also computed using basically sound methodology, with the same exception that variation due to estimation of the random effects parameters is ignored. Determination of degrees of freedom for variation estimates is complicated, especially in unbalanced data. PROC MIXED allows several options for assessing degrees of freedom.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.15.149