Most persons who have analyzed data have experienced “unbalanced data,” although the term is hard to precisely define. Usually, it refers to having different numbers of observations in different groups of data. For example, a data set contained final exam scores from a statistics class at a university. The class had 66 students from four colleges, but there were different numbers of students from the various colleges. The set of exam scores would make up an unbalanced one-way classification of data. However, the main problems associated with analyzing unbalanced data do not occur in a one-way classification. In the statistics class, there were both part-time and full-time students within each college, and only coincidentally were there the same number of each from a given college. Thus, the scores of final exams constitute a two-way classification of data with differing numbers of observations in the combinations of college and status. A combination of college and status makes a cell of data. With this terminology, we could say that unbalanced data refers to data sets with different numbers of observations in the cells. You might consider this situation an “observational study” because the numbers of students in the various groups were not controlled. The instructor would be faced with problems of analyzing unbalanced data if he or she wanted to compare mean exam scores of full-time with part-time students, averaged across colleges. The basic problem is to decide what weight to attach to the mean scores for each status group in each college.

In another example, a pharmaceutical company compared effects of two drugs, A and B, on a clinical measurement called “flush.” The study utilized patients in 10 clinics. Multiple clinics were used in order to obtain representation of diverse patient populations. The original plan called for each clinic to assign the two drugs to 15 patients within each clinic. However, there were not enough patients at the clinics, so all patients available were randomly assigned to the two drugs. This was basically done, but a few patients abandoned the trial before completion, leaving unequal numbers of patients on the two drugs within some of the clinics. In addition, the availability of patients varied between the clinics, ranging from three to 28. Thus, even though this was a designed experiment, realities of the situation resulted in unbalanced data. A statistical comparison of mean flush between the two drugs would raise the question of how to assign weights to the individual means.

Analysis of unbalanced data received sporadic attention for several decades, but the attention intensified in the 1970’s when computer programs such as PROC GLM became readily accessible. Most of the writing focused on the fixed-effects case, prompted in part by the different types of sums of squares in PROC GLM. Other popular texts that discuss fixed-effects issues include Milliken and Johnson (1991), Hocking (1986), and Searle (1987). Analysis of unbalanced mixed-model data still contains many mysteries. The GLM procedure contains certain capabilities that are adoptions of fixed-effects computations, but there has been relatively little concrete description of how to use them. This prompted a re-evaluation of how to analyze mixed-model data in the late 1980’s and PROC MIXED implemented newer methodology based on generalized least squares and likelihood-based methods.

The purpose of this chapter is to illustrate methods that are available in PROC GLM and PROC MIXED for analyzing unbalanced data. In Sections 5.2 and 5.3 you will see the issues of analyzing fixed-effect unbalanced data presented on a conceptual level using the clinical trial example described above. Then, in later sections, you will see ANOVA and generalized least squares and likelihood methods for analysis of unbalanced mixed-model data.

5.2 Applied Concepts of Analyzing Unbalanced Data

The FLUSH measurements from the pharmaceutical study are recorded in a SAS data set named DRUGS. A portion of the data set is printed in Output 5.1. Variables in the data set include STUDY, TRT, PATIENT, FLUSH0, and FLUSH. The values of FLUSH0 were obtained prior to administration of the drugs, but are not used in the discussions in this chapter.

Output 5.1 Data Set DRUGS

Unbalanced Two-way Classification

OBS	STUDY	TRT	PATIENT	FLUSH0	FLUSH

1	42	A	201	50.5	70.3333
2	42	A	203	84.5	16.1429
3	42	B	202	33.5	28.3333
4	43	A	302	22.0	14.5000
5	43	A	305	23.0	25.5000
6	43	A	306	22.0	12.2500
7	43	A	307	13.0	3.1250
8	43	A	310	50.5	51.1250
9	43	A	313	57.0	49.2500
10	43	A	316	13.5	1.6250
11	43	A	317	36.5	29.5000
12	43	A	321	59.0	30.5000
13	43	A	322	30.5	33.5000
14	43	A	323	10.5	2.2500
15	43	A	325	37.0	13.8750
16	43	A	327	35.5	21.0000
17	43	A	329	28.0	16.0000
18	43	B	301	40.5	17.5000
19	43	B	303	12.5	8.8333
20	43	B	304	47.5	40.0000
21	43	B	308	34.5	23.1429
22	43	B	309	15.5	3.1250
23	43	B	311	43.0	35.8750
24	43	B	314	30.0	31.6250
25	43	B	315	27.5	16.0000
26	43	B	318	62.0	41.1250
27	43	B	319	105.0	44.7500
28	43	B	324	38.5	43.1250
29	43	B	326	7.0	15.0000
30	43	B	328	8.0	4.5000
31	43	B	330	30.5	19.0000
32	44	A	401	46.0	14.8750
33	44	A	405	36.5	2.9231
34	44	A	406	22.5	2.8000
35	44	A	408	21.5	1.3750
36	44	A	409	27.0	22.0000
37	44	A	411	46.5	⋅
38	44	B	402	14.0	4.7778
39	44	B	403	23.0	3.6667
40	44	B	404	30.0	17.1250
41	44	B	407	19.0	22.3636
42	44	B	410	67.5	18.1667
43	44	B	412	12.0	2.0000
44	45	A	502	60.0	62.0000
45	45	A	503	36.0	13.6250
46	45	A	506	24.0	1.0000
47	45	A	507	29.0	24.1250
48	45	A	510	12.5	11.5000
49	45	A	512	82.5	84.0000
50	45	A	513	31.5	0.6250
51	45	A	515	53.0	45.0000
52	45	A	518	56.0	43.7500
53	45	A	519	23.0	7.3750
54	45	A	520	48.5	43.1429
55	45	A	527	16.0	⋅
56	45	B	501	34.0	30.0000
57	45	B	504	74.5	38.3750
58	45	B	505	22.0	25.3750
59	45	B	508	7.0	2.8750
60	45	B	509	13.0	8.1250
61	45	B	511	34.5	28.8750
62	45	B	514	20.5	22.6000
63	45	B	516	75.5	37.0000
64	45	B	517	50.0	59.1250
65	45	B	529	27.5	26.8000
66	45	B	530	49.0	33.0000
67	46	A	601	31.0	5.0000
68	46	A	602	53.0	20.8750
69	46	A	605	28.0	16.0000
70	46	A	608	21.5	7.5000
71	46	A	609	11.5	3.3750
72	46	A	611	59.0	35.6250
73	46	B	603	39.0	50.0000
74	46	B	604	65.0	43.0000
75	46	B	606	43.5	41.0000
76	46	B	607	25.0	8.5000
77	46	B	610	26.5	0.5000
78	46	B	629	27.5	15.5000
79	46	B	630	19.5	11.1250

Output 5.2 shows summary statistics for FLUSH for each combination of STUDY and TRT.

Output 5.2 Summary Statistics for the FLUSH Data Set

Unbalanced Two-way Classification

Analysis Variable : FLUSH


STUDY	TRT	N Obs	Mean	N	Std Dev	Minimum	Maximum
42	A	2	43.2381000	2	38.3183993	16.1429000	70.3333000
	B	1	28.3333000	1	⋅	28.3333000	28.3333000

43	A	14	21.7142857	14	15.8805206	1.6250000	51.1250000
	B	14	24.5429429	14	14.6750757	3.1250000	44.7500000

44	A	6	8.7946200	5	9.1762473	1.3750000	22.0000000
	B	6	11.3499667	6	8.8404391	2.0000000	22.3636000

45	A	12	30.5584455	11	27.1736184	0.6250000	84.0000000
	B	11	28.3772727	11	14.9979268	2.8750000	59.1250000

46	A	6	14.7291667	6	12.2625998	3.3750000	35.6250000
	B	7	24.2321429	7	19.8164006	0.5000000	50.0000000

47	A	6	20.8777833	6	7.8619415	6.8750000	28.7500000
	B	7	49.0178571	7	30.9056384	4.3750000	92.1250000

48	A	8	21.7857143	7	13.8723185	7.0000000	42.5000000
	B	8	22.5732500	8	14.7692870	3.8750000	49.2500000

49	A	10	32.1554000	10	31.0770557	0.1250000	79.7500000
	B	10	27.3953000	10	24.3228740	0	59.6250000

50	A	4	9.4687500	4	7.4486961	2.1250000	18.7500000
	B	3	71.8750000	3	53.6849898	32.3750000	133.0000000

There are different numbers of observations in the TRT-by-STUDY cells, meaning we have unbalanced data. Note that there is at least one observation for each combination of the factors.

5.2.1 ANOVA for Unbalanced Data

The four types of sums of squares available in PROC GLM are designed to deal with unbalanced data. Run the statements

proc glm data=drugs;
class study trt;
model flush=trt study trt*study / ss1 ss2 ss3;
run;

Results appear in Output 5.3. Types I, II, and III are selected with the options ss1 ss2 ss3 on the MODEL statement. Type IV was not selected because Types III and IV are equal for situations such as this that have at least one observation for each factor combination; that is no empty cells. Types I, II, and III have different values for TRT, but Types I and II are the same for STUDY. Types I, II, and III are the same for TRT*STUDY. The technical reasons for the sameness and differences of the Types of sums of squares are explained in Chapter 6.

The primary objective of the study is to compare TRT means. Before making comparisons of the drugs averaged over clinics, note that the TRT*STUDY interaction is significant at the p=.0178 level. This means that differences between drug A and drug B vary across clinics. In Output 5.2 you see that drug B has a larger mean than drug A for six of the nine clinics. If there is additional information about the clinics, you might want to try to investigate whether characteristics of the clinics can explain the interaction between STUDY and TRT. Recall in Section 3.7 that METHODS were compared separately for each VARIETY due to the presence of METHOD*VARIETY interaction.) Depending on the situation, it may or may not be meaningful to compare the drugs averaged across clinics. For the present situation, we have no other information about the clinics. Also, the clinics were used to obtain representation of the drug differences over a set of clinics. Therefore, even in the face of TRT*STUDY interaction, it could be important and meaningful to compare the drugs averaged over clinics. This would be the case if, for example, you must choose one of the drugs to be used at all clinics.

Output 5.3 Three Types of ANOVA Tables for the FLUSH Data Set

Unbalanced Two-way Classification

The GLM Procedure

Dependent Variable: FLUSH

		Sum of
Source	DF	Squares	Mean Square	F Value	Pr > F

Model	17	16618.75357	977.57374	2.24	0.0063

Error	114	49684.09084	435.82536

Corrected Total	131	66302.84440


R-Square	Coeff Var	Root MSE	drywt Mean

0.250649	80.31125	20.87643	25.99440


Source	DF	Type I SS	Mean Square	F Value	Pr > F

TRT	1	1134.560964	1134.560964	2.60	0.1094
STUDY	8	6971.606045	871.450756	2.00	0.0526
TRT*STUDY	8	8512.586561	1064.073320	2.44	0.0178

Source	DF	Type II SS	Mean Square	F Value	Pr > F

TRT	1	1377.550724	1377.550724	3.16	0.0781
STUDY	8	6971.606045	871.450756	2.00	0.0526
TRT*STUDY	8	8512.586561	1064.073320	2.44	0.0178

Source	DF	Type III SS	Mean Square	F Value	Pr > F

TRT	1	1843.572090	1843.572090	4.23	0.0420
STUDY	8	7081.377266	885.172158	2.03	0.0488
TRT*STUDY	8	8512.586561	1064.073320	2.44	0.0178

Next you must decide which of the three F-tests from the three types of sums of squares is most appropriate for comparing the drugs. The first consideration is to select a test statistic that tests the hypothesis you want to test. Of course, the hypothesis you want to test should have been prescribed at the planning stage of the study, not in the middle of data analysis.

Let μ_ij denote the population for drug i and clinic j. If you are equally interested in each clinic, then a reasonable hypothesis to test is

H₀: μ_A. = μ_B.

where

μ_A. = (1/9)(μ_A1 + μ_A2 + μ_A3 + μ_A4 + μ_A5 + μ_A6 + μ_A7 + μ_A8 + μ_A9)

and

μ_B. = (1/9)(μ_B1 + μ_B2 + μ_B3 + μ_B4 + μ_B5 + μ_B6 + μ_B7 + μ_B8 + μ_B9)

The F-test based on the Type III sum of squares in Output 5.3 gives a test of this hypothesis, which we will refer to as a “Type III hypothesis.” This means that if you test this hypothesis at the .05 level using the Type III F-test, the probability you will make a Type I error is exactly .05.

The Type III hypothesis is a statement of equality of the drug means, averaged across the clinics, with equal weights attached to each clinic. Other hypotheses could be formulated attaching different weights to different clinics. Without overriding reasons for attaching different weights, the type III hypothesis is often reasonable. However, there are other considerations that enter into the decision. For example, power of the Type III test can be very low if sample sizes for some cells are small compared to other sample sizes of other cells.

5.2.2 Using the CONTRAST and ESTIMATE Statements with Unbalanced Data

If there are no empty cells you can use CONTRAST and ESTIMATE statements in the same way you used them with balanced data in Chapter 3. For example, you can test the significance of the difference between the drug means with the CONTRAST statement

contrast trtB-trtA’ trt –1 1;

and you can estimate the difference between the drug means with the statement

estimate ‘trtB-trtA’ trt –1 1;

Results of the CONTRAST and ESTIMATE statements appear in Output 5.4.

Output 5.4 Results of the CONTRAST and ESTIMATE Statements

Unbalanced Two-way Classification

Contrast	DF	Contrast SS	Mean Square	F Value	Pr > F

trtB-trtA	1	1843.572090	1843.572090	4.23	0.0420


		Standard
Parameter	Estimate	Error	t Value	Pr > \|t\|

trtB-trtA	9.37497409	4.55823028	2.06	0.0420

The CONTRAST statement produces the same sum of squares, mean square, F-test and p-value for the difference between drug means that you obtained from the Type III ANOVA F-test in Output 5.4. It is a test of H₀: μ_A. = μ_B..

The difference between the estimates of μ_A. and μ_B. is 9.375, and the standard error of the estimate is 4.558. A t-statistic for testing H₀: μ_A. = μ_B. is t=9.375/4.558=2.06. The p-value for the t-statistic is .0420. This also is the same p-value you got from the Type III F-test in Output 5.3. In most situations, the results of t-tests from ESTIMATE statements are equivalent to F-tests from Type III ANOVA.

5.2.3 The LSMEANS Statement

You can calculate means from unbalanced data using the LSMEANS statement. With balanced data, the LSMEANS statement computes ordinary means. LSMEANS for the drugs are obtained from the statement

lsmeans trt / pdiff;

The PDIFF option is a request for t-tests to compare the LSMEANS. Results appear in Output 5.5.

Output 5.5 Results of the LSMEANS Statement

Unbalanced Two-way Classification

The GLM Procedure
Least Squares Mean

				H0:LSMean1=
		Standard	H0:LSMEAN=0	LSMean2
TRT	FLUSH LSMEAN	Error	Pr > \|t\|	Pr > \|t\|

A	22.5913628	3.0141710	<.0001	0.0420
B	31.9663369	3.4193912	<.0001

The estimate of μ_A. is 22.591, with standard error 3.014, and the estimate of μ_B. is 31.966 with standard error 3.419. The p-value for comparing the means in .0420, which is the same as the p-value you got from the ESTIMATE statement in Output 5.4. In fact, the difference between the means in Output 5.4 is the same as the difference between the LSMEANS in Output 5.5. More information about LSMEANS is given in Chapter 6.

5.2.4 More on Comparing Means: Other Hypotheses and Types of Sums of Squares

In Section 5.2.2 you learned that the F-statistic based on the Type III sum of squares gives a test of the null hypothesis

H₀: μ_A. = μ_B.

where

μ_A. = (1/9)(μ_A1 + μ_A2 + μ_A3 + μ_A4 + μ_A5 + μ_A6 + μ_A7 + μ_A8 + μ_A9)

and

μ_B. = (1/9)(μ_B1 + μ_B2 + μ_B3 + μ_B4 + μ_B5 + μ_B6 + μ_B7 + μ_B8 + μ_B9)

This null hypothesis states that the average of the drug A means is equal to the average of the drug B means, where the averages are computed across the clinics with equal weights for each clinic. In some circumstances you might want to compare averages of the drug means, but with different weights for the clinics. For example, the clinics might serve different patient populations, and you might want to weight the means proportional to the patient population sizes. Let w_j, j = 1, … ,9, denote the relative population sizes, with w₁ + … + w₉= 1. Then the weighted hypothesis would be

H₀:μ^*_A. = μ^*_B. (5.2)

where

μ^*_A. = (1/9)(w₁μ_A1 + … + w₉μ_A9)

and

μ^*_B. = (1/9)(w₁μ_B1 + … + w₉μ_B9)

You could test this hypothesis with the CONTRAST statement. To illustrate, suppose the weights are .03, .20, .09, .17, .1, .1, .1, .14, and .07. The CONTRAST statement would be

contrast ‘trtB – trtA wtd’ trt –1 1
trt*study –.03 –.20 –.09 –.17 –.1 –.1 –.1 –.14 –.07
.03 .20 .09 .17 .1 .1 .1 .14 .07;

Results appear in Output 5.6.

Output 5.6 Results of the CONTRAST Statement for Weighted Hypothesis

Unbalanced Two-way Classification

The GLM Procedure

Dependent Variable: FLUSH

	Contrast	DF	Contrast SS	Mean Square	F Value	Pr > F

trtB-trtA wtd		1	1829.354286	1829.354286	4.20	0.0428

You see that the sum of squares for this CONTRAST statement is different from the sum of squares in Output 5.4 for the equally weighted hypothesis, although not by very much. You also see that the sum of squares for the weighted hypothesis in Output 5.6 is different from the TRT sum of squares for any of the ANOVA tables in Output 5.3. This illustrates that there are different values of sums of squares for TRT depending on the weights assigned to the means. Each type of sum of squares for TRT in Output 5.6 is associated with a certain set of weights. These are explained in detail in Chapter 6.

5.3 Issues Associated with Empty Cells

The data set discussed in Section 5.2 had at least one observation for each cell corresponding to combinations of TRT and STUDY. In the original data set there was another clinic, STUDY=41, that had patients only for drug B. So the cell for that clinic and drug A was empty. Empty cells create another layer of complications in analyzing unbalanced data. In this section we illustrate some of these issues using the original data set, which we call DRUGS1. The first several observations are printed in Output 5.7.

Output 5.7 Partial Printout of Data Set DRUGS1

Unbalanced Two-way Classification

OBS	STUDY	TRT	PATIENT	FLUSH0	FLUSH

1	41	B	102	77.5	72.0000
2	41	B	104	23.5	5.6250
3	41	B	105	63.5	81.8750
4	41	B	106	72.5	83.5000
5	41	B	107	58.0	75.5000
6	41	B	108	49.0	13.7500
7	41	B	109	7.5	9.3750
8	41	B	110	13.5	7.8750
9	41	B	111	13.5	6.0000
10	41	B	112	76.5	61.6000
11	41	B	113	78.5	98.1250
12	41	B	114	56.5	46.1250
13	41	B	115	61.0	24.2500
14	41	B	116	91.0	64.4000
15	41	B	117	13.5	7.3333
16	41	B	118	63.5	79.2500
17	42	A	201	50.5	70.3333
18	42	A	203	84.5	16.1429

You see that there are 16 observations for STUDY=41 and TRT=B, but no observations for TRT=A. Now we briefly review the effects of the empty cell on the analysis methods shown in Section 5.2.

5.3.1 The Effect of Empty Cells on Types of Sums of Squares

Empty cells create problems with ANOVA computations associated with difficulties in specifying meaningful hypotheses. Run the statements

proc glm data=drugs1;
class study trt;
model flush=trt study trt*study / ss1 ss2 ss3 ss4;
run;

Results appear in Output 5.8.

Output 5.8 Four Types of ANOVA Tables for Data Set with Empty Cell

Unbalanced Two-way Classification

The GLM Procedure

Dependent Variable: FLUSH

		Sum of
Source	DF	Squares	Mean Square	F Value	Pr > F

Model	18	22350.89135	1241.71619	2.38	0.0027

Error	129	67361.38451	522.18128

Corrected Total	147	89712.27586


R-Square	Coeff Var	Root MSE	FLUSH Mean

0.249140	81.14483	22.85129	28.16111


	Source	DF	Type I SS	Mean Square	F Value	Pr > F

	TRT	1	3065.96578	3065.96578	5.87	0.0168
	STUDY	9	10772.33900	1196.92656	2.29	0.0202
	TRT*STUDY	8	8512.58656	1064.07332	2.04	0.0468

	Source	DF	Type II SS	Mean Square	F Value	Pr > F

	TRT	1	1377.55072	1377.55072	2.64	0.1068
	STUDY	9	10772.33900	1196.92656	2.29	0.0202
	TRT*STUDY	8	8512.58656	1064.07332	2.04	0.0468

	Source	DF	Type III SS	Mean Square	F Value	Pr > F

	TRT	1	1843.57209	1843.57209	3.53	0.0625
	STUDY	9	10261.36525	1140.15169	2.18	0.0272
	TRT*STUDY	8	8512.58656	1064.07332	2.04	0.0468

	Source	DF	Type IV SS	Mean Square	F Value	Pr > F

	TRT	1*	1843.572090	1843.572090	3.53	0.0625
	STUDY	9*	7462.538828	829.170981	1.59	0.1254
	TRT*STUDY	8	8512.586561	1064.073320	2.04	0.0468

* NOTE: Other Type IV Testable Hypotheses exist which may yield different SS.

Compare Output 5.8 with Output 5.3. In general, you see different values in the two tables for the Types I, II and III sums of squares for TRT and STUDY. Also, the Types III and IV sums of squares for STUDY are different from each other in Output 5.8. These differences illustrate the fact that the associated hypotheses are different. Details of the specific hypotheses are discussed in Chapter 6.

5.3.2 The Effect of Empty Cells on CONTRAST, ESTIMATE, and LSMEANS Results

Run the statements

estimate 'trtB-trtA' trt -1 1;
contrast 'trtB-trtA' trt -1 1;
lsmeans trt / stderr pdiff;
run;

Results appear in Output 5.9.

Output 5.9 Effects of Empty Cell on LSMEANS

Unbalanced Two-way Classification

The GLM Procedure
Least Squares Means

		Standard
TRT	FLUSH LSMEAN	Error	Pr > \|t\|

A	Non-est	⋅	⋅
B	33.3733489	3.4166700	<.0001

No output is given from the CONTRAST and ESTIMATE statements because the underlying linear combinations of parameters are non-estimable. This message appears in the SAS log. The LSMEAN for TRT=A also is non-estimable because the empty cell was for drug A. Non-estimability is discussed in detail in Chapter 6.

5.4 Some Problems with Unbalanced Mixed-Model Data

In Chapter 4 you read about statistical analysis of data with random effects. The methods discussed there were in the setting of balanced data. The statistical issues concerned construction of F-tests and standard errors of estimates that take into account multiple sources of random variation in the data. Applications in Chapter 4 illustrated both analysis-of-variance methods using the GLM procedure, and mixed model methods using the MIXED procedure. In Sections 5.2 and 5.3 you read about statistical analysis of unbalanced data. The critical issues were constructing meaningful linear combinations of model parameters for estimation and hypothesis testing. In the present section we address problems of analyzing unbalanced data with random effects. We must identify statistical procedures that simultaneously define meaningful linear combinations of model parameters and account for multiple sources of random variation. As in the case of balanced data, we illustrate two approaches, analysis of variance using the GLM procedure, and mixed-model methodology using PROC MIXED.

We return to the clinical trial example of Section 5.1. Now we assume that the clinics were chosen from a population of clinics, and that the objective is to make inference about the drugs that is relevant to the entire population of clinics. Thus, we consider clinics to be a random factor. Ideally, the clinics would be chosen as a random sample from the population of clinics, but this is not realistic. Instead, we assume that the clinics in the data set reasonably represent the population of clinics as would a truly random sample of clinics. The statistical model is

y_ijk = μ + α_i+ b_j + (αb)_ij + e_ijk

where

y_ijk	is the FLUSH measurement on the kth patient assigned to drug i in clinic j.
μ + α_i	is the mean FLUSH for drug i.
b_j	is the random effect associated with clinic j.
(αb)_ij	is the random interaction effect associated with drug i and clinic j.
e_ijk	is the random error associated with the kth patient assigned to drug i in clinic j.

We assume the b_j random variables for clinics are normally and independently distributed with mean 0 and variance $σ_{STUDY}^{^{2}}$ $σ_{STUDY}^{^{2}}$ and the (αb)_ij random variables for DRUG*CLINIC interaction are normally and independently distributed with mean 0 and variance $σ_{STUDY*TRT}^{^{2}}$ $σ_{STUDY*TRT}^{^{2}}$ . Also, we assume the e_ijk random variables for patients are normally and independently distributed with mean 0 and variance σ².

This is the same model introduced in Chapter 4 for a two-way classification mixed model. The only distinction is that the number of observations in each clinic-drug cell may change from cell to cell. The objectives are the same as in the balanced case. This is an important point: The failure to obtain the same number of observations in each cell should not influence the objectives of the research.

There are basically two approaches to analyzing unbalanced mixed-model data—ANOVA and mixed-model methods. In the context of SAS/STAT procedures, ANOVA means using the GLM procedure, and mixed-model methods means using the MIXED procedure. You saw both approaches applied to balanced data in Chapter 4. Both approaches result in approximate methods for unbalanced mixed-model data. We illustrate ANOVA methods in Section 5.5 and mixed-model methods in Section 5.6.

5.5 Using the GLM Procedure to Analyze Unbalanced Mixed-Model Data

The term “ANOVA” methods refers to adapting analysis-of-variance computations for statistical inference with mixed-model data. The computations have their basis in comparing fixed-effects models, but have been found useful in comparing mixed models. However, there are some troublesome difficulties. First, it is not clear how to choose a mean square to measure the effect we want to test. Second, it also is not clear how to choose a mean square for the denominator of the test. As in the balanced data case, expected mean squares are used to determine appropriate denominators for F-tests, but coefficients for variance components do not match with coefficients in the numerator. Third, the two mean squares are usually not independent, so the ratio does not have a true F-distribution. The same difficulties carry over to contrasts and standard errors for linear combinations.

5.5.1 Approximate F-Statistics from ANOVA Mean Squares with Unbalanced Mixed-Model Data

Run the statements

proc glm data=drugs;
class study trt;
model flush=trt study trt*study / ss1 ss2 ss3;
run;

Results appear in Output 5.10.

Output 5.10 Three Types of ANOVA Tables for the FLUSH Data Set

Unbalanced Two-way Classification

The GLM Procedure

Dependent Variable: FLUSH
			Sum of
	Source	DF	Squares	Mean Square	F Value	Pr > F

	Model	17	16618.75357	977.57374	2.24	0.0063

	Error	114	49684.09084	435.82536

	Corrected Total	131	66302.84440

	Source	DF	Type I SS	Mean Square	F Value	Pr > F

	TRT	1	1134.560964	1134.560964	2.60	0.1094
	STUDY	8	6971.606045	871.450756	2.00	0.0526
	TRT*STUDY	8	8512.586561	1064.073320	2.44	0.0178

	Source	DF	Type II SS	Mean Square	F Value	Pr > F

	TRT	1	1377.550724	1377.550724	3.16	0.0781
	STUDY	8	6971.606045	871.450756	2.00	0.0526
	TRT*STUDY	8	8512.586561	1064.073320	2.44	0.0178

	Source	DF	Type III SS	Mean Square	F Value	Pr > F

	TRT	1	1843.572090	1843.572090	4.23	0.0420
	STUDY	8	7081.377266	885.172158	2.03	0.0488
	TRT*STUDY	8	8512.586561	1064.073320	2.44	0.0178

The Types I, II, and III sums of squares are the same as in Output 5.3. The first task is to choose a mean square for the numerator of an F-statistic to test for the difference between drug means. The technical considerations in doing so are very different from those faced with fixed-effects models. Essentially, we want to select a mean square that measures the effect we want to test with the least amount of random variation. In general, it is not clear which of Types I, II, or III mean squares to use for this purpose. See Littell (1996) for details. Without further justification, we will use the Type III mean square for TRT as the numerator of an F-statistic. However, we return to this problem in Section 6.5.1.

The next task is to select a denominator for the test. While this choice is not totally clear, at least we have some useful criteria for the choice in the expected mean squares. You learned in Chapter 4 how to use the expected mean squares to select a mean square for the denominator of an F-statistic whose expectation matches the expectation of the numerator mean square under the null hypothesis. Run the RANDOM statement to obtain the expected mean squares:

random study trt*study / test;

Results appear in Output 5.11.

Output 5.11 Results of the RANDOM Statement

Unbalanced Two-way Classification

The GLM Procedure

Source	Type III Expected Mean Square

TRT	Var(Error) + 4.6613 Var(TRT*STUDY) + Q(TRT)

STUDY	Var(Error) + 7.0585 Var(TRT*STUDY) + 14.117 Var(STUDY)

TRT*STUDY	Var(Error) + 7.0585 Var(TRT*STUDY)


Tests of Hypotheses for Mixed Model Analysis of Variance

Dependent Variable: FLUSH

	Source	DF	Type III SS	Mean Square	F Value	Pr > F

	TRT	1	1843.572090	1843.572090	2.17	0.1674

	Error	11.689	9943.710652	850.710718
	Error: 0.6604MS(TRTSTUDY) + 0.3396*MS(Error)

You see that the expected mean square for TRT is σ² + 4.66 $σ_{STUDY*TRT}^{^{2}}$ $σ_{STUDY*TRT}^{^{2}}$ + ϕ² (TRT). The Q option (see Chapter 4) could be used to discover that ϕ²(TRT)=20.97 (α₁ – α₂)². Thus, under the null hypothesis H₀: α₁ – α₂ = 0, the expected mean square for TRT is σ² + 4.66 $σ_{STUDY*TRT}^{^{2}}$ $σ_{STUDY*TRT}^{^{2}}$ . We want to obtain another mean square with this expectation to use as the denominator for the F-statistic. Unfortunately, none is directly available, so a combination of mean squares must be used. The TEST option in the RANDOM statement instructs GLM to compute such a combination, shown in Output 5.11 as 0.66*MS(STUDY*TRT) + 0.34*MS(Error), with Satterthwaite’s approximate DF=11.69. The approximate F-statistic has value F=2.17 and significance probability p=0.1674. Thus, the difference between drugs is less significant when making inference to the population of clinics instead of to the set of clinics in the data set.

You should remember that the significance probability for an ANOVA F-test is only approximate due to the complications of unbalanced data. The F-statistic does not have a true F-distribution for two reasons: One, the denominator is a linear combination of mean squares, but is not distributed as a constant-times-a-chi-squared random variable. Two, the numerator and denominator of the F-ratio are not independent. Nonetheless, statistics obtained in this manner are very useful and sometimes provide the only available means of statistical inference.

Expected mean squares also can be used to estimate variance components, as you learned in Chapter 4 with balanced data. To do this, equate the mean squares to their expectations and solve for the values of the variances estimates. These are called ANOVA variance component estimates. Here are the equations to solve to obtain the ANOVA estimates:

${\hat{σ}}^{2}$ ${\hat{σ}}^{2}$ + 7.06 ${\hat{σ}}_{STUDYTRT}^{^{2}}$ ${\hat{σ}}_{STUDYTRT}^{^{2}}$ + 14.12 ${\hat{σ}}_{STUDY}^{^{2}}$ ${\hat{σ}}_{STUDY}^{^{2}}$	= 885.17
${\hat{σ}}^{2}$ ${\hat{σ}}^{2}$ + 7.06 ${\hat{σ}}_{STUDYTRT}^{^{2}}$ ${\hat{σ}}_{STUDYTRT}^{^{2}}$	= 1064.07
${\hat{σ}}^{2}$ ${\hat{σ}}^{2}$	= 435.83

The last equation gives ${\hat{σ}}^{2}$ ${\hat{σ}}^{2}$ = 435.83. Next, substitute ${\hat{σ}}^{2}$ ${\hat{σ}}^{2}$ = 435.83 into the second equation to obtain 435.83 + 7.06 ${\hat{σ}}_{STUDY*TRT}^{^{2}}$ ${\hat{σ}}_{STUDY*TRT}^{^{2}}$ = 1064.07 and solve for ${\hat{σ}}_{STUDY*TRT}^{^{2}}$ ${\hat{σ}}_{STUDY*TRT}^{^{2}}$ = 88.99. Finally, substitute ${\hat{σ}}^{2}$ ${\hat{σ}}^{2}$ = 435.83 and ${\hat{σ}}_{STUDY*TRT}^{^{2}}$ ${\hat{σ}}_{STUDY*TRT}^{^{2}}$ = 88.99 into the first equation to obtain 435.83 + 7.06 (88.99) + 14.12 ${\hat{σ}}_{STUDY}^{^{2}}$ ${\hat{σ}}_{STUDY}^{^{2}}$ = 885.17. Solving this equation gives ${\hat{σ}}_{STUDY}^{^{2}}$ ${\hat{σ}}_{STUDY}^{^{2}}$ = –12.67. Since the variances are positive numbers by definition, a negative estimate is not satisfactory. Zero is often substituted instead of the negative estimate. However, this has ripple effects on other issues, such as the standard errors of estimates and test statistics that utilize the variance component estimates in their computations. From this perspective, there are legitimate reasons for not routinely setting negative estimates to zero. This problem is not limited to unbalanced data. Refer to Section 4.4.2.

5.5.2 Using the CONTRAST, ESTIMATE, and LSMEANS Statements in GLM with Unbalanced Mixed-Model Data

You must be very careful in using the CONTRAST, ESTIMATE, and LSMEANS statements with mixed-model data because their output is not automatically modified to accommodate random effects when you specify a RANDOM statement. In some cases it is not possible to appropriately modify the results. These comments pertain to both the balanced and unbalanced situations.

Run the statements

contrast ‘trtA-trtB’ trt 1 –1;
estimate ‘trtA-trtB’ trt 1 –1;
lsmeans trt / pdiff;
random study trt*study;
run;

Results appear in Output 5.12.

Output 5.12 Results of CONTRAST, ESTIMATE, and LSMEANS Statements with the RANDOM Statement

Unbalanced Two-way Classification

Contrast	DF	Contrast SS	Mean Square	F Value	Pr > F

trtA	1	1843.572090	1843.572090	4.23	0.0420


		Standard
Parameter	Estimate	Error	t Value	Pr > \|t\|

trtB-trtA	9.37497409	4.55823028	2.06	0.0420


Least Squares Mean

				H0:LSMean1=
		Standard	H0:LSMEAN=0	LSMean2
TRT	FLUSH LSMEAN	Error	Pr > \|t\|	Pr > \|t\|

A	22.5913628	3.0141710	<.0001	0.0420
B	31.9663369	3.4193912	<.0001


Contrast	Contrast Expected Mean Square

trtB-trtA	Var(Error) + 4.6613 Var(TRT*STUDY) + Q(TRT)

You see that the F-test for trtB-trtA is the same as in the fixed case in Output 5.4. The expected mean square for the CONTRAST statement is printed when the RANDOM statement follows the CONTRAST statement. The expected mean square for trtB-trtA in Output 5.12 indicates an appropriate denominator for the F-statistic to be σ² + 4.66 $σ_{STUDY*TRT}^{^{2}}$ $σ_{STUDY*TRT}^{^{2}}$ . (Since there is only one degree of freedom for TRT, the sum of squares for the contrast trtB-trtA is the same as the Type III sum of squares for TRT in the ANOVA table. This would not be true with more degrees of freedom for TRT.) There is no mean square with this expectation. Thus, you cannot directly obtain an F-statistic for the contrast that has an appropriate denominator. You can use the expected mean squares to determine an appropriate combination of mean squares and then compute the F-statistic by hand. In this case, we know from the ANOVA results in Section 5.5.1 that the appropriate combination is 0.66*MS(STUDY*TRT) + 0.34*MS(Error), and has Satterthwaite’s approximate DF = 11.69. The appropriate F-statistic is then F=1843.57 / (0.66(1064.07) + 0.34(435.82)) = 2.17 and it has significance probability p=0.1674.

The ESTIMATE statement cannot be modified to accommodate random effects. Therefore, the standard error and t-statistic for ESTIMATE statement are usually invalid with mixed-model data.

You can specify an option E= effect in the LSMEANS statement, where effect is an effect in the MODEL statement. This sometimes is useful for declaring an appropriate mean square to compute standard errors of differences for LSMEANS, but almost certainly does not specify appropriate computations for standard errors of individual LSMEANS. With unbalanced data there usually is no effect whose expected mean square provides the correct linear combination of variance components.

In summary, the expected mean squares for CONTRAST statements can be useful for determining appropriate combinations of mean squares for the denominator of F-statistics. Unfortunately, these are not computed automatically with the TEST option on the RANDOM statement. Standard errors for ESTIMATE and LSMEANS statements cannot be computed correctly in most cases.

Many of the shortcomings of the GLM procedure for analyzing unbalanced mixed-model data can be overcome by using the MIXED procedure, as you will see in the next section.

5.6 Using the MIXED Procedure to Analyze Unbalanced Mixed-Model Data

The MIXED procedure is used in the same way with unbalanced data as it is with balanced data. Run the statements

proc mixed data=drugs;
   class study trt;
   model flush=trt / ddfm=satterth;
   random study study*trt;
   contrast ‘trtB-trtA’;
   estimate ‘trtB-trtA’;
   lsmeans trt;
run;

Results appear in Output 5.13.

Output 5.13 Results of the MIXED Procedure with Unbalanced Data

Unbalanced Two-way Classification

The Mixed Procedure

Model Information

Data Set	WORK.DRUGS
Dependent Variable	FLUSH
Covariance Structure	Variance Components
Estimation Method	REML
Residual Variance Method	Profile
Fixed Effects SE Method	Model-Based
Degrees of Freedom Method	Satterthwaite


Covariance Parameter
Estimates

Cov Parm	Estimate

STUDY	0
TRT*STUDY	75.3629
Residual	447.57


Type 3 Tests of Fixed Effects

	Num	Den
Effect	DF	DF	F Value	Pr > F

TRT	1	9.3	1.88	0.2028


Estimates

		Standard
Label	Estimate	Error	DF	t Value	Pr > \|t\|

trtB-trtA	7.8198	5.7076	9.3	1.37	0.2028


Contrasts

	Num	Den
Label	DF	DF	F Value	Pr > F

trtB-trtA	1	9.3	1.88	0.2028

Least Squares Means

			Standard
Effect	TRT	Estimate	Error	DF	t Value	Pr > \|t\|

TRT	A	22.3593	4.0316	9.6	5.55	0.0003
TRT	B	30.1791	4.0401	9	7.47	>.0001


				Standard
Effect	TRT	_TRT	Estimate	Error	DF	t Value	Pr > \|t\|

TRT	A	B	-7.8198	5.7076	9.3	-1.37	0.2028

First of all, you see that the REML estimate of $σ_{STUDY}^{^{2}}$ $σ_{STUDY}^{^{2}}$ is 0. (Recall that the ANOVA estimate you would obtain using the expected mean squares in Output 5.11, was negative.) The REML estimates of $σ_{STUDY*TRT}^{^{2}}$ $σ_{STUDY*TRT}^{^{2}}$ and σ² are 75.36 and 447.57, respectively.

You see that the F-statistic is equal to 1.88, with p-value equal to 0.2028 in the “Type 3 Tests of Fixed Effects.” This is the test for the TRT null hypothesis H₀: α₁ – α₂ = 0. The results are similar to the test using expected mean squares from the GLM procedure in Output 5.11.

The ESTIMATE statement produces an estimated difference equal to 7.82, with standard error 5.71. The resulting t-statistic for testing the null hypothesis H₀: α₁ - α₂ = 0 has value t=1.37 and significance probability p=0.2079.

The CONTRAST statement produces results equivalent to the F-test in “Type 3 Tests of Fixed Effects.”

The LSMEANS statement produces LS means of 22.36 for drug A and 30.18 for treatment B. These are slightly different from the LS means produced by GLM in Output 5.12. More importantly, the standard errors of LS means in Output 5.13 are larger than standard errors in Output 5.12 because the MIXED procedure correctly computes the standard errors. Likewise, the difference between LS means in Output 5.13 is less significant than in Output 5.12 because the MIXED procedure computes a t-statistic that is more nearly valid that does the GLM procedure. Details on these computations are described in Chapter 6.

5.7 Using the GLM and MIXED Procedures to Analyze Mixed-Model Data with Empty Cells

Refer once more to the DRUGS1 data set, which has no data for TRT=A in STUDY=41 (See Output 5.7). Run the statements

proc glm data=drugs1;
   class study trt;
   model flush=trt study trt*study / ss1 ss2 ss3 ss4;
   random study trt*study / test;

run;

You would get exactly the same ANOVA results from the MODEL statement that you saw in Output 5.7.

The RANDOM statement produces Types III and IV expected mean squares, as shown in Output 5.14

Output 5.14 Results of the RANDOM Statement with Empty Cells

Unbalanced Two-way Classification

The GLM Procedure

Source	Type III Expected Mean Square

TRT	Var(Error) + 4.6613 Var(TRT*STUDY) + Q(TRT)

STUDY	Var(Error) + 7.8109 Var(TRT*STUDY) + 14.111 Var(STUDY)

TRT*STUDY	Var(Error) + 7.0585 Var(TRT*STUDY) Unbalanced Two-way Classification

The GLM Procedure
Tests of Hypotheses for Mixed Model Analysis of Variance

Source	DF	Type III SS	Mean Square	F Value	Pr > F

TRT	1	1843.572090	1843.572090	2.09	0.1724

Error	12.498	10999	880.038506
Error: 0.6604MS(TRTSTUDY) + 0.3396*MS(Error)


Source	Type IV Expected Mean Square

TRT	Var(Error) + 4.6613 Var(TRT*STUDY) + Q(TRT)

STUDY	Var(Error) + 7.0961 Var(TRT*STUDY) + 13.16 Var(STUDY)

TRT*STUDY	Var(Error) + 7.0585 Var(TRT*STUDY) Unbalanced Two-way Classification


Tests of Hypotheses for Mixed Model Analysis of Variance

Source	DF	Type IV SS	Mean Square	F Value	Pr > F

TRT	1	1843.572090	1843.572090	2.09	0.1724

Error	12.498	10999	880.038506
Error: 0.6604MS(TRTSTUDY) + 0.3396*MS(Error)

Types III and IV expected mean squares are the same for TRT because there are only two levels of the factor. Types III and IV expected mean squares for STUDY differ only slightly. A greater prevalence of empty cells would tend to cause a greater difference between all aspects of Type III and Type IV, including the expected mean squares. Additional detail is presented in Chapter 6 on the Type III and Type IV distinction.

The F-tests based on the Types III and IV mean squares for TRT are the same, with F=2.09 and p=0.1724. The results could differ if TRT had more levels. As discussed following Output 5.11, there are no definitive reasons for using one of these tests instead of the other. This point is discussed further in Chapter 6.

CONTRAST, ESTIMATE, and LSMEANS statements with PROC GLM would produce the same results as in the fixed-effects case because the LSMEAN for TRT A is non-estimable. (See Output 5.9.)

The MIXED procedure is used in the same way with unbalanced data as it is with balanced data, even with empty cells. Run the statements

proc mixed data=drugs;
   class study trt;
   model flush=trt / ddfm=satterth;
   random study study*trt;
   contrast ‘trtB-trtA’;
   estimate ‘trtB-trtA’;
   lsmeans trt;
run;

Edited results appear in Output 5.15.

Output 5.15 Results of the MIXED Procedure with Unbalanced Data

The Mixed Procedure

Model Information

Covariance Parameter
Estimates

Cov Parm	Estimate

STUDY	0
TRT*STUDY	77.0369
Residual	530.50


Type 3 Tests of Fixed Effects

	Num	Den
Effect	DF	DF	F Value	Pr > F

TRT	1	12.4	2.96	0.1103


Estimates

		Standard
Label	Estimate	Error	DF	t Value	Pr > \|t\|

trtB-trtA	9.9035	5.7585	12.4	1.72	0.1103


Contrasts

	Num	Den
Label	DF	DF	F Value	Pr > F

trtB-trtA	1	12.4	2.96	0.1103

Least Squares Means

			Standard
Effect	TRT	Estimate	Error	DF	t Value	Pr > \|t\|

TRT	A	22.3908	4.2200	13.4	5.31	0.0001
TRT	B	32.2943	3.9182	11.4	8.24	<.0001

Results of tests, estimates and standard errors are similar, but not identical, to those in the case of no empty cell in Output 5.13. Do not expect this to always happen. Generally speaking, with more prevalent empty cells, you can expect more different results.

There is a very important point to be observed in the comparison of results for unbalanced data analysis with and without empty cells. Empty cells cause non-estimability of certain LSMEANS and linear functions of model parameters. Thus, ESTIMATE and CONTRAST statements will not produce output if they specify non-estimable linear functions. This occurred when using PROC GLM with the data set DRUGS1, both for the fixed and mixed-model analyses, because GLM makes the same essential computations with or without a RANDOM statement. In other words, estimability is judged by GLM considering all effects fixed, even though a RANDOM statement is used. PROC MIXED, on the contrary, judges estimability only in terms of fixed effects. That is why complete results were presented for the LSMEANS, ESTIMATE, and CONTRAST statements in Output 5.15.

5.8 Summary and Conclusions about Using the GLM and MIXED Procedures to Analyze Unbalanced Mixed-Model Data

The GLM and MIXED procedures both have certain capabilities for analysis of mixed-model data, as described in Chapter 4. The GLM capabilities are oriented around analysis of variance, based on an ordinary least squares fit of the model, in which random effects are treated as fixed effects. The RANDOM statement in PROC GLM produces expected mean squares, which can be used to construct F-statistics for tests of hypotheses. In balanced data situations, these F-statistics are often “exact,” meaning that the distribution of the statistic, under the null hypothesis, has a true F-distribution. In unbalanced data applications, the distributions are only approximate, but still useful, and must be used with caution. Moreover, there are no definitive guidelines for selecting a “type” of sum of squares for the numerator of the F-statistic. Standard errors computed by PROC GLM for LS means, differences between LS means, and ESTIMATE statements are generally unreliable. There are methods for determining appropriate standard errors of estimates from ESTIMATE and LSMEANS statements using the CONTRAST and RANDOM statements (Littell and Linda 1990; Milliken and Johnson 1994, Chapter 28), but these are tedious and are not feasible for many users.

The MIXED procedure, on the other hand, uses true mixed-model methodology. It builds the parameters for the random effects into the statistical model through the covariance structure, using either the RANDOM or REPEATED statement. Test statistics, and estimates, and standard errors of estimates for fixed effects are computed from principles of generalized least squares, with random effects parameters replaced by their estimates (see Chapter 6). Estimates computed in this manner are called estimated generalized least squares estimates. They are unbiased, and their standard errors are computed on the basis of a valid formula, except that the standard errors do not account for the variation in the random effects parameter estimates. In most cases, this is not a serious problem. Test statistics for fixed effects are also computed using basically sound methodology, with the same exception that variation due to estimation of the random effects parameters is ignored. Determination of degrees of freedom for variation estimates is complicated, especially in unbalanced data. PROC MIXED allows several options for assessing degrees of freedom.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5 Unbalanced Data Analysis: Basic Methods

Create new playlist

Sign In

Sign Up

5.1 Introduction

5.2 Applied Concepts of Analyzing Unbalanced Data

5.2.1 ANOVA for Unbalanced Data

5.2.2 Using the CONTRAST and ESTIMATE Statements with Unbalanced Data

5.2.3 The LSMEANS Statement

5.2.4 More on Comparing Means: Other Hypotheses and Types of Sums of Squares

5.3 Issues Associated with Empty Cells

5.3.1 The Effect of Empty Cells on Types of Sums of Squares

5.3.2 The Effect of Empty Cells on CONTRAST, ESTIMATE, and LSMEANS Results

5.4 Some Problems with Unbalanced Mixed-Model Data

5.5 Using the GLM Procedure to Analyze Unbalanced Mixed-Model Data

5.5.1 Approximate F-Statistics from ANOVA Mean Squares with Unbalanced Mixed-Model Data

5.5.2 Using the CONTRAST, ESTIMATE, and LSMEANS Statements in GLM with Unbalanced Mixed-Model Data

5.6 Using the MIXED Procedure to Analyze Unbalanced Mixed-Model Data

5.7 Using the GLM and MIXED Procedures to Analyze Mixed-Model Data with Empty Cells

5.8 Summary and Conclusions about Using the GLM and MIXED Procedures to Analyze Unbalanced Mixed-Model Data

Table of Contents for
Chapter 5 Unbalanced Data Analysis: Basic Methods