Chapter 9 Multivariate Linear Models

9.1 Introduction

9.2 A One-Way Multivariate Analysis of Variance

9.3 Hotelling’s T2 Test

9.4 A Two-Factor Factorial

9.5 Multivariate Analysis of Covariance

9.6 Contrasts in Multivariate Analyses

9.7 Statistical Background

9.1 Introduction

Although the methods presented in earlier chapters are sufficient for studying one dependent variable at a time, there are many situations in which several dependent variables are studied simultaneously. For example, in monitoring the growth of animals, a researcher might measure the length, weight, and girth of animals receiving different treatments. The goal of the experiment would be to see if the treatments had an effect on the growth of the animals. One way to determine this would be to use analysis of variance or regression methods to analyze the effects of the treatment on the length, weight, and girth of the animals. There are two problems with this approach. First, trying to interpret results produced by separate univariate (one variable at a time) analyses of each of the variables can be unwieldy. In the example with three dependent variables, this may not seem like a problem. But if you have ten or twenty dependent variables, the task could be substantial. Second, and more importantly, when many variables are studied simultaneously, they are almost always correlated—that is, the value of each variable may be related to the values of others. This is true for measurements such as height and weight, or responses to similar questions on a questionnaire. In cases like these, considering the univariate analyses separately would not take into account information contained in the data due to the correlation. Moreover, this approach could mislead a naïve researcher into believing that a factor has a very significant effect, when in fact it does not. On the other hand, a significant effect that only becomes apparent when all the dependent variables are studied simultaneously may not be discovered from the univariate analyses alone. In most multivariate data applications, you should usually examine the results of the multivariate tests first, then examine the univariate analyses cautiously if significant results do not appear in the multivariate analysis.

Although some of the details of a multivariate analysis differ from those of a univariate analysis, the two are similar in many ways. Experimental factors of interest are related to the dependent variables by a linear model, and functions of sums of squares are computed to test hypotheses about these factors. In general, if you have designed an experiment with only one dependent variable, the extension of the analysis to the multivariate case can be carried out in a very straightforward manner.

We present examples of several types of multivariate analyses. Basic theory for the methods is presented in Section 9.7, “Statistical Background.”

9.2 A One-Way Multivariate Analysis of Variance

Test scores from two exams taken by students with three different teachers are analyzed in Output 9.1.

Output 9.1 Two Exam Scores for Students in Three Teachers’ Classes

Obs      teach    score1    score2
 
1      JAY    69    75
2      JAY    69    70
3      JAY    71    73
4      JAY    78    82
5      JAY    79    81
6      JAY    73    75
7      PAT    69    70
8      PAT    68    74
9      PAT    75    80
10      PAT    78    85
11      PAT    68    68
12      PAT    63    68
13      PAT    72    74
14      PAT    63    66
15      PAT    71    76
16      PAT    72    78
17      PAT    71    73
18      PAT    70    73
19      PAT    56    59
20      PAT    77    79
22      ROBIN    64    65
23      ROBIN    74    74
24      ROBIN    72    75
25      ROBIN    82    84
26      ROBIN    69    68
27      ROBIN    76    76
28      ROBIN    68    65
29      ROBIN    78    79
30      ROBIN    70    71
31      ROBIN    60    61

We first perform an analysis of variance to compare the teacher means for each variable, SCORE1 and SCORE2. Run the SAS statements for these analyses:

proc glm;
   class teach;
   model score1 score2=teach;

Results are shown in Output 9.2. There is no difference among teachers when considering a univariate analysis of either SCORE1 or SCORE2.

Output 9.2 Results of Univariate One-Way Analysis

The GLM Procedure

 

Dependent Variable: score1

  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 2 60.6050831 30.3025415 0.91 0.4143
 
Error 28 932.8787879 33.3170996    
Corrected Total 30 993.4838710      
 
R-Square Coeff Var Root MSE score1 Mean
 
0.061003 8.144515 5.772097 70.87097
 
 
Source DF Type III SS  Mean Square F Value Pr > F
 
teach 2 60.60508309 30.30254154 0.91 0.4143

 

The GLM Procedure

 

Dependent Variable: score2

  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 2 49.735861 24.867930 0.56 0.5776
 
Error 28 1243.941558 44.426484    
 
Corrected Total 30 1293.677419      
 
R-Square Coeff Var Root MSE score2 Mean
 
0.038445 9.062496 6.665320 73.54839
 
Source DF Type III SS  Mean Square F Value Pr > F
 
teach 2 49.73586091 24.86793046 0.56 0.5776

The objective now is to see if a model using both scores shows a difference among teachers. The following SAS statements are used for this analysis:

  proc glm;
        class teach;
        model score1 score2=teach;
        manova h=teach / printh printe;

The first three statements produce the usual univariate analyses of the two scores as shown in Output 9.2. The MANOVA statement produces results in Output 9.3.

Output 9.3 Results of One-Way Multivariate Analysis: The MANOVA Statement

The GLM Procedure
Multivariate Analysis of Variance

 

E = Error SSCP Matrix

 

score1 score2
 
score1 932.87878788 1018.6818182
score2 1018.6818182 1243.9415584

 

➋ Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|

 

DF = 28 score1 score2
 
score1 1.000000 0.945640
<.0001
 
score2 0.945640
<.0001
1.000000

 

➌ H = Type III SSCP Matrix for teach

 

  score1 score2
 
score1 60.605083089 31.511730205
score2 31.511730205 49.735860913

 

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for teach
E = Error SSCP Matrix

 

➍ Characteristic        Characteristic Vector V'EV=1

Root Percent score1 score2
 
0.43098027 91.86 -0.10044686 0.08416103
0.03821194 8.14 0.00675930 0.02275380

 

MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall teach Effect

➎      H = Type III SSCP Matrix for teach
E = Error SSCP Matrix

 

S=2     M=-0.5     N=12.5

 

Statistic Value F Value Num DF Den DF Pr > F
 
Wilks' Lambda 0.67310116 2.95 4 54 0.0279
Pillai's Trace 0.33798387 2.85 4 56 0.0322
Hotelling-Lawley Trace 0.46919220 3.13 4 31.389 0.0281
Roy's Greatest Root 0.43098027 6.03 2 28 0.0066

 

NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.

 

The PRINTH and PRINTE options cause the printing of the hypothesis and error matrices, respectively. In addition, the PRINTE option produces a matrix of partial correlation coefficients derived from the error SSCP matrix. This correlation matrix represents the correlations of the dependent variables corrected for all the independent factors in the MODEL statement.

The results in Output 9.3 are described below. The callout numbers have been added to the output to key the following descriptions:

➊ The elements of the error matrix. The diagonal elements of this matrix represent the error sums of squares from the corresponding univariate analyses (see Output 9.2).

➋ The associated partial correlation matrix. In this example it appears that SCORE1 and SCORE2 are highly correlated (r=0.945640).

➌ The elements of the hypothesis matrix, H. Again, the diagonal elements correspond to the hypothesis sums of squares from the corresponding univariate analysis.

➍ The characteristic roots and vectors of E-1H. The elements of the characteristic vector describe a linear combination of the analysis variables that produces the largest possible univariate F-ratio.

➎ The four test statistics previously discussed. The values of S, M, and N, printed above the table of statistics, provide information that is used in constructing the F-approximations for the criteria. (For more information, see Morrison (1976).) All four tests give similar results, although this is not always the case. Note that the p-values for the “Hypothesis of No Overall TEACH Effect” are much lower for the multivariate tests than any of the univariate tests would indicate. This is an example of how viewing a set of variables together can help you detect differences that you would not detect by looking at the individual variables.

9.3 Hotelling’s T2 Test

Consider a common situation in multivariate analysis. You have several different measurements taken on each of several subjects, and you want to know if the means of the different variables are all the same. For example, you may have used different recording devices to measure the same phenomenon, or you may have observed subjects under a variety of conditions and administered a test under each of the conditions. If you had only two means to compare, you could use the familiar t-test, but it is important to use an analysis that takes into account the correlations among the dependent variables, just as in the previous examples, even though there are no independent factors in the model—that is, no terms on the right side of the MODEL statement. In this situation you could use Hotelling’s T2 test. As an example, consider the following data taken from Morrison (1976). Weight gains in rats given a special diet were measured at one (GAIN1), two (GAIN2), three (GAIN3), and four (GAIN4) weeks after administration of the diet. The question of interest is whether the rats’ weight gains stayed constant over the course of the experiment; in other words, were the mean weight gains of the rats the same at each of the four weeks? Output 9.4 shows the data.

Output 9.4 Data for Hotelling’s T2 Test

Obs gain1 gain2 gain3 gain4
 
1 29 28 25 33
2 33 30 23 31
3 25 34 33 41
4 18 33 29 35
5 25 23 17 30
6 24 32 29 22
7 20 23 16 31
8 28 21 18 24
9 18 23 22 28
10 25 28 29 30

Note that the following MODEL statement fits a model with only an intercept:

model gain1 gain2 gain3 gain4 = ;

Following this statement with a MANOVA statement to test the intercept tests the hypothesis that the means of the four weight gains are all 0. It does not test the hypothesis of interest—that the four means are equal. To test the hypothesis that the four means are equal, the dependent variables must be transformed in such a way that their transformed means being 0 will imply that the original means are equal for all four variables. It turns out that there are many different transformations that have this effect. (This problem is discussed in detail in Chapter 8, “Repeated-Measures Analysis.”) One simple transformation that achieves this goal is subtracting one of the variables from each of the other variables; in this example, the first gain could be subtracted from each of the other gains. You should understand that the only way all the means could be equal to each other is if each of these differences is 0 and that if all the means are equal, then all the differences must be 0. In this way, Hotelling’s T2 test for equality of means can be performed using the MANOVA statement.

One way of producing the transformed variables necessary to perform Hotelling’s T2 test is to produce new variables in a DATA step and to perform an analysis on the new variables. A quicker and more efficient way, however, is to use the M= option in the MANOVA statement. Using the M= option, you can perform an analysis on a set of variables that is a linear transformation of the original variables as listed on the left side of the equal sign in the MODEL statement.

The following SAS statements perform the appropriate analysis:

  proc glm data=wtgain;
        model gain1 gain2 gain3 gain4 = / nouni;
        manova h=intercept
               m=gain2-gain1, gain3-gain1, gain4-gain1
               mnames=diff2 diff3 diff4 / summary;

The NOUNI option in the MODEL statement suppresses the individual analyses of the gain variables. This is done because the multivariate hypothesis of equality of the four means is the hypothesis of interest. The SUMMARY option in the MANOVA statement produces analysis-oF-variance tables of the transformed variables. The MNAMES= option provides labels for these transformed variables; if omitted, the procedure uses the names MVAR1, MVAR2, and so on. The results appear in Output 9.5.

Output 9.5 Analysis of Transformed Variables: Hotelling's T2 Test

The GLM Procedure

 

Number of observation    10
The ANOVA Procedure
Multivariate Analysis of Variance

 

M Matrix Describing Transformed Variables

 

  gain1     gain2     gain3     gain4
diff2 -1 1 0 0
diff3 -1 0 1 0
diff4 -1 0 0 1

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for Intercept
E = Error SSCP Matrix

Variables have been transformed by the M Matrix

 

Characteristic   Characteristic Vector V'EV=1  
Root Percent diff2 diff3 diff4
 
2.97211676 100.00 -0.11825538 0.11174598 -0.02428445
0.00000000 0.00 -0.11610475 0.05331366 0.06160662
0.00000000 0.00 0.00519921 0.03899407 0.00000000
 

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Intercept Effect

on the Variables Defined by the M Matrix Transformation
H = Type III SSCP Matrix for Intercept
E = Error SSCP Matrix

S=1      M=0.5      N=2.5

 

Statistic Value F Value Num DF Den DF Pr > F
 
Wilks' Lambda 0.25175494 6.93 3 7 0.0167
Pillai's Trace 0.74824506 6.93 3 7 0.0167
Hotelling-Lawley Trace 2.97211676 6.93 3 7 0.0167
Roy's Greatest Root 2.97211676 6.93 3 7 0.0167

 

Dependent Variable: diff2

Source DF Type III SS Mean Square F Value Pr > F
Intercept 1 90.0000000 90.0000000 2.10 0.1814
Error 9 386.0000000 42.8888889    

 

Dependent Variable: diff3

Source DF Type III SS Mean Square F Value Pr > F
Intercept 1 1.6000000 1.6000000 0.03 0.8735
Error 9 536.4000000 59.6000000    

 

Dependent Variable: diff4

Source DF Type III SS Mean Square F Value Pr > F
Intercept 1 360.0000000 360.0000000 6.53 0.0309
Error 9 496.0000000 55.1111111    

Although PROC ANOVA does not produce the Hotelling T2 statistic, it produces the correct F-test and probability level. Note that all the multivariate test statistics result in the same F-values and that they are labeled as “Exact F Statistics.” These tests are always exact when the hypothesis being tested has only 1 degree of freedom, such as the hypothesis of no INTERCEPT effect in this example. In order to calculate the actual value of the T2 statistic, the following formula can be used:

T2 = (nobs − 1)((1 / λ) − 1)

In the data set, nobs is the number of observations and λ is the value of Wilks’s Criterion that is printed by PROC ANOVA. (See Section 9.7, “Statistical Background.”) In this case, the calculation leads to T2 = (10 – 1)(1/0.2517 – 1) = 26.757.

9.4 A Two-Factor Factorial

The total weight of a mature cotton boll can be divided into three parts: the weight of the seeds, the weight of the lint, and the weight of the bract. Lint and seed constitute the economic yield of cotton.

In the following data, the differences in the three components of the cotton bolls due to two varieties (VARIETY) and two plant spacings (SPACING) are studied. Five plants are chosen at random from each of the four treatment combinations. Two bolls are picked from each plant, and the weights of the seeds, lint, and bract are recorded. The most appropriate error term for testing VARIETY, SPACING, and the interaction of the two is the variation among plants. In univariate analyses, the TEST statement is used to specify this alternative error term; for multivariate analyses, the error term is specified in the MANOVA statement or statements. (Each MANOVA statement can have, at most, one error term specified.)

The following SAS statements are used:

proc glm;
   class variety spacing plant;
   model seed lint bract=variety spacing variety*spacing
                         plant(variety spacing)/ss3;
   test h=variety|spacing e=plant(variety spacing);
   means variety|spacing;
   manova h=variety|spacing e=plant(variety spacing);

The data used in this analysis appear in Output 9.6.

Output 9.6 Data for Two-Way Multivariate Analysis

Obs   variety spacing plant seed lint bract
 
1   213 30 3 3.1 1.7 2.0
2   213 30 3 1.5 1.7 1.4
3   213 30 5 3.0 1.9 1.8
4   213 30 5 1.4 0.9 1.3
5   213 30 6 2.3 1.7 1.5
6   213 30 6 2.2 2.0 1.4
7   213 30 8 0.4 0.9 1.2
8   213 30 8 1.7 1.6 1.3
9   213 30 9 1.8 1.2 1.0
10   213 30 9 1.2 0.8 1.0
11   213 40 0 2.0 1.0 1.9
12   213 40 0 1.5 1.5 1.7
13   213 40 1 1.8 1.1 2.1
14   213 40 1 1.0 1.3 1.1
15   213 40 2 1.3 1.1 1.3
16   213 40 2 2.9 1.9 1.7
17   213 40 3 2.8 1.2 1.3
18   213 40 3 1.8 1.2 1.2
19   213 40 4 3.2 1.8 2.0
20   213 40 4 3.2 1.6 1.9
21   37 30 1 3.2 2.6 1.4
22   37 30 1 2.8 2.1 1.2
23   37 30 2 3.6 2.4 1.5
24   37 30 2 0.9 0.8 0.8
25   37 30 3 4.0 3.1 1.8
26   37 30 3 4.0 2.9 1.5
27   37 30 5 3.7 2.7 1.6
28   37 30 5 2.6 1.5 1.3
29   37 30 8 2.8 2.2 1.2
30   37 30 8 2.9 2.3 1.2
31   37 40 1 4.1 2.9 2.0
32   37 40 1 3.4 2.0 1.6
33   37 40 3 3.7 2.3 2.0
34   37 40 3 3.2 2.2 1.8
35   37 40 4 3.4 2.7 1.5
36   37 40 4 2.9 2.1 1.2
37   37 40 5 2.5 1.4 0.8
38   37 40 5 3.6 2.4 1.6
39   37 40 6 3.1 2.3 1.4
40   37 40 6 2.5 1.5 1.5

Note that the TEST statement is used to obtain the appropriate error terms. The results of the three univariate analyses appear in Output 9.7.

Output 9.7 Results of Two-Way Multivariate Analysis: Univariate Analyses

Dependent Variable: seed

  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 19 24.06500000 1.26657895 2.22 0.0425
 
Error 20 11.43000000 0.57150000    
 
Corrected Total 39 35.49500000      
 
R-Square Coeff Var Root MSE seed Mean
 
0.677983 29.35830 0.755976 2.575000
 
Source DF Type III SS Mean Square F Value Pr > F
 
variety 1 12.99600000 12.99600000 22.74 0.0001
spacing 1 0.57600000 0.57600000 1.01 0.3274
variety*spacing 1 0.02500000 0.02500000 0.04 0.8364
plant(variet*spacin) 16 10.46800000 0.65425000 1.14 0.3823
 

Tests of Hypotheses Using the Type III MS for plant(variet*spacin) as an Error Term

 
Source DF Type III SS Mean Square F Value Pr > F
 
variety 1 12.99600000 12.99600000 19.86 0.0004
spacing 1 0.57600000 0.57600000 0.88 0.3620
variety*spacing 1 0.02500000 0.02500000 0.04 0.8475

 

Dependent Variable: lint

  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 19 10.62875000 0.55940789 2.28 0.0377
 
Error 20 4.91500000 0.24575000    
 
Corrected Total 39 15.54375000      
 
R-Square Coeff Var Root MSE lint Mean
 
0.683796 27.35072 0.495732 1.812500
 
Source DF Type III SS Mean Square F Value Pr > F
 
variety 1 6.64225000 6.64225000 27.03 <.0001
spacing 1 0.05625000 0.05625000 0.23 0.6375
variety*spacing 1 0.00025000 0.00025000 0.00 0.9749
plant(variet*spacin) 16 3.93000000 0.24562500 1.00 0.4934
 

Tests of Hypotheses Using the Type III MS for plant(variet*spacin) as an Error Term

 
Source DF Type III SS Mean Square F Value Pr > F
 
variety 1 6.64225000 6.64225000 27.04 <.0001
spacing 1 0.05625000 0.05625000 0.23 0.6387
variety*spacing 1 0.00025000 0.00025000 0.00 0.9749
 

Dependent Variable: bract

  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 19 2.70500000 0.14236842 1.63 0.1442
 
Error 20 1.75000000 0.08750000    
 
Corrected Total 39 4.45500000      
 
R-Square Coeff Var Root MSE bract Mean
 
0.607183 20.05451 0.295804 1.475000
 
Source DF Type III SS Mean Square F Value Pr > F
 
variety 1 0.03600000 0.03600000 0.41 0.5285
spacing 1 0.44100000 0.44100000 5.04 0.0362
variety*spacing 1 0.00400000 0.00400000 0.05 0.8329
plant(variet*spacin) 16 2.22400000 0.13900000 1.59 0.1626
 

Tests of Hypotheses Using the Type III MS for plant(variet*spacin) as an Error Term

 
Source DF Type III SS Mean Square F Value Pr > F
 
variety 1 0.03600000 0.03600000 0.26 0.6178
spacing 1 0.44100000 0.44100000 3.17 0.0939
variety*spacing 1 0.00400000 0.00400000 0.03 0.8674

VARIETY has a statistically significant effect on SEED and LINT; no effects are statistically significant for BRACT.

The means for all levels and combinations of levels of VARIETY and SPACING produced by the MEANS statement appear in Output 9.8.

Output 9.8 Results of Two-Way Multivariate Analysis: The MEANS Statement

The GLM Procedure

 

Level of   ---seed--- ---lint--- ---bract---
variety N    Mean Std Dev Mean Std Dev Mean Std Dev
 
37 20     3.14500000  0.72799291  2.22000000  0.57087191  1.44500000  0.32843328
213 20    2.00500000 0.80881655 1.40500000 0.37763112 1.50500000 0.35314378
 
Level of   ---seed--- ---lint--- ---bract---
spacing N    Mean Std Dev Mean Std Dev Mean Std Dev
 
30 20     2.45500000  1.04351380  1.85000000  0.70150215  1.37000000  0.29037181
40 20    2.69500000 0.86540225 1.77500000 0.56835404 1.58000000 0.35629674
Level of Level of   ---seed--- ---lint---
variety spacing N    Mean Std Dev Mean Std Dev
 
37 30 10     3.05000000  0.91439111  2.26000000  0.68182761
37 40 10    3.24000000 0.51251016 2.18000000 0.46856756
213 30 10    1.86000000 0.82219219 1.44000000 0.44771022
213 40 10    2.15000000 0.81137743 1.37000000 0.31287200
Level of Level of ---bract---
variety spacing N    Mean Std Dev
 
37 30 10    1.35000000 0.27588242
37 40 10    1.54000000 0.36270588
213 30 10    1.39000000 0.31780497
213 40 10    1.62000000 0.36453928

The results of the multivariate analyses appear in Output 9.9.

Output 9.9 Results of Two-Way Multivariate Analysis

The GLM Procedure
Multivariate Analysis of Variance

 

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for variety
E = Type III SSCP Matrix for plant(variet*spacin)

 

Characteristic   Characteristic Vector V'EV=1  
Root Percent seed lint bract
 
3.43919116 100.00 0.13061027 0.48969379 -0.64083380
0.00000000 0.00 -0.52400501 0.74035630 0.10041133
0.00000000 0.00 0.01704640 0.02228579 0.62659680

 

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall variety Effect

H = Type III SSCP Matrix for variety
E = Type III SSCP Matrix for plant(variet*spacin)

 

S=1     M=0.5     N=6

 

Statistic Value F Value Num DF Den DF Pr > F
 
Wilks' Lambda 0.22526626 16.05 3 14 <.0001
Pillai's Trace 0.77473374 16.05 3 14 <.0001
Hotelling-Lawley Trace 3.43919116 16.05 3 14 <.0001
Roy's Greatest Root 3.43919116 16.05 3 14 <.0001

 

The GLM Procedure
Multivariate Analysis of Variance

 

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for spacing
E = Type III SSCP Matrix for plant(variet*spacin)

 

Characteristic   Characteristic Vector V'EV=1  
Root Percent seed lint bract
 
0.63209472 100.00 0.27027458 -0.73247531 0.62673044
0.00000000 0.00 0.24001974 0.22618207 -0.19352896
0.00000000 0.00 -0.40158816 0.44804654 0.61897451
 

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall spacing Effect
H = Type III SSCP Matrix for spacing
E = Type III SSCP Matrix for plant(variet*spacin)

 

S=1     M=0.5     N=6

 

Statistic Value F Value Num DF Den DF Pr > F
 
Wilks' Lambda 0.61270954 2.95 3 14 0.0692
Pillai's Trace 0.38729046 2.95 3 14 0.0692
Hotelling-Lawley Trace 0.63209472 2.95 3 14 0.0692
Roy's Greatest Root 0.63209472 2.95 3 14 0.0692

 

The GLM Procedure
Multivariate Analysis of Variance

 

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for variety*spacing
E = Type III SSCP Matrix for plant(variet*spacin)

 

Characteristic   Characteristic Vector V'EV=1  
Root Percent seed lint bract
0.00616711 100.00 0.42143581 -0.67443149 0.35670210
0.00000000 0.00 -0.33315217 0.01818488 0.82833421
0.00000000 0.00 -0.05772656 0.57726561 0.00000000

 

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall variety*spacing Effect
H = Type III SSCP Matrix for variety*spacing
E = Type III SSCP Matrix for plant(variet*spacin)

 

S=1     M=0.5     N=6

 

Statistic Value F Value Num DF Den DF Pr > F
 
Wilks' Lambda 0.99387069 0.03 3 14 0.9931
Pillai's Trace 0.00612931 0.03 3 14 0.9931
Hotelling-Lawley Trace 0.00616711 0.03 3 14 0.9931
Roy's Greatest Root 0.00616711 0.03 3 14 0.9931

The only highly significant effect is VARIETY, and all four multivariate statistics are the same because there is only one hypothesis degree of freedom.

9.5 Multivariate Analysis of Covariance

This section illustrates multivariate analysis of covariance using the data on orange sales presented in Output 7.6. Sales of two types of oranges are related to experimentally determined prices (PRICE) as well as stores (STORE) and days of the week (DAY). The analysis is expanded here to consider the simultaneous multivariate relationship of the price to both types of oranges.

The following SAS statements are used for the analysis:

proc glm;
   class store day;
   model q1 q2=store day p1 p2 / nouni;
   manova h=store day p1 p2 / printh printe;

Note that PROC ANOVA is not appropriate in this situation because of the presence of the covariates. The NOUNI option in the MODEL statement suppresses printing of the univariate analyses that are already shown in Chapter 7, “Analysis of Covariance.” Results of the multivariate analysis appear in Output 9.10.

Output 9.10 Results of Multivariate Analysis of Covariance

The GLM Procedure
Multivariate Analysis of Variance

 

E = Error SSCP Matrix

 

  q1 q2
q1 408.30824182 74.603217758
q2 74.603217758 706.94116552

 

Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|

 

DF = 23 q1  q2 
 q1 1.000000 0.138858
0.5176
 
 q2 0.138858 1.000000
0.5176

 

H = Type III SSCP Matrix for store

 

  q1 q2
 
q1 223.83267344 93.801152319
q2 93.801152319 155.09933793

 

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for store
E = Error SSCP Matrix

 

Characteristic   Characteristic Vector V'EV=1
Root Percent q1 q2
 
0.57363829 78.23 0.04622384 0.00941459
0.15960322 21.77 -0.01899048 0.03679295

 

MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall store Effect
H = Type III SSCP Matrix for store
E = Error SSCP Matrix

 

S=2     M=1     N=10

 

Statistic Value F Value Num DF Den DF Pr > F
 
Wilks' Lambda 0.54800645 1.54 10 44 0.1564
Pillai's Trace 0.50216601 1.54 10 46 0.1553
Hotelling-Lawley Trace 0.73324151 1.57 10 30.372 0.1634
Roy's Greatest Root 0.57363829 2.64 5 23 0.0501

 

NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.

 

H= Type III SSCP Matrix for day

 

  q1 q2
 
q1 433.09686996 461.05064188
q2 461.05064188 614.4088834

 

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for day
E = Error SSCP Matrix

 

Characteristic   Characteristic Vector V'EV=1
Root Percent q1 q2
1.60708776 93.18 0.03517603 0.02300242
0.11766546 6.82 -0.03549548 0.03021993

 

MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall day Effect
H = Type III SSCP Matrix for day
E = Error SSCP Matrix

 

S=2     M=1     N=10

 

Statistic Value F Value Num DF Den DF Pr > F
 
Wilks' Lambda 0.34318834 3.11 10 44 0.0044
Pillai's Trace 0.72170813 2.60 10 46 0.0137
Hotelling-Lawley Trace 1.72475321 3.69 10 30.372 0.0026
Roy's Greatest Root 1.60708776 7.39 5 23 0.0003

 

NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.

 

H = Type III SSCP Matrix for p1

 

  q1 q2
q1 538.16885116 -212.5196287
q2 -212.5196287 83.922717744

 

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for p1
E = Error SSCP Matrix

 

Characteristic   Characteristic Vector V'EV=1
Root Percent q1 q2
1.57701930 100.00 0.04805513 -0.01539025
0.00000000 0.00 0.01371082 0.03472026

 

The GLM Procedure
Multivariate Analysis of Variance

 

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall p1 Effect
H = Type III SSCP Matrix for p1
E = Error SSCP Matrix

 

S=1     M=0     N=10

 

Statistic Value F Value Num DF Den DF Pr > F
 
Wilks' Lambda 0.38804521 17.35 2 22 <.0001
Pillai's Trace 0.61195479 17.35 2 22 <.0001
Hotelling-Lawley Trace 1.57701930 17.35 2 22 <.0001
Roy's Greatest Root 1.57701930 17.35 2 22 <.0001

 

H = Type III SSCP Matrix for p2

 

  q1 q2
 
q1 39.542251923 -183.5850939
q2 -183.5850939 852.34110489

 

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for p2
E = Error SSCP Matrix

 

Characteristic   Characteristic Vector V'EV=1
Root Percent q1 q2
1.42489030 100.00 -0.01960102 0.03666503
0.00000000 0.00 0.04596827 0.00990107

 

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall p2 Effect
H = Type III SSCP Matrix for p2
E = Error SSCP Matrix

 

S=1     M=0     N=10

 

Statistic Value F Value Num DF Den DF Pr > F
 
Wilks' Lambda 0.41238979 15.67 2 22 <.0001
Pillai's Trace 0.58761021 15.67 2 22 <.0001
Hotelling-Lawley Trace 1.42489030 15.67 2 22 <.0001
Roy's Greatest Root 1.42489030 15.67 2 22 <.0001

For the STORE effect, none of the statistics produce significant results. This is not totally consistent with univariate results where sales of the first type of oranges are nearly significant at the 5% level.

The DAY effect is quite significant according to all statistics, although there is a considerable difference in the level of significance (Pr > F) among the statistics. Also, there is one dominant eigenvalue, indicating that the trend in sales over days is roughly parallel for the two types of oranges. This can be verified by printing and plotting the least-squares means.

The results for the PRICE effects are relatively straightforward. Even though both P1 and P2 are significant for only one dependent variable in the univariate analyses, the multivariate analysis indicates that their effects are substantial enough to be significant overall.

9.6 Contrasts in Multivariate Analyses

Chapter 3, “Analysis of Variance for Balanced Data,” discusses how specialized questions concerning certain levels of factors in univariate analyses can be answered by using the CONTRAST statement to define a hypothesis to be tested. This same technique can be useful in multivariate analyses of variance. The GLM procedure prints output for CONTRAST statements as part of its multivariate analysis. As an example, consider the study described in Section 9.5, “Multivariate Analysis of Covariance.” Assume you want to know if Saturday sales differ from weekday sales, averaged across the two types of oranges. Because the levels for DAY are coded as 1 through 6 corresponding to Monday through Saturday, you need to construct a contrast that compares the average of the first five levels of DAY to the sixth. The following SAS statement is required:

contrast 'SAT. vs. WEEKDAYS' day .2 .2 .2 .2 .2 -1;

The label SAT. vs. WEEKDAYS appears in the output to identify the contrast. If this CONTRAST statement is appended to the program of the previous section, preceding the MANOVA statement, then Output 9.11 is produced. This statement must precede the MANOVA statement so that PROC GLM will know that the multivariate test for the contrast is wanted.

Output 9.11 Results of Multivariate Analysis: The CONTRAST Statement

The GLM Procedure
Multivariate Analysis of Variance

 

H = Contrast SSCP Matrix for sat vs weekdays

 

  q1 q2
 
q1 9.3680815712 7.8469238548
q2 7.8469238548 6.5727666348

 

Characteristic Roots and Vectors of: E Inverse * H, where
H = Contrast SSCP Matrix for sat vs weekdays
E = Error SSCP Matrix

 

Characteristic   Characteristic Vector V'EV=1
Root Percent q1 q2
 
0.02873910 100.00 0.04110205 0.01705466
0.00000000 0.00 -0.02842364 0.03393368

 

MANOVA Test Criteria and Exact F Statistics for the
Hypothesis of No Overall sat vs weekdays Effect
H = Contrast SSCP Matrix for sat vs weekdays
E = Error SSCP Matrix

 

S=1     M=0     N=10

 

Statistic Value F Value Num DF Den DF Pr > F
 
Wilks' Lambda 0.97206377 0.32 2 22 0.7322
Pillai's Trace 0.02793623 0.32 2 22 0.7322
Hotelling-Lawley Trace 0.02873910 0.32 2 22 0.7322
Roy's Greatest Root 0.02873910 0.32 2 22 0.7322

The results of the multivariate tests indicate no significant overall difference between the Saturday and weekday sales for the two types of oranges (Pr > F=0.7322).

9.7 Statistical Background

The multivariate linear model can be written as

Y = XB + U

where

Y

is an nXk matrix of observed values of k dependent variables or responses. Each column corresponds to a specific dependent variable and each row to an observation.

X

is an nXm matrix of n observations on the m independent variables (which may contain dummy variables).

B

is an mXk matrix of regression coefficients or parameters. Each column of B is a vector of coefficients corresponding to each of k dependent variables, and each row contains the coefficients associated with each of m independent variables.

U

is the nXk matrix of the n random errors, with columns corresponding to the dependent variables.

The matrix of estimated coefficients is

B^=(XX)1XY

Each column of B^ is the vector of estimated coefficients that would be obtained by estimating coefficients for each response variable separately.

The partitioning of sums of squares is parallel to that developed in previous chapters except that the partitions consist of kXk matrices of sums of squares and crossproducts:

   
Sums of Squares SSCP Matrix
TOTAL

ÝY

MODEL

B^XY

ERROR

YYB^XY

In the univariate analysis of variance, the F-statistic is the statistic of choice in most cases for testing hypotheses about the factors being considered. Recall that this statistic is derived by taking the ratio of two sums of squares, one derived from the hypothesis being tested and the other derived from an appropriate error term. In multivariate linear models, these sums of squares are replaced by matrices of sums of squares and crossproducts. These matrices are represented by H for the hypothesis, corresponding to the numerator sum of squares, and E for the error matrix, corresponding to the denominator sum of squares. Since division of matrices is not possible, E-1H is the matrix that is the basis for test statistics for multivariate hypotheses. Four different functions of this matrix are used as test statistics and are available in the GLM, ANOVA, and other multivariate procedures in SAS. Each of these statistics is a function of the characteristic roots (also known as eigenvalues) of the matrix E-1H. In the formulas below, represents the characteristic roots.

Corresponding to each characteristic root is a characteristic vector, or eigenvector, that represents a linear combination of the dependent variables being analyzed. A function of the characteristic root, λ/(1 + λ), is the value of R2 that would be obtained if the linear combination of dependent variables represented by the corresponding characteristic vector were used as the dependent variable in a univariate analysis of the same model. For this reason, that function of the characteristic root is sometimes called the canonical correlation. In the formulas below, ri2 represents the canonical correlations.

Hotelling-Lawley Trace

  

Pillai's Trace

  = Tr(E-1H)      = Tr(H(H + E)-1)
  = ri2/(1ri2)      = ri2
  = λi      = ∑ λi / (1 + λi
      

Wilks' Lambda

  

Roy's Greatest Root

  = |E| / |H + E|      = max λi
  = (1ri2)     
  = (1/(1+λi))   

Not one of these criteria has been identified as being universally superior to the others, although there are hypothesized situations where one criterion may outperform the others. Because we generally do not know the exact form of the alternative hypotheses being studied, the decision of which test criterion to use often becomes a matter of personal choice. Wilks’ criterion is derived from a likelihood-ratio approach and appeals to some statisticians on those grounds.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.6.75