Chapter 9 Multivariate Linear Models
9.2 A One-Way Multivariate Analysis of Variance
9.5 Multivariate Analysis of Covariance
9.6 Contrasts in Multivariate Analyses
Although the methods presented in earlier chapters are sufficient for studying one dependent variable at a time, there are many situations in which several dependent variables are studied simultaneously. For example, in monitoring the growth of animals, a researcher might measure the length, weight, and girth of animals receiving different treatments. The goal of the experiment would be to see if the treatments had an effect on the growth of the animals. One way to determine this would be to use analysis of variance or regression methods to analyze the effects of the treatment on the length, weight, and girth of the animals. There are two problems with this approach. First, trying to interpret results produced by separate univariate (one variable at a time) analyses of each of the variables can be unwieldy. In the example with three dependent variables, this may not seem like a problem. But if you have ten or twenty dependent variables, the task could be substantial. Second, and more importantly, when many variables are studied simultaneously, they are almost always correlated—that is, the value of each variable may be related to the values of others. This is true for measurements such as height and weight, or responses to similar questions on a questionnaire. In cases like these, considering the univariate analyses separately would not take into account information contained in the data due to the correlation. Moreover, this approach could mislead a naïve researcher into believing that a factor has a very significant effect, when in fact it does not. On the other hand, a significant effect that only becomes apparent when all the dependent variables are studied simultaneously may not be discovered from the univariate analyses alone. In most multivariate data applications, you should usually examine the results of the multivariate tests first, then examine the univariate analyses cautiously if significant results do not appear in the multivariate analysis.
Although some of the details of a multivariate analysis differ from those of a univariate analysis, the two are similar in many ways. Experimental factors of interest are related to the dependent variables by a linear model, and functions of sums of squares are computed to test hypotheses about these factors. In general, if you have designed an experiment with only one dependent variable, the extension of the analysis to the multivariate case can be carried out in a very straightforward manner.
We present examples of several types of multivariate analyses. Basic theory for the methods is presented in Section 9.7, “Statistical Background.”
Test scores from two exams taken by students with three different teachers are analyzed in Output 9.1.
Output 9.1 Two Exam Scores for Students in Three Teachers’ Classes
Obs | teach | score1 | score2 |
1 | JAY | 69 | 75 |
2 | JAY | 69 | 70 |
3 | JAY | 71 | 73 |
4 | JAY | 78 | 82 |
5 | JAY | 79 | 81 |
6 | JAY | 73 | 75 |
7 | PAT | 69 | 70 |
8 | PAT | 68 | 74 |
9 | PAT | 75 | 80 |
10 | PAT | 78 | 85 |
11 | PAT | 68 | 68 |
12 | PAT | 63 | 68 |
13 | PAT | 72 | 74 |
14 | PAT | 63 | 66 |
15 | PAT | 71 | 76 |
16 | PAT | 72 | 78 |
17 | PAT | 71 | 73 |
18 | PAT | 70 | 73 |
19 | PAT | 56 | 59 |
20 | PAT | 77 | 79 |
22 | ROBIN | 64 | 65 |
23 | ROBIN | 74 | 74 |
24 | ROBIN | 72 | 75 |
25 | ROBIN | 82 | 84 |
26 | ROBIN | 69 | 68 |
27 | ROBIN | 76 | 76 |
28 | ROBIN | 68 | 65 |
29 | ROBIN | 78 | 79 |
30 | ROBIN | 70 | 71 |
31 | ROBIN | 60 | 61 |
We first perform an analysis of variance to compare the teacher means for each variable, SCORE1 and SCORE2. Run the SAS statements for these analyses:
proc glm;
class teach;
model score1 score2=teach;
Results are shown in Output 9.2. There is no difference among teachers when considering a univariate analysis of either SCORE1 or SCORE2.
Output 9.2 Results of Univariate One-Way Analysis
The GLM Procedure
Dependent Variable: score1
Sum of | |||||
Source | DF | Squares | Mean Square | F Value | Pr > F |
Model | 2 | 60.6050831 | 30.3025415 | 0.91 | 0.4143 |
Error | 28 | 932.8787879 | 33.3170996 | ||
Corrected Total | 30 | 993.4838710 | |||
R-Square | Coeff Var | Root MSE | score1 Mean |
0.061003 | 8.144515 | 5.772097 | 70.87097 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
teach | 2 | 60.60508309 | 30.30254154 | 0.91 | 0.4143 |
The GLM Procedure
Dependent Variable: score2
Sum of | |||||
Source | DF | Squares | Mean Square | F Value | Pr > F |
Model | 2 | 49.735861 | 24.867930 | 0.56 | 0.5776 |
Error | 28 | 1243.941558 | 44.426484 | ||
Corrected Total | 30 | 1293.677419 |
R-Square | Coeff Var | Root MSE | score2 Mean |
0.038445 | 9.062496 | 6.665320 | 73.54839 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
teach | 2 | 49.73586091 | 24.86793046 | 0.56 | 0.5776 |
The objective now is to see if a model using both scores shows a difference among teachers. The following SAS statements are used for this analysis:
proc glm;
class teach;
model score1 score2=teach;
manova h=teach / printh printe;
The first three statements produce the usual univariate analyses of the two scores as shown in Output 9.2. The MANOVA statement produces results in Output 9.3.
Output 9.3 Results of One-Way Multivariate Analysis: The MANOVA Statement
The GLM Procedure
Multivariate Analysis of Variance
E = Error SSCP Matrix
➊ | score1 | score2 |
score1 | 932.87878788 | 1018.6818182 |
score2 | 1018.6818182 | 1243.9415584 |
➋ Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 28 | score1 | score2 |
score1 | 1.000000 | 0.945640 <.0001 |
score2 | 0.945640 <.0001 |
1.000000 |
➌ H = Type III SSCP Matrix for teach
score1 | score2 | |
score1 | 60.605083089 | 31.511730205 |
score2 | 31.511730205 | 49.735860913 |
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for teach
E = Error SSCP Matrix
➍ Characteristic Characteristic Vector V'EV=1
Root | Percent | score1 | score2 |
0.43098027 | 91.86 | -0.10044686 | 0.08416103 |
0.03821194 | 8.14 | 0.00675930 | 0.02275380 |
MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall teach Effect
➎ H = Type III SSCP Matrix for teach
E = Error SSCP Matrix
S=2 M=-0.5 N=12.5
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.67310116 | 2.95 | 4 | 54 | 0.0279 |
Pillai's Trace | 0.33798387 | 2.85 | 4 | 56 | 0.0322 |
Hotelling-Lawley Trace | 0.46919220 | 3.13 | 4 | 31.389 | 0.0281 |
Roy's Greatest Root | 0.43098027 | 6.03 | 2 | 28 | 0.0066 |
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.
The PRINTH and PRINTE options cause the printing of the hypothesis and error matrices, respectively. In addition, the PRINTE option produces a matrix of partial correlation coefficients derived from the error SSCP matrix. This correlation matrix represents the correlations of the dependent variables corrected for all the independent factors in the MODEL statement.
The results in Output 9.3 are described below. The callout numbers have been added to the output to key the following descriptions:
➊ The elements of the error matrix. The diagonal elements of this matrix represent the error sums of squares from the corresponding univariate analyses (see Output 9.2).
➋ The associated partial correlation matrix. In this example it appears that SCORE1 and SCORE2 are highly correlated (r=0.945640).
➌ The elements of the hypothesis matrix, H. Again, the diagonal elements correspond to the hypothesis sums of squares from the corresponding univariate analysis.
➍ The characteristic roots and vectors of E-1H. The elements of the characteristic vector describe a linear combination of the analysis variables that produces the largest possible univariate F-ratio.
➎ The four test statistics previously discussed. The values of S, M, and N, printed above the table of statistics, provide information that is used in constructing the F-approximations for the criteria. (For more information, see Morrison (1976).) All four tests give similar results, although this is not always the case. Note that the p-values for the “Hypothesis of No Overall TEACH Effect” are much lower for the multivariate tests than any of the univariate tests would indicate. This is an example of how viewing a set of variables together can help you detect differences that you would not detect by looking at the individual variables.
Consider a common situation in multivariate analysis. You have several different measurements taken on each of several subjects, and you want to know if the means of the different variables are all the same. For example, you may have used different recording devices to measure the same phenomenon, or you may have observed subjects under a variety of conditions and administered a test under each of the conditions. If you had only two means to compare, you could use the familiar t-test, but it is important to use an analysis that takes into account the correlations among the dependent variables, just as in the previous examples, even though there are no independent factors in the model—that is, no terms on the right side of the MODEL statement. In this situation you could use Hotelling’s T2 test. As an example, consider the following data taken from Morrison (1976). Weight gains in rats given a special diet were measured at one (GAIN1), two (GAIN2), three (GAIN3), and four (GAIN4) weeks after administration of the diet. The question of interest is whether the rats’ weight gains stayed constant over the course of the experiment; in other words, were the mean weight gains of the rats the same at each of the four weeks? Output 9.4 shows the data.
Output 9.4 Data for Hotelling’s T2 Test
Obs | gain1 | gain2 | gain3 | gain4 |
1 | 29 | 28 | 25 | 33 |
2 | 33 | 30 | 23 | 31 |
3 | 25 | 34 | 33 | 41 |
4 | 18 | 33 | 29 | 35 |
5 | 25 | 23 | 17 | 30 |
6 | 24 | 32 | 29 | 22 |
7 | 20 | 23 | 16 | 31 |
8 | 28 | 21 | 18 | 24 |
9 | 18 | 23 | 22 | 28 |
10 | 25 | 28 | 29 | 30 |
Note that the following MODEL statement fits a model with only an intercept:
model gain1 gain2 gain3 gain4 = ;
Following this statement with a MANOVA statement to test the intercept tests the hypothesis that the means of the four weight gains are all 0. It does not test the hypothesis of interest—that the four means are equal. To test the hypothesis that the four means are equal, the dependent variables must be transformed in such a way that their transformed means being 0 will imply that the original means are equal for all four variables. It turns out that there are many different transformations that have this effect. (This problem is discussed in detail in Chapter 8, “Repeated-Measures Analysis.”) One simple transformation that achieves this goal is subtracting one of the variables from each of the other variables; in this example, the first gain could be subtracted from each of the other gains. You should understand that the only way all the means could be equal to each other is if each of these differences is 0 and that if all the means are equal, then all the differences must be 0. In this way, Hotelling’s T2 test for equality of means can be performed using the MANOVA statement.
One way of producing the transformed variables necessary to perform Hotelling’s T2 test is to produce new variables in a DATA step and to perform an analysis on the new variables. A quicker and more efficient way, however, is to use the M= option in the MANOVA statement. Using the M= option, you can perform an analysis on a set of variables that is a linear transformation of the original variables as listed on the left side of the equal sign in the MODEL statement.
The following SAS statements perform the appropriate analysis:
proc glm data=wtgain;
model gain1 gain2 gain3 gain4 = / nouni;
manova h=intercept
m=gain2-gain1, gain3-gain1, gain4-gain1
mnames=diff2 diff3 diff4 / summary;
The NOUNI option in the MODEL statement suppresses the individual analyses of the gain variables. This is done because the multivariate hypothesis of equality of the four means is the hypothesis of interest. The SUMMARY option in the MANOVA statement produces analysis-oF-variance tables of the transformed variables. The MNAMES= option provides labels for these transformed variables; if omitted, the procedure uses the names MVAR1, MVAR2, and so on. The results appear in Output 9.5.
Output 9.5 Analysis of Transformed Variables: Hotelling's T2 Test
The GLM Procedure
Number of observation 10
The ANOVA Procedure
Multivariate Analysis of Variance
M Matrix Describing Transformed Variables
gain1 | gain2 | gain3 | gain4 | |
diff2 | -1 | 1 | 0 | 0 |
diff3 | -1 | 0 | 1 | 0 |
diff4 | -1 | 0 | 0 | 1 |
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for Intercept
E = Error SSCP Matrix
Variables have been transformed by the M Matrix
Characteristic | Characteristic Vector V'EV=1 | |||
Root | Percent | diff2 | diff3 | diff4 |
2.97211676 | 100.00 | -0.11825538 | 0.11174598 | -0.02428445 |
0.00000000 | 0.00 | -0.11610475 | 0.05331366 | 0.06160662 |
0.00000000 | 0.00 | 0.00519921 | 0.03899407 | 0.00000000 |
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Intercept Effect
on the Variables Defined by the M Matrix Transformation
H = Type III SSCP Matrix for Intercept
E = Error SSCP Matrix
S=1 M=0.5 N=2.5
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.25175494 | 6.93 | 3 | 7 | 0.0167 |
Pillai's Trace | 0.74824506 | 6.93 | 3 | 7 | 0.0167 |
Hotelling-Lawley Trace | 2.97211676 | 6.93 | 3 | 7 | 0.0167 |
Roy's Greatest Root | 2.97211676 | 6.93 | 3 | 7 | 0.0167 |
Dependent Variable: diff2
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
Intercept | 1 | 90.0000000 | 90.0000000 | 2.10 | 0.1814 |
Error | 9 | 386.0000000 | 42.8888889 |
Dependent Variable: diff3
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
Intercept | 1 | 1.6000000 | 1.6000000 | 0.03 | 0.8735 |
Error | 9 | 536.4000000 | 59.6000000 |
Dependent Variable: diff4
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
Intercept | 1 | 360.0000000 | 360.0000000 | 6.53 | 0.0309 |
Error | 9 | 496.0000000 | 55.1111111 |
Although PROC ANOVA does not produce the Hotelling T2 statistic, it produces the correct F-test and probability level. Note that all the multivariate test statistics result in the same F-values and that they are labeled as “Exact F Statistics.” These tests are always exact when the hypothesis being tested has only 1 degree of freedom, such as the hypothesis of no INTERCEPT effect in this example. In order to calculate the actual value of the T2 statistic, the following formula can be used:
T2 = (nobs − 1)((1 / λ) − 1)
In the data set, nobs is the number of observations and λ is the value of Wilks’s Criterion that is printed by PROC ANOVA. (See Section 9.7, “Statistical Background.”) In this case, the calculation leads to T2 = (10 – 1)(1/0.2517 – 1) = 26.757.
The total weight of a mature cotton boll can be divided into three parts: the weight of the seeds, the weight of the lint, and the weight of the bract. Lint and seed constitute the economic yield of cotton.
In the following data, the differences in the three components of the cotton bolls due to two varieties (VARIETY) and two plant spacings (SPACING) are studied. Five plants are chosen at random from each of the four treatment combinations. Two bolls are picked from each plant, and the weights of the seeds, lint, and bract are recorded. The most appropriate error term for testing VARIETY, SPACING, and the interaction of the two is the variation among plants. In univariate analyses, the TEST statement is used to specify this alternative error term; for multivariate analyses, the error term is specified in the MANOVA statement or statements. (Each MANOVA statement can have, at most, one error term specified.)
The following SAS statements are used:
proc glm;
class variety spacing plant;
model seed lint bract=variety spacing variety*spacing
plant(variety spacing)/ss3;
test h=variety|spacing e=plant(variety spacing);
means variety|spacing;
manova h=variety|spacing e=plant(variety spacing);
The data used in this analysis appear in Output 9.6.
Output 9.6 Data for Two-Way Multivariate Analysis
Obs | variety | spacing | plant | seed | lint | bract |
1 | 213 | 30 | 3 | 3.1 | 1.7 | 2.0 |
2 | 213 | 30 | 3 | 1.5 | 1.7 | 1.4 |
3 | 213 | 30 | 5 | 3.0 | 1.9 | 1.8 |
4 | 213 | 30 | 5 | 1.4 | 0.9 | 1.3 |
5 | 213 | 30 | 6 | 2.3 | 1.7 | 1.5 |
6 | 213 | 30 | 6 | 2.2 | 2.0 | 1.4 |
7 | 213 | 30 | 8 | 0.4 | 0.9 | 1.2 |
8 | 213 | 30 | 8 | 1.7 | 1.6 | 1.3 |
9 | 213 | 30 | 9 | 1.8 | 1.2 | 1.0 |
10 | 213 | 30 | 9 | 1.2 | 0.8 | 1.0 |
11 | 213 | 40 | 0 | 2.0 | 1.0 | 1.9 |
12 | 213 | 40 | 0 | 1.5 | 1.5 | 1.7 |
13 | 213 | 40 | 1 | 1.8 | 1.1 | 2.1 |
14 | 213 | 40 | 1 | 1.0 | 1.3 | 1.1 |
15 | 213 | 40 | 2 | 1.3 | 1.1 | 1.3 |
16 | 213 | 40 | 2 | 2.9 | 1.9 | 1.7 |
17 | 213 | 40 | 3 | 2.8 | 1.2 | 1.3 |
18 | 213 | 40 | 3 | 1.8 | 1.2 | 1.2 |
19 | 213 | 40 | 4 | 3.2 | 1.8 | 2.0 |
20 | 213 | 40 | 4 | 3.2 | 1.6 | 1.9 |
21 | 37 | 30 | 1 | 3.2 | 2.6 | 1.4 |
22 | 37 | 30 | 1 | 2.8 | 2.1 | 1.2 |
23 | 37 | 30 | 2 | 3.6 | 2.4 | 1.5 |
24 | 37 | 30 | 2 | 0.9 | 0.8 | 0.8 |
25 | 37 | 30 | 3 | 4.0 | 3.1 | 1.8 |
26 | 37 | 30 | 3 | 4.0 | 2.9 | 1.5 |
27 | 37 | 30 | 5 | 3.7 | 2.7 | 1.6 |
28 | 37 | 30 | 5 | 2.6 | 1.5 | 1.3 |
29 | 37 | 30 | 8 | 2.8 | 2.2 | 1.2 |
30 | 37 | 30 | 8 | 2.9 | 2.3 | 1.2 |
31 | 37 | 40 | 1 | 4.1 | 2.9 | 2.0 |
32 | 37 | 40 | 1 | 3.4 | 2.0 | 1.6 |
33 | 37 | 40 | 3 | 3.7 | 2.3 | 2.0 |
34 | 37 | 40 | 3 | 3.2 | 2.2 | 1.8 |
35 | 37 | 40 | 4 | 3.4 | 2.7 | 1.5 |
36 | 37 | 40 | 4 | 2.9 | 2.1 | 1.2 |
37 | 37 | 40 | 5 | 2.5 | 1.4 | 0.8 |
38 | 37 | 40 | 5 | 3.6 | 2.4 | 1.6 |
39 | 37 | 40 | 6 | 3.1 | 2.3 | 1.4 |
40 | 37 | 40 | 6 | 2.5 | 1.5 | 1.5 |
Note that the TEST statement is used to obtain the appropriate error terms. The results of the three univariate analyses appear in Output 9.7.
Output 9.7 Results of Two-Way Multivariate Analysis: Univariate Analyses
Dependent Variable: seed
Sum of | |||||
Source | DF | Squares | Mean Square | F Value | Pr > F |
Model | 19 | 24.06500000 | 1.26657895 | 2.22 | 0.0425 |
Error | 20 | 11.43000000 | 0.57150000 | ||
Corrected Total | 39 | 35.49500000 |
R-Square | Coeff Var | Root MSE | seed Mean |
0.677983 | 29.35830 | 0.755976 | 2.575000 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
variety | 1 | 12.99600000 | 12.99600000 | 22.74 | 0.0001 |
spacing | 1 | 0.57600000 | 0.57600000 | 1.01 | 0.3274 |
variety*spacing | 1 | 0.02500000 | 0.02500000 | 0.04 | 0.8364 |
plant(variet*spacin) | 16 | 10.46800000 | 0.65425000 | 1.14 | 0.3823 |
Tests of Hypotheses Using the Type III MS for plant(variet*spacin) as an Error Term
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
variety | 1 | 12.99600000 | 12.99600000 | 19.86 | 0.0004 |
spacing | 1 | 0.57600000 | 0.57600000 | 0.88 | 0.3620 |
variety*spacing | 1 | 0.02500000 | 0.02500000 | 0.04 | 0.8475 |
Dependent Variable: lint
Sum of | |||||
Source | DF | Squares | Mean Square | F Value | Pr > F |
Model | 19 | 10.62875000 | 0.55940789 | 2.28 | 0.0377 |
Error | 20 | 4.91500000 | 0.24575000 | ||
Corrected Total | 39 | 15.54375000 |
R-Square | Coeff Var | Root MSE | lint Mean |
0.683796 | 27.35072 | 0.495732 | 1.812500 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
variety | 1 | 6.64225000 | 6.64225000 | 27.03 | <.0001 |
spacing | 1 | 0.05625000 | 0.05625000 | 0.23 | 0.6375 |
variety*spacing | 1 | 0.00025000 | 0.00025000 | 0.00 | 0.9749 |
plant(variet*spacin) | 16 | 3.93000000 | 0.24562500 | 1.00 | 0.4934 |
Tests of Hypotheses Using the Type III MS for plant(variet*spacin) as an Error Term
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
variety | 1 | 6.64225000 | 6.64225000 | 27.04 | <.0001 |
spacing | 1 | 0.05625000 | 0.05625000 | 0.23 | 0.6387 |
variety*spacing | 1 | 0.00025000 | 0.00025000 | 0.00 | 0.9749 |
Dependent Variable: bract
Sum of | |||||
Source | DF | Squares | Mean Square | F Value | Pr > F |
Model | 19 | 2.70500000 | 0.14236842 | 1.63 | 0.1442 |
Error | 20 | 1.75000000 | 0.08750000 | ||
Corrected Total | 39 | 4.45500000 | |||
R-Square | Coeff Var | Root MSE | bract Mean |
0.607183 | 20.05451 | 0.295804 | 1.475000 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
variety | 1 | 0.03600000 | 0.03600000 | 0.41 | 0.5285 |
spacing | 1 | 0.44100000 | 0.44100000 | 5.04 | 0.0362 |
variety*spacing | 1 | 0.00400000 | 0.00400000 | 0.05 | 0.8329 |
plant(variet*spacin) | 16 | 2.22400000 | 0.13900000 | 1.59 | 0.1626 |
Tests of Hypotheses Using the Type III MS for plant(variet*spacin) as an Error Term
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
variety | 1 | 0.03600000 | 0.03600000 | 0.26 | 0.6178 |
spacing | 1 | 0.44100000 | 0.44100000 | 3.17 | 0.0939 |
variety*spacing | 1 | 0.00400000 | 0.00400000 | 0.03 | 0.8674 |
VARIETY has a statistically significant effect on SEED and LINT; no effects are statistically significant for BRACT.
The means for all levels and combinations of levels of VARIETY and SPACING produced by the MEANS statement appear in Output 9.8.
Output 9.8 Results of Two-Way Multivariate Analysis: The MEANS Statement
The GLM Procedure
Level of | ---seed--- | ---lint--- | ---bract--- | ||||
variety | N | Mean | Std Dev | Mean | Std Dev | Mean | Std Dev |
37 | 20 | 3.14500000 | 0.72799291 | 2.22000000 | 0.57087191 | 1.44500000 | 0.32843328 |
213 | 20 | 2.00500000 | 0.80881655 | 1.40500000 | 0.37763112 | 1.50500000 | 0.35314378 |
Level of | ---seed--- | ---lint--- | ---bract--- | ||||
spacing | N | Mean | Std Dev | Mean | Std Dev | Mean | Std Dev |
30 | 20 | 2.45500000 | 1.04351380 | 1.85000000 | 0.70150215 | 1.37000000 | 0.29037181 |
40 | 20 | 2.69500000 | 0.86540225 | 1.77500000 | 0.56835404 | 1.58000000 | 0.35629674 |
Level of | Level of | ---seed--- | ---lint--- | |||
variety | spacing | N | Mean | Std Dev | Mean | Std Dev |
37 | 30 | 10 | 3.05000000 | 0.91439111 | 2.26000000 | 0.68182761 |
37 | 40 | 10 | 3.24000000 | 0.51251016 | 2.18000000 | 0.46856756 |
213 | 30 | 10 | 1.86000000 | 0.82219219 | 1.44000000 | 0.44771022 |
213 | 40 | 10 | 2.15000000 | 0.81137743 | 1.37000000 | 0.31287200 |
Level of | Level of | ---bract--- | ||
variety | spacing | N | Mean | Std Dev |
37 | 30 | 10 | 1.35000000 | 0.27588242 |
37 | 40 | 10 | 1.54000000 | 0.36270588 |
213 | 30 | 10 | 1.39000000 | 0.31780497 |
213 | 40 | 10 | 1.62000000 | 0.36453928 |
The results of the multivariate analyses appear in Output 9.9.
Output 9.9 Results of Two-Way Multivariate Analysis
The GLM Procedure
Multivariate Analysis of Variance
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for variety
E = Type III SSCP Matrix for plant(variet*spacin)
Characteristic | Characteristic Vector V'EV=1 | |||
Root | Percent | seed | lint | bract |
3.43919116 | 100.00 | 0.13061027 | 0.48969379 | -0.64083380 |
0.00000000 | 0.00 | -0.52400501 | 0.74035630 | 0.10041133 |
0.00000000 | 0.00 | 0.01704640 | 0.02228579 | 0.62659680 |
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall variety Effect
H = Type III SSCP Matrix for variety
E = Type III SSCP Matrix for plant(variet*spacin)
S=1 M=0.5 N=6
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.22526626 | 16.05 | 3 | 14 | <.0001 |
Pillai's Trace | 0.77473374 | 16.05 | 3 | 14 | <.0001 |
Hotelling-Lawley Trace | 3.43919116 | 16.05 | 3 | 14 | <.0001 |
Roy's Greatest Root | 3.43919116 | 16.05 | 3 | 14 | <.0001 |
The GLM Procedure
Multivariate Analysis of Variance
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for spacing
E = Type III SSCP Matrix for plant(variet*spacin)
Characteristic | Characteristic Vector V'EV=1 | |||
Root | Percent | seed | lint | bract |
0.63209472 | 100.00 | 0.27027458 | -0.73247531 | 0.62673044 |
0.00000000 | 0.00 | 0.24001974 | 0.22618207 | -0.19352896 |
0.00000000 | 0.00 | -0.40158816 | 0.44804654 | 0.61897451 |
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall spacing Effect
H = Type III SSCP Matrix for spacing
E = Type III SSCP Matrix for plant(variet*spacin)
S=1 M=0.5 N=6
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.61270954 | 2.95 | 3 | 14 | 0.0692 |
Pillai's Trace | 0.38729046 | 2.95 | 3 | 14 | 0.0692 |
Hotelling-Lawley Trace | 0.63209472 | 2.95 | 3 | 14 | 0.0692 |
Roy's Greatest Root | 0.63209472 | 2.95 | 3 | 14 | 0.0692 |
The GLM Procedure
Multivariate Analysis of Variance
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for variety*spacing
E = Type III SSCP Matrix for plant(variet*spacin)
Characteristic | Characteristic Vector V'EV=1 | |||
Root | Percent | seed | lint | bract |
0.00616711 | 100.00 | 0.42143581 | -0.67443149 | 0.35670210 |
0.00000000 | 0.00 | -0.33315217 | 0.01818488 | 0.82833421 |
0.00000000 | 0.00 | -0.05772656 | 0.57726561 | 0.00000000 |
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall variety*spacing Effect
H = Type III SSCP Matrix for variety*spacing
E = Type III SSCP Matrix for plant(variet*spacin)
S=1 M=0.5 N=6
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.99387069 | 0.03 | 3 | 14 | 0.9931 |
Pillai's Trace | 0.00612931 | 0.03 | 3 | 14 | 0.9931 |
Hotelling-Lawley Trace | 0.00616711 | 0.03 | 3 | 14 | 0.9931 |
Roy's Greatest Root | 0.00616711 | 0.03 | 3 | 14 | 0.9931 |
The only highly significant effect is VARIETY, and all four multivariate statistics are the same because there is only one hypothesis degree of freedom.
This section illustrates multivariate analysis of covariance using the data on orange sales presented in Output 7.6. Sales of two types of oranges are related to experimentally determined prices (PRICE) as well as stores (STORE) and days of the week (DAY). The analysis is expanded here to consider the simultaneous multivariate relationship of the price to both types of oranges.
The following SAS statements are used for the analysis:
proc glm;
class store day;
model q1 q2=store day p1 p2 / nouni;
manova h=store day p1 p2 / printh printe;
Note that PROC ANOVA is not appropriate in this situation because of the presence of the covariates. The NOUNI option in the MODEL statement suppresses printing of the univariate analyses that are already shown in Chapter 7, “Analysis of Covariance.” Results of the multivariate analysis appear in Output 9.10.
Output 9.10 Results of Multivariate Analysis of Covariance
The GLM Procedure
Multivariate Analysis of Variance
E = Error SSCP Matrix
q1 | q2 | |
q1 | 408.30824182 | 74.603217758 |
q2 | 74.603217758 | 706.94116552 |
Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 23 | q1 | q2 |
q1 | 1.000000 | 0.138858 0.5176 |
q2 | 0.138858 | 1.000000 0.5176 |
H = Type III SSCP Matrix for store
q1 | q2 | |
q1 | 223.83267344 | 93.801152319 |
q2 | 93.801152319 | 155.09933793 |
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for store
E = Error SSCP Matrix
Characteristic | Characteristic Vector V'EV=1 | ||
Root | Percent | q1 | q2 |
0.57363829 | 78.23 | 0.04622384 | 0.00941459 |
0.15960322 | 21.77 | -0.01899048 | 0.03679295 |
MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall store Effect
H = Type III SSCP Matrix for store
E = Error SSCP Matrix
S=2 M=1 N=10
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.54800645 | 1.54 | 10 | 44 | 0.1564 |
Pillai's Trace | 0.50216601 | 1.54 | 10 | 46 | 0.1553 |
Hotelling-Lawley Trace | 0.73324151 | 1.57 | 10 | 30.372 | 0.1634 |
Roy's Greatest Root | 0.57363829 | 2.64 | 5 | 23 | 0.0501 |
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.
H= Type III SSCP Matrix for day
q1 | q2 | |
q1 | 433.09686996 | 461.05064188 |
q2 | 461.05064188 | 614.4088834 |
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for day
E = Error SSCP Matrix
Characteristic | Characteristic Vector V'EV=1 | ||
Root | Percent | q1 | q2 |
1.60708776 | 93.18 | 0.03517603 | 0.02300242 |
0.11766546 | 6.82 | -0.03549548 | 0.03021993 |
MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall day Effect
H = Type III SSCP Matrix for day
E = Error SSCP Matrix
S=2 M=1 N=10
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.34318834 | 3.11 | 10 | 44 | 0.0044 |
Pillai's Trace | 0.72170813 | 2.60 | 10 | 46 | 0.0137 |
Hotelling-Lawley Trace | 1.72475321 | 3.69 | 10 | 30.372 | 0.0026 |
Roy's Greatest Root | 1.60708776 | 7.39 | 5 | 23 | 0.0003 |
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.
H = Type III SSCP Matrix for p1
q1 | q2 | |
q1 | 538.16885116 | -212.5196287 |
q2 | -212.5196287 | 83.922717744 |
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for p1
E = Error SSCP Matrix
Characteristic | Characteristic Vector V'EV=1 | ||
Root | Percent | q1 | q2 |
1.57701930 | 100.00 | 0.04805513 | -0.01539025 |
0.00000000 | 0.00 | 0.01371082 | 0.03472026 |
The GLM Procedure
Multivariate Analysis of Variance
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall p1 Effect
H = Type III SSCP Matrix for p1
E = Error SSCP Matrix
S=1 M=0 N=10
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.38804521 | 17.35 | 2 | 22 | <.0001 |
Pillai's Trace | 0.61195479 | 17.35 | 2 | 22 | <.0001 |
Hotelling-Lawley Trace | 1.57701930 | 17.35 | 2 | 22 | <.0001 |
Roy's Greatest Root | 1.57701930 | 17.35 | 2 | 22 | <.0001 |
H = Type III SSCP Matrix for p2
q1 | q2 | |
q1 | 39.542251923 | -183.5850939 |
q2 | -183.5850939 | 852.34110489 |
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for p2
E = Error SSCP Matrix
Characteristic | Characteristic Vector V'EV=1 | ||
Root | Percent | q1 | q2 |
1.42489030 | 100.00 | -0.01960102 | 0.03666503 |
0.00000000 | 0.00 | 0.04596827 | 0.00990107 |
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall p2 Effect
H = Type III SSCP Matrix for p2
E = Error SSCP Matrix
S=1 M=0 N=10
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.41238979 | 15.67 | 2 | 22 | <.0001 |
Pillai's Trace | 0.58761021 | 15.67 | 2 | 22 | <.0001 |
Hotelling-Lawley Trace | 1.42489030 | 15.67 | 2 | 22 | <.0001 |
Roy's Greatest Root | 1.42489030 | 15.67 | 2 | 22 | <.0001 |
For the STORE effect, none of the statistics produce significant results. This is not totally consistent with univariate results where sales of the first type of oranges are nearly significant at the 5% level.
The DAY effect is quite significant according to all statistics, although there is a considerable difference in the level of significance (Pr > F) among the statistics. Also, there is one dominant eigenvalue, indicating that the trend in sales over days is roughly parallel for the two types of oranges. This can be verified by printing and plotting the least-squares means.
The results for the PRICE effects are relatively straightforward. Even though both P1 and P2 are significant for only one dependent variable in the univariate analyses, the multivariate analysis indicates that their effects are substantial enough to be significant overall.
Chapter 3, “Analysis of Variance for Balanced Data,” discusses how specialized questions concerning certain levels of factors in univariate analyses can be answered by using the CONTRAST statement to define a hypothesis to be tested. This same technique can be useful in multivariate analyses of variance. The GLM procedure prints output for CONTRAST statements as part of its multivariate analysis. As an example, consider the study described in Section 9.5, “Multivariate Analysis of Covariance.” Assume you want to know if Saturday sales differ from weekday sales, averaged across the two types of oranges. Because the levels for DAY are coded as 1 through 6 corresponding to Monday through Saturday, you need to construct a contrast that compares the average of the first five levels of DAY to the sixth. The following SAS statement is required:
contrast 'SAT. vs. WEEKDAYS' day .2 .2 .2 .2 .2 -1;
The label SAT. vs. WEEKDAYS appears in the output to identify the contrast. If this CONTRAST statement is appended to the program of the previous section, preceding the MANOVA statement, then Output 9.11 is produced. This statement must precede the MANOVA statement so that PROC GLM will know that the multivariate test for the contrast is wanted.
Output 9.11 Results of Multivariate Analysis: The CONTRAST Statement
The GLM Procedure
Multivariate Analysis of Variance
H = Contrast SSCP Matrix for sat vs weekdays
q1 | q2 | |
q1 | 9.3680815712 | 7.8469238548 |
q2 | 7.8469238548 | 6.5727666348 |
Characteristic Roots and Vectors of: E Inverse * H, where
H = Contrast SSCP Matrix for sat vs weekdays
E = Error SSCP Matrix
Characteristic | Characteristic Vector V'EV=1 | ||
Root | Percent | q1 | q2 |
0.02873910 | 100.00 | 0.04110205 | 0.01705466 |
0.00000000 | 0.00 | -0.02842364 | 0.03393368 |
MANOVA Test Criteria and Exact F Statistics for the
Hypothesis of No Overall sat vs weekdays Effect
H = Contrast SSCP Matrix for sat vs weekdays
E = Error SSCP Matrix
S=1 M=0 N=10
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.97206377 | 0.32 | 2 | 22 | 0.7322 |
Pillai's Trace | 0.02793623 | 0.32 | 2 | 22 | 0.7322 |
Hotelling-Lawley Trace | 0.02873910 | 0.32 | 2 | 22 | 0.7322 |
Roy's Greatest Root | 0.02873910 | 0.32 | 2 | 22 | 0.7322 |
The results of the multivariate tests indicate no significant overall difference between the Saturday and weekday sales for the two types of oranges (Pr > F=0.7322).
The multivariate linear model can be written as
Y = XB + U
where
Y | is an nXk matrix of observed values of k dependent variables or responses. Each column corresponds to a specific dependent variable and each row to an observation. |
X | is an nXm matrix of n observations on the m independent variables (which may contain dummy variables). |
B | is an mXk matrix of regression coefficients or parameters. Each column of B is a vector of coefficients corresponding to each of k dependent variables, and each row contains the coefficients associated with each of m independent variables. |
U | is the nXk matrix of the n random errors, with columns corresponding to the dependent variables. |
The matrix of estimated coefficients is
Each column of is the vector of estimated coefficients that would be obtained by estimating coefficients for each response variable separately.
The partitioning of sums of squares is parallel to that developed in previous chapters except that the partitions consist of kXk matrices of sums of squares and crossproducts:
Sums of Squares | SSCP Matrix |
---|---|
TOTAL | ÝY |
MODEL | |
ERROR |
In the univariate analysis of variance, the F-statistic is the statistic of choice in most cases for testing hypotheses about the factors being considered. Recall that this statistic is derived by taking the ratio of two sums of squares, one derived from the hypothesis being tested and the other derived from an appropriate error term. In multivariate linear models, these sums of squares are replaced by matrices of sums of squares and crossproducts. These matrices are represented by H for the hypothesis, corresponding to the numerator sum of squares, and E for the error matrix, corresponding to the denominator sum of squares. Since division of matrices is not possible, E-1H is the matrix that is the basis for test statistics for multivariate hypotheses. Four different functions of this matrix are used as test statistics and are available in the GLM, ANOVA, and other multivariate procedures in SAS. Each of these statistics is a function of the characteristic roots (also known as eigenvalues) of the matrix E-1H. In the formulas below, represents the characteristic roots.
Corresponding to each characteristic root is a characteristic vector, or eigenvector, that represents a linear combination of the dependent variables being analyzed. A function of the characteristic root, λ/(1 + λ), is the value of R2 that would be obtained if the linear combination of dependent variables represented by the corresponding characteristic vector were used as the dependent variable in a univariate analysis of the same model. For this reason, that function of the characteristic root is sometimes called the canonical correlation. In the formulas below, represents the canonical correlations.
Hotelling-Lawley Trace |
Pillai's Trace |
|
= Tr(E-1H) | = Tr(H(H + E)-1) | |
= | = | |
= | = ∑ λi / (1 + λi | |
Wilks' Lambda |
Roy's Greatest Root |
|
= |E| / |H + E| | = max λi | |
= | ||
= |
Not one of these criteria has been identified as being universally superior to the others, although there are hypothesized situations where one criterion may outperform the others. Because we generally do not know the exact form of the alternative hypotheses being studied, the decision of which test criterion to use often becomes a matter of personal choice. Wilks’ criterion is derived from a likelihood-ratio approach and appeals to some statisticians on those grounds.
18.119.248.149