Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9 Multivariate Linear Models

9.2 A One-Way Multivariate Analysis of Variance

9.5 Multivariate Analysis of Covariance

9.6 Contrasts in Multivariate Analyses

9.1 Introduction

Although the methods presented in earlier chapters are sufficient for studying one dependent variable at a time, there are many situations in which several dependent variables are studied simultaneously. For example, in monitoring the growth of animals, a researcher might measure the length, weight, and girth of animals receiving different treatments. The goal of the experiment would be to see if the treatments had an effect on the growth of the animals. One way to determine this would be to use analysis of variance or regression methods to analyze the effects of the treatment on the length, weight, and girth of the animals. There are two problems with this approach. First, trying to interpret results produced by separate univariate (one variable at a time) analyses of each of the variables can be unwieldy. In the example with three dependent variables, this may not seem like a problem. But if you have ten or twenty dependent variables, the task could be substantial. Second, and more importantly, when many variables are studied simultaneously, they are almost always correlated—that is, the value of each variable may be related to the values of others. This is true for measurements such as height and weight, or responses to similar questions on a questionnaire. In cases like these, considering the univariate analyses separately would not take into account information contained in the data due to the correlation. Moreover, this approach could mislead a naïve researcher into believing that a factor has a very significant effect, when in fact it does not. On the other hand, a significant effect that only becomes apparent when all the dependent variables are studied simultaneously may not be discovered from the univariate analyses alone. In most multivariate data applications, you should usually examine the results of the multivariate tests first, then examine the univariate analyses cautiously if significant results do not appear in the multivariate analysis.

Although some of the details of a multivariate analysis differ from those of a univariate analysis, the two are similar in many ways. Experimental factors of interest are related to the dependent variables by a linear model, and functions of sums of squares are computed to test hypotheses about these factors. In general, if you have designed an experiment with only one dependent variable, the extension of the analysis to the multivariate case can be carried out in a very straightforward manner.

We present examples of several types of multivariate analyses. Basic theory for the methods is presented in Section 9.7, “Statistical Background.”

9.2 A One-Way Multivariate Analysis of Variance

Test scores from two exams taken by students with three different teachers are analyzed in Output 9.1.

Output 9.1 Two Exam Scores for Students in Three Teachers’ Classes

Obs	teach	score1	score2

1	JAY	69	75
2	JAY	69	70
3	JAY	71	73
4	JAY	78	82
5	JAY	79	81
6	JAY	73	75
7	PAT	69	70
8	PAT	68	74
9	PAT	75	80
10	PAT	78	85
11	PAT	68	68
12	PAT	63	68
13	PAT	72	74
14	PAT	63	66
15	PAT	71	76
16	PAT	72	78
17	PAT	71	73
18	PAT	70	73
19	PAT	56	59
20	PAT	77	79
22	ROBIN	64	65
23	ROBIN	74	74
24	ROBIN	72	75
25	ROBIN	82	84
26	ROBIN	69	68
27	ROBIN	76	76
28	ROBIN	68	65
29	ROBIN	78	79
30	ROBIN	70	71
31	ROBIN	60	61

We first perform an analysis of variance to compare the teacher means for each variable, SCORE1 and SCORE2. Run the SAS statements for these analyses:

proc glm;
class teach;
model score1 score2=teach;

Results are shown in Output 9.2. There is no difference among teachers when considering a univariate analysis of either SCORE1 or SCORE2.

Output 9.2 Results of Univariate One-Way Analysis

The GLM Procedure

Dependent Variable: score1

		Sum of
Source	DF	Squares	Mean Square	F Value	Pr > F

Model	2	60.6050831	30.3025415	0.91	0.4143

Error	28	932.8787879	33.3170996
Corrected Total	30	993.4838710

R-Square	Coeff Var	Root MSE	score1 Mean

0.061003	8.144515	5.772097	70.87097


Source	DF	Type III SS	Mean Square	F Value	Pr > F

teach	2	60.60508309	30.30254154	0.91	0.4143

The GLM Procedure

Dependent Variable: score2

		Sum of
Source	DF	Squares	Mean Square	F Value	Pr > F

Model	2	49.735861	24.867930	0.56	0.5776

Error	28	1243.941558	44.426484

Corrected Total	30	1293.677419


R-Square	Coeff Var	Root MSE	score2 Mean

0.038445	9.062496	6.665320	73.54839


Source	DF	Type III SS	Mean Square	F Value	Pr > F

teach	2	49.73586091	24.86793046	0.56	0.5776

The objective now is to see if a model using both scores shows a difference among teachers. The following SAS statements are used for this analysis:

  proc glm;
        class teach;
        model score1 score2=teach;
        manova h=teach / printh printe;

The first three statements produce the usual univariate analyses of the two scores as shown in Output 9.2. The MANOVA statement produces results in Output 9.3.

Output 9.3 Results of One-Way Multivariate Analysis: The MANOVA Statement

The GLM Procedure
Multivariate Analysis of Variance

E = Error SSCP Matrix

➊	score1	score2

score1	932.87878788	1018.6818182
score2	1018.6818182	1243.9415584

➋ Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|

DF = 28	score1	score2

score1	1.000000	0.945640 <.0001

score2	0.945640 <.0001	1.000000

➌ H = Type III SSCP Matrix for teach

	score1	score2

score1	60.605083089	31.511730205
score2	31.511730205	49.735860913

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for teach
E = Error SSCP Matrix

➍ Characteristic Characteristic Vector V'EV=1

Root	Percent	score1	score2

0.43098027	91.86	-0.10044686	0.08416103
0.03821194	8.14	0.00675930	0.02275380

MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall teach Effect

➎ H = Type III SSCP Matrix for teach
E = Error SSCP Matrix

S=2 M=-0.5 N=12.5

Statistic	Value	F Value	Num DF	Den DF	Pr > F

Wilks' Lambda	0.67310116	2.95	4	54	0.0279
Pillai's Trace	0.33798387	2.85	4	56	0.0322
Hotelling-Lawley Trace	0.46919220	3.13	4	31.389	0.0281
Roy's Greatest Root	0.43098027	6.03	2	28	0.0066

NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.

The PRINTH and PRINTE options cause the printing of the hypothesis and error matrices, respectively. In addition, the PRINTE option produces a matrix of partial correlation coefficients derived from the error SSCP matrix. This correlation matrix represents the correlations of the dependent variables corrected for all the independent factors in the MODEL statement.

The results in Output 9.3 are described below. The callout numbers have been added to the output to key the following descriptions:

➊ The elements of the error matrix. The diagonal elements of this matrix represent the error sums of squares from the corresponding univariate analyses (see Output 9.2).

➋ The associated partial correlation matrix. In this example it appears that SCORE1 and SCORE2 are highly correlated (r=0.945640).

➌ The elements of the hypothesis matrix, H. Again, the diagonal elements correspond to the hypothesis sums of squares from the corresponding univariate analysis.

➍ The characteristic roots and vectors of E^-1H. The elements of the characteristic vector describe a linear combination of the analysis variables that produces the largest possible univariate F-ratio.

➎ The four test statistics previously discussed. The values of S, M, and N, printed above the table of statistics, provide information that is used in constructing the F-approximations for the criteria. (For more information, see Morrison (1976).) All four tests give similar results, although this is not always the case. Note that the p-values for the “Hypothesis of No Overall TEACH Effect” are much lower for the multivariate tests than any of the univariate tests would indicate. This is an example of how viewing a set of variables together can help you detect differences that you would not detect by looking at the individual variables.

9.3 Hotelling’s T² Test

Consider a common situation in multivariate analysis. You have several different measurements taken on each of several subjects, and you want to know if the means of the different variables are all the same. For example, you may have used different recording devices to measure the same phenomenon, or you may have observed subjects under a variety of conditions and administered a test under each of the conditions. If you had only two means to compare, you could use the familiar t-test, but it is important to use an analysis that takes into account the correlations among the dependent variables, just as in the previous examples, even though there are no independent factors in the model—that is, no terms on the right side of the MODEL statement. In this situation you could use Hotelling’s T² test. As an example, consider the following data taken from Morrison (1976). Weight gains in rats given a special diet were measured at one (GAIN1), two (GAIN2), three (GAIN3), and four (GAIN4) weeks after administration of the diet. The question of interest is whether the rats’ weight gains stayed constant over the course of the experiment; in other words, were the mean weight gains of the rats the same at each of the four weeks? Output 9.4 shows the data.

Output 9.4 Data for Hotelling’s T² Test

Obs	gain1	gain2	gain3	gain4

1	29	28	25	33
2	33	30	23	31
3	25	34	33	41
4	18	33	29	35
5	25	23	17	30
6	24	32	29	22
7	20	23	16	31
8	28	21	18	24
9	18	23	22	28
10	25	28	29	30

Note that the following MODEL statement fits a model with only an intercept:

model gain1 gain2 gain3 gain4 = ;

Following this statement with a MANOVA statement to test the intercept tests the hypothesis that the means of the four weight gains are all 0. It does not test the hypothesis of interest—that the four means are equal. To test the hypothesis that the four means are equal, the dependent variables must be transformed in such a way that their transformed means being 0 will imply that the original means are equal for all four variables. It turns out that there are many different transformations that have this effect. (This problem is discussed in detail in Chapter 8, “Repeated-Measures Analysis.”) One simple transformation that achieves this goal is subtracting one of the variables from each of the other variables; in this example, the first gain could be subtracted from each of the other gains. You should understand that the only way all the means could be equal to each other is if each of these differences is 0 and that if all the means are equal, then all the differences must be 0. In this way, Hotelling’s T² test for equality of means can be performed using the MANOVA statement.

One way of producing the transformed variables necessary to perform Hotelling’s T² test is to produce new variables in a DATA step and to perform an analysis on the new variables. A quicker and more efficient way, however, is to use the M= option in the MANOVA statement. Using the M= option, you can perform an analysis on a set of variables that is a linear transformation of the original variables as listed on the left side of the equal sign in the MODEL statement.

The following SAS statements perform the appropriate analysis:

  proc glm data=wtgain;
        model gain1 gain2 gain3 gain4 = / nouni;
        manova h=intercept
               m=gain2-gain1, gain3-gain1, gain4-gain1
               mnames=diff2 diff3 diff4 / summary;

The NOUNI option in the MODEL statement suppresses the individual analyses of the gain variables. This is done because the multivariate hypothesis of equality of the four means is the hypothesis of interest. The SUMMARY option in the MANOVA statement produces analysis-oF-variance tables of the transformed variables. The MNAMES= option provides labels for these transformed variables; if omitted, the procedure uses the names MVAR1, MVAR2, and so on. The results appear in Output 9.5.

Output 9.5 Analysis of Transformed Variables: Hotelling's T2 Test

The GLM Procedure

Number of observation 10
The ANOVA Procedure
Multivariate Analysis of Variance

M Matrix Describing Transformed Variables

	gain1	gain2	gain3	gain4
diff2	-1	1	0	0
diff3	-1	0	1	0
diff4	-1	0	0	1

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for Intercept
E = Error SSCP Matrix

Variables have been transformed by the M Matrix

Characteristic		Characteristic Vector V'EV=1
Root	Percent	diff2	diff3	diff4

2.97211676	100.00	-0.11825538	0.11174598	-0.02428445
0.00000000	0.00	-0.11610475	0.05331366	0.06160662
0.00000000	0.00	0.00519921	0.03899407	0.00000000

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Intercept Effect

on the Variables Defined by the M Matrix Transformation
H = Type III SSCP Matrix for Intercept
E = Error SSCP Matrix

S=1 M=0.5 N=2.5

Statistic	Value	F Value	Num DF	Den DF	Pr > F

Wilks' Lambda	0.25175494	6.93	3	7	0.0167
Pillai's Trace	0.74824506	6.93	3	7	0.0167
Hotelling-Lawley Trace	2.97211676	6.93	3	7	0.0167
Roy's Greatest Root	2.97211676	6.93	3	7	0.0167

Dependent Variable: diff2

Source	DF	Type III SS	Mean Square	F Value	Pr > F
Intercept	1	90.0000000	90.0000000	2.10	0.1814
Error	9	386.0000000	42.8888889

Dependent Variable: diff3

Source	DF	Type III SS	Mean Square	F Value	Pr > F
Intercept	1	1.6000000	1.6000000	0.03	0.8735
Error	9	536.4000000	59.6000000

Dependent Variable: diff4

Source	DF	Type III SS	Mean Square	F Value	Pr > F
Intercept	1	360.0000000	360.0000000	6.53	0.0309
Error	9	496.0000000	55.1111111

Although PROC ANOVA does not produce the Hotelling T² statistic, it produces the correct F-test and probability level. Note that all the multivariate test statistics result in the same F-values and that they are labeled as “Exact F Statistics.” These tests are always exact when the hypothesis being tested has only 1 degree of freedom, such as the hypothesis of no INTERCEPT effect in this example. In order to calculate the actual value of the T² statistic, the following formula can be used:

T² = (nobs − 1)((1 / λ) − 1)

In the data set, nobs is the number of observations and λ is the value of Wilks’s Criterion that is printed by PROC ANOVA. (See Section 9.7, “Statistical Background.”) In this case, the calculation leads to T² = (10 – 1)(1/0.2517 – 1) = 26.757.

9.4 A Two-Factor Factorial

The total weight of a mature cotton boll can be divided into three parts: the weight of the seeds, the weight of the lint, and the weight of the bract. Lint and seed constitute the economic yield of cotton.

In the following data, the differences in the three components of the cotton bolls due to two varieties (VARIETY) and two plant spacings (SPACING) are studied. Five plants are chosen at random from each of the four treatment combinations. Two bolls are picked from each plant, and the weights of the seeds, lint, and bract are recorded. The most appropriate error term for testing VARIETY, SPACING, and the interaction of the two is the variation among plants. In univariate analyses, the TEST statement is used to specify this alternative error term; for multivariate analyses, the error term is specified in the MANOVA statement or statements. (Each MANOVA statement can have, at most, one error term specified.)

The following SAS statements are used:

proc glm;
   class variety spacing plant;
   model seed lint bract=variety spacing variety*spacing
                         plant(variety spacing)/ss3;
   test h=variety|spacing e=plant(variety spacing);
   means variety|spacing;
   manova h=variety|spacing e=plant(variety spacing);

The data used in this analysis appear in Output 9.6.

Output 9.6 Data for Two-Way Multivariate Analysis

Obs	variety	spacing	plant	seed	lint	bract

1	213	30	3	3.1	1.7	2.0
2	213	30	3	1.5	1.7	1.4
3	213	30	5	3.0	1.9	1.8
4	213	30	5	1.4	0.9	1.3
5	213	30	6	2.3	1.7	1.5
6	213	30	6	2.2	2.0	1.4
7	213	30	8	0.4	0.9	1.2
8	213	30	8	1.7	1.6	1.3
9	213	30	9	1.8	1.2	1.0
10	213	30	9	1.2	0.8	1.0
11	213	40	0	2.0	1.0	1.9
12	213	40	0	1.5	1.5	1.7
13	213	40	1	1.8	1.1	2.1
14	213	40	1	1.0	1.3	1.1
15	213	40	2	1.3	1.1	1.3
16	213	40	2	2.9	1.9	1.7
17	213	40	3	2.8	1.2	1.3
18	213	40	3	1.8	1.2	1.2
19	213	40	4	3.2	1.8	2.0
20	213	40	4	3.2	1.6	1.9
21	37	30	1	3.2	2.6	1.4
22	37	30	1	2.8	2.1	1.2
23	37	30	2	3.6	2.4	1.5
24	37	30	2	0.9	0.8	0.8
25	37	30	3	4.0	3.1	1.8
26	37	30	3	4.0	2.9	1.5
27	37	30	5	3.7	2.7	1.6
28	37	30	5	2.6	1.5	1.3
29	37	30	8	2.8	2.2	1.2
30	37	30	8	2.9	2.3	1.2
31	37	40	1	4.1	2.9	2.0
32	37	40	1	3.4	2.0	1.6
33	37	40	3	3.7	2.3	2.0
34	37	40	3	3.2	2.2	1.8
35	37	40	4	3.4	2.7	1.5
36	37	40	4	2.9	2.1	1.2
37	37	40	5	2.5	1.4	0.8
38	37	40	5	3.6	2.4	1.6
39	37	40	6	3.1	2.3	1.4
40	37	40	6	2.5	1.5	1.5

Note that the TEST statement is used to obtain the appropriate error terms. The results of the three univariate analyses appear in Output 9.7.

Output 9.7 Results of Two-Way Multivariate Analysis: Univariate Analyses

Dependent Variable: seed

		Sum of
Source	DF	Squares	Mean Square	F Value	Pr > F

Model	19	24.06500000	1.26657895	2.22	0.0425

Error	20	11.43000000	0.57150000

Corrected Total	39	35.49500000


R-Square	Coeff Var	Root MSE	seed Mean

0.677983	29.35830	0.755976	2.575000

Source	DF	Type III SS	Mean Square	F Value	Pr > F

variety	1	12.99600000	12.99600000	22.74	0.0001
spacing	1	0.57600000	0.57600000	1.01	0.3274
variety*spacing	1	0.02500000	0.02500000	0.04	0.8364
plant(variet*spacin)	16	10.46800000	0.65425000	1.14	0.3823

Tests of Hypotheses Using the Type III MS for plant(variet*spacin) as an Error Term


Source	DF	Type III SS	Mean Square	F Value	Pr > F

variety	1	12.99600000	12.99600000	19.86	0.0004
spacing	1	0.57600000	0.57600000	0.88	0.3620
variety*spacing	1	0.02500000	0.02500000	0.04	0.8475

Dependent Variable: lint

		Sum of
Source	DF	Squares	Mean Square	F Value	Pr > F

Model	19	10.62875000	0.55940789	2.28	0.0377

Error	20	4.91500000	0.24575000

Corrected Total	39	15.54375000


R-Square	Coeff Var	Root MSE	lint Mean

0.683796	27.35072	0.495732	1.812500

Source	DF	Type III SS	Mean Square	F Value	Pr > F

variety	1	6.64225000	6.64225000	27.03	<.0001
spacing	1	0.05625000	0.05625000	0.23	0.6375
variety*spacing	1	0.00025000	0.00025000	0.00	0.9749
plant(variet*spacin)	16	3.93000000	0.24562500	1.00	0.4934

Tests of Hypotheses Using the Type III MS for plant(variet*spacin) as an Error Term


Source	DF	Type III SS	Mean Square	F Value	Pr > F

variety	1	6.64225000	6.64225000	27.04	<.0001
spacing	1	0.05625000	0.05625000	0.23	0.6387
variety*spacing	1	0.00025000	0.00025000	0.00	0.9749

Dependent Variable: bract

		Sum of
Source	DF	Squares	Mean Square	F Value	Pr > F

Model	19	2.70500000	0.14236842	1.63	0.1442

Error	20	1.75000000	0.08750000

Corrected Total	39	4.45500000

R-Square	Coeff Var	Root MSE	bract Mean

0.607183	20.05451	0.295804	1.475000

Source	DF	Type III SS	Mean Square	F Value	Pr > F

variety	1	0.03600000	0.03600000	0.41	0.5285
spacing	1	0.44100000	0.44100000	5.04	0.0362
variety*spacing	1	0.00400000	0.00400000	0.05	0.8329
plant(variet*spacin)	16	2.22400000	0.13900000	1.59	0.1626

Tests of Hypotheses Using the Type III MS for plant(variet*spacin) as an Error Term


Source	DF	Type III SS	Mean Square	F Value	Pr > F

variety	1	0.03600000	0.03600000	0.26	0.6178
spacing	1	0.44100000	0.44100000	3.17	0.0939
variety*spacing	1	0.00400000	0.00400000	0.03	0.8674

VARIETY has a statistically significant effect on SEED and LINT; no effects are statistically significant for BRACT.

The means for all levels and combinations of levels of VARIETY and SPACING produced by the MEANS statement appear in Output 9.8.

Output 9.8 Results of Two-Way Multivariate Analysis: The MEANS Statement

The GLM Procedure

Level of		---seed---		---lint---		---bract---
variety	N	Mean	Std Dev	Mean	Std Dev	Mean	Std Dev

37	20	3.14500000	0.72799291	2.22000000	0.57087191	1.44500000	0.32843328
213	20	2.00500000	0.80881655	1.40500000	0.37763112	1.50500000	0.35314378

Level of		---seed---		---lint---		---bract---
spacing	N	Mean	Std Dev	Mean	Std Dev	Mean	Std Dev

30	20	2.45500000	1.04351380	1.85000000	0.70150215	1.37000000	0.29037181
40	20	2.69500000	0.86540225	1.77500000	0.56835404	1.58000000	0.35629674

Level of	Level of		---seed---		---lint---
variety	spacing	N	Mean	Std Dev	Mean	Std Dev

37	30	10	3.05000000	0.91439111	2.26000000	0.68182761
37	40	10	3.24000000	0.51251016	2.18000000	0.46856756
213	30	10	1.86000000	0.82219219	1.44000000	0.44771022
213	40	10	2.15000000	0.81137743	1.37000000	0.31287200

Level of	Level of		---bract---
variety	spacing	N	Mean	Std Dev

37	30	10	1.35000000	0.27588242
37	40	10	1.54000000	0.36270588
213	30	10	1.39000000	0.31780497
213	40	10	1.62000000	0.36453928

The results of the multivariate analyses appear in Output 9.9.

Output 9.9 Results of Two-Way Multivariate Analysis

The GLM Procedure
Multivariate Analysis of Variance

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for variety
E = Type III SSCP Matrix for plant(variet*spacin)

Characteristic		Characteristic Vector V'EV=1
Root	Percent	seed	lint	bract

3.43919116	100.00	0.13061027	0.48969379	-0.64083380
0.00000000	0.00	-0.52400501	0.74035630	0.10041133
0.00000000	0.00	0.01704640	0.02228579	0.62659680

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall variety Effect

H = Type III SSCP Matrix for variety
E = Type III SSCP Matrix for plant(variet*spacin)

S=1 M=0.5 N=6

Statistic	Value	F Value	Num DF	Den DF	Pr > F

Wilks' Lambda	0.22526626	16.05	3	14	<.0001
Pillai's Trace	0.77473374	16.05	3	14	<.0001
Hotelling-Lawley Trace	3.43919116	16.05	3	14	<.0001
Roy's Greatest Root	3.43919116	16.05	3	14	<.0001

The GLM Procedure
Multivariate Analysis of Variance

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for spacing
E = Type III SSCP Matrix for plant(variet*spacin)

Characteristic		Characteristic Vector V'EV=1
Root	Percent	seed	lint	bract

0.63209472	100.00	0.27027458	-0.73247531	0.62673044
0.00000000	0.00	0.24001974	0.22618207	-0.19352896
0.00000000	0.00	-0.40158816	0.44804654	0.61897451

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall spacing Effect
H = Type III SSCP Matrix for spacing
E = Type III SSCP Matrix for plant(variet*spacin)

S=1 M=0.5 N=6

Statistic	Value	F Value	Num DF	Den DF	Pr > F

Wilks' Lambda	0.61270954	2.95	3	14	0.0692
Pillai's Trace	0.38729046	2.95	3	14	0.0692
Hotelling-Lawley Trace	0.63209472	2.95	3	14	0.0692
Roy's Greatest Root	0.63209472	2.95	3	14	0.0692

The GLM Procedure
Multivariate Analysis of Variance

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for variety*spacing
E = Type III SSCP Matrix for plant(variet*spacin)

Characteristic		Characteristic Vector V'EV=1
Root	Percent	seed	lint	bract
0.00616711	100.00	0.42143581	-0.67443149	0.35670210
0.00000000	0.00	-0.33315217	0.01818488	0.82833421
0.00000000	0.00	-0.05772656	0.57726561	0.00000000

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall variety*spacing Effect
H = Type III SSCP Matrix for variety*spacing
E = Type III SSCP Matrix for plant(variet*spacin)

S=1 M=0.5 N=6

Statistic	Value	F Value	Num DF	Den DF	Pr > F

Wilks' Lambda	0.99387069	0.03	3	14	0.9931
Pillai's Trace	0.00612931	0.03	3	14	0.9931
Hotelling-Lawley Trace	0.00616711	0.03	3	14	0.9931
Roy's Greatest Root	0.00616711	0.03	3	14	0.9931

The only highly significant effect is VARIETY, and all four multivariate statistics are the same because there is only one hypothesis degree of freedom.

9.5 Multivariate Analysis of Covariance

This section illustrates multivariate analysis of covariance using the data on orange sales presented in Output 7.6. Sales of two types of oranges are related to experimentally determined prices (PRICE) as well as stores (STORE) and days of the week (DAY). The analysis is expanded here to consider the simultaneous multivariate relationship of the price to both types of oranges.

The following SAS statements are used for the analysis:

proc glm;
   class store day;
   model q1 q2=store day p1 p2 / nouni;
   manova h=store day p1 p2 / printh printe;

Note that PROC ANOVA is not appropriate in this situation because of the presence of the covariates. The NOUNI option in the MODEL statement suppresses printing of the univariate analyses that are already shown in Chapter 7, “Analysis of Covariance.” Results of the multivariate analysis appear in Output 9.10.

Output 9.10 Results of Multivariate Analysis of Covariance

The GLM Procedure
Multivariate Analysis of Variance

E = Error SSCP Matrix

	q1	q2
q1	408.30824182	74.603217758
q2	74.603217758	706.94116552

Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|

DF = 23	q1	q2
q1	1.000000	0.138858 0.5176

q2	0.138858	1.000000 0.5176

H = Type III SSCP Matrix for store

	q1	q2

q1	223.83267344	93.801152319
q2	93.801152319	155.09933793

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for store
E = Error SSCP Matrix

Characteristic		Characteristic Vector V'EV=1
Root	Percent	q1	q2

0.57363829	78.23	0.04622384	0.00941459
0.15960322	21.77	-0.01899048	0.03679295

MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall store Effect
H = Type III SSCP Matrix for store
E = Error SSCP Matrix

S=2 M=1 N=10

Statistic	Value	F Value	Num DF	Den DF	Pr > F

Wilks' Lambda	0.54800645	1.54	10	44	0.1564
Pillai's Trace	0.50216601	1.54	10	46	0.1553
Hotelling-Lawley Trace	0.73324151	1.57	10	30.372	0.1634
Roy's Greatest Root	0.57363829	2.64	5	23	0.0501

NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.

H= Type III SSCP Matrix for day

	q1	q2

q1	433.09686996	461.05064188
q2	461.05064188	614.4088834

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for day
E = Error SSCP Matrix

Characteristic		Characteristic Vector V'EV=1
Root	Percent	q1	q2
1.60708776	93.18	0.03517603	0.02300242
0.11766546	6.82	-0.03549548	0.03021993

MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall day Effect
H = Type III SSCP Matrix for day
E = Error SSCP Matrix

S=2 M=1 N=10

Statistic	Value	F Value	Num DF	Den DF	Pr > F

Wilks' Lambda	0.34318834	3.11	10	44	0.0044
Pillai's Trace	0.72170813	2.60	10	46	0.0137
Hotelling-Lawley Trace	1.72475321	3.69	10	30.372	0.0026
Roy's Greatest Root	1.60708776	7.39	5	23	0.0003

NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.

H = Type III SSCP Matrix for p1

	q1	q2
q1	538.16885116	-212.5196287
q2	-212.5196287	83.922717744

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for p1
E = Error SSCP Matrix

Characteristic		Characteristic Vector V'EV=1
Root	Percent	q1	q2
1.57701930	100.00	0.04805513	-0.01539025
0.00000000	0.00	0.01371082	0.03472026

The GLM Procedure
Multivariate Analysis of Variance

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall p1 Effect
H = Type III SSCP Matrix for p1
E = Error SSCP Matrix

S=1 M=0 N=10

Statistic	Value	F Value	Num DF	Den DF	Pr > F

Wilks' Lambda	0.38804521	17.35	2	22	<.0001
Pillai's Trace	0.61195479	17.35	2	22	<.0001
Hotelling-Lawley Trace	1.57701930	17.35	2	22	<.0001
Roy's Greatest Root	1.57701930	17.35	2	22	<.0001

H = Type III SSCP Matrix for p2

	q1	q2

q1	39.542251923	-183.5850939
q2	-183.5850939	852.34110489

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for p2
E = Error SSCP Matrix

Characteristic		Characteristic Vector V'EV=1
Root	Percent	q1	q2
1.42489030	100.00	-0.01960102	0.03666503
0.00000000	0.00	0.04596827	0.00990107

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall p2 Effect
H = Type III SSCP Matrix for p2
E = Error SSCP Matrix

S=1 M=0 N=10

Statistic	Value	F Value	Num DF	Den DF	Pr > F

Wilks' Lambda	0.41238979	15.67	2	22	<.0001
Pillai's Trace	0.58761021	15.67	2	22	<.0001
Hotelling-Lawley Trace	1.42489030	15.67	2	22	<.0001
Roy's Greatest Root	1.42489030	15.67	2	22	<.0001

For the STORE effect, none of the statistics produce significant results. This is not totally consistent with univariate results where sales of the first type of oranges are nearly significant at the 5% level.

The DAY effect is quite significant according to all statistics, although there is a considerable difference in the level of significance (Pr > F) among the statistics. Also, there is one dominant eigenvalue, indicating that the trend in sales over days is roughly parallel for the two types of oranges. This can be verified by printing and plotting the least-squares means.

The results for the PRICE effects are relatively straightforward. Even though both P1 and P2 are significant for only one dependent variable in the univariate analyses, the multivariate analysis indicates that their effects are substantial enough to be significant overall.

9.6 Contrasts in Multivariate Analyses

Chapter 3, “Analysis of Variance for Balanced Data,” discusses how specialized questions concerning certain levels of factors in univariate analyses can be answered by using the CONTRAST statement to define a hypothesis to be tested. This same technique can be useful in multivariate analyses of variance. The GLM procedure prints output for CONTRAST statements as part of its multivariate analysis. As an example, consider the study described in Section 9.5, “Multivariate Analysis of Covariance.” Assume you want to know if Saturday sales differ from weekday sales, averaged across the two types of oranges. Because the levels for DAY are coded as 1 through 6 corresponding to Monday through Saturday, you need to construct a contrast that compares the average of the first five levels of DAY to the sixth. The following SAS statement is required:

contrast 'SAT. vs. WEEKDAYS' day .2 .2 .2 .2 .2 -1;

The label SAT. vs. WEEKDAYS appears in the output to identify the contrast. If this CONTRAST statement is appended to the program of the previous section, preceding the MANOVA statement, then Output 9.11 is produced. This statement must precede the MANOVA statement so that PROC GLM will know that the multivariate test for the contrast is wanted.

Output 9.11 Results of Multivariate Analysis: The CONTRAST Statement

The GLM Procedure
Multivariate Analysis of Variance

H = Contrast SSCP Matrix for sat vs weekdays

	q1	q2

q1	9.3680815712	7.8469238548
q2	7.8469238548	6.5727666348

Characteristic Roots and Vectors of: E Inverse * H, where
H = Contrast SSCP Matrix for sat vs weekdays
E = Error SSCP Matrix

Characteristic		Characteristic Vector V'EV=1
Root	Percent	q1	q2

0.02873910	100.00	0.04110205	0.01705466
0.00000000	0.00	-0.02842364	0.03393368

MANOVA Test Criteria and Exact F Statistics for the
Hypothesis of No Overall sat vs weekdays Effect
H = Contrast SSCP Matrix for sat vs weekdays
E = Error SSCP Matrix

S=1 M=0 N=10

Statistic	Value	F Value	Num DF	Den DF	Pr > F

Wilks' Lambda	0.97206377	0.32	2	22	0.7322
Pillai's Trace	0.02793623	0.32	2	22	0.7322
Hotelling-Lawley Trace	0.02873910	0.32	2	22	0.7322
Roy's Greatest Root	0.02873910	0.32	2	22	0.7322

The results of the multivariate tests indicate no significant overall difference between the Saturday and weekday sales for the two types of oranges (Pr > F=0.7322).

9.7 Statistical Background

The multivariate linear model can be written as

Y = XB + U

where

Y	is an nXk matrix of observed values of k dependent variables or responses. Each column corresponds to a specific dependent variable and each row to an observation.
X	is an nXm matrix of n observations on the m independent variables (which may contain dummy variables).
B	is an mXk matrix of regression coefficients or parameters. Each column of B is a vector of coefficients corresponding to each of k dependent variables, and each row contains the coefficients associated with each of m independent variables.
U	is the nXk matrix of the n random errors, with columns corresponding to the dependent variables.

The matrix of estimated coefficients is

$\hat{B} = {(X^{'} X)}^{- 1} X^{'} Y$ $\hat{B} = {(X^{'} X)}^{- 1} X^{'} Y$

Each column of $\hat{B}$ $\hat{B}$ is the vector of estimated coefficients that would be obtained by estimating coefficients for each response variable separately.

The partitioning of sums of squares is parallel to that developed in previous chapters except that the partitions consist of kXk matrices of sums of squares and crossproducts:


Sums of Squares	SSCP Matrix
TOTAL	ÝY
MODEL	${\hat{B}}^{'} X^{'} Y$ ${\hat{B}}^{'} X^{'} Y$
ERROR	$Y^{'} Y - {\hat{B}}^{'} X^{'} Y$ $Y^{'} Y - {\hat{B}}^{'} X^{'} Y$

In the univariate analysis of variance, the F-statistic is the statistic of choice in most cases for testing hypotheses about the factors being considered. Recall that this statistic is derived by taking the ratio of two sums of squares, one derived from the hypothesis being tested and the other derived from an appropriate error term. In multivariate linear models, these sums of squares are replaced by matrices of sums of squares and crossproducts. These matrices are represented by H for the hypothesis, corresponding to the numerator sum of squares, and E for the error matrix, corresponding to the denominator sum of squares. Since division of matrices is not possible, E^-1H is the matrix that is the basis for test statistics for multivariate hypotheses. Four different functions of this matrix are used as test statistics and are available in the GLM, ANOVA, and other multivariate procedures in SAS. Each of these statistics is a function of the characteristic roots (also known as eigenvalues) of the matrix E^-1H. In the formulas below, represents the characteristic roots.

Corresponding to each characteristic root is a characteristic vector, or eigenvector, that represents a linear combination of the dependent variables being analyzed. A function of the characteristic root, λ/(1 + λ), is the value of R² that would be obtained if the linear combination of dependent variables represented by the corresponding characteristic vector were used as the dependent variable in a univariate analysis of the same model. For this reason, that function of the characteristic root is sometimes called the canonical correlation. In the formulas below, $r_{i}^{2}$ $r_{i}^{2}$ represents the canonical correlations.

Hotelling-Lawley Trace		Pillai's Trace
= Tr(E^-1H)		= Tr(H(H + E)^-1)
= $\sum r_{i}^{2} / (1 - r_{i}^{2})$ $\sum r_{i}^{2} / (1 - r_{i}^{2})$		= $\sum r_{i}^{2}$ $\sum r_{i}^{2}$
= $\sum λ i$ $\sum λ i$		= ∑ λ_i / (1 + λ_i

Wilks' Lambda		Roy's Greatest Root
= \|E\| / \|H + E\|		= max λ_i
= $\prod (1 - r_{i}^{2})$ $\prod (1 - r_{i}^{2})$
= $\prod (1 / (1 + λ_{i}))$ $\prod (1 / (1 + λ_{i}))$

Not one of these criteria has been identified as being universally superior to the others, although there are hypothesized situations where one criterion may outperform the others. Because we generally do not know the exact form of the alternative hypotheses being studied, the decision of which test criterion to use often becomes a matter of personal choice. Wilks’ criterion is derived from a likelihood-ratio approach and appeals to some statisticians on those grounds.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9 Multivariate Linear Models

Create new playlist

Sign In

Sign Up

9.1 Introduction

9.2 A One-Way Multivariate Analysis of Variance

9.3 Hotelling’s T2 Test

9.4 A Two-Factor Factorial

9.5 Multivariate Analysis of Covariance

9.6 Contrasts in Multivariate Analyses

9.7 Statistical Background

Table of Contents for
Chapter 9 Multivariate Linear Models

9.3 Hotelling’s T² Test