Chapter 7 Analysis of Covariance

7.1 Introduction

7.2 A One-Way Structure

7.2.1 Covariance Model

7.2.2 Means and Least-Squares Means

7.2.3 Contrasts

7.2.4 Multiple Covariates

7.3 Unequal Slopes

7.3.1 Testing the Heterogeneity of Slopes

7.3.2 Estimating Different Slopes

7.3.3 Testing Treatment Differences with Unequal Slopes

7.4 A Two-Way Structure without Interaction

7.5 A Two-Way Structure with Interaction

7.6 Orthogonal Polynomials and Covariance Methods

7.6.1 A 2×3 Example

7.6.2 Use of the IML ORPOL Function to Obtain Orthogonal Polynomial Contrast Coefficients

7.6.3 Use of Analysis of Covariance to Compute ANOVA and Fit Regression

7.1 Introduction

Analysis of covariance can be described as a combination of the methods of regression and analysis of variance. Regression models use direct independent variables—that is, variables whose values appear directly in the model—for example, the linear regression y=β01x. Analysis of variance models use class variables—that is, the independent variable’s classifications appear in the model—for example, the one-way ANOVA model y=μ+τi. In more theoretical terms, ANOVA class variables set up dummy 0-1 columns in the X matrix (see Chapter 6). Analysis-of-covariance models use both direct and class variables. A simple example combines linear regression and one-way ANOVA, yielding y=μ+τi+βx.

Analysis of covariance uses at least two measurements on each unit: the response variable y, and another variable x, called a covariable. You may have more than one covariable. The basic objective is to use information about y that is contained in x in order to refine inference about the response. This is done primarily in three ways:

❏ In all applications, variation in y that is associated with x is removed from the error variance, resulting in more precise estimates and more powerful tests.

❏ In some applications, group means of the y variable are adjusted to correspond to a common value of x, thereby producing an equitable comparison of the groups.

❏ In other applications, the regression of y on x for each group is of intrinsic interest, either to predict the effect of x on y for each group, or to compare the effect of x on y among groups.

Textbook discussions of covariance analysis focus on the first two points, with the main goal of establishing differences among adjusted treatment means. By including a related variable that accounts for substantial variation in the dependent variable of interest, you can reduce error. This increases the precision of the model parameter estimates. Textbooks discuss separate slopes versus common slope models—that is, covariance models with a separate regression slope coefficient for each treatment group, versus a single slope coefficient for all treatment groups. In these discussions, the main role of the separate slope model is to test for differences in slopes among the treatments. Typically, this test should be conducted as a preliminary step before an analysis of covariance, because, aside from carefully defined exceptions, the validity of comparing adjusted means using the analysis of covariance requires that the slopes be homogeneous. The beginning sections of this chapter present the textbook approach to analysis of covariance

There are broader uses of covariance models, such as the study of partial regression coefficients adjusted for treatment effects. Applied to factorial experiments with qualitative and quantitative factors, covariance models provide a convenient alternative to orthogonal polynomial and related contrasts that are often tedious and awkward. Later sections of this chapter present these methods.

To give a practical definition, analysis of covariance refers to models containing both continuous variables and group indicators (CLASS variables in the GLM procedure). Because CLASS variables create less-than-full-rank models, covariance models are typically more complex and hence involve more difficulties in interpretation than regression-only or ANOVA-only models. These issues are addressed throughout this chapter.

7.2 A One-Way Structure

Analysis of covariance can be applied in any data classification whenever covariables are measured. This section deals with the simplest type of classification, the one-way structure.

7.2.1 Covariance Model

The simplest covariance model is written

yij=μ+τi+β(xijx¯..)+εij

and combines a one-way treatment structure with parameters τi, one independent covariate xij, and associated regression parameter β.

Two equivalent models are

yijk = β0 + τi+ βxij + εijk

where β0=(μβx¯..),

yijk = β0i + βxij+ εijk

where β0i is the intercept for the ith treatment. This expression reveals a model that represents a set of parallel lines; the common slope of the lines is β, and the intercepts are β0i = (β0 + τi). The model contains all the elements of an analysis-of-variance model of less-than-full rank, requiring restrictions on the τi or the use of generalized inverses and estimable functions. Note, however, that the regression coefficient β is not affected by the singularity of the X’X matrix; hence, the estimate of β is unique.

The following example, using data on the growth of oysters, illustrates the basic features of analysis of covariance. The goal is to determine

❏ if exposure to water heated artificially affects growth

❏ if the position in the water column (surface or bottom) affects growth.

Four bags with ten oysters in each bag are randomly placed at each of five stations in the cooling water canal of a power-generating plant. Each location, or station, is considered a treatment and is represented by the variable TRT. Each bag is considered to be one experimental unit. Two stations are located in the intake canal, and two stations are located in the discharge canal, one at the surface (TOP), the other at the bottom (BOTTOM) of each location. A single mid-depth station is located in a shallow portion of the bay near the power plant. The treatments are described below:

Treatment
(TRT)

Station

1

INTAKE-BOTTOM

2

INTAKE-SURFACE

3

DISCHARGE-BOTTOM

4

DISCHARGE-SURFACE

5

BAY

Stations in the intake canal act as controls for those in the discharge canal, which has a higher temperature. The station in the bay is an overall control in case some factor other than the heat difference due to water depth or location is responsible for an observed change in growth rate.

The oysters are cleaned and measured at the beginning of the experiment and then again about one month later. The initial weight and the final weight are recorded for each bag. The data appear in Output 7.1.

Output 7.1 Data for Analysis of Covariance

Obs trt rep initial final
 
1 1 1 27.2 32.6
2 1 2 32.0 36.6
3 1 3 33.0 37.7
4 1 4 26.8 31.0
5 2 1 28.6 33.8
6 2 2 26.8 31.7
7 2 3 26.5 30.7
8 2 4 26.8 30.4
9 3 1 28.6 35.2
10 3 2 22.4 29.1
11 3 3 23.2 28.9
12 3 4 24.4 30.2
13 4 1 29.3 35.0
14 4 2 21.8 27.0
15 4 3 30.3 36.4
16 4 4 24.3 30.5
17 5 1 20.4 24.6
18 5 2 19.6 23.4
19 5 3 25.1 30.3
20 5 4 18.1 21.8

You can address the objectives given above by analysis of covariance. The response variable is final weight, but the analysis must also account for initial weight. You can do this using initial weight as the covariate. The following SAS statements are required to compute the basic analysis:

proc glm;
   class trt;
   model final=trt initial / solution;

The CLASS statement specifies that TRT is a classification variable. The variable INITIAL is the covariate. The MODEL statement defines the model yij = β0 + τi + βxij + εij. Specifying the SOLUTION option requests printing of the coefficient vector.

Results of these statements appear in Output 7.2.

Output 7.2 Results of Analysis of Covariance

The GLM Procedure
 
Dependent Variable: final  
 
  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 5  354.4471767 70.8894353 235.05 <.0001
 
Error 14 4.2223233 0.3015945
 
Corrected Total 19 358.6695000
 
R-Square Coeff Var Root MSE y Mean
 
0.988228 1.780438 0.549176 30.84500
 
Source DF Type I SS Mean Square F Value Pr > F
 
trt 4 198.4070000 49.6017500 164.47 <.0001
initial 1 156.0401767 156.0401767 517.38 <.0001
 
Source DF Type III SS Mean Square F Value Pr > F
 
trt 4 12.0893593 3.0223398 10.02 0.0005
initial 1 156.0401767 156.0401767 517.38 <.0001
 
      Standard    
Parameter   Estimate Error   t Value  Pr > |t|
 
Intercept   2.494859769 B   1.02786287    2.43 0.0293
trt 1 -0.244459378 B 0.57658196    -0.42 0.6780
trt 2 -0.280271345 B 0.49290825    -0.57 0.5786
trt 3 1.654757698 B 0.42943036    3.85 0.0018
trt 4 1.107113519 B 0.47175112    2.35 0.0342
trt 5 0.000000000 B  ⋅            ⋅    ⋅      
initial   1.083179819   0.04762051    22.75 <.0001
 
NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

Consider the Type I and Type III SS (Type II and Type IV would be the same as Type III here). The Type I SS for TRT is the unadjusted treatment sum of squares. The ERROR SS for a simple analysis of variance can be reconstructed by subtracting the Type I SS from the TOTAL SS, for example,

Source

DF

SS

MS

F

TRT

4

198.407

49.602

4.642

ERROR

15

160.263

10.684

 
TOTAL

19

358.670

   

The resulting F-value indicates that p is less than .01. Thus, a simple analysis of variance leads to concluding that statistically significant treatment differences in final weight exist even when initial weights are not considered.

Now compare these results with the analysis of covariance. The Type III TRT SS is 12.089 whereas the Type I TRT SS equals the one-way ANOVA TRT SS of 198.407. The Type III TRT SS reflects differences among treatment means that have been adjusted to a common value of the covariate, INITIAL. In analysis of covariance, the TYPE III TRT SS is the adjusted treatment sum of squares; the Type I TRT SS is the unadjusted treatment sum of squares because it reflects the difference among treatment means prior to adjustment for the covariate. In this example, the unadjusted TRT SS is much larger than the adjusted one. However, the reduction in error mean squares from 10.684 to 0.302 allows an increase in the F-statistic from 4.642 in the simple analysis of variance to 10.02 in Output 7.2. The power of the test for treatment differences increases when the covariate is included because most of the error in the simple analysis of variance is due to variation in INITIAL values.

The last part of Output 7.2 contains the SOLUTION vector. In this one-factor case, the TRT estimates are obtained by setting the estimates for the last treatment (TRT 5) to 0. Therefore, the INTERCEPT estimate is the intercept for TRT 5, and the other four treatment effects are differences between each TRT and TRT 5. Because TRT 5 is the control, the output estimates, standard errors, and t-tests are for treatment versus control. Note that the means of TRT 3 and TRT 4 in the discharge canal differ from TRT 5.

The coefficient associated with INITIAL is the pooled within-groups regression coefficient relating FINAL to INITIAL. The coefficient estimate is a weighted average of the regression coefficients of FINAL on INITIAL, estimated separately for each of the five treatment groups. This coefficient estimates that a difference of 1.083 units in FINAL is associated with a one-unit difference in INITIAL.

7.2.2 Means and Least-Squares Means

A MEANS statement requests the unadjusted treatment means of all continuous (non-CLASS) variables in the model, that is, the response variable and the covariate. You can suppress printing the covariate means by using the DEPONLY option. These means are not strictly relevant to an analysis of covariance unless they are used to determine the effect of the covariance adjustment. The DUNCAN and WALLER options, among others, for multiple-comparisons tests are also available, but they are not useful here.

The LSMEANS (least-squares means) statement produces the estimates that are usually called adjusted treatment means. They are defined as

y¯iβ^(x¯i.x¯..), or, equivalently, β^0 + τ^i + β^x¯

Consistent with the MODEL statement in the SAS statements above, this example uses the latter form. Recall that you also obtain adjusted means for the unbalanced two-way classification using the same LSMEANS statement, which is

lsmeans trt / stderr tdiff;

The TDIFF option requests LSD tests among the adjusted means. You can use the ADJUST= option to obtain alternative multiple comparison tests, depending on the relative seriousness of Type I and Type II error. These issues were discussed in more detail in Section 3.3.3. The LSMEANS and TDIFF results appear in Output 7.3.

Output 7.3 Results of Analysis of Covariance: Adjusted Treatment Means (Least-Squares Means)

Least Squares Means
 
trt final LSMEAN Standard
Error
Pr > |t| LSMEAN
Number
 
1 30.1531125 0.3339174 <.0001 1
2 30.1173006 0.2827350 <.0001 2
3 32.0523296 0.2796295 <.0001 3
4 31.5046854 0.2764082 <.0001 4
5 30.3975719 0.3621988 <.0001 5
 
t for H0: LSMean(i)=LSMean(j) / Pr > |t|
 
i/j  1 2 3 4 5
 
1   0.087941 -4.1466 -3.22289 -0.42398
0.9312 0.0010 0.0061 0.6780
2 -0.08794   -4.76003 -3.55771 -0.56861
  0.9312   0.0003 0.0032 0.5786
3 4.146599 4.76003   1.378002 3.853378
  0.0010 0.0003   0.1898 0.0018
4 3.222892 3.557715 -1.378   2.346817
  0.0061 0.0032 0.1898   0.0342
5 0.42398 0.568608 -3.8533 -2.34682  
  0.6780 0.5786 0.0018 0.0342  
 
NOTE: To ensure overall protection level, only probabilities associated with preplanned comparisons should be used.

The estimated least-squares means are followed by their standard errors, which are printed because of the STDERR option. The t-values and associated significance probabilities for all pairwise tests of treatment differences are printed because of the TDIFF option.

The table below shows the unadjusted and adjusted means for the response variable FINAL and the mean of the covariate, INITIAL for each treatment group:

TRT

FINAL Unadjusted Means Adjusted Least-Squares Means INITIAL Covariate Mean

1

34.475

30.153

29.750

2

31.650

30.117

27.175

3

30.850

32.052

24.650

4

32.225

31.504

26.425

5

25.025

30.398

20.800

Figure 7.1 illustrates the distinction between adjusted and unadjusted means. The five linear regressions for each treatment are parallel, each with a common slope. Four points appear on each regression line: the end points are the predicted values of the response variable FINAL for the minimum and maximum values of INITIAL in the data set. The other two are 1) the predicted value of FINAL at the INITIAL mean for that treatment, and 2) the predicted value of FINAL at the mean value of the covariate INITIAL for the entire data set (shown by the solid vertical line at the mean value of INITIAL = 25.76). The former are the unadjusted sample means of FINAL; the latter are the adjusted, or LS means. The light-shaded vertical lines represent the mean of the covariate INITIAL at each treatment; the light-shaded horizontal lines correspond to the unadjusted sample means.

Figure 7.1 Regressions for Five Oyster Data Treatments Showing Means and LS Means

image

You can see that there are large changes between the unadjusted to adjusted treatment means for the variable FINAL. These changes result from the large treatment differences in the covariable INITIAL. Apparently, the random assignment of oysters to treatments did not result in equal mean initial weights. Some treatments, particularly TRT 5, received smaller oysters than other treatments. This biases the unadjusted treatment means. Computation of the adjusted treatment means is intended to remove the bias.

Note: Although the purpose of adjusted means is to remove bias resulting from unequal covariate means among the treatments, adjusted means are not always appropriate. The basic rule is that if the covariate means themselves depend on the treatments, adjustment is likely to be misleading, whereas if there is no reason to believe covariate means depend on treatment, failing to adjust is likely to be misleading. In the oyster growth example, there is no reason initial weights of oysters should depend on TRT. Therefore, you should use adjusted means. Consider, however, a typical example in plant breeding. Plant yield is affected by the population density of the plants. Accounting for plant density is essential to reduce error variance and hence allow manageable experiments to provide adequate power and precision. However, different plant varieties have inherently different plant densities, for a number of well-known reasons. Adjusting mean yield to a common plant density would distort differences among varieties you would actually see under realistic conditions. In this case, you should use unadjusted means. However, analysis of covariance is still useful because it improves precision.

You can use ESTIMATE statements to provide further insight into the mean and LS mean. The following SAS statements illustrate treatments 1 and 2 and their difference:

estimate 'trt 1 adj mean'
     intercept 1 trt 1 0 0 0 0 initial 25.76;
estimate 'trt 2 adj mean'
     intercept 1 trt 0 1 0 0 0 initial 25.76;
estimate 'adj trt diff' trt 1 -1 0 0 0;

The overall mean of the covariable INITIAL is x¯ =25.76, hence ESTIMATE computes the adjusted mean adjusted mean β^0 + τ^i + β^(25.76). Because β^0 and β^x¯ are the same for all adjusted means, the adjusted treatment difference is τ^1τ^2. The unadjusted means, y¯i estimate β0τiβx¯i where x¯i is the sample mean of the covariate for the ith treatment. Use the following ESTIMATE statements to compute the unadjusted means:

estimate 'trt 1 unadj mean'
      intercept 1 trt 1 0 0 0 0 initial 29.75;
estimate 'trt 2 unadj mean'
      intercept 1 trt 0 1 0 0 0 initial 27.175;
estimate 'unadj diff' trt 1 -1 0 0 0 initial 2.575;

For the unadjusted means, β^x¯i is different for each treatment, so the unadjusted treatment difference estimates τ^1τ^2 + β^(x¯1x¯2). This shows how the unadjusted means are confounded with the β^x¯i.. The results of both sets of ESTIMATE statements appear in Output 7.4.

Output 7.4 ESTIMATE Statements for Adjusted and Unadjusted Means

    Standard    
Parameter Estimate Error   t Value  Pr > |t|
 
trt 1 adj mean 30.1531125 0.33391743 90.30 <.0001
trt 2 adj mean 30.1173006 0.28273504 106.52 <.0001
adj diff 0.0358120 0.40722674 0.09 0.9312
trt 1 unadj mean 34.4750000 0.27458811 125.55 <.0001
trt 2 unadj mean 31.6500000 0.27458811 115.26 <.0001
unadj diff 2.8250000 0.38832623 7.27 <.0001

You can see that the apparent difference between treatments 1 and 2 in unadjusted means results from their different mean initial weights. When these are adjusted, the difference disappears. Notice that, with an equal number of observations per treatment, the standard errors of the unadjusted treatment means are equal, whereas for the adjusted means, they depend on the difference between x¯i and x¯. As a final note, you can modify the LSMEANS statement to obtain results similar to Output 7.3, but for the unadjusted means. You use the statement

lsmeans trt / bylevel stderr tdiff;

The BYLEVEL option causes x¯i to be used in place of x¯i in computing the LS mean. These results are not shown, but would simply complete the results in Output 7.4 for the unadjusted means and differences.

7.2.3 Contrasts

This section illustrates comparing means with contrasts, using the oyster growth example discussed in Section 7.2.1, “Covariance Model.” The five treatments can also be looked upon as a 2×2 factorial (BOTTOM/TOPXINTAKE/DISCHARGE) plus a CONTROL. The adjusted treatment means from the analysis of covariance can be analyzed further with four orthogonal contrasts implemented by the following CONTRAST statements:

contrast 'CONTROL VS. TREATMENT' TRT -1 -1 -1 -1  4;
contrast 'BOTTOM VS. TOP'        TRT -1  1 -1  1  0;
contrast 'INTAKE VS. DISCHARGE'  TRT -1 -1  1  1  0;
contrast 'BOT/TOP*INT/DIS'       TRT  1 -1 -1  1  0;

The output that results from these statements follows the partitioning of sums of squares in Output 7.5. Note that the only significant contrast is INTAKE VS. DISCHARGE. Also, note that these are comparisons among adjusted means. If the objectives of your study compel you to define contrasts among unadjusted means, these must include the contribution from the β^x¯i, which do not cancel out as do the β^x¯ for the adjusted means. For example, for the first contrast above, CONTROL VS. TREATMENT, you need to include β^(x¯1x¯2x¯3x¯4+4x¯5)=β^(24.85) and thus the required SAS statement

contrast ‘CTL V TRT UNADJUSTED’
   TRT -1 –1 –1 –1 4 INITIAL –24.85;

Notice that the significance level of the unadjusted contrast far exceeds that of the adjusted CONTROL VS. TREATMENT contrast given the large discrepancy between adjusted and unadjusted means, especially for the CONTROL, noted earlier.

Equivalent results can be obtained with the ESTIMATE statement, which also gives the estimated coefficients for the contrasts. All options for CONTRAST and ESTIMATE statements discussed in Chapter 3, “Analysis of Variance for Balanced Data,” and in Chapter 6, “Understanding Linear Models Concepts,” apply here. Although constructed to be orthogonal, these contrasts are not orthogonal to the covariable; hence, their sums of squares do not add to the adjusted treatment sums of squares.

Output 7.5 Results of Analysis of Covariance: Orthogonal Contrasts Plus an Unadjusted Example

Contrast  DF   Contrast SS   Mean Square  F Value  Pr > F
 
CONTROL VS. TREATMENT 1 0.52000411 0.52000411 1.72 0.2103
BOTTOM VS. TOP 1 0.33879074 0.33879074 1.12 0.3071
INTAKE VS. DISCHARGE 1 8.59108077 8.59108077 28.49 0.0001
BOT/TOP*INT/DIS 1 0.22934155 0.22934155 0.76 0.3979
CTL V TRT UNADJ 1 169.9923582 169.9923582 563.65 <.0001

7.2.4 Multiple Covariates

Multiple covariates are specified as continuous (non-CLASS) variables in the MODEL statement. If the CLASS variable is designated as the first independent variable, the Type I sums of squares for individual covariates can be added to get the adjusted sums of squares due to all covariates. The Type III sums of squares are the fully adjusted sums of squares for the individual regression coefficients as well as those for the adjusted treatment means.

7.3 Unequal Slopes

Section 7.2 presented analysis of covariance assuming the equal slopes model. The unequal slopes model is a natural extension of covariance analysis. Both models are usually applied to data characterized by treatment groups and one or more covariates. The unequal slopes model allows you to test for heterogeneity of slopes—that is, it tests whether or not the regression coefficients are constant over groups. The analysis in Section 7.2 assumes constant regression coefficients and is invalid if this assumption fails. You can draw valid inference from the unequal slopes model, but it requires considerable care. This section presents the test for heterogeneity of slopes and inference strategies appropriate for the unequal slopes model.

Extending the one independent variable (covariate) and one-way treatment structure used in Section 7.2, the unequal slopes analysis-of-covariance model can be written

yij = β0i + β1ixij + ε

where i denotes the treatment group. The hypothesis of equal slopes is

H0: β1i = β1i' for all ii

An alternate formulation of the model is

yij = β0 + αi + β1xij + δixij + ε

where β0 and β1 are overall intercept and slope coefficients, and αi and δi are coefficients for the treatment effect on intercept and slope, respectively, thus comparing the formulations of the model, β0i = β0 + αi and β1i = β1 + δi. Under the alternate formulation, the hypothesis of equal slopes becomes

H0: δi = 0 for all i = 1, 2,...,t

Note that any possible intercept differences are irrelevant to both hypotheses.

Regression relationships that differ among treatment groups actually reflect an interaction between the treatment groups and the covariates. In fact, the GLM procedure specifies and analyzes this phenomenon as an interaction. Thus, if you use the following statements, the expression X*A produces the appropriate statistics for estimating different regressions of Y on X for the different values, or classes, specified by A:

proc glm;
   class a;
   model y=a x x*a / solution;

This MODEL fits the formulation yij = β0 + αi + β1xij + δixij + ε. The αi correspond to A, β1xij corresponds to X, and δixij corresponds to X*A. In this application, the Type I sums of squares for this model provide the most useful information:

X

is the sum of squares due to a single regression of Yon X, ignoring the group.

A

is the sum of squares due to different intercepts (adjusted treatment differences), assuming equal slopes.

X*A

is an additional sum of squares due to different regression coefficients for the groups specified by the factor A.

The associated sequence of tests provides a logical stepwise analysis to determine the most appropriate model. Equivalent results can also be obtained by fitting the nested effects formulation yij = β0i + β1i + ε. Use the following statements:

proc glm;
   class a;
   model y=a x(a) / noint solution;

Here, the β0i correspond to A, and the β1i correspond to X(A). This formulation is more convenient than the alternative for obtaining estimates of the slopes for each treatment group. You can write a CONTRAST statement that generates a test equivalent to X*A in the previous model. However, X(A) does not test for the heterogeneity of slopes. Instead, it tests the hypothesis that all regression coefficients are 0. Also, A tests the hypothesis that all intercepts are equal to 0. For models like this one, the Type III (or Type IV) sums of squares have little meaning; it is not instructive to consider the effect of the CLASS variable over and above the effect of different regressions.

7.3.1 Testing the Heterogeneity of Slopes

This section uses the oyster growth data from Output 7.1 to demonstrate the test for the homogeneity of slopes. In a practical situation, you would do this testing before proceeding to the inference based on equal slopes shown in Section 7.2. Use the following SAS statements to obtain the relevant test statistics:

proc glm;
 class trt;
 model final=trt initial trt*initial;

Output 7.6 shows the results.

Output 7.6 Unequal Slopes Analysis of Covariance for Oyster Growth Data

  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 9 355.8354908 39.5372768 139.51 <.0001
Error 10 2.8340092 0.2834009    
Corrected Total 19 358.6695000      
 
Source DF Type I SS  Mean Square F Value Pr > F
 
initial 1 342.3578175 342.3578175 1208.03 <.0001
trt 4 12.0893593 3.0223398 10.66 0.0012
initial*trt 4 1.3883141 0.3470785 1.22 0.3602

The Type I sums of squares show that

❏ the INITIAL weight has an effect on FINAL weight (F=1208.03, p<0.0001)

❏ TRT has an effect on FINAL weight, at any given INITIAL weight (F=10.66, p=0.0012)

❏ there is no significant difference in the INITIAL/FINAL relationship among the different levels of TRT (F=1.22, p=0.3602). That is, there is no evidence to contradict the null hypothesis of homogeneous slopes.

The last result is especially important, as it validates the analysis using the equal slopes model. The first two results essentially reiterate the results in Section 7.2.

7.3.2 Estimating Different Slopes

In many cases, the individual regression slopes for each treatment contain useful information. Output 7.7 contains data to illustrate this. The data are from a study of the relationship between the price of oranges and sales per customer. The hypothesis is that sales vary as a function of price differences for different stores (STORE) and days of the week (DAY). The price is varied daily for two varieties of oranges. The variables P1 and P2 denote the prices for the two varieties, respectively. Q1 and Q2 are the sales per customer of the corresponding varieties.

Output 7.7 Orange Sales Data

Obs STORE DAY P1 P2   Q1   Q2
 
1 1 1 37 61 11.3208 0.0047
2 1 2 37 37 12.9151 0.0037
3 1 3 45 53 18.8947 7.5429
4 1 4 41 41 14.6739 7.0652
5 1 5 57 41 8.6493 21.2085
6 1 6 49 33 9.5238 16.6667
7 2 1 49 49 7.6923 7.1154
8 2 2 53 53 0.0017 1.0000
9 2 3 53 45 8.0477 24.2176
10 2 4 53 53 6.7358 2.9361
11 2 5 61 37 6.1441 40.5720
12 2 6 49 65 21.7939 2.8324
13 3 1 53 45 4.2553 6.0284
14 3 2 57 57 0.0017 2.0906
15 3 3 49 49 11.0196 13.9329
16 3 4 53 53 6.2762 6.5551
17 3 5 53 45 13.2316 10.6870
18 3 6 53 53 5.0676 5.1351
19 4 1 57 57 5.6235 3.9120
20 4 2 49 49 14.9893 7.2805
21 4 3 53 53 13.7233 16.3105
22 4 4 53 45 6.0669 23.8494
23 4 5 53 53 8.1602 4.1543
24 4 6 61 37 1.4423 21.1538
25 5 1 45 45 6.9971 6.9971
26 5 2 53 45 5.2308 3.6923
27 5 3 57 57 8.2560 10.6679
28 5 4 49 49 14.5000 16.7500
29 5 5 53 53 20.7627 15.2542
30 5 6 53 45 3.6115 21.5442
31 6 1 53 53 11.3475 4.9645
32 6 2 53 45 9.4650 11.7284
33 6 3 53 53 22.6103 14.8897
34 6 4 61 37 0.0020 19.2000
35 6 5 49 65 20.5997 2.3468
36 6 6 37 37 28.1828 17.9543

In this example, consider the variety 1 only—that is, the response variable, sales, for variety 1 is Q1 and the covariable, price, is P1. Examples in Section 7.4 also involve variety 2. Here, you compute the unequal slopes covariance model using the SAS statements

proc glm;
 class day;
 model q1=p1 day p1*day/solution;

Output 7.7 shows the results of this analysis.

Output 7.8 Unequal Slopes Analysis of Covariance for Orange Sales Data

  Sum of   
Source DF Squares  Mean Square F Value Pr > F
 
Model 11  1111.522562  101.047506 4.64 0.0008
Error 24 522.153228 21.756384
Corrected Total 35 1633.675790
 
Source DF Type I SS  Mean Square F Value Pr > F
 
P1 1 516.5921408 516.5921408 23.74 <.0001
DAY 5 430.5384175 86.1076835 3.96 0.0093
P1*DAY 5 164.3920040 32.8784008 1.51 0.2236
 
      Standard    
Parameter   Estimate Error   t Value  Pr > |t|
 
Intercept   73.27263578 B   13.48373708 5.43 <.0001
P1   -1.22521164 B 0.26520396 -4.62 0.0001
DAY 1   -54.59714671 B 19.73545845 -2.77 0.0107
DAY 2 -34.78570099 B 20.25105926 -1.72 0.0987
DAY 3 -27.94295765 B 29.42842946 -0.95 0.3518
DAY 4 -24.12342640 B 21.39334761 -1.13 0.2706
DAY 5 4.62631110 B 30.62842608 0.15 0.8812
DAY 6 0.00000000 B ⋅         ⋅    ⋅      
P1*DAY 1 1.00474758 B 0.39410534 2.55 0.0176
P1*DAY 2 0.60164207 B 0.39876566 1.51 0.1444
P1*DAY 3 0.61415851 B 0.57034268 1.08 0.2923
P1*DAY 4 0.42959726 B 0.41510986 1.03 0.3110
P1*DAY 5 0.02936476 B 0.57034268 0.05 0.9594
P1*DAY 6 0.00000000 B ⋅         ⋅    ⋅      

From Output 7.8, you can see that there is no evidence to reject the null hypothesis of equal slopes (F for P1*DAY is 1.51 and p=0.2236). Ordinarily, you would then proceed as with the oyster growth data, using an equal slopes model. The Type I results here indicate a significant effect of price (P1) on sales and a significant effects of DAY on sales at any given price. For these data, however, a closer look at the estimates of the daily regression coefficients reveals additional information. Although the differences in the daily regressions are not statistically significant, it is instructive to look at their estimates.

The estimated daily regression slope coefficients are β^1i=β^1+δ^i.. The estimate corresponding to P1 is β^1 and P1*DAY is δ^i.. For example, for DAY 1, the estimated slope is β^11=β^1+δ^1=1.2252+1.0047=0.2205.. You can use ESTIMATE statements to obtain the daily regression coefficients:

estimate 'P1:DAY 1' p1 1 p1*day 1 0 0 0 0 0;
estimate 'P1:DAY 2' p1 1 p1*day 0 1 0 0 0 0;
estimate 'P1:DAY 3' p1 1 p1*day 0 0 1 0 0 0;
estimate 'P1:DAY 4' p1 1 p1*day 0 0 0 1 0 0;
estimate 'P1:DAY 5' p1 1 p1*day 0 0 0 0 1 0;
estimate 'P1:DAY 6' p1 1 p1*day 0 0 0 0 0 1;

The results appear in Output 7.9.

Output 7.9 Estimated Regression Coefficients for Each TRT

    Standard    
Parameter Estimate  Error    t Value  Pr > |t|
 
P1:DAY 1 -0.22046406 0.29152337 -0.76 0.4569
P1:DAY 2 -0.62356957 0.29779341 -2.09 0.0470
P1:DAY 3 -0.61105313 0.50493329 -1.21 0.2380
P1:DAY 4 -0.79561438 0.31934785 -2.49 0.0200
P1:DAY 5 -1.19584688 0.50493329 -2.37 0.0263
P1:DAY 6 -1.22521164 0.26520396 -4.62 0.0001

Note that these estimated coefficients are larger in absolute value toward the end of the week. This is quite reasonable given the higher level of overall sales activity near the end of the week, which may result in a proportionately larger response in sales to changes in price. Thus, it is likely that a coefficient specifically testing for a linear trend in price response during the week would be significant.

You can obtain the results in Output 7.7 more conveniently using the nested-effects formation, whose SAS statements are

proc glm;
 class day;
 model q1=day p1(day)/noint solution;
 contrast 'equal slopes' p1(day) 1 0 0 0 0 -1,
                         p1(day) 0 1 0 0 0 -1,
                         p1(day) 0 0 1 0 0 -1,
                         p1(day) 0 0 0 1 0 -1,
                         p1(day) 0 0 0 0 1 -1;

The CONTRAST statement contains one independent comparison of the daily regressions per degree of freedom among daily regressions. In this case, there are 6 days and hence 5 DF. If all five comparisons are true, then H0: all β1. equal must be true. Hence the contrast is equivalent to the test generated by P1*DAY in Output 7.8. Output 7.10 contains the results.

Output 7.10 Analysis of Orange Sales Data Using a Nested Covariance Model

Source DF Type I SS  Mean Square F Value Pr > F
 
DAY 6 4008.414213 668.069035 30.71 <.0001
P1(DAY) 6 861.125290 143.520882 6.60 0.0003
 
Source DF Type III SS  Mean Square F Value Pr > F
 
DAY 6 1250.581757 208.430293 9.58 <.0001
P1(DAY) 6 861.125290 143.520882 6.60 0.0003
 
Contrast DF Contrast SS   Mean Square   F Value   Pr > F
 
equal slopes  5 164.3920040 32.8784008 1.51 0.2236
 
      Standard    
Parameter   Estimate Error   t Value  Pr > |t|
 
DAY 1   18.67548906   14.41100810 1.30 0.2073
DAY 2 38.48693478 15.10940884 2.55 0.0177
DAY 3 45.32967813 26.15762403 1.73 0.0959
DAY 4 49.14920937 16.60915881 2.96 0.0068
DAY 5 77.89894687 27.50071487 2.83 0.0092
DAY 6 73.27263578 13.48373708 5.43 <.0001
P1(DAY) 1 -0.22046406 0.29152337 -0.76 0.4569
P1(DAY) 2 -0.62356957 0.29779341 -2.09 0.0470
P1(DAY) 3 -0.61105313 0.50493329 -1.21 0.2380
P1(DAY) 4 -0.79561437 0.31934785 -2.49 0.0200
P1(DAY) 5 -1.19584687 0.50493329 -2.37 0.0263
P1(DAY) 6 -1.22521164 0.26520396 -4.62 0.0001

As noted earlier, the Type I and Type III sums of squares provide no useful information. P1(DAY) tests the null hypothesis that all daily regression slopes are equal to zero. You can see that the contrast for equal slopes is identical to the equal slopes test given by P1*DAY in Output 7.8. The parameter estimate is the most useful output. You can see that the daily regression coefficients, P1(DAY), are identical to the estimates obtained via the ESTIMATE statements in Output 7.9, although Output 7.10 is easier to compute because no ESTIMATE statements are required. Output 7.9 also gives the intercept of the regressions for each day. For example, using DAY 1 and P1(DAY) 1, you can see that the regression for DAY 1 is Q1 = 18.675 – 0.2205×P1.

Using more than one independent variable is straightforward and can determine which variables have a different coefficient for each treatment group. More complex designs are not difficult to implement but may be difficult to interpret.

7.3.3 Testing Treatment Differences with Unequal Slopes

Tests among adjusted means with the equal slopes model apply to all values of the covariable. For example, Section 7.2.2 showed that the difference between the adjusted mean of TRT 1 and 2 is (β + τ1 + β x¯) – (β + τ2 + β x¯) = τ1–τ2. If you use any other value of the covariable in the expression, the difference is still τ1–τ2. However, for the unequal slope model, the adjusted mean is β + σi + β1ix, and hence the difference depends on x. Typically is used for x. If you compare TRT 1 and 2 at x¯, the difference is α1α2+(β11β12)x¯. If you evaluate the treatment difference at a different value of x, the difference changes. Thus, in the unequal slopes model, you can compare treatments, but only conditional upon a specified value of the covariable.

PROC GLM and PROC MIXED offer a great deal of flexibility for unequal slopes models, but you must be careful, because some defaults do not necessarily result in sensible tests.

Output 7.11 contains the Type III sums of squares and the LS means for the unequal slopes models for the orange sales data. The SAS statements are

proc glm;
 class day;
 model q1=p1 day p1*day;
 lsmeans day;

Output 7.11 Type III SS and Default LS Means for Unequal Slopes Covariance Analysis of Orange Sales Data

Source DF Type III SS  Mean Square F Value Pr > F
 
P1 1 554.7860985 554.7860985 25.50 <.0001
DAY 5 201.1717701 40.2343540 1.85 0.1412
P1*DAY 5 164.3920040 32.8784008 1.51 0.2236
 
Least Squares Means
 
DAY Q1 LSMEAN
 
1 7.3828299
2 6.5463159
3 14.0301792
4 8.3960731
5 16.6450125
6 10.5145730

The least-squares means are computed using the overall covariable mean, x¯, which for these data is equal to 51.222. You can use the AT MEANS option with the LSMEANS statement to print the value of the covariable means being used. The SAS statement is

lsmeans day/at means;

Output 7.12 shows the results.

Output 7.12 Least-Squares Means for Orange Sales Data Using the AT MEANS Option

 
Least Squares Means at P1=51.22222
 
DAY Q1 LSMEAN
 
1 7.3828299
2 6.5463159
3 14.0301792
4 8.3960731
5 16.6450125
6 10.5145730

Suppose you want to test the null hypothesis of equal adjusted means—that is, among the means adjusted to a common value x¯.. Clearly, the Type I sum of squares for DAY from Output 7.8 does not do this: it tests means for a common slope, not adjusted for P1*DAY, the differences among slopes. What about the Type III sum of squares for DAY in Output 7.11? The Type III sum of squares tests the means adjusted not to x¯ but to x=0. You can see this from the following CONTRAST statement:

contrast 'trt' day 1 -1 0 0 0 0,
               day 1 0 -1 0 0 0,
               day 1 0 0 -1 0 0,
               day 1 0 0 0 -1 0,
               day 1 0 0 0 0 -1;

Output 7.13 shows the results.

Output 7.13 Contrast Testing the Equality of Sales Means Adjusted to Covariable Price=0

Contrast DF Contrast SS  Mean Square F Value Pr > F
 
trt 5 201.1717701 40.2343540 1.85 0.1412

You can see that the F=1.85 and p-value of 0.1412 are identical to the Type III SS results in Output 7.11.

You can use the following LSMEANS statement to compute adjusted means at x=0 to see what you are testing:

lsmeans day/at p1=0;

Output 7.14 shows the results.

Output 7.14 Orange Sales Adjusted Means at Price Covariable=0

Least Squares Means at P1=0
 
DAY Q1 LSMEAN
 
1 18.6754891
2 38.4869348
3 45.3296781
4 49.1492094
5 77.8989469
6 73.2726358

Recalling Output 7.10, these adjusted means are in fact the intercepts of the separate daily regression equations. Testing makes no sense in this context, because the oranges are not going to be sold for a price P1=0—that is, they are not going to be given away.

You can test the adjusted means at x¯ using the following CONTRAST statement:

contrast 'trt' day 1 -1 0 0 0 0
   p1*day 51.2222 -51.2222 0 0 0 0,
    day 1 0 -1 0 0 0 p1*day 51.2222 0 -51.2222 0 0 0,
    day 1 0 0 -1 0 0 p1*day 51.2222 0 0 -51.2222 0 0,
    day 1 0 0 0 -1 0 p1*day 51.2222 0 0 0 -51.2222 0,
    day 1 0 0 0 0 -1 p1*day 51.2222 0 0 0 0 -51.2222;

This statement uses a set of independent comparisons for the unequal slopes model difference, whose form is α1α2+(β11β12)x¯. The number of comparisons in the contrast equals the DF for DAY. Output 7.15 gives the results.

Output 7.15 Contrast to Test the Equality of Means Adjusted to Mean Covariable Price=51.22

Contrast DF Contrast SS  Mean Square F Value Pr > F
 
trt 5 376.3758925 75.2751785 3.46 0.0170

You can change the value of the AT P1= option in the LSMEANS statement and the coefficients for P1*DAY in the above contrast to test the equality of the adjusted means at any value of the covariable deemed reasonable. Alternatively, you can center the covariable in the DATA step. For example, you can define a new covariable X=P1–51.22 and use X in place of P1 in the analysis. The default Type III sum of squares tests the equality of adjusted means at X=0, which corresponds to P1=51.22, the overall mean. The crucial thing to keep in mind with unequal slopes models is that the treatment difference changes with the covariable, so the test is only valid if it is done at a value of the covariable agreed to be reasonable.

7.4 A Two-Way Structure without Interaction

Analysis of covariance can be applied to other experimental and treatment structures. This section illustrates covariance analysis of a two-factor factorial experiment with two covariates.

This example uses the data from the study of the relationship between the price of oranges and sales per customer described in Section 7.3 and presented in Output 7.7. Recall that the data set had two varieties. The section uses only response variable Q1, sales per customer for the first variety to illustrate the main ideas. You can easily adapt these methods to Q2, the response variable for the second variety.

A model for the sales of oranges for variety 1 is

Q1 = μ + τi + δj + β1P1 + β2P2 + e

where

Q1

is the sales per customer for the first variety.

τi

is the effect of the ith STORE, i = 1, 2, . . . , 6.

δj

is the effect of the jth DAY, j = 1, 2, . . . , 6.

β1

is the coefficient of the relationship between sales Q1 and P1 (the price of one variety of oranges).

β2

is the coefficient of the relationship between Q1 and P2 (the price of the other variety of oranges).

e

is the random error term.

Note that because there is no replication, the interaction between STORE and DAY must be used as the error term. In this example, the primary focus is on the influence of price, P1 and P2, and on sales, Q1. The DAY and STORE differences are of secondary importance.

To implement the model, use following SAS statements:

proc glm;
   class store day;
   model q1 q2=store day p1 p2 / solution;
   lsmeans day / stderr;

The results appear in Output 7.16.

Output 7.16 Results of Analysis of Covariance: Two-Way Structure without Interaction

The GLM Procedure
 
Dependent Variable: q1  
 
  Sum of  
Source DF Squares  Mean Square  F Value  Pr > F
 
Model 12  1225.367548  102.113962 5.75 0.0002
 
Error 23 408.308242 17.752532
 
Corrected Total 35 1633.675790
 
R-Square Coeff Var Root MSE y Mean
 
0.750068 41.23842 4.213375 10.21711
 
Source DF Type I SS Mean Square F Value Pr > F
 
tore 5 313.4198071 62.6839614 3.53 0.0163
day 5 250.3972723 50.0794545 2.82 0.0396
p1 1 622.0082168 622.0082168 35.04 <.0001
p2 1 39.5422519 39.5422519 2.23 0.1492
 
  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
store 5 223.8326734 44.7665347 2.52 0.0583
day 5 433.0968700 86.6193740 4.88 0.0035
p1 1 538.1688512 538.1688512 30.32 <.0001
p2 1 39.5422519 39.5422519 2.23 0.1492
 
      Standard    
Parameter   Estimate Error   t Value  Pr > |t|
 
Intercept   51.69987930 B   9.79103443 5.28 <.0001
store 1 -7.64532641 B 2.69194414 -2.84 0.0093
store 2 -5.60226472 B 2.46416942 -2.27 0.0327
store 3 -7.36284806 B 2.46416942 -2.99 0.0066
store 4 -4.36498239 B 2.48754952 -1.75 0.0926
store 5 -5.02052157 B 2.43612208 -2.06 0.0508
store 6 0.00000000 B ⋅         ⋅    ⋅      
day 1 -5.83036664 B 2.51932754 -2.31 0.0299
day 2 -4.89997548 B 2.44708866 -2.00 0.0572
day 3 2.26978922 B 2.54028189 0.89 0.3808
day 4 -2.65249315 B 2.44667751 -1.08 0.2895
day 5 4.04702055 B 2.55655852 1.58 0.1271
day 6 0.00000000 B ⋅         ⋅    ⋅      
p1 -0.83036470   0.15081334 -5.51 <.0001
p2 0.14884706   0.09973319 1.49 0.1492
 

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable

 
Least Squares Means
 
    Standard  
DAY Q1 LSMEAN Error  Pr > |t|
 
1   5.5644154   1.7680833 0.0045
2 6.4948065 1.7289585 0.0010
3 13.6645712 1.7515046 <.0001
4 8.7422889 1.7339197 <.0001
5 15.4418026 1.7858085 <.0001
6 11.3947820 1.7667260 <.0001

In addition to the details previously discussed, these also are of interest:

❏ The Type I SS for P1 and P2 can be summed to obtain a test for the partial contribution of both prices:

F2,23=[(622.01+39.54)/2]/(17.75)=74.54

❏ The Type III SS show that all effects are highly significant except P2, the price of the competing orange.

❏ Each coefficient estimate is the mean difference between each CLASS variable value (STORE, DAY) and the last CLASS variable value, because there is no interaction.

❏ The P1 coefficient is negative, indicating the expected negatively sloping price response (demand function). The P2 coefficient, although not significant, has the expected positive sign for the price response of a competing product.

❏ Least-squares means are requested only for DAY, which shows the expected higher sales toward the end of the week.

Contrasts and estimates of linear functions could, of course, be requested with this analysis.

7.5 A Two-Way Structure with Interaction

The most complex covariance model discussed in this chapter is a two-factor factorial with two stages of subsampling. Output 7.17 shows data from a study whose objective is to estimate y, the weight of usable lint from x, the total weight of cotton bolls. In addition, the researcher wants to see if lint estimation is affected by varieties of cotton (VARIETY) and the distance between planting rows (SPACING), using x, the boll weight (BOLLWT), as a covariate in the analysis of y, the lint weight. The study is a factorial experiment with two levels of VARIETY (37 and 213) and two levels of SPACING (30 and 40). There are two plants for each VARIETYXSPACING treatment combination, and there are from five to nine bolls per plant (PLANT).

Output 7.17 Data for Analysis of Covariance: Two-Way Structure with Interaction

Obs variety spacing plant bollwt lint
 
1  37 30 3 8.4 2.9
2  37 30 3 8.0 2.5
3  37 30 3 7.4 2.7
4  37 30 3 8.9 3.1
5  37 30 5 5.6 2.1
6  37 30 5 8.0 2.7
7  37 30 5 7.6 2.5
8  37 30 5 5.4 1.5
9  37 30 5 6.9 2.5
10  37 40 3 4.5 1.3
11  37 40 3 9.1 3.1
12  37 40 3 9.0 3.1
13  37 40 3 8.0 2.3
14  37 40 3 7.2 2.2
15  37 40 3 7.6 2.5
16  37 40 3 9.0 3.0
17  37 40 3 2.3 0.6
18  37 40 3 8.7 3.0
19  37 40 5 8.0 2.6
20  37 40 5 7.2 2.5
21  37 40 5 7.6 2.4
22  37 40 5 6.9 2.2
23  37 40 5 6.9 2.5
24  37 40 5 7.6 2.4
25  37 40 5 4.7 1.4
26 213 30 3 4.6 1.7
27 213 30 3 6.8 1.7
28 213 30 3 3.5 1.3
29 213 30 3 2.4 1.0
30 213 30 3 3.0 1.0
31 213 30 5 2.8 0.5
32 213 30 5 3.6 0.9
33 213 30 5 6.7 1.9
34 213 40 0 7.4 2.1
35 213 40 0 4.9 1.0
36 213 40 0 5.7 1.0
37 213 40 0 3.0 0.7
38 213 40 0 4.7 1.5
39 213 40 0 5.0 1.3
40 213 40 0 2.8 0.4
41 213 40 0 5.2 1.2
42 213 40 0 5.6 1.0
43 213 40 3 4.5 1.0
44 213 40 3 5.6 1.2
45 213 40 3 2.0 0.7
46 213 40 3 1.2 0.2
47 213 40 3 4.2 1.2
48 213 40 3 5.3 1.2
49 213 40 3 7.0 1.7

The model for the analysis is

yijkl = μ + vi + τi + (vτ)ij + γ(vτ)ijk + βxijkl + εijkl

where

yijkl

equals the weight of the lint for the ith VARIETY, jth SPACING, kth PLANT of the (i, j)th cell, and lth(i,j)boll of each plant.

μ

is the intercept.

vi

is the effect of the ith VARIETY.

τj

is the effect of the jth SPACING.

(vτ)ij

is the VARIETY×SPACING interaction.

γ(vτ)ijk

is the effect of the kth plant in the (i,j) th VARIETY and SPACING combination.

xijkl

is the total weight of each boll, the covariate.

β

is the regression effect of the covariate.

εijkl

equals the error variation among bolls within plants.

The primary focus of this study is on estimating lint weight from boll weight (that is, the regression) and only later in determining if this relationship is affected by VARIETY and SPACING factors. In the SAS program to analyze the data, the order of variables in the MODEL statement is changed so that the Type I sums of squares provide the appropriate information:

proc glm;
 class variety spacing plant;
 model lint=bollwt variety spacing variety*spacing
    plant(variety*spacing) / solution;
 random plant(variety*spacing)/test;

Note that the RANDOM statement with the TEST option has been added because the plant-to-plant variation provides the appropriate error term. Results of the analysis appear in Output 7.18.

Because PLANT(VARIETY*SPACING) is a random effect, an alternative is to use PROC MIXED. You use the following SAS statements:

proc mixed;
 class variety spacing plant;
 model lint=bollwt variety spacing
       variety*spacing/solution;
 random plant(variety*spacing);

The results for this analysis appear in Output 7.19. Littell et al. (1996) discuss analysis of covariance for mixed models in much greater detail.

Output 7.18 Results of Analysis of Covariance: Two-Way Structure with Interaction

Dependent Variable: lint  
 
  Sum of  
Source DF Squares  Mean Square  F Value  Pr > F
 
Model 8 31.16009287 3.89501161 80.70 <.0001
Error 40 1.93051938 0.04826298
Corrected Total 48 33.09061224
 
Source DF Type I SS  Mean Square F Value Pr > F
 
 bollwt 1 29.06931406 29.06931406 602.31 <.0001
 variety 1 1.26353553 1.26353553 26.18 <.0001
 spacing 1 0.46664798 0.46664798 9.67 0.0034
 variety*spacing 1 0.09326994 0.09326994 1.93 0.1722
 plant(variet*spacin) 4 0.26732535 0.06683134 1.38 0.2565
 
Source DF Type III SS  Mean Square F Value Pr > F
 
bollwt 1 11.11855999 11.11855999 230.37 <.0001
variety 1 0.94242614 0.94242614 19.53 <.0001
spacing 1 0.37483940 0.37483940 7.77 0.0081
variety*spacing 1 0.04785515 0.04785515 0.99 0.3253
plant(variet*spacin) 4 0.26732535 0.06683134 1.38 0.2565
 
Tests of Hypotheses for Mixed Model Analysis of Variance
 
Source DF Type III SS  Mean Square F Value Pr > F
 
bollwt 1 11.118560 11.118560 230.37 <.0001
Error: MS(Error) 40 1.930519 0.048263
 
Source DF   Type I  SS   Mean Square   F Value   Pr > F
 
 * variety 1 0.942426 0.942426 16.27 0.0021
 
Error   10.657 0.617126 0.057907
 Error: 0.5194*MS(plant(variet*spacin)) + 0.4806*MS(Error)
 * This test assumes one or more other fixed effects are zero.
 
Source DF Type III SS  Mean Square F Value Pr > F
 
 * spacing 1 0.374839 0.374839 5.76 0.0660
 
Error 4.6073 0.300008 0.065116
 Error: 0.9076*MS(plant(variet*spacin)) + 0.0924*MS(Error)
 * This test assumes one or more other fixed effects are zero.
 
Source DF   Type III SS   Mean Square   F Value   Pr > F
 
variety*spacing 1 0.047855 0.047855 0.74 0.4324
 
Error   4.6791 0.303859 0.064939
Error: 0.8981*MS(plant(variet*spacin)) + 0.1019*MS(Error)
      Standard    
Parameter   Estimate Error   t Value  Pr > |t|
 
Intercept   -.2724440749 B   0.11934010 -2.28  0.0278
bollwt   0.3056076686 0.02013479 15.18  <.0001
variety 37 0.4232705043 B 0.12964467  3.26  0.0022
variety 213 0.0000000000 B  ⋅          ⋅    ⋅      
spacing 30 0.0379572553 B 0.15161542  0.25  0.8036
spacing 40 0.0000000000 B  ⋅          ⋅    ⋅      
variety*spacing 37 30 0.0236449357 B 0.19897993  0.12  0.9060
variety*spacing 37 40 0.0000000000 B  ⋅          ⋅    ⋅      
variety*spacing 213 30 0.0000000000 B  ⋅          ⋅    ⋅      
variety*spacing 213 40 0.0000000000 B  ⋅          ⋅    ⋅      
plant(variet*spacin) 3 37 30 0.0892286888 B 0.15033417  0.59  0.5562
plant(variet*spacin) 5 37 30 0.0000000000 B  ⋅          ⋅    ⋅      
plant(variet*spacin) 3 37 40 -.0271310434 B 0.11085696 -0.24  0.8079
plant(variet*spacin) 5 37 40 0.0000000000 B  ⋅          ⋅    ⋅      
plant(variet*spacin) 3 213 30 0.3337196850 B 0.16055649  2.08  0.0441
plant(variet*spacin) 5 213 30 0.0000000000 B  ⋅          ⋅    ⋅      
plant(variet*spacin) 0 213 40 -.0984914494 B 0.11151946 -0.88  0.3824
plant(variet*spacin) 3 213 40 0.0000000000 B  ⋅          ⋅    ⋅      

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

The Type I SS for BOLLWT is what would be obtained by a simple linear regression of LINT on BOLLWT. If you ran this simple linear regression, you would get an R2 value of (29.069/33.091)=0.878, a residual mean square of (33.091 29.069)/47)=0.08557, and an F-statistic of 339.69, thus indicating a strong relationship of lint weight to boll weight.

The Type III SS from the RANDOM statement with the TEST option shows a non-significant contribution from the VARIETY*SPACING interaction (F=0.74, p-value 0.4324). The VARIETY effect (F=16.27, p-value 0.0021) is statistically significant, whereas the SPACING main effect (F=5.76, p-value 0.0660) is only marginally significant. Note that the error terms for the VARIETY and SPACING main effects and interaction use linear combinations of PLANT(VARIETY*SPACING) and ERROR mean squares. This follows from the complex set of expected means squares that result in analysis of covariance.

It might seem that it would be simpler to assume from inspection of the analysis of covariance sources of variance that PLANT(VARIETY*SPACING) is the proper error term for VARIETY, SPACING, and VARIETY*SPACING and use the statement

test h=variety spacing variety*spacing
   e=plant(variety*spacing);

in place of the RANDOM statement. If you do this, the resulting test statistics will be affected and there is the possibility of drawing erroneous conclusions. Now consider the results obtained using PROC MIXED.

Output 7.19 Analysis of Covariance Results Using PROC MIXED

Covariance Parameter Estimates
 
Cov Parm Estimate
 
plant(variet*spacin) 0
Residual 0.04995
 
Solution for Fixed Effects
 
          Standard      
Effect variety spacing   Estimate Error   DF   t Value  Pr > |t|
 
Intercept        -0.3210 0.1078 4 -2.98 0.0408
bollwt        0.3041 0.01990 40 15.28 <.0001
variety 37      0.4671 0.09351 4 5.00 0.0075
variety 213      0 ⋅  ⋅     ⋅       
spacing      30 0.3013 0.09720 4 3.10 0.0362
spacing      40 0 ⋅  ⋅     ⋅       
variety*spacing 37    30 -0.1844 0.1350 4 -1.37 0.2436
variety*spacing 37    40 0 ⋅  ⋅     ⋅       
variety*spacing 213    30 0 ⋅  ⋅     ⋅       
variety*spacing 213    40 0 ⋅  ⋅     ⋅       
 
Type 3 Tests of Fixed Effects
 
  Num Den    
Effect DF DF F Value Pr > F
 
bollwt 1 40 233.51 <.0001
variety 1 4 18.21 0.0130
spacing 1 4 9.68 0.0358
variety*spacing 1 4 1.87 0.2436

Compared to Output 7.18 for PROC GLM, the exact parameter estimates and F-values differ somewhat. This is partly because PROC MIXED recovers information from the random-model effects, similar to the recovery of interblock information for incomplete-blocks designs discussed in Chapter 4, and partly because the variance component estimate for PLANT(VARIETY*SPACING) is zero. MIXED’s default option for computing F-values in the presence of zero or negative variance component estimates is somewhat different than the F-ratios derived from the expected mean square by PROC GLM.

Section 4.4.2 discussed the possible bias to F-statistics that the MIXED default of setting negative variance component estimates to zero may introduce. This is especially evident in this example with the main effect test for SPACING. The p-value here is 0.0358, whereas it was 0.0660 with the GLM analysis. The MIXED result reflects potential bias. Here, as in Section 4.4.2, you can avoid this problem by using the METHOD=TYPE3 option. The results are not shown here, but they are very similar to the results in Output 7.18. Slight discrepancies result from the recovery of random-effects information present in PROC MIXED but not in PROC GLM.

Both the GLM and MIXED results suggest dropping the terms VARIETY*SPACING and PLANT(VARIETY*SPACING). The justification for VARIETY*SPACING is the same, that is, the F-test. However, with MIXED, there is no test for PLANT(VARIETY*SPACING). The zero variance component estimate provides equivalent justification. Given these results, the model has more factors than necessary; hence, their coefficient estimates are not necessary and can make further inference needlessly awkward. Drop VARIETY*SPACING and PLANT(VARIETY*SPACING), and use the following GLM statements to re-estimate:

model lint=bollwt variety spacing / solution;
lsmeans variety spacing / stderr;

You can use equivalent MIXED statements and obtain the same results. The results appear in Output 7.20. They differ only slightly from those in Outputs 7.18 and 7.19.

Output 7.20 Results of a Simplified Covariance Analysis: Two-Way Structure with Interaction

  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 3   30.79949757 10.26649919 201.65 <.0001
Error 45 2.29111467 0.05091366
Corrected Total 48 33.09061224
 
Source DF Type I SS  Mean Square F Value Pr > F
 
bollwt 1 29.06931406 29.06931406 570.95 <.0001
variety 1 1.26353553 1.26353553 24.82 <.0001
spacing 1 0.46664798 0.46664798 9.17 0.0041
 
Source DF Type III SS  Mean Square F Value Pr > F
 
bollwt 1 11.57173388 11.57173388 227.28 <.0001
variety 1 1.19732512 1.19732512 23.52 <.0001
spacing 1 0.46664798 0.46664798 9.17 0.0041
      Standard    
Parameter   Estimate Error   t Value  Pr > |t|
 
Intercept -.2769483300 B    0.10384452 -2.67 0.0106
bollwt 0.3014429094 0.01999507 15.08 <.0001
variety 37 0.4106564020 B 0.08468173 4.85 <.0001
variety 213 0.0000000000 B  ⋅         ⋅    ⋅      
spacing 30 0.2052058951 B 0.06778167 3.03 0.0041
spacing 40 0.0000000000 B  ⋅         ⋅    ⋅      

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

 
Least Squares Means
 
    Standard  
var lint LSMEAN Error  Pr > |t|
 
37 2.00805710 0.05320406 <.0001
213 1.59740070 0.05523778 <.0001
 
    Standard  
spac lint LSMEAN Error  Pr > |t|
 
30 1.90533185 0.05479483 <.0001
40 1.70012595 0.03988849 <.0001

This model specifies a single regression coefficient that relates LINT to BOLLWT (0.3014), but with different intercepts for the four treatment combinations. These intercepts can be constructed from the SOLUTION vector by summing appropriate component values.

For example, for VARIETY=37, SPACING=30, the model estimate is

y = μ + v1 + τ1 + βx = −.2769 + .4107 + .2052 + .3014x

For the other treatment combinations, the results are

VARIETY

SPACING

Values for Model
Estimate, Where x =
BOLLWT, the Covariate

37

40

0.1338 + 0.3014x

213

30

–0.0715 + 0.3014x

213

40

–0.2769 + 0.3014x

Note that these results can be obtained with ESTIMATE statements. Least-squares means appear in Output 7.20; other statistics can be obtained but are not necessary in this situation.

7.6 Orthogonal Polynomials and Covariance Methods

A standard method for analyzing treatments with quantitative levels is to decompose the treatment sum of squares using orthogonal polynomial contrasts. That is, contrasts whose coefficients measure the linear, quadratic, and higher-order regression effects associated with treatment level. Most statistical methods textbooks have tables of orthogonal polynomial coefficients for balanced data with equally spaced treatment levels. You can use the ORPOL function in PROC IML to determine coefficients when you have a design that standard tables do not cover, such as unequally spaced designs. Section 7.6.2 shows you how to use the ORPOL function.

In many practical applications, orthogonal polynomials are awkward to use. Often, you want to estimate the regression equation, not merely decide what is “significant.” Even with treatment designs covered by standard tables, extracting the regression equation from orthogonal polynomials is laborious. For factorial experiments, interest usually centers on interaction. That is, are the regressions over the quantitative factor the same for all levels of the other factor? Except for very simple factorial treatment designs, trying to use orthogonal polynomials to measure interaction can become a daunting task.

This section presents analysis-of-covariance methods that are equivalent to orthogonal polynomial contrasts. The main advantage of the covariance, or direct regression, approach is that in most cases, it is easier to implement using SAS.

7.6.1 A 2x3 Example

Output 7.21 contains data from an experiment designed to compare response to increasing dosage for two types of drug. There were three levels of the actual dosage, DOSE in the SAS data set—1, 10, and 100 units. The data were analyzed using LOGDOSE, the base 10 logs of the dosages. Note that the levels of LOGDOSE are equally spaced. The experiment was conducted as a randomized-complete-blocks design. BLOC denotes the blocks and Y denotes the response variable.

Output 7.21 Data for a Type-Dose Factorial Orthogonal Polynomial Example

Obs   bloc   type  dose logdose y
 
1 1 1 1 0 63
2 1 2 1 0 59
3 1 1 10 1 62
4 1 2 10 1 62
5 1 1 100 2 62
6 1 2 100 2 68
7 2 1 1 0 50
8 2 2 1 0 49
9 2 1 10 1 49
10 2 2 10 1 55
11 2 1 100 2 48
12 2 2 100 2 58
13 3 1 1 0 53
14 3 2 1 0 47
15 3 1 10 1 52
16 3 2 10 1 51
17 3 1 100 2 51
18 3 2 100 2 50
19 4 1 1 0 52
20 4 2 1 0 48
21 4 1 10 1 54
22 4 2 10 1 49
23 4 1 100 2 55
24 4 2 100 2 72

The analysis of variance of these data is as follows:

SOURCE OF VARIATION

DF

 
Block

3

 
Type

1

 
Log Dose

2

 
Type × Log Dose

2

 
Error

15

 

The dose main effect can be partitioned into linear and quadratic components by orthogonal polynomials whose contrast coefficients are

    COEFFICIENT FOR LOGDOSE  
CONTRAST

0

1

2

Linear

–1

0

1

Quadratic

–1

2

–1

Similarly, the Type×Log Dose interaction can be partitioned into a Linear×Type and a Quadratic×Type component. Because the log dose levels are equally spaced, you can look up the contrast coefficients shown above. Most statistical methods texts have such a table. If you don’t have a table readily available, or if you want to use contrasts for a situation not in the table, for instance partitioning DOSE rather than Log Dose into linear and quadratic components, you can use the ORPOL function in PROC IML demonstrated in Section 7.6.2.

For analysis with LOGDOSE, use the following SAS statements:

proc glm;
 class bloc type logdose;
 model y=bloc type|logdose;
 contrast 'linear logdose' logdose -1 0 1;
 contrast 'quadratic logdose' logdose -1 2 -1;
 contrast 'linear logdose x type' type*logdose 1 0 -1 -1 0 1;
 contrast 'quad logdose x type' type*logdose 1 -2 1 -1 2 -1;

Output 7.22 contains the results.

Output 7.22 Analysis of Variance for Type-Dose Data

  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 8 816.500000 102.062500 6.06 0.0014
Error 15 252.458333 16.830556
Corrected Total 23 1068.958333
 
Source DF Type I SS  Mean Square F Value Pr > F
 
bloc 3 538.7916667 179.5972222 10.67 0.0005
type 1 12.0416667 12.0416667 0.72 0.4109
logdose 2 121.5833333 60.7916667 3.61 0.0524
type*logdose 2 144.0833333 72.0416667 4.28 0.0338
 
Source DF Type III SS  Mean Square F Value Pr > F
 
bloc 3 538.7916667 179.5972222 10.67 0.0005
type 1 12.0416667 12.0416667 0.72 0.4109
logdose 2 121.5833333 60.7916667 3.61 0.0524
type*logdose 2 144.0833333 72.0416667 4.28 0.0338
 
Contrast DF Contrast SS  Mean Square F Value Pr > F
 
linear logdose 1 115.5625000 115.5625000 6.87 0.0193
quadratic logdose 1 6.0208333 6.0208333 0.36 0.5587
linear logdose x type 1 138.0625000 138.0625000 8.20 0.0118
quad logdose x type 1 6.0208333 6.0208333 0.36 0.5587

From this output you can see that there is a significant Type×Log Dose interaction. The Linear Logdose×Type interaction explains most of the interaction. A look at the Type×Log Dose least-squares means (Output 7.23) reveals why. For Type 1, LOGDOSE does not affect mean response, whereas for Type 2, mean response increases approximately linearly with increasing LOGDOSE.

Output 7.23 Least-Squares Means for Type-Dose Data

Least Squares Means
 
type logdose y LSMEAN
 
1 0 54.5000000
1 1 54.2500000
1 2 54.0000000
2 0 50.7500000
2 1 54.2500000
2 2 62.0000000

By inspection, a linear regression for each type appears sufficient to explain the LOGDOSE effect. You can use the following SAS code to formally confirm this:

proc glm;
   class bloc type logdose;
   model y=bloc type logdose(type);
   contrast 'lin in type 1' logdose(type) 1 0 -1 0 0 0;
   contrast 'lin in type 2' logdose(type) 0 0 0 1 0 -1;
   contrast 'quad in type 1' logdose(type) 1 -2 1 0 0 0;
   contrast 'quad in type 2' logdose(type) 0 0 0 1 -2 1;

Output 7.24 shows the results.

Output 7.24 Orthogonal Polynomial Contrast Results within Each Type

Contrast DF Contrast SS  Mean Square F Value Pr > F
 
lin in type 1 1 0.5000000 0.5000000 0.03 0.8655
lin in type 2 1 253.1250000 253.1250000 15.04 0.0015
quad in type 1 1 0.0000000 0.0000000 0.00 1.0000
quad in type 2 1 12.0416667 12.0416667 0.72 0.4109

You can see that neither type has a significant quadratic regression. The F-values are 0.0 and 0.72, respectively for Types 1 and 2. Type 2 does have a highly significant linear regression, as shown by the Lin In Type 2 contrast. The F-value is 15.04, and the associated p-value is 0.0015. You would then fit the linear regression equation over LOGDOSE for Type 1.

7.6.2 Use of the IML ORPOL Function to Obtain Orthogonal Polynomial Contrast Coefficients

The contrast coefficients used in the previous section are the standard orthogonal polynomial coefficients for equally spaced treatments. You can find these coefficients in tables contained in many statistical methods textbooks. Suppose, however, that you want to partition treatment effects into linear effects, quadratic effects, and so forth, but your treatment levels are not equally spaced. Or suppose that you simply don’t have convenient access to a table of standard orthogonal polynomial contrasts. The interactive matrix algebra procedure PROC IML has a function, ORPOL, that computes orthogonal polynomial contrasts for any set of quantitative treatment levels. The function is simple to use, and does not require knowledge of matrix algebra.

To illustrate the ORPOL function, suppose you want to use DOSE rather than LOGDOSE in the example in Section 7.6.1. The levels are 1, 10, and 100, which are not equally spaced so you won’t find the correct coefficients in any table. Use the following PROC IML statements:

proc iml;
 levels={1,10,100};
 coef=orpol(levels);
 print coef;

Output 7.25 shows the results. The LEVELS={level 1, level 2,...} statement defines a variable named LEVELS that contains a list of the treatment levels. Note that the treatment levels are separated by commas. The name of the variable is your choice; here it is called LEVELS, but you can give it any name you like within the conventions of allowable SAS variable names. The variable named COEF will contain the contrast coefficients; its name is your choice as well. You set it equal to the ORPOL function and put the variable with the treatment levels in parentheses. The PRINT statement causes the variable COEF to be printed.

Output 7.25 Contrast Coefficients for Unequally Spaced Levels of DOSE Using ORPOL

COEF
 
0.5773503 -0.464991 0.6711561
0.5773503 -0.348743 -0.738272
0.5773503 0.8137335 0.0671156

The first column of numbers, all 0.577, is the contrast for the mean, which is rarely, if ever, used. The second column gives you the coefficients for the linear contrasts. The third column gives you the quadratic contrast coefficients. Note that for each contrast, there is one coefficient per treatment (in this case dose) level. Also, there is one contrast per treatment degree of freedom. If you had four treatment levels, there would be four coefficients per contrast and a cubic as well as a linear and quadratic contrast.

The information from Output 7.25 allows you to write the appropriate CONTRAST statements. For example, for linear and quadratic dose effects, use the statements:

contrast ‘linear dose’ dose -0.465 -0.349 0.814;
contrast ‘quadratic dose’ dose 0.671 -0.738 0.067;

Pay attention to the sum of the contrasts. Occasionally, a rounding error will cause the sum of the coefficients to not equal zero; you might get 0.001, for example. This will cause both the GLM and MIXED procedures to declare the contrast non-estimable and you will get no output. Simply adjust one number so the coefficients sum to exactly zero. The impact of this adjustment on the resulting computations is negligible.

You can use ORPOL for equally spaced treatments. For example, add the following statements to the IML program given above to compute the coefficients for the equally spaced levels of LOGDOSE:

log_lev=log10(levels);
coef=orpol(log_lev);
print log_lev;
print coef;
fuzzed_coef=fuzz(coef);
print fuzzed_coef;

Output 7.26 shows the results. The LOG10 function takes the base 10 log of each element of the vector variable LEVELS, defined above. The name of the new variable is your choice; here it is called LOG_LEV. Occasionally, machine-rounding error causes numbers that are supposed to be zero to be computed as very small nonzero numbers, as with the coefficient for the second treatment level in the linear contrast (second column). The FUZZ function cleans up these rounding errors and sets them to zero, as shown in the variable FUZZED_COEF, the COEF variable with FUZZ applied.

Output 7.26 ORPOL Results for Equally Spaced Treatment Level (LOGDOSE)

LOG_LEV
 
0
1
2
 
COEF
 
0.5773503 -0.707107 0.4082483
0.5773503 8.194E-17 -0.816497
0.5773503 0.7071068 0.4082483
 
FUZZED_COEF
 
0.5773503 -0.707107 0.4082483
0.5773503 0 -0.816497
0.5773503 0.7071068 0.4082483

Note that the coefficients are given the orthonormal form, that is, the squared coefficients for each contrast sum to one, for example, (–0.707)2 + 02 + (0.707)2 = 1 for the linear contrast (second column). You can rescale these coefficients to integer values, for example, –1, 0, and 1 for the linear contrast and 1, –2, and 1 for the quadratic, without affecting the sum of squares or F-values for the contrasts. This gives you the same coefficients you used in the previous section for LOGDOSE.

7.6.3 Use of Analysis of Covariance to Compute ANOVA and Fit Regression

In the previous sections you saw how to assess regression over treatment levels using orthogonal polynomial contrasts and how to use ORPOL to obtain the needed contrast coefficients. However, orthogonal polynomials can be awkward to use, especially when you want to estimate the regression equation in addition to merely partitioning variation for testing purposes. In practical situations, orthogonal polynomials are something of a holdover from pre-computer-era statistical analysis. You can use analysis-of-covariance methods to compute the same statistics you got using orthogonal polynomials, as well as additional statistics that are often useful. This section uses the example from Section 7.6.1 to illustrate.

You can reproduce the essential elements of the analysis in Output 7.22 using analysis-of-covariance methods. First, you need to define a new variable in the DATA step for the square of LOGDOSE. In the SAS program between, the new variable is called LOGD2, defined in the DATA step as LOGDOSE*LOGDOSE. Then use the following SAS statements:

proc glm;
 class bloc type;
 model y=bloc type logdose logd2
    type*logdose type*logd2;

The results appear in Output 7.27. Notice that LOGDOSE does not appear in the CLASS statement. You use it, as well as LOGD2, as a direct regression variable, exactly as you would if LOGDOSE was a covariable and you suspected heterogeneous quadratic regressions of y on the covariable.

Output 7.27 ANOVA for Type-Dose Data Using Analysis-of-Covariance Methods

Source DF Type I SS  Mean Square F Value Pr > F
 
bloc 3 538.7916667 179.5972222 10.67 0.0005
type 1 12.0416667 12.0416667 0.72 0.4109
logdose 1 115.5625000 115.5625000 6.87 0.0193
logd2 1 6.0208333 6.0208333 0.36 0.5587
logdose*type 1 138.0625000 138.0625000 8.20 0.0118
logd2*type 1 6.0208333 6.0208333 0.36 0.5587
 
Source DF Type III SS  Mean Square F Value Pr > F
 
bloc 3 538.7916667 179.5972222 10.67 0.0005
type 1 28.1250000 28.1250000 1.67 0.2157
logdose 1 0.3894231 0.3894231 0.02 0.8811
logd2 1 6.0208333 6.0208333 0.36 0.5587
logdose*type 1 0.8125000 0.8125000 0.05 0.8291
logd2*type 1 6.0208333 6.0208333 0.36 0.5587

Comparing Output 7.25 to Output 7.22, you can see that the Type I SS for LOGDOSE and LOGD2 are identical to the contrast results for Linear Logdose and Quadratic Logdose. The sums of squares are both 115.5625 and 6.0208, respectively. Also, the TYPE*LOGDOSE and TYPE*LOGD2 Type I sum of squares are identical to the Linear Logdose×Type and Quadratic Logdose×Type contrasts from Output 7.21. In both cases, the significant difference between linear regressions of y on LOGDOSE for the two types is the main result. Note that the Type III sums of squares produce nonsense results for the LINEAR LOGDOSE main effect and interaction terms because they are both adjusted for quadratic effects. In general, the Type III SS should be ignored when using analysis of covariance in lieu of orthogonal polynomials.

Outputs 7.22 and 7.25 both lead to the conclusion that you should fit a linear regression over LOGDOSE for each type. You can use the following SAS statements to do so:

proc glm;
 class bloc type;
 model y= bloc type logdose(type)/solution;
 estimate ‘beta-0, type 1’
    intercept 4 bloc 1 1 1 1 type 4 0/divisor=4;
 estimate ‘beta-0, type 2’
    intercept 4 bloc 1 1 1 1 type 0 4/divisor=4;

These statements are similar to those used in Section 7.3 to fit the unequal slopes. The additional ESTIMATE statements allow you to compute the β0i terms for the ith type, which the MODEL statement implicitly defines to be the sum of the intercept, the average block effect, and the type effect. The results appear in Output 7.28.

Output 7.28 ANOVA for Type-Dose Data Using Analysis-of-Covariance Methods

Parameter Estimate Error    t Value   Pr > |t|
 
 beta-0, type 1 54.5000000 1.80039484 30.27 <.0001
 beta-0, type 2 50.0416667 1.80039484 27.79 <.0001
      Standard    
Parameter   Estimate Error     t Value  Pr > |t|
 
Intercept 50.08333333 B    2.27733935 21.99 <.0001
bloc 1 7.66666667 B 2.27733935 3.37 0.0037
bloc 2 -3.50000000 B 2.27733935 -1.54 0.1427
bloc 3 -4.33333333 B 2.27733935 -1.90 0.0741
bloc 4 0.00000000 B  ⋅         ⋅    ⋅      
type 1 4.45833333 B 2.54614280 1.75 0.0980
type 2 0.00000000 B  ⋅         ⋅    ⋅      
logdose(type) 1 -0.25000000   1.39457984 -0.18 0.8598
logdose(type) 2 5.62500000   1.39457984 4.03 0.0009
 

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

From Output 7.25, the main results are the regression equations. For Type 1, the equation is y=54.5 – 0.25*LOGDOSE; for Type 2 it is y=50.042+5.625*LOGDOSE.

PROC MIXED makes it somewhat more convenient to obtain the regression equations. You can use the SAS statements

proc mixed;
 class bloc type;
 model y=type logdose(type)/noint solution;
 random bloc;

The results appear in Output 7.29.

Output 7.29 Estimate of Linear Regression Equations for Each Type

      Standard    
Parameter   Estimate Error   t Value  Pr > |t|
 
type 1 54.50000000    2.89268414 18.84 <.0001
type 2 50.04166667 2.89268414 17.30 <.0001
logdose(type) 1 -0.25000000 2.24066350 -0.11 0.9123
logdose(type) 2 5.62500000 2.24066350 2.51 0.0208

The RANDOM BLOC statement and the NOINT option cause the β0i terms to be estimated directly, rather than requiring ESTIMATE statements. You can assume that RANDOM BLOC does not affect the estimates, but it does change the standard errors of the regression coefficients. For PROC MIXED, the standard errors are 2.89 and 2.24 for the intercept and slope, respectively. These are valid if assuming blocks to be random is reasonable. The PROC GLM results, 1.80 and 1.39, assume fixed blocks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.45.153