Chapter 11 Examples of Special Applications

11.1 Introduction

11.2 Confounding in a Factorial Experiment

11.2.1 Confounding with Blocks

11.2.2 A Fractional Factorial Example

11.3 A Balanced Incomplete-Blocks Design

11.4 A Crossover Design with Residual Effects

11.5 Models for Experiments with Qualitative and Quantitative Variables

11.6 A Lack-of-Fit Analysis

11.7 An Unbalanced Nested Structure

11.8 An Analysis of Multi-Location Data

11.8.1 An Analysis Assuming No Location×Treatment Interaction

11.8.2 A Fixed-Location Analysis with an Interaction

11.8.3 A Random-Location Analysis

11.8.4 Further Analysis of a Location×Treatment Interaction Using a Location Index

11.9 Absorbing Nesting Effects

11.1 Introduction

As already noted, the GLM and MIXED procedures can be used to analyze a multitude of data structures. In this chapter several applications are presented that utilize tools discussed in the previous chapters. Some of these applications involve statistical topics that are not discussed in great detail in this book. References are given to provide the necessary background information.

11.2 Confounding in a Factorial Experiment

Experiments use confounding in two forms. The first are factorial treatments designs in which all factorial combinations appear in the experiment, but they appear in incomplete blocks containing only a subset of the factor combinations. Thus, within a given block, one or more treatment effects are confounded with block effects. The second are fractional factorial experiments in which only a subset of the factor combinations appear in the experiment. Thus, some of the factorial effects are not estimable, but are aliased with other effects, meaning that the same estimable function estimates both effects. Confounding is covered in most textbooks on the design of experiments (for example, Hicks and Turner 2000).

11.2.1 Confounding with Blocks

The first example for this topic is a 23 factorial with factors labeled A, B, and C in blocks of size four. There are three replications with interactions ABC, AC, and BC, confounded with blocks in replications 1, 2, and 3, respectively. These factors are thus partially confounded with blocks. The data appear in Output 11.1.

Output 11.1 Data for a Two-Cube Factorial in Blocks of Size Four

Obs rep blk a b c y
 
1 1 1 1 1 1 3.99
2 1 1 1 0 0 1.14
3 1 1 0 1 0 1.52
4 1 1 0 0 1 3.33
5 1 2 1 1 0 2.06
6 1 2 1 0 1 5.58
7 1 2 0 1 1 2.06
8 1 2 0 0 0 -0.17
9 2 1 1 1 1 3.77
10 2 1 1 0 1 6.69
11 2 1 0 1 0 2.17
12 2 1 0 0 0 -0.01
13 2 2 1 1 0 2.43
14 2 2 0 1 1 1.22
15 2 2 1 0 0 0.37
16 2 2 0 0 1 2.06
17 3 1 1 1 1 4.53
18 3 1 0 1 1 1.90
19 3 1 1 0 0 1.62
20 3 1 0 0 0 -0.70
21 3 2 1 1 0 1.56
22 3 2 1 0 1 5.99
23 3 2 0 1 0 1.44
24 3 2 0 0 1 2.42

Contrasts corresponding to confounded effects can be estimated only from those replications in which they are not confounded. In this example, they are estimated from only two-thirds of the data; thus their standard errors should be larger by a factor of 3/2.

The analysis using PROC GLM is straightforward. You can generate contrasts in the DATA step instead of specifying classes for treatments and using CONTRAST statements, as the following code shows:

data confound;
   input rep blk a b c y;
      ca= -(a=0) + (a=1);
      cb= -(b=0) + (b=1);
      cc= -(c=0) + (c=1);
    datalines;
      ·
      data
      ·
;

By sorting the data and running the analysis by REP, you can use the ALIASING option in PROC GLM to print out the confounding pattern. Use the following statements:

proc sort;
   by rep;
proc glm;
   by rep;
   class blk;
   model y=blk ca|cb|cc/solution aliasing;

The results appear in Output 11.2.

Output 11.2 Aliasing Output Showing a Confounding Pattern for a 23Factorial in Blocks of Size Four

----------- rep=1 -----------
Parameter Expected Value
 
Intercept Intercept + [blk 2] - ca*cb*cc
blk       1 [blk 1] - [blk 2] + 2*ca*cb*cc
blk       2  
ca ca
cb cb
ca*cb ca*cb
cc cc
ca*cc ca*cc
cb*cc cb*cc
ca*cb*cc  
----------- rep=2 -----------
Parameter Expected Value
 
Intercept Intercept + [blk 2] - ca*cc
blk       1 [blk 1] - [blk 2] + 2*ca*cc
blk       2  
ca ca
cb cb
ca*cb ca*cb
cc cc
ca*cc  
cb*cc cb*cc
ca*cb*cc ca*cb*cc
----------- rep=3 -----------
Parameter Expected Value
 
Intercept Intercept + [blk 2] - ca*cc
blk       1 [blk 1] - [blk 2] + 2*cb*cc
blk       2  
ca ca
cb cb
ca*cb ca*cb
cc cc
ca*cc ca*cc
cb*cc  
ca*cb*cc ca*cb*cc

The contents of Output 11.2 appear immediately after the parameter estimates generated by the SOLUTION option in the MODEL statement. For REP=1, you can see that the three-way interaction CA*CB*CC has a blank under “Expected Value” but the INTERCEPT and BLK 1 effects estimate their usual estimable functions plus the CA*CB*CC effect. This indicates that the ABC interaction effect is confounded with block in REP=1. Similarly, the output indicates that the AC interaction is confounded with block in REP=2, and the BC interaction is confounded with block in REP=2. Although in this example the ALIASING option merely confirms the confounding pattern stated in the introduction, it can be very useful in data sets where the confounding pattern is not obvious and needs to be investigated.

For a complete analysis of the data, combined over all replications, use the following SAS statements:

proc glm;
   classes rep blk;
   model y=rep blk(rep) ca|cb|cc/ solution;

The results appear in Output 11.3.

Output 11.3 ANOVA for a Two-Cube Factorial in Blocks of Size Four

The GLM Procedure
Class Level Information
Class Levels   Values
rep 3   1 2 3
blk 2   1 2

 

  Number of observations     24
The GLM Procedure

 

   Dependent Variable: y

  Sum of  
Source DF Squares  Mean Square F Value Pr > F
 
Model 12 81.74957500 6.81246458 33.60 <.0001
 
Error 11 2.23018750 0.20274432    
 
Corrected Total 23 83.97976250      
 
R-Square Coeff Var Root MSE y Mean
 
0.973444 18.96878 0.450271 2.373750
 
Source DF Type I SS Mean Square F Value Pr > F
 
rep 2 0.05092500 0.02546250 0.13 0.8832
blk(rep) 3 7.43221250 2.47740417 12.22 0.0008
ca 1 21.07500417 21.07500417 103.95 <.0001
cb 1 0.00453750 0.00453750 0.02 0.8838
ca*cb 1 1.72270417 1.72270417 8.50 0.0141
cc 1 37.77550417 37.77550417 186.32 <.0001
ca*cc 1 2.31800625 2.31800625 11.43 0.0061
cb*cc 1 11.34005625 11.34005625 55.93 <.0001
ca*cb*cc 1 0.03062500 0.03062500 0.15 0.7049
 
Source DF Type III SS Mean Square F Value Pr > F
 
rep 2 0.05092500 0.02546250 0.13 0.8832
blk(rep) 3 1.66755417 0.55585139 2.74 0.0938
ca 1 21.07500417 21.07500417 103.95 <.0001
cb 1 0.00453750 0.00453750 0.02 0.8838
ca*cb 1 1.72270417 1.72270417 8.50 0.0141
cc 1 37.77550417 37.77550417 186.32 <.0001
ca*cc 1 2.31800625 2.31800625 11.43 0.0061
cb*cc 1 11.34005625 11.34005625 55.93 <.0001
ca*cb*cc 1 0.03062500 0.03062500 0.15 0.7049
 
    Standard    
Parameter Estimate   Error t Value Pr > |t|
 
Intercept 2.010625000 B 0.25170936 7.99 <.0001
rep        1 0.328125000 B 0.35597078 0.92 0.3764
rep        2 -0.110000000 B 0.35597078 -0.31 0.7631
rep        3 0.000000000 B  ⋅          ⋅    ⋅    
blk(rep)   1 1 0.200000000 B 0.38994646 0.51 0.6182
blk(rep)   2 1 0.000000000 B  ⋅          ⋅    ⋅    
blk(rep)   1 2 0.873750000 B 0.38994646 2.24 0.0466
blk(rep)   2 2 0.000000000 B  ⋅          ⋅    ⋅    
blk(rep)   1 3 0.668750000 B 0.38994646 1.71 0.1143
blk(rep)   2 3 0.000000000 B  ⋅          ⋅    ⋅    
ca 0.937083333   0.09191126 10.20 <.0001
cb 0.013750000   0.09191126 0.15 0.8838
ca*cb -0.267916667   0.09191126 -2.91 0.0141
cc 1.254583333   0.09191126 13.65 <.0001
ca*cc 0.380625000   0.11256785 3.38 0.0061
cb*cc -0.841875000   0.11256785 -7.48 <.0001
ca*cb*cc -0.043750000   0.11256785 -0.39 0.7049
 

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter ‘B’ are not uniquely estimable.

The standard errors of the coefficients of the confounded effects (ABC, AC, and BC) are indeed larger by 3/2 than the coefficients of the effects not confounded. You can verify that the sums of squares of the confounded effects, based on data from the replications in which they are not confounded, are identical to the sums of squares in Output 11.3.

The estimable functions option can be used to indicate the nature of the confounding. Requesting the Type I functions for effects in the same order as in the MODEL statement above gives the effects for BLK(REP) unadjusted for the factorial effects and reveals how the blocks are related to the factorial effects.

Output 11.4 Estimable Functions for a Two-Cube Factorial in Blocks of Size Four

   Type I Estimable Functions

 

Effect Coefficients blk(rep)
 
Intercept 0
 
rep     1 0
rep     2 0
rep     3 0
 
blk(rep) 1 1 L5
blk(rep) 2 1 -L5
blk(rep) 1 2 L7
blk(rep) 2 2 -L7
blk(rep) 1 3 L9
blk(rep) 2 3 -L9
 
ca 0
 
cb 0
 
ca*cb 0
 
cc 0
 
ca*cc 2L7
 
cb*cc 2L9
 
ca*cb*cc 2L5

Output 11.4 gives the nonzero coefficients of BLK(REP). The coefficient L5 appears on the terms for BLK in REP 1 and also the CA*CB*CC interaction term. This happens because CA*CB*CC is confounded with BLK in REP 1. This is apparent from the data set shown in Output 11.1. The product CA*CB*CC is equal to 1 for all observations in REP 1 of BLK 1 and CA*CB*CC= – 1 for REP 2 of BLK 1. In some data sets, the confounding pattern is not so obvious. Using the coefficients for estimable functions in conjunction with the output from the ALIASING option shown above, you can discover the confounding pattern.

11.2.2 A Fractional Factorial Example

The second example is a ½ fraction of a 24 factorial experiment. The defining contrast is ABCD. The data appear in Output 11.5.

Output 11.5 Data for a ½ Fraction of a 24 Factorial Experiment

Obs a b c d y ca cb cc cd
 
1 0 0 0 0 2.29 -1 -1 -1 -1
2 0 0 1 1 1.51 -1 -1 1 1
3 0 1 0 1 1.49 -1 1 -1 1
4 0 1 1 0 3.43 -1 1 1 -1
5 1 0 0 1 3.78 1 -1 -1 1
6 1 0 1 0 2.08 1 -1 1 -1
7 1 1 0 0 3.30 1 1 -1 -1
8 1 1 1 1 3.63 1 1 1 1

The data in Output 11.5 include the factor levels in their original form (A, B, C, and D) and in contrast (–1,1) form (CA, CB, CC, and CD).

You can compute the analysis with the aliasing pattern by using PROC GLM statements similar to those used in the previous example:

proc glm;
   model y=ca|cb|cc|cd/solution aliasing;

Output 11.6 shows the results.

Output 11.6 PROC GLM Analysis of Data from a ½ Fraction of a 24 Factorial Experiment

Source DF Squares Mean Square F Value Pr > F
 
Model 7 6.35588750 0.90798393 . .
Error 0 0.00000000  .            
Corrected Total 7 6.35588750      
 
R-Square Coeff Var Root MSE y Mean
 
1.000000 . . 2.688750
 
Source DF Type I SS Mean Square F Value Pr > F
 
ca 1 2.07061250 2.07061250 . .
cb 1 0.59951250 0.59951250 . .
ca*cb 1 0.00031250 0.00031250 . .
cc 1 0.00551250 0.00551250 . .
ca*cc 1 0.80011250 0.80011250 . .
cb*cc 1 2.82031250 2.82031250 . .
ca*cb*cc 1 0.05951250 0.05951250 . .
cd 0 0.00000000  .         . .
ca*cd 0 0.00000000  .         . .
cb*cd 0 0.00000000  .         . .
ca*cb*cd 0 0.00000000  .         . .
cc*cd 0 0.00000000  .         . .
ca*cc*cd 0 0.00000000  .         . .
cb*cc*cd 0 0.00000000  .         . .
ca*cb*cc*cd 0 0.00000000  .         . .
 
Source DF Type III SS Mean Square F Value Pr > F
 
ca 0 0 . . .
cb 0 0 . . .
ca*cb 0 0 . . .
cc 0 0 . . .
ca*cc 0 0 . . .
cb*cc 0 0 . . .
ca*cb*cc 0 0 . . .
cd 0 0 . . .
ca*cd 0 0 . . .
cb*cd 0 0 . . .
ca*cb*cd 0 0 . . .
cc*cd 0 0 . . .
ca*cc*cd 0 0 . . .
cb*cc*cd 0 0 . . .
ca*cb*cc*cd 0 0 . . .
 
Parameter Estimate Standard Error t Value Pr > |t|
 
Intercept  2.688750000 B . . .
ca  0.508750000 B . . .
cb  0.273750000 B . . .
ca*cb -0.006250000 B . . .
cc -0.026250000 B . . .
ca*cc -0.316250000 B . . .
cb*cc  0.593750000 B . . .
ca*cb*cc -0.086250000 B . . .
cd  0.000000000 B . . .
ca*cd  0.000000000 B . . .
cb*cd  0.000000000 B . . .
ca*cb*cd  0.000000000 B . . .
cc*cd  0.000000000 B . . .
ca*cc*cd  0.000000000 B . . .
cb*cc*cd  0.000000000 B . . .
ca*cb*cc*cd  0.000000000 B . . .
 
Parameter Expected Value
 
Intercept Intercept + ca*cb*cc*cd
ca ca + cb*cc*cd
cb cb + ca*cc*cd
ca*cb ca*cb + cc*cd
cc cc + ca*cb*cd
ca*cc ca*cc + cb*cd
cb*cc cb*cc + ca*cd
ca*cb*cc ca*cb*cc + cd
cd  
ca*cd  
cb*cd  
ca*cb*cd  
cc*cd  
ca*cc*cd  
cb*cc*cd  
ca*cb*cc*cd  

From Output 11.6, you can see that because there are only eight observations, only the first seven parameters in the model plus the intercept can be estimated. Also, each estimate is confounded—aliased—with one other factorial effect. The tables of “Parameter” and “Expected Value” at the end of the printout give the aliases. For example, the estimate of the intercept is aliased with the ABCD interaction, indicated on the printout by the fact that the expected value of the intercept is INTERCEPT + CA*CB*CC*CD. Similarly, the output indicates that the expected value of the parameter CA is CA+CB*CC*CD, that is, the main effect of A is aliased with the BCD interaction. You can apply analogous interpretations to the remaining parameters. You can see that this aliasing pattern agrees with the pattern you would derive from standard fractional factorial methods. In this case, which uses a very basic design, the ALIASING option merely restates information someone familiar with fractional factorial design would already know. However, for nonstandard incomplete factorial designs, for instance those you could generate with PROC OPTEX, the ALIASING option can provide useful information that usually is not obvious.

There are three important additional points about the analysis in Output 11.6. First, the default order of effects from the CA|CB|CC|CD syntax used in the MODEL statement causes all estimates involving factors A, B, and C to be estimated first, before any effects involving D appear in the model. This is not very realistic. Normally, you would not use a fractional factorial design unless you expect higher-order interaction effects to be negligible. For example, the output gives an estimate of the ABC interaction, which is aliased with the main effect of D. In practice, you would assume this to be an estimate of the main effect of D. That is, you would use this design only if you could assume that the ABC interaction is essentially zero. Also, because all the two-factor interactions are aliased with other two-factor interactions, you must be sure which you can assume to be negligible, and not alias two potentially important effects.

The second point is that there are no degrees of freedom and hence no F-values or p-values given in Output 11.6. The model used to compute the analysis is saturated. There are various strategies to get around this. A common approach is to assume that all interactions are zero and compute a main-effects-only model using the three degrees of freedom for the two-way interactions to estimate experimental error. You can do this by using the following statements:

proc glm;
   model y=ca cb cc cd/solution aliasing;

The results appear in Output 11.7. However, you can easily question whether the results in Output 11.7 are valid, because in Output 11.6, the largest single source of variation was the BC (aliased with AD) interaction. For these data, at least, the assumption that all interaction effects are zero is questionable. If there is a non-negligible BC (or AD) interaction, then the MS(ERROR) in Output 11.7 overestimates σ2 and hence the F-values are too low. An alternative strategy, not shown here, uses half-normal plots to estimate σ2 and construct approximate tests for the model effects. See Milliken and Johnson (1989, Chapter 4) for an explanation of how to implement half-normal plot analysis using SAS. Under the half-normal plot method, the main effects of A and the BC (or AD) interaction are statistically significant. You would need sufficient understanding of the data to decide whether the interaction is a BC or an AD interaction.

Output 11.7 Main-Effects-Only Analysis of Fractional Factorial Data

  Sum of  
Source DF Squares Mean Square F Value Pr > F
 
Model 4 2.73515000 0.68378750 0.57 0.7075
 
Error 3 3.62073750 1.20691250    
 
Corrected Total 7 6.35588750      
 
R-Square Coeff Var Root MSE y Mean
 
0.430333 40.85898 1.098596 2.688750
 
Source DF Type I SS Mean Square F Value Pr > F
 
ca 1 2.07061250 2.07061250 1.72 0.2815
cb 1 0.59951250 0.59951250 0.50 0.5317
cc 1 0.00551250 0.00551250 0.00 0.9504
cd 1 0.05951250 0.05951250 0.05 0.8385
 
Source DF Type III SS Mean Square F Value Pr > F
 
ca 1 2.07061250 2.07061250 1.72 0.2815
cb 1 0.59951250 0.59951250 0.50 0.5317
cc 1 0.00551250 0.00551250 0.00 0.9504
cd 1 0.05951250 0.05951250 0.05 0.8385
Parameter Estimate Standard
Error
t Value Pr > |t|   Expected
  Value
 
Intercept 2.688750000 0.38841223 6.92 0.0062   Intercept
ca 0.508750000 0.38841223 1.31 0.2815   ca
cb 0.273750000 0.38841223 0.70 0.5317   cb
cc -0.026250000 0.38841223 -0.07 0.9504   cc
cd -0.086250000 0.38841223 -0.22 0.8385   cd

The final point concerns the use of the (–1,1) contrasts CA through CD instead of the original (0,1) coding of A through D. If you use the variables A through D in the model, the ALIASING option assesses the aliasing pattern based on the estimable functions that follow from the (0,1) coding. These do not correspond to the standard aliasing pattern for fractional factorial experiments, and can be difficult to interpret. For example, these SAS statements yield the results shown in Output 11.8:

proc glm;
   model y=a|b|c|d/ aliasing;

Output 11.8 An Analysis of Fractional Factorial Data Using 0-1 Coding

Source DF Type I SS Mean Square F Value Pr > F
 
a 1 2.07061250 2.07061250 . .
b 1 0.59951250 0.59951250 . .
a*b 1 0.00031250 0.00031250 . .
c 1 0.00551250 0.00551250 . .
a*c 1 0.80011250 0.80011250 . .
b*c 1 2.82031250 2.82031250 . .
a*b*c 1 0.05951250 0.05951250 . .
d 0 0.00000000  .         . .
a*d 0 0.00000000  .         . .
b*d 0 0.00000000  .         . .
a*b*d 0 0.00000000  .         . .
c*d 0 0.00000000  .         . .
a*c*d 0 0.00000000  .         . .
b*c*d 0 0.00000000  .         . .
a*b*c*d 0 0.00000000  .         . .
 
Parameter Estimate Standard
Error
t Value Pr > |t|
 
Intercept  2.290000000   . . .
a  1.490000000 B . . .
b -0.800000000 B . . .
a*b  0.320000000 B . . .
c -0.780000000 B . . .
a*c -0.920000000 B . . .
b*c  2.720000000 B . . .
a*b*c -0.690000000 B . . .
d  0.000000000 B . . .
a*d  0.000000000 B . . .
b*d  0.000000000 B . . .
a*b*d  0.000000000 B . . .
c*d  0.000000000 B . . .
a*c*d  0.000000000 B . . .
b*c*d  0.000000000 B . . .
a*b*c*d  0.000000000 B . . .
 
Expected Value
 
Intercept
a + d + a*d
b + d + b*d
a*b - 2*d - a*d - b*d
c + d + c*d
a*c - 2*d - a*d - c*d
b*c - 2*d - b*d - c*d
a*b*c + 4*d + 2*a*d + 2*b*d + a*b*d + 2*c*d + a*c*d + b*c*d + a*b*c*d
 

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

You can see that the sums of squares are the same as those computed from the contrast coding (–1,1). The parameter estimates are different, as you would expect, because the different coding changes the intercept and hence the other coefficients. The aliasing pattern shown in the “Expected Value” of the parameter estimates is also quite different. This reflects the fact that the (0,1) coding results in a different set of estimable functions. As shown in the theory section of Chapter 6, GLM determines estimable functions from the nonzero rows of the (X′X)(X′X) matrix. The contrast coding results in estimable functions in standard form for assessing aliasing patterns in incomplete factorials. On the other hand, the (0,1) coding results in a different, and unfamiliar, form.

11.3 A Balanced Incomplete-Blocks Design

Incomplete-blocks designs are used whenever there are not enough experimental units in blocks to accommodate all treatments. Perhaps the best-known incomplete-blocks design is the so-called balanced incomplete-blocks (BIB) design. This design in not balanced in the sense that we have used the word in previous chapters, because, in fact, not all treatments are assigned in all blocks. Instead, balance in the context of incomplete-blocks designs has the specific definition that all treatments appear in the same number of blocks, and all pairs of treatments appear together in the same number of blocks. These requirements result in certain conditions on the numbers of blocks, treatments, and numbers of treatments per block. For the BIB design with four treatments in blocks of size two, six blocks (three replications) are required (Cochran and Cox 1957). The data appear in Output 11.9. The design is shown below:

BIB Design Example Data
(Numbers in parentheses indicate treatment number)

 

Block

1

2

3

4

5

6

2.7(1)

7.1(3)

7.1(1)

8.8(2)

9.7(1)

13.0(2)

2.7(2)

8.6(4)

9.7(3)

15.1(4)

17.4(4)

16.6(3)

Output 11.9 Data for a Balanced Incomplete- Blocks Design

Obs      blk     trt y
 
1      1     1 1.2
2      1     2 2.7
3      2     3 7.1
4      2     4 8.6
5      3     1 7.1
6      3     3 9.7
7      4     2 8.8
8      4     4 15.1
9      5     1 9.7
10      5     4 17.4
11      6     2 13.0
12      6     3 16.6

Consider the following statements:

proc glm;
   class blk trt;
   model y=trt blk / e1 ss3;
   means trt blk;
   lsmeans trt / stderr pdiff cl;
run;

The analysis-of-variance portion appears in Output 11.10.

Output 11.10 ANOVA for a Balanced Incomplete-Blocks Design

The GLM Procedure
 
Source DF Sum of
Squares
Mean Square F Value Pr > F
 
Model 8 281.1275000 35.1409375 40.82 0.0056
Error 3 2.5825000 0.8608333    
Corrected Total 11 283.7100000      
 
Source DF Type I SS Mean Square F Value Pr > F
 
trt 3 102.2566667 34.0855556 39.60 0.0065
blk 5 178.8708333 35.7741667 41.56 0.0057
 
Source DF Type III SS Mean Square F Value Pr > F
 
trt 3 59.0175000 19.6725000 22.85 0.0144
blk 5 178.8708333 35.7741667 41.56 0.0057
 
Least Squares Means
 
trt y LSMEAN Standard
 Error  
95% Confidence Limits
 
1 6.8000000 0.6281310 4.801007 8.798993
2 7.6500000 0.6281310 5.651007 9.648993
3 10.9250000 0.6281310 8.926007 12.923993
4 13.6250000 0.6281310 11.626007 15.623993
 
i j Difference
Between
Means
  95% Confidence Limits for LSMean(i)-LSMean(j)  
 
1 2 -0.850000 -3.802709 2.102709
1 3 -4.125000 -7.077709 -1.172291
1 4 -6.825000 -9.777709 -3.872291
2 3 -3.275000 -6.227709 -0.322291
2 4 -5.975000 -8.927709 -3.022291
3 4 -2.700000 -5.652709 0.252709

The Type I sum of squares is the unadjusted treatment sum of squares, based on the ordinary treatment means. Therefore, the unadjusted treatment sum of squares contains both treatment differences and block differences. The Type III treatment sum of squares is adjusted for blocks. This means that block effects have been removed from the sum of squares. Thus, the adjusted treatment mean square measures only differences between treatment means and random error. These concepts are revealed in the estimable functions. Table 11.1 shows the Type I estimable functions.

Table 11.1 Type I Estimable Functions for Treatments

Effect

 

Symbolic
Expression

Coefficients
for TRT1 Effect

TRT

1

L2

+.75

 

2

L3

–.25

 

3

L4

–.25

 

4

–L2–L3–L4

–.25

BLK

1

.333L2 +.333L3

.167

 

2

.333L2 +.333L3

–.167

 

3

.333L2 +.333L4

–.167

 

4

.333L2 +.333L3

–.167

 

5

.333L3 +.333L4

–.167

 

6

.333L3 +.333L4

–.167

The Type I estimable function for treatments (TRT) is of some interest. Consider the contrast

TRT1 − 1/4(TRT1 + TRT2 + TRT3 + TRT4)

This is often called the effect of treatment 1, or the difference between the treatment 1 mean from the mean of all treatments. Simplification gives

3/4(TRT1) − 1/4(TRT2) − 1/4 (TRT3) − 1/4(TRT4)

This expression is obtained by defining

L2 = 3/4

L3 = 1/4

L4 = 1/4

and results in the coefficients that appear in the right-hand column of Table 11.1. You can see that the Type I (unadjusted) estimate of the TRT 1 effect is also a contrast between blocks 1, 3, and 5, which contain treatment 1, and blocks 2, 4, and 6, which do not.

The least-squares means (see Output 11.10) have been “adjusted” for block effects. The corresponding estimable functions (not reproduced here) show that the LS means contain equal representation of block parameters even though individual treatments do not appear in all the blocks. Differences between LS means provide the so-called intra-block comparisons of treatments. There is information about differences between the treatment means contained in the block means that is not used in the intra-block comparisons. This is called the inter-block information.

Expected mean squares from the RANDOM statement reveal the presence of block effects in the Type I mean squares, but not in the Type III mean squares, as shown in Output 11.11. The Type I EMS for TRT contains VAR(BLK), but the Type III EMS does not.

Output 11.11 Expected Mean Squares for a Balanced Incomplete-Blocks Design

The GLM Procedure

Source Type I Expected Mean Square
 
trt Var(Error) + 0.6667 Var(blk) + Q(trt)
 
blk Var(Error) + 1.6 Var(blk)
 
Source Type III Expected Mean Square
 
trt Var(Error) + Q(trt)
 
blk Var(Error) + 1.6 Var(blk)

The MIXED procedure can be used to obtain the combined inter- and intra-block information about differences between treatment means. Run the following statements:

proc mixed data=bibd;
   class blk trt;
   model y=trt / ddfm=satterth;
   random blk;
   lsmeans trt / pdiff cl;
run;

The results appear in Output 11.12.

Output 11.12 A Mixed-Model Analysis of a Balanced Incomplete-Blocks Design

The Mixed Procedure

 

Covariance Parameter
Estimates

 

Cov Parm Estimate
 
blk 17.8543
Residual 0.8518
 

Type 3 Tests of Fixed Effects

 

Effect Num
DF
Den
DF
F Value Pr > F
 
trt 3 3.13 23.46 0.0121
 

Least Squares Means

Effect trt Estimate Standard
Error
DF t Value Pr > |t| Alpha Lower Upper
 
trt 1 6.7724 1.8337 5.96 3.69 0.0103 0.05 2.2773 11.2674
trt 2 7.6678 1.8337 5.96 4.18 0.0059 0.05 3.1728 12.1629
trt 3 10.9322 1.8337 5.96 5.96 0.0010 0.05 6.4371 15.4273
trt 4 13.6276 1.8337 5.96 7.43 0.0003 0.05 9.1325 18.1227
 

Differences of Least Squares Means

 
Effect trt _trt Estimate Standard
Error
DF t Value Pr > |t| Alpha Lower Upper
trt 1 2 -0.8955 0.9176 3.13 -0.98 0.3983 0.05 -3.7462 1.9552
trt 1 3 -4.1598 0.9176 3.13 -4.53 0.0183 0.05 -7.0105 -1.3092
trt 1 4 -6.8552 0.9176 3.13 -7.47 0.0043 0.05 -9.7059 -4.0045
trt 2 3 -3.2643 0.9176 3.13 -3.56 0.0353 0.05 -6.1150 -0.4137
trt 2 4 -5.9597 0.9176 3.13 -6.49 0.0065 0.05 -8.8104 -3.1091
trt 3 4 -2.6954 0.9176 3.13 -2.94 0.0574 0.05 -5.5461 0.1553

You can see the distinction between the intra-block and the combined inter- and intra-block comparisons of treatments by comparing results in Output 11.10 and Output 11.12. First of all, the TRT LSMEANS are slightly different in the two output tables. Also, the confidence for the difference between TRT 1 and TRT 2 in Output 11.10 is (–3.802709, 2.102709), whereas the confidence interval in Output 11.12 is (–3.7462, 1.9552). The confidence interval using the combined information in Output 11.12 is slightly narrower. However, this can be misleading. The standard error in Output 11.12 does not take into account the variation induced by estimating the variance-covariance matrix to obtain the estimated GLS estimates of differences between treatment means. If you use DDFM=KENWARDROGER in the MODEL statement you will get a better assessment of the true error of estimation.

11.4 A Crossover Design with Residual Effects

Crossover designs are used in animal nutrition and pharmaceutical studies to compare two or more treatments (diets or drugs). The treatments are administered sequentially to each subject over a set of time periods. This enables the comparison of treatments on a within-subjects basis. However, there is a possibility that the response obtained after a particular time period might be influenced by the treatment assigned not only in that period but also in previous periods. If so, then the response contains residual effects from the previous periods. Some authors call these “carry-over” effects. Certain crossover designs permit the residual effects to be estimated, and thus to be effectively removed from estimates of treatment means and comparisons of means.

Cochran and Cox (1957) present two 3×3 Latin squares as a design for estimating the residual effects on milk yields of treatment from the preceding period. The treatment allocation is shown in the table below. The columns of the two squares contain the six possible sequences.

Square

Cow

 



I

1

II



III



IV

2

V



VI

 

1

A

B

C

A

B

C

Period

2

B

C

A

C

A

B

 

3

C

A

B

B

C

A

Output 11.13 contains data from a study that was conducted to compare the effects on heart rate of three treatments; a test drug, a standard drug, and a placebo. Treatments were assigned in the six possible sequences to four patients each. The treatment design for the data in Output 11.13 is equivalent to the Cochran and Cox design in the table above with sequences A-F in Output 11.13 corresponding to Cows I-VI in the table, respectively.

Heart rate was measured one hour following the administration of treatment in each of three visits. The visits are labeled 2, 3, and 4, because visit 1 was a preliminary visit for baseline data. Thus, in the general terminology of crossover designs, period 1 is visit 2, period 2 is visit 3, and period 3 is visit 4. Baseline heart rate was measured, but it is not used in the illustrative analysis.

A model for the data is

yijk = μ + αi + dj + βk + τl(ik) + ρm(ik) + eijk

where αi is the effect of sequence i, dj is the random effect of patient j, βk is the effect of visit k, τl(ik) is the direct effect of treatment l, τl(ik) is the residual effect of treatment m, and eijk is a random effect associated with patient j in visit k. The subscript l(jk) on the treatment direct effect indicates that the treatment (l) is a function of the visit (k) and sequence (i). The same is true of the treatment residual effect subscript m(ik).

When using PROC GLM to analyze data from a crossover design, the sequence, patient, period, and direct treatment effects can be incorporated into the model with the dummy variables that result from using a CLASS statement. However, it is more convenient to use explicitly created covariates in the model for the residual effects. In the data set for the heartrate data, we create covariates for the standard and test drug residual effects named RESIDS and RESIDT, respectively. Their values in the first period (visit 2) are zero because there is no period prior to the first period that would contribute a residual effect. In periods 2 and 3 (visits 3 and 4), the values of RESIDS and RESIDT are 0 or ±1 depending on the treatment in the preceding visit. This particular coding provides estimates of the residual effects corresponding to those prescribed by Cochran and Cox (1957). For example, patient number 2 is in sequence F (test, placebo, standard). The values of RESIDS and RESIDT are both 0 in the first period (visit 2). Patient 2 received the test drug in period 1, so in period 2 (visit 3), the covariates have values RESIDS=0 and REISIDT=1. This specifies that the residual effect ρT for test is contained in the observation on patient 2 in period 2. In period 3 (visit 4), the covariates both have values of –1. This coding specifies a sum-to-zero constraint on the residual effects. Thus, the residual effect ρP of the placebo satisfies the equation ρP = −ρT − ρS, and hence the residual effect ρP of the placebo can be represented with –1 times the residual effects of test and standard.

Output 11.13 Date for Crossover Design with Residual Effects

PATIENT   SEQUENCE VISIT   BASEHR HR     DRUG    RESIDT    RESIDS
 
1 B 2 86 86     placebo     0     0
1 B 3 86 106     test    -1    -1
1 B 4 62 79     standard     1     0
2 F 2 48 66     test     0     0
2 F 3 58 56     placebo     1     0
2 F 4 74 79     standard    -1    -1
3 B 2 78 84     placebo     0     0
3 B 3 78 76     test    -1    -1
3 B 4 82 91     standard     1     0
4 D 2 66 79     standard     0     0
4 D 3 72 100     test     0     1
4 D 4 90 82     placebo     1     0
5 C 2 74 74     test     0     0
5 C 3 90 71     standard     1     0
5 C 4 66 62     placebo     0     1
6 B 2 62 64     placebo     0     0
6 B 3 74 90     test    -1    -1
6 B 4 58 85     standard     1     0
7 A 2 94 75     standard     0     0
7 A 3 72 82     placebo     0     1
7 A 4 100 102     test    -1    -1
8 A 2 54 63     standard     0     0
8 A 3 54 58     placebo     0     1
8 A 4 66 62     test    -1    -1
9 D 2 82 91     standard     0     0
9 D 3 96 86     test     0     1
9 D 4 78 88     placebo     1     0
10 C 2 86 82     test     0     0
10 C 3 70 71     standard     1     0
10 C 4 58 62     placebo     0     1
11 F 2 82 80     test     0     0
11 F 3 80 78     placebo     1     0
11 F 4 72 75     standard    -1    -1
12 E 2 96 90     placebo     0     0
12 E 3 92 93     standard    -1    -1
12 E 4 82 88     test     0     1
13 D 2 78 87     standard     0     0
13 D 3 72 80     test     0     1
13 D 4 76 78     placebo     1     0
14 F 2 98 86     test     0     0
14 F 3 86 86     placebo     1     0
14 F 4 70 79     standard    -1    -1
15 A 2 86 71     standard     0     0
15 A 3 66 70     placebo     0     1
15 A 4 74 90     test    -1    -1
16 E 2 86 86     placebo     0     0
16 E 3 90 103     standard    -1    -1
16 E 4 82 86     test     0     1
17 A 2 66 83     standard     0     0
17 A 3 82 86     placebo     0     1
17 A 4 86 102     test    -1    -1
18 F 2 66 82     test     0     0
18 F 3 78 80     placebo     1     0
18 F 4 74 95     standard    -1    -1
19 E 2 74 80     placebo     0     0
19 E 3 78 79     standard    -1    -1
19 E 4 70 74     test     0     1
20 B 2 66 70     placebo     0     0
20 B 3 74 62     test    -1    -1
20 B 4 62 67     standard     1     0
21 C 2 82 90     test     0     0
21 C 3 90 103     standard     1     0
21 C 4 76 82     placebo     0     1
22 C 2 82 82     test     0     0
22 C 3 66 83     standard     1     0
22 C 4 90 82     placebo     0     1
23 E 2 82 66     placebo     0     0
23 E 3 74 87     standard    -1    -1
23 E 4 82 82     test     0     1
24 D 2 72 75     standard     0     0
24 D 3 82 86     test     0     1
24 D 4 74 82     placebo     1     0

The following SAS statements can be used to construct an analysis of variance and parameter estimates similar to those proposed by Cochran and Cox (1957):

proc glm data=hrtrate;
   class sequence patient visit drug;
   model hr = sequence patient(sequence) visit drug
      resids residt / solution;
   random patient(sequence)
run;

ANOVA results appear in Output 11.14.

Output 11.14 ANOVA for a Crossover Design

Source DF Sum of
Squares
Mean Square F Value Pr > F
 
Model 29 6408.694444 220.989464 3.91 <.0001
Error 42 2372.583333 56.490079    
Corrected Total 71 8781.277778      
 
R-Square Coeff Var Root MSE HR Mean
0.729813 9.301326 7.515988 80.80556
Source DF Type I SS Mean Square F Value Pr > F
 
SEQUENCE 5 508.944444 101.788889 1.80 0.1333
PATIENT(SEQUENCE) 18 4692.333333 260.685185 4.61 <.0001
VISIT 2 146.777778 73.388889 1.30 0.2835
DRUG 2 668.777778 334.388889 5.92 0.0054
resids 1 391.020833 391.020833 6.92 0.0119
residt 1 0.840278 0.840278 0.01 0.9035
 
Source DF Type III SS Mean Square F Value Pr > F
 
SEQUENCE 5 701.183333 140.236667 2.48 0.0466
PATIENT(SEQUENCE) 18 4692.333333 260.685185 4.61 <.0001
VISIT 2 146.777778 73.388889 1.30 0.2835
DRUG 2 343.950000 171.975000 3.04 0.0583
resids 1 309.173611 309.173611 5.47 0.0241
residt 1 0.840278 0.840278 0.01 0.9035

The desired ANOVA table is constructed as follows:

Source of Variation

DF

SS

 

Sequence

5

508.94

(Type I)
Patient(Sequence)

18

4692.33

(Type I)
Visits

2

146.78

(Type III)
Direct effect of drugs (adjusted for residual effects)

2

343.95

(Type III)
 
Residual effects(adjusted)

2

391.86

(Type I SS RESIDS+Type I SS RESIDT)

Expected mean squares shown in Output 11.15 show that appropriate tests for VISIT, DRUG, and the carry-over effect covariates utilize residual means square as an error term. A test for SEQUENCE would use PATIENT(SEQUENCE) in the error term.

Output 11.15 Expected Mean Squares for a Crossover Design

Source Type III Expected Mean Square
 
SEQUENCE Var(Error) + 2.76 Var(PATIENT(SEQUENCE)) + Q(SEQUENCE)
 
PATIENT(SEQUENCE) Var(Error) + 3 Var(PATIENT(SEQUENCE))
 
VISIT Var(Error) + Q(VISIT)
 
DRUG Var(Error) + Q(DRUG)
 
resids Var(Error) + Q(resids)
 
residt Var(Error) + Q(residt)

The effect of SEQUENCE is clearly not significant, since the F-ratio would be less than 1 using either a Type I or a Type III mean square in the numerator. The Type III test for DRUG has a significance level p=0.0538. The Type III mean square for DRUG has been adjusted for the residual effects. The Type I mean square for DRUG is not adjusted for the residual effects, and an F-test based on it has a significance probability p=0.0054. Thus, results from tests for DRUG depend on whether residual effects have been removed or not. Estimates of the direct and residual effect parameters can be obtained from Output 11.16.

Output 11.16 Parameter Estimates for a Crossover Design

Parameter   Estimate   Standard
Error
t Value Pr > |t|
 
Intercept   82.06250000 B 4.72870558 17.35 <.0001
SEQUENCE A 6.20833333 B 6.23192824 1.00 0.3249
SEQUENCE B -19.33333333 B 6.23192824 -3.15 0.0030
SEQUENCE C -0.47916667 B 6.23192824 -0.08 0.9391
SEQUENCE D -1.81250000 B 6.23192824 -0.29 0.7726
SEQUENCE E -5.79166667 B 6.23192824 -0.93 0.3580
SEQUENCE F 0.00000000 B  .           .    .    
PATIENT(SEQUENCE)   7 A -4.00000000 B 6.13677871 -0.65 0.5181
PATIENT(SEQUENCE) 8 A -29.33333333 B 6.13677871 -4.78 <.0001
PATIENT(SEQUENCE) 15 A -13.33333333 B 6.13677871 -2.17 0.0355
PATIENT(SEQUENCE) 17 A 0.00000000 B  .           .    .    
...
PATIENT(SEQUENCE) 2 F -18.66666667 B 6.13677871 -3.04 0.0040
PATIENT(SEQUENCE) 11 F -8.00000000 B 6.13677871 -1.30 0.1995
PATIENT(SEQUENCE) 14 F -2.00000000 B 6.13677871 -0.33 0.7461
PATIENT(SEQUENCE) 18 F 0.00000000 B  .           .    .    
VISIT 2 -2.58333333 B 2.16967892 -1.19 0.2405
VISIT 3 0.75000000 B 2.16967892 0.35 0.7313
VISIT 4 0.00000000 B  .           .    .    
DRUG standard 2.31250000 B 2.42577478 0.95 0.3459
DRUG test 5.93750000 B 2.42577478 2.45 0.0186
DRUG placebo 0.00000000 B  .           .    .    
resids   -4.39583333   1.87899706 -2.34 0.0241
residt   0.22916667   1.87899706 0.12 0.9035

First of all, the residual effects presented by Cochran and Cox (1957) are obtained from the parameter estimates for RESIDS and RESIDT. The values are

STD: –4.396

TST: 0.229

PCB: – (–4.396 + 0.229) = 4.167

Notice that these estimates come from the sum-to-zero coding for the residual effect dummy variables.

The direct treatment effects reported by Cochran and Cox (1957) can be obtained from the TRTMENT parameter estimates according to the following equations:

STD: –0.4375 =      2.3125 (1/3)(2.3125 + 5.9375 + 0.0000)

TST: 3.1875 =        5.9375 (1/3)(2.3125 + 5.9375 + 0.0000)

PCB: –2.7500 =      0.000 (1/3)(2.3125 + 5.9375 + 0.0000)

Thus, the direct effects can be obtained from the following ESTIMATE statements:

estimate 'DIRECT EFFECT OF STD'
          drug 2 –1 –1 / divisor=3;
estimate 'DIRECT EFFECT OF TST'
          drug –1 2 –1 / divisor=3;
estimate 'DIRECT EFFECT OF PCB'
          drug –1 –1 2 / divisor=3;

Results from these ESTIMATE statements appear in Output 11.17.

Output 11.17 Direct Effect Estimates

The GLM Procedure

 

Parameter Estimate Standard
Error
t Value Pr > |t|
 
DIRECT EFFECT OF STD -0.43750000 1.40052172 -0.31 0.7563
DIRECT EFFECT OF TST 3.18750000 1.40052172 2.28 0.0280
DIRECT EFFECT OF PCB -2.75000000 1.40052172 -1.96 0.0562

The direct effect means reported by Cochran and Cox (1957) are equal to the overall mean 80.8056 (printed as HR mean in Output 11.14) added to the direct effects. They are also equal to the GLM least-squares means, obtained from the following statement:

lsmeans drug / pdiff cl e;

The results appear in Output 11.18. You can see from the estimable functions that the LS means contain the INTERCEPT, and average across the SEQUENCE, PATIENT(SEQUENCE), and VISIT parameters. Thus, the correct standard error of these LS means would contain variance due to PATIENT(SEQUENCE). However, this variance is not contained in the standard error computed by PROC GLM for the LS means. (That is why we specified the STDERR option in the LSMEANS statement.) As a consequence, the confidence intervals for LS means displayed in Output 11.18 are not valid. However, the confidence intervals for the differences between LS means in Output 11.18 are valid because the INTERCEPT, SEQUENCE, PATIENT(SEQUENCE), and VISIT parameters would drop out of the differences.

Output 11.18 Least-Squares Means for a Crossover Design

Least Squares Means

 

Coefficients for DRUG Least Square Means

 

Effect   DRUG Level standard test placebo
 
Intercept   1 1 1
SEQUENCE A 0.16666667 0.16666667 0.16666667
...
SEQUENCE F 0.16666667 0.16666667 0.16666667
PATIENT(SEQUENCE) 7 A 0.04166667 0.04166667 0.04166667
PATIENT(SEQUENCE) 8 A 0.04166667 0.04166667 0.04166667
PATIENT(SEQUENCE) 16 A 0.04166667 0.04166667 0.04166667
PATIENT(SEQUENCE) 18 A 0.04166667 0.04166667 0.04166667
...
PATIENT(SEQUENCE) 12 F 0.04166667 0.04166667 0.04166667
PATIENT(SEQUENCE) 17 F 0.04166667 0.04166667 0.04166667
PATIENT(SEQUENCE) 20 F 0.04166667 0.04166667 0.04166667
PATIENT(SEQUENCE) 24 F 0.04166667 0.04166667 0.04166667
VISIT 2 0.33333333 0.33333333 0.33333333
VISIT 3 0.33333333 0.33333333 0.33333333
VISIT 4 0.33333333 0.33333333 0.33333333
DRUG standard 1 0 0
DRUG test 0 1 0
DRUG placebo 0 0 1
resids   0 0 0
residt   0 0 0

 

Least Squares Means for Effect DRUG

 

DRUG HR LSMEAN 95% Confidence Limits
 
standard 80.368056 77.023853 83.712258
test 83.993056 80.648853 87.337258
placebo 78.055556 74.711353 81.399758
 
i j Difference
Between
Means
95% Confidence Limits for LSMean(i)-LSMean(j)
 
1 2 -3.625000 -8.520412 1.270412
1 3 2.312500 -2.582912 7.207912
2 3 5.937500 1.042088 10.832912

PROC MIXED can be used to analyze the crossover design data. Run the following statements:

proc mixed data=hrtrate order=internal;
class sequence patient visit drug;
model hr=sequence visit drug resides residt/solution
ddfm=satterth;
random patient(sequence);
lsmeans drug / pdiff cl e;
run;

Edited results appear in Output 11.19.

Output 11.19 Partial Mixed-Model Results for a Crossover Design

The Mixed Procedure

 

Covariance Parameter Estimates

 

Cov Parm Estimate
 
PATIENT(SEQUENCE) 68.0650
Residual 56.4901

 

Type 3 Tests of Fixed Effects

Effect Num
DF
Den
DF
F Value Pr > F
 
SEQUENCE 5 18.7 0.58 0.7165
VISIT 2 42 1.30 0.2835
DRUG 2 42 3.04 0.0583
resids 1 42 5.47 0.0241
residt 1 42 0.01 0.9035

 

Least Squares Means

 

Effect DRUG Estimate Standard
Error
DF t Value Pr > |t| Alpha Lower Upper
 
DRUG standard 80.3681 2.3626 38 34.02 <.0001 0.05 75.5852 85.1510
DRUG test 83.9931 2.3626 38 35.55 <.0001 0.05 79.2102 88.7760
DRUG placebo 78.0556 2.3626 38 33.04 <.0001 0.05 73.2727 82.8385

 

Differences of Least Squares Means

 

Effect DRUG _DRUG Estimate Standard
Error
DF t Value Pr > |t| Alpha
DRUG standard test -3.6250 2.4258 42 -1.49 0.1426 0.05
DRUG standard placebo 2.3125 2.4258 42 0.95 0.3459 0.05
DRUG test placebo 5.9375 2.4258 42 2.45 0.0186 0.05

 

Differences of Least Squares Means

 

Effect DRUG _DRUG Lower Upper
 
DRUG standard test -8.5204 1.2704
DRUG standard placebo -2.5829 7.2079
DRUG test placebo 1.0421 10.8329

The test of significance for DRUG in “Type 3 Tests of Fixed Effects” in Output 11.19 is the same as the test from GLM in Output 11.15. Likewise, the least-squares means are equal in the two analyses. This illustrates that ordinary least-squares analyses, as performed by GLM, can be equivalent to generalized least-squares analyses, as performed by MIXED. The phenomenon occurs in this example because the within-patients effects are orthogonal to the between-patients effects. However, notice that the confidence intervals for differences between LS means are the same in Outputs 11.18 and 11.19, but the confidence intervals for the LS means themselves are wider in Output 11.19 than in Output 11.18 because PROC MIXED computes standard errors of LS means that incorporate the PATIENT(SEQUENCE) variance.

11.5 Models for Experiments with Qualitative and Quantitative Variables

The material in this section is related to the discussions of regression analysis in Chapter 2 and analysis of covariance in Chapter 7. This section concerns details of certain models that contain dummy variables generated from the CLASS statement, and also a continuous variable. These are several regression models in one equation. Of particular interest are cases for which the regressions have a common intercept. These types of models are frequently used, for example, in relative potency and relative bioavailability studies (Littell et al. 1997).

Many experiments involve both qualitative and quantitative factors. For example, the tensile strength (TS) of a monofilament fiber depends on the amount (AMT) of a chemical used in the manufacturing process. This chemical can be obtained from three different sources (SOURCE), with values A, B, or C. SOURCE is a qualitative variable and AMT is a quantitative variable. Measurements of TS were obtained from samples from different amounts and sources. The SAS data set named MONOFIL appears in Output 11.20.

Output 11.20 Data for an Experiment with Qualitative and Quantitative Variables

Obs SOURCE AMT TS
 
1 A 1 11.5
2 A 2 13.8
3 A 3 14.4
4 A 4 16.8
5 A 5 18.7
6 B 1 10.8
7 B 2 12.3
8 B 3 13.7
9 B 4 14.2
10 B 5 16.6
11 C 1 13.1
12 C 2 16.2
13 C 3 19.0
14 C 4 22.9
15 C 5 26.5

A simple linear regression model for each source relates to TS and AMT:

TS = αA + βA + ε (SOURCE A)

TS = αB + βB + ε (SOURCE B)

TS = αC + βC + ε (SOURCE C)

The parameters αA and βA are the intercept and slope, respectively, for SOURCE=A.

The following statements produce the analysis of variance and parameter estimates in Output 11.21.

proc glm data=monofil;
class source;
model ts=source amt source*amt / solution;
run;

Output 11.21 A Model with Main Effects and Interactions

The GLM Procedure

 

    Dependent Variable: ts

 
Source DF Sum of
Squares
Mean Square F Value Pr > F
 
Model 5 258.7273333 51.7454667 263.71 <.0001
 
Error 9 1.7660000 0.1962222    
 
Corrected Total 14 260.4933333      
 
R-Square Coeff Var Root MSE ts Mean
 
0.993221 2.762805 0.442970 16.03333
 
Source DF Type I SS Mean Square F Value Pr > F
 
Source 2 98.0013333 49.0006667 249.72 <.0001
amt 1 138.2453333 138.2453333 704.53 <.0001
amt*source 2 22.4806667 11.2403333 57.28 <.0001
 
Source DF Type III SS Mean Square F Value Pr > F
 
Source 2 0.0702424 0.0351212 0.18 0.8390
amt 1 138.2453333 138.2453333 704.53 <.0001
amt*source 2 22.4806667 11.2403333 57.28 <.0001
 
Parameter Estimate Standard
Error
t Value Pr > |t|
 
Intercept 9.490000000 B 0.46459062 20.43 <.0001
source     A 0.330000000 B 0.65703036 0.50 0.6275
source     B -0.020000000 B 0.65703036 -0.03 0.9764
source     C 0.000000000 B  .           .    .    
amt 3.350000000 B 0.14007934 23.92 <.0001
amt*source A -1.610000000 B 0.19810211 -8.13 <.0001
amt*source B -2.000000000 B 0.19810211 -10.10 <.0001
amt*source C 0.000000000 B  .           .    .    

 

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

These parameter estimates pertain to the integrated model

TS=αC+αADA+αBDB+βCAMT+βADAAMT+βBDBAMT+ε

The parameters α′ and β′ are further defined as

α'A = αA - αC α'B = αB - αC
β'A = βA - βC β'B = βB - βC

The variable DA is a dummy variable equal to 1 for SOURCE=A and equal to 0 otherwise, and DB has a corresponding definition with respect to SOURCE=B. Thus, the regression models for the three nitrogen sources are

TS = (αC + α'A) + (βC + β'A) AMT + ε (SOURCE A)
TS = (αC + α'B) + (βC + β'B) AMT + ε (SOURCE B)
TS = αC + αCAMT + ε                          (SOURCE C)

Therefore, the fitted equations are

TS = 9.49 + 0.33 + (3.35 - 1.61) AMT (SOURCE A)
      = 9.82 + 1.74 AMT
TS = 9.49 - 0.02 + (3.35 - 2.00) AMT (SOURCE B)
      = 9.47 + 1.35 AMT
TS = 9.49 + 3.35 AMT                         (SOURCE C)

The GLM parameter estimates, in effect, treat the regression line for SOURCE=C as a reference line, and the parameters α′A, α′B, β′A, and β′B are parameters for lines A and B minus parameters for line C. The AMT source parameters β′A and β′B measure differences between the slopes for regression lines A and B, and line C, respectively. Thus, a test that these parameters are 0 is testing that the lines are parallel, that is, they have equal slopes. The appropriate statistic is the F=57.28 for the AMT*SOURCE effect, which has a significant probability p=0.0001.

Caution is advised in using the Type III F-test for SOURCE. It is a test of the equality of the intercepts (H0: αA = αB = αC), which probably has no practical interpretation because the intercepts are simply extrapolations of the lines to L=0. The Type I F-test, on the other hand, tests the equality of the midpoints of the regression lines (H0: αA + βA (2) = αB + βB(2) = αC + βC(2)).

You can compare two sources at a given amount with an ESTIMATE statement. Suppose you want to compare SOURCE=A with SOURCE=B using AMT=3.5. This difference is

A + βA(3.5)) - (αB + βB(3.5))
=((αC + α'A) + (βC + β'C) 3.5)
= ((αC + α'B) + (βC + β'C) 3.5)
= α'A + α'B + (β'A + β'B) 3.5

So the appropriate ESTIMATE statement is

estimate 'A vs B at AMT=3.5'
source 1 -1 0
   source*amt 3.5 -3.5 0;

The results appear in Output 11.22.

Output 11.22 The Difference between SOURCE=A and SOURCE=B at AMT=3.5

The GLM Procedure

 

 Dependent Variable: ts

Parameter Estimate Standard
Error
t Value Pr > |t|
 
A vs B at AMT=3.5 1.71500000 0.29715316 5.77 0.0003

Suppose TS also is measured for AMT=0. This variation of the experiment is commonly mishandled by data analysts. Since AMT=0 means there is no chemical, the intercepts for the models are all equal, αA = αB = αC. Thus, a correct analysis should provide equal estimates of the intercepts. The regressions can be written simultaneously as

TS = α + γADAAMT + γBDBAMT + γCDCAMT + ε

where DA is a dummy variable equal to 1 for SOURCE=A and equal to 0 otherwise, and DB and DC have corresponding definitions with respect to SOURCE=B and SOURCE=C. Use PROC GLM to create DA, DB, and DC by including the SOURCE variable in a CLASS statement.

Look at the data set MONOFIL2 printed in Output 11.23. The value C is arbitrarily assigned to SOURCE when AMT=0.

Output 11.23 Data with AMT=0

Qual and Quant Variables

 

Obs source      amt ts
 
1 A      1 11.5
2 A      2 13.8
3 A      3 14.4
4 A      4 16.8
5 A      5 18.7
6 B      1 10.8
7 B      2 12.3
8 B      3 13.7
9 B      4 14.2
10 B      5 16.6
11 C      1 13.1
12 C      2 16.2
13 C      3 19.0
14 C      4 22.9
15 C      5 26.5
16 C      0 10.1
17 C      0 10.2
18 C      0 9.8
19 C      0 9.9
20 C      0 10.2

The following statements produce Output 11.24:

proc glm;
   class source;
   model ts=amt*source / solution

Output 11.24 Parameter Estimates for Data with AMT=0

The GLM Procedure

 

    Dependent Variable: ts

 
Source DF Sum of
Squares
Mean Square F Value Pr > F
 
Model 3 393.0051791 131.0017264 903.34 <.0001
 
Error 16 2.3203209 0.1450201    
 
Corrected Total 19 395.3255000      
 
R-Square Coeff Var Root MSE ts Mean
 
0.994131 2.619986 0.380815 14.53500
 
Source DF Type I SS Mean Square F Value Pr > F
 
amt*source 3 393.0051791 131.0017264 903.34 <.0001
 
 
Source DF Type III SS Mean Square F Value Pr > F
 
amt*source 3 393.0051791 131.0017264 903.34 <.0001
Parameter Estimate Standard
Error
t Value Pr > |t|
 
Intercept 9.882352941 0.13699380 72.14 <.0001
amt*source A 1.722994652 0.06350310 27.13 <.0001
amt*source B 1.237540107 0.06350310 19.49 <.0001
amt*source C 3.242994652 0.06350310 51.07 <.0001

Parameter estimates in Output 11.24 yield the three prediction equations

TS = 9.88 + 1.72 AMT      (SOURCE A)
TS = 9.88 + 1.24 AMT      (SOURCE B)
TS = 9.88 + 3.24 AMT      (SOURCE C)

The relative effect of one source to another can be measured by the ratio of slopes of the regression parameters. For example, the strength of SOURCE B relative to SOURCE A is the ratio 1.24/1.72 = 0.72. This means that one unit of the chemical from SOURCE B has the same effect on tensile strength as .72 units of the chemical from SOURCE A.

Similar models are used in other types of applications. The potency of one drug relative to another in a drug study, or the bioavailability of one nutrient relative to another in a nutrition study, is measured in the same way.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.112.69