Chapter 11 Examples of Special Applications
11.2 Confounding in a Factorial Experiment
11.2.1 Confounding with Blocks
11.2.2 A Fractional Factorial Example
11.3 A Balanced Incomplete-Blocks Design
11.4 A Crossover Design with Residual Effects
11.5 Models for Experiments with Qualitative and Quantitative Variables
11.7 An Unbalanced Nested Structure
11.8 An Analysis of Multi-Location Data
11.8.1 An Analysis Assuming No Location×Treatment Interaction
11.8.2 A Fixed-Location Analysis with an Interaction
11.8.3 A Random-Location Analysis
11.8.4 Further Analysis of a Location×Treatment Interaction Using a Location Index
11.9 Absorbing Nesting Effects
As already noted, the GLM and MIXED procedures can be used to analyze a multitude of data structures. In this chapter several applications are presented that utilize tools discussed in the previous chapters. Some of these applications involve statistical topics that are not discussed in great detail in this book. References are given to provide the necessary background information.
Experiments use confounding in two forms. The first are factorial treatments designs in which all factorial combinations appear in the experiment, but they appear in incomplete blocks containing only a subset of the factor combinations. Thus, within a given block, one or more treatment effects are confounded with block effects. The second are fractional factorial experiments in which only a subset of the factor combinations appear in the experiment. Thus, some of the factorial effects are not estimable, but are aliased with other effects, meaning that the same estimable function estimates both effects. Confounding is covered in most textbooks on the design of experiments (for example, Hicks and Turner 2000).
The first example for this topic is a 23 factorial with factors labeled A, B, and C in blocks of size four. There are three replications with interactions ABC, AC, and BC, confounded with blocks in replications 1, 2, and 3, respectively. These factors are thus partially confounded with blocks. The data appear in Output 11.1.
Output 11.1 Data for a Two-Cube Factorial in Blocks of Size Four
Obs | rep | blk | a | b | c | y |
1 | 1 | 1 | 1 | 1 | 1 | 3.99 |
2 | 1 | 1 | 1 | 0 | 0 | 1.14 |
3 | 1 | 1 | 0 | 1 | 0 | 1.52 |
4 | 1 | 1 | 0 | 0 | 1 | 3.33 |
5 | 1 | 2 | 1 | 1 | 0 | 2.06 |
6 | 1 | 2 | 1 | 0 | 1 | 5.58 |
7 | 1 | 2 | 0 | 1 | 1 | 2.06 |
8 | 1 | 2 | 0 | 0 | 0 | -0.17 |
9 | 2 | 1 | 1 | 1 | 1 | 3.77 |
10 | 2 | 1 | 1 | 0 | 1 | 6.69 |
11 | 2 | 1 | 0 | 1 | 0 | 2.17 |
12 | 2 | 1 | 0 | 0 | 0 | -0.01 |
13 | 2 | 2 | 1 | 1 | 0 | 2.43 |
14 | 2 | 2 | 0 | 1 | 1 | 1.22 |
15 | 2 | 2 | 1 | 0 | 0 | 0.37 |
16 | 2 | 2 | 0 | 0 | 1 | 2.06 |
17 | 3 | 1 | 1 | 1 | 1 | 4.53 |
18 | 3 | 1 | 0 | 1 | 1 | 1.90 |
19 | 3 | 1 | 1 | 0 | 0 | 1.62 |
20 | 3 | 1 | 0 | 0 | 0 | -0.70 |
21 | 3 | 2 | 1 | 1 | 0 | 1.56 |
22 | 3 | 2 | 1 | 0 | 1 | 5.99 |
23 | 3 | 2 | 0 | 1 | 0 | 1.44 |
24 | 3 | 2 | 0 | 0 | 1 | 2.42 |
Contrasts corresponding to confounded effects can be estimated only from those replications in which they are not confounded. In this example, they are estimated from only two-thirds of the data; thus their standard errors should be larger by a factor of .
The analysis using PROC GLM is straightforward. You can generate contrasts in the DATA step instead of specifying classes for treatments and using CONTRAST statements, as the following code shows:
data confound;
input rep blk a b c y;
ca= -(a=0) + (a=1);
cb= -(b=0) + (b=1);
cc= -(c=0) + (c=1);
datalines;
·
data
·
;
By sorting the data and running the analysis by REP, you can use the ALIASING option in PROC GLM to print out the confounding pattern. Use the following statements:
proc sort;
by rep;
proc glm;
by rep;
class blk;
model y=blk ca|cb|cc/solution aliasing;
The results appear in Output 11.2.
Output 11.2 Aliasing Output Showing a Confounding Pattern for a 23Factorial in Blocks of Size Four
----------- rep=1 ----------- |
Parameter | Expected Value |
Intercept | Intercept + [blk 2] - ca*cb*cc |
blk 1 | [blk 1] - [blk 2] + 2*ca*cb*cc |
blk 2 | |
ca | ca |
cb | cb |
ca*cb | ca*cb |
cc | cc |
ca*cc | ca*cc |
cb*cc | cb*cc |
ca*cb*cc |
----------- rep=2 ----------- |
Parameter | Expected Value |
Intercept | Intercept + [blk 2] - ca*cc |
blk 1 | [blk 1] - [blk 2] + 2*ca*cc |
blk 2 | |
ca | ca |
cb | cb |
ca*cb | ca*cb |
cc | cc |
ca*cc | |
cb*cc | cb*cc |
ca*cb*cc | ca*cb*cc |
----------- rep=3 ----------- |
Parameter | Expected Value |
Intercept | Intercept + [blk 2] - ca*cc |
blk 1 | [blk 1] - [blk 2] + 2*cb*cc |
blk 2 | |
ca | ca |
cb | cb |
ca*cb | ca*cb |
cc | cc |
ca*cc | ca*cc |
cb*cc | |
ca*cb*cc | ca*cb*cc |
The contents of Output 11.2 appear immediately after the parameter estimates generated by the SOLUTION option in the MODEL statement. For REP=1, you can see that the three-way interaction CA*CB*CC has a blank under “Expected Value” but the INTERCEPT and BLK 1 effects estimate their usual estimable functions plus the CA*CB*CC effect. This indicates that the ABC interaction effect is confounded with block in REP=1. Similarly, the output indicates that the AC interaction is confounded with block in REP=2, and the BC interaction is confounded with block in REP=2. Although in this example the ALIASING option merely confirms the confounding pattern stated in the introduction, it can be very useful in data sets where the confounding pattern is not obvious and needs to be investigated.
For a complete analysis of the data, combined over all replications, use the following SAS statements:
proc glm;
classes rep blk;
model y=rep blk(rep) ca|cb|cc/ solution;
The results appear in Output 11.3.
Output 11.3 ANOVA for a Two-Cube Factorial in Blocks of Size Four
The GLM Procedure | ||
Class Level Information | ||
Class | Levels | Values |
rep | 3 | 1 2 3 |
blk | 2 | 1 2 |
Number of observations 24
The GLM Procedure
Dependent Variable: y
Sum of | |||||
Source | DF | Squares | Mean Square | F Value | Pr > F |
Model | 12 | 81.74957500 | 6.81246458 | 33.60 | <.0001 |
Error | 11 | 2.23018750 | 0.20274432 | ||
Corrected Total | 23 | 83.97976250 |
R-Square | Coeff Var | Root MSE | y Mean | ||
0.973444 | 18.96878 | 0.450271 | 2.373750 | ||
Source | DF | Type I SS | Mean Square | F Value | Pr > F |
rep | 2 | 0.05092500 | 0.02546250 | 0.13 | 0.8832 |
blk(rep) | 3 | 7.43221250 | 2.47740417 | 12.22 | 0.0008 |
ca | 1 | 21.07500417 | 21.07500417 | 103.95 | <.0001 |
cb | 1 | 0.00453750 | 0.00453750 | 0.02 | 0.8838 |
ca*cb | 1 | 1.72270417 | 1.72270417 | 8.50 | 0.0141 |
cc | 1 | 37.77550417 | 37.77550417 | 186.32 | <.0001 |
ca*cc | 1 | 2.31800625 | 2.31800625 | 11.43 | 0.0061 |
cb*cc | 1 | 11.34005625 | 11.34005625 | 55.93 | <.0001 |
ca*cb*cc | 1 | 0.03062500 | 0.03062500 | 0.15 | 0.7049 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
rep | 2 | 0.05092500 | 0.02546250 | 0.13 | 0.8832 |
blk(rep) | 3 | 1.66755417 | 0.55585139 | 2.74 | 0.0938 |
ca | 1 | 21.07500417 | 21.07500417 | 103.95 | <.0001 |
cb | 1 | 0.00453750 | 0.00453750 | 0.02 | 0.8838 |
ca*cb | 1 | 1.72270417 | 1.72270417 | 8.50 | 0.0141 |
cc | 1 | 37.77550417 | 37.77550417 | 186.32 | <.0001 |
ca*cc | 1 | 2.31800625 | 2.31800625 | 11.43 | 0.0061 |
cb*cc | 1 | 11.34005625 | 11.34005625 | 55.93 | <.0001 |
ca*cb*cc | 1 | 0.03062500 | 0.03062500 | 0.15 | 0.7049 |
Standard | |||||
Parameter | Estimate | Error | t Value | Pr > |t| | |
Intercept | 2.010625000 B | 0.25170936 | 7.99 | <.0001 | |
rep 1 | 0.328125000 B | 0.35597078 | 0.92 | 0.3764 | |
rep 2 | -0.110000000 B | 0.35597078 | -0.31 | 0.7631 | |
rep 3 | 0.000000000 B | ⋅ | ⋅ | ⋅ | |
blk(rep) 1 1 | 0.200000000 B | 0.38994646 | 0.51 | 0.6182 | |
blk(rep) 2 1 | 0.000000000 B | ⋅ | ⋅ | ⋅ | |
blk(rep) 1 2 | 0.873750000 B | 0.38994646 | 2.24 | 0.0466 | |
blk(rep) 2 2 | 0.000000000 B | ⋅ | ⋅ | ⋅ | |
blk(rep) 1 3 | 0.668750000 B | 0.38994646 | 1.71 | 0.1143 | |
blk(rep) 2 3 | 0.000000000 B | ⋅ | ⋅ | ⋅ | |
ca | 0.937083333 | 0.09191126 | 10.20 | <.0001 | |
cb | 0.013750000 | 0.09191126 | 0.15 | 0.8838 | |
ca*cb | -0.267916667 | 0.09191126 | -2.91 | 0.0141 | |
cc | 1.254583333 | 0.09191126 | 13.65 | <.0001 | |
ca*cc | 0.380625000 | 0.11256785 | 3.38 | 0.0061 | |
cb*cc | -0.841875000 | 0.11256785 | -7.48 | <.0001 | |
ca*cb*cc | -0.043750000 | 0.11256785 | -0.39 | 0.7049 | |
NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter ‘B’ are not uniquely estimable.
The standard errors of the coefficients of the confounded effects (ABC, AC, and BC) are indeed larger by than the coefficients of the effects not confounded. You can verify that the sums of squares of the confounded effects, based on data from the replications in which they are not confounded, are identical to the sums of squares in Output 11.3.
The estimable functions option can be used to indicate the nature of the confounding. Requesting the Type I functions for effects in the same order as in the MODEL statement above gives the effects for BLK(REP) unadjusted for the factorial effects and reveals how the blocks are related to the factorial effects.
Output 11.4 Estimable Functions for a Two-Cube Factorial in Blocks of Size Four
Type I Estimable Functions
Effect | Coefficients blk(rep) |
Intercept | 0 |
rep 1 | 0 |
rep 2 | 0 |
rep 3 | 0 |
blk(rep) 1 1 | L5 |
blk(rep) 2 1 | -L5 |
blk(rep) 1 2 | L7 |
blk(rep) 2 2 | -L7 |
blk(rep) 1 3 | L9 |
blk(rep) 2 3 | -L9 |
ca | 0 |
cb | 0 |
ca*cb | 0 |
cc | 0 |
ca*cc | 2L7 |
cb*cc | 2L9 |
ca*cb*cc | 2L5 |
Output 11.4 gives the nonzero coefficients of BLK(REP). The coefficient L5 appears on the terms for BLK in REP 1 and also the CA*CB*CC interaction term. This happens because CA*CB*CC is confounded with BLK in REP 1. This is apparent from the data set shown in Output 11.1. The product CA*CB*CC is equal to 1 for all observations in REP 1 of BLK 1 and CA*CB*CC= – 1 for REP 2 of BLK 1. In some data sets, the confounding pattern is not so obvious. Using the coefficients for estimable functions in conjunction with the output from the ALIASING option shown above, you can discover the confounding pattern.
The second example is a ½ fraction of a 24 factorial experiment. The defining contrast is ABCD. The data appear in Output 11.5.
Output 11.5 Data for a ½ Fraction of a 24 Factorial Experiment
Obs | a | b | c | d | y | ca | cb | cc | cd |
1 | 0 | 0 | 0 | 0 | 2.29 | -1 | -1 | -1 | -1 |
2 | 0 | 0 | 1 | 1 | 1.51 | -1 | -1 | 1 | 1 |
3 | 0 | 1 | 0 | 1 | 1.49 | -1 | 1 | -1 | 1 |
4 | 0 | 1 | 1 | 0 | 3.43 | -1 | 1 | 1 | -1 |
5 | 1 | 0 | 0 | 1 | 3.78 | 1 | -1 | -1 | 1 |
6 | 1 | 0 | 1 | 0 | 2.08 | 1 | -1 | 1 | -1 |
7 | 1 | 1 | 0 | 0 | 3.30 | 1 | 1 | -1 | -1 |
8 | 1 | 1 | 1 | 1 | 3.63 | 1 | 1 | 1 | 1 |
The data in Output 11.5 include the factor levels in their original form (A, B, C, and D) and in contrast (–1,1) form (CA, CB, CC, and CD).
You can compute the analysis with the aliasing pattern by using PROC GLM statements similar to those used in the previous example:
proc glm;
model y=ca|cb|cc|cd/solution aliasing;
Output 11.6 shows the results.
Output 11.6 PROC GLM Analysis of Data from a ½ Fraction of a 24 Factorial Experiment
Source | DF | Squares | Mean Square | F Value | Pr > F |
Model | 7 | 6.35588750 | 0.90798393 | . | . |
Error | 0 | 0.00000000 | . | ||
Corrected Total | 7 | 6.35588750 | |||
R-Square | Coeff Var | Root MSE | y Mean |
1.000000 | . | . | 2.688750 |
Source | DF | Type I SS | Mean Square | F Value | Pr > F |
ca | 1 | 2.07061250 | 2.07061250 | . | . |
cb | 1 | 0.59951250 | 0.59951250 | . | . |
ca*cb | 1 | 0.00031250 | 0.00031250 | . | . |
cc | 1 | 0.00551250 | 0.00551250 | . | . |
ca*cc | 1 | 0.80011250 | 0.80011250 | . | . |
cb*cc | 1 | 2.82031250 | 2.82031250 | . | . |
ca*cb*cc | 1 | 0.05951250 | 0.05951250 | . | . |
cd | 0 | 0.00000000 | . | . | . |
ca*cd | 0 | 0.00000000 | . | . | . |
cb*cd | 0 | 0.00000000 | . | . | . |
ca*cb*cd | 0 | 0.00000000 | . | . | . |
cc*cd | 0 | 0.00000000 | . | . | . |
ca*cc*cd | 0 | 0.00000000 | . | . | . |
cb*cc*cd | 0 | 0.00000000 | . | . | . |
ca*cb*cc*cd | 0 | 0.00000000 | . | . | . |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
ca | 0 | 0 | . | . | . |
cb | 0 | 0 | . | . | . |
ca*cb | 0 | 0 | . | . | . |
cc | 0 | 0 | . | . | . |
ca*cc | 0 | 0 | . | . | . |
cb*cc | 0 | 0 | . | . | . |
ca*cb*cc | 0 | 0 | . | . | . |
cd | 0 | 0 | . | . | . |
ca*cd | 0 | 0 | . | . | . |
cb*cd | 0 | 0 | . | . | . |
ca*cb*cd | 0 | 0 | . | . | . |
cc*cd | 0 | 0 | . | . | . |
ca*cc*cd | 0 | 0 | . | . | . |
cb*cc*cd | 0 | 0 | . | . | . |
ca*cb*cc*cd | 0 | 0 | . | . | . |
Parameter | Estimate | Standard Error | t Value | Pr > |t| | |
Intercept | 2.688750000 B | . | . | . | |
ca | 0.508750000 B | . | . | . | |
cb | 0.273750000 B | . | . | . | |
ca*cb | -0.006250000 B | . | . | . | |
cc | -0.026250000 B | . | . | . | |
ca*cc | -0.316250000 B | . | . | . | |
cb*cc | 0.593750000 B | . | . | . | |
ca*cb*cc | -0.086250000 B | . | . | . | |
cd | 0.000000000 B | . | . | . | |
ca*cd | 0.000000000 B | . | . | . | |
cb*cd | 0.000000000 B | . | . | . | |
ca*cb*cd | 0.000000000 B | . | . | . | |
cc*cd | 0.000000000 B | . | . | . | |
ca*cc*cd | 0.000000000 B | . | . | . | |
cb*cc*cd | 0.000000000 B | . | . | . | |
ca*cb*cc*cd | 0.000000000 B | . | . | . |
Parameter | Expected Value |
Intercept | Intercept + ca*cb*cc*cd |
ca | ca + cb*cc*cd |
cb | cb + ca*cc*cd |
ca*cb | ca*cb + cc*cd |
cc | cc + ca*cb*cd |
ca*cc | ca*cc + cb*cd |
cb*cc | cb*cc + ca*cd |
ca*cb*cc | ca*cb*cc + cd |
cd | |
ca*cd | |
cb*cd | |
ca*cb*cd | |
cc*cd | |
ca*cc*cd | |
cb*cc*cd | |
ca*cb*cc*cd |
From Output 11.6, you can see that because there are only eight observations, only the first seven parameters in the model plus the intercept can be estimated. Also, each estimate is confounded—aliased—with one other factorial effect. The tables of “Parameter” and “Expected Value” at the end of the printout give the aliases. For example, the estimate of the intercept is aliased with the ABCD interaction, indicated on the printout by the fact that the expected value of the intercept is INTERCEPT + CA*CB*CC*CD. Similarly, the output indicates that the expected value of the parameter CA is CA+CB*CC*CD, that is, the main effect of A is aliased with the BCD interaction. You can apply analogous interpretations to the remaining parameters. You can see that this aliasing pattern agrees with the pattern you would derive from standard fractional factorial methods. In this case, which uses a very basic design, the ALIASING option merely restates information someone familiar with fractional factorial design would already know. However, for nonstandard incomplete factorial designs, for instance those you could generate with PROC OPTEX, the ALIASING option can provide useful information that usually is not obvious.
There are three important additional points about the analysis in Output 11.6. First, the default order of effects from the CA|CB|CC|CD syntax used in the MODEL statement causes all estimates involving factors A, B, and C to be estimated first, before any effects involving D appear in the model. This is not very realistic. Normally, you would not use a fractional factorial design unless you expect higher-order interaction effects to be negligible. For example, the output gives an estimate of the ABC interaction, which is aliased with the main effect of D. In practice, you would assume this to be an estimate of the main effect of D. That is, you would use this design only if you could assume that the ABC interaction is essentially zero. Also, because all the two-factor interactions are aliased with other two-factor interactions, you must be sure which you can assume to be negligible, and not alias two potentially important effects.
The second point is that there are no degrees of freedom and hence no F-values or p-values given in Output 11.6. The model used to compute the analysis is saturated. There are various strategies to get around this. A common approach is to assume that all interactions are zero and compute a main-effects-only model using the three degrees of freedom for the two-way interactions to estimate experimental error. You can do this by using the following statements:
proc glm;
model y=ca cb cc cd/solution aliasing;
The results appear in Output 11.7. However, you can easily question whether the results in Output 11.7 are valid, because in Output 11.6, the largest single source of variation was the BC (aliased with AD) interaction. For these data, at least, the assumption that all interaction effects are zero is questionable. If there is a non-negligible BC (or AD) interaction, then the MS(ERROR) in Output 11.7 overestimates σ2 and hence the F-values are too low. An alternative strategy, not shown here, uses half-normal plots to estimate σ2 and construct approximate tests for the model effects. See Milliken and Johnson (1989, Chapter 4) for an explanation of how to implement half-normal plot analysis using SAS. Under the half-normal plot method, the main effects of A and the BC (or AD) interaction are statistically significant. You would need sufficient understanding of the data to decide whether the interaction is a BC or an AD interaction.
Output 11.7 Main-Effects-Only Analysis of Fractional Factorial Data
Sum of | |||||
Source | DF | Squares | Mean Square | F Value | Pr > F |
Model | 4 | 2.73515000 | 0.68378750 | 0.57 | 0.7075 |
Error | 3 | 3.62073750 | 1.20691250 | ||
Corrected Total | 7 | 6.35588750 | |||
R-Square | Coeff Var | Root MSE | y Mean |
0.430333 | 40.85898 | 1.098596 | 2.688750 |
Source | DF | Type I SS | Mean Square | F Value | Pr > F |
ca | 1 | 2.07061250 | 2.07061250 | 1.72 | 0.2815 |
cb | 1 | 0.59951250 | 0.59951250 | 0.50 | 0.5317 |
cc | 1 | 0.00551250 | 0.00551250 | 0.00 | 0.9504 |
cd | 1 | 0.05951250 | 0.05951250 | 0.05 | 0.8385 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
ca | 1 | 2.07061250 | 2.07061250 | 1.72 | 0.2815 |
cb | 1 | 0.59951250 | 0.59951250 | 0.50 | 0.5317 |
cc | 1 | 0.00551250 | 0.00551250 | 0.00 | 0.9504 |
cd | 1 | 0.05951250 | 0.05951250 | 0.05 | 0.8385 |
Parameter | Estimate | Standard Error |
t Value | Pr > |t| | Expected Value |
Intercept | 2.688750000 | 0.38841223 | 6.92 | 0.0062 | Intercept |
ca | 0.508750000 | 0.38841223 | 1.31 | 0.2815 | ca |
cb | 0.273750000 | 0.38841223 | 0.70 | 0.5317 | cb |
cc | -0.026250000 | 0.38841223 | -0.07 | 0.9504 | cc |
cd | -0.086250000 | 0.38841223 | -0.22 | 0.8385 | cd |
The final point concerns the use of the (–1,1) contrasts CA through CD instead of the original (0,1) coding of A through D. If you use the variables A through D in the model, the ALIASING option assesses the aliasing pattern based on the estimable functions that follow from the (0,1) coding. These do not correspond to the standard aliasing pattern for fractional factorial experiments, and can be difficult to interpret. For example, these SAS statements yield the results shown in Output 11.8:
proc glm;
model y=a|b|c|d/ aliasing;
Output 11.8 An Analysis of Fractional Factorial Data Using 0-1 Coding
Source | DF | Type I SS | Mean Square | F Value | Pr > F |
a | 1 | 2.07061250 | 2.07061250 | . | . |
b | 1 | 0.59951250 | 0.59951250 | . | . |
a*b | 1 | 0.00031250 | 0.00031250 | . | . |
c | 1 | 0.00551250 | 0.00551250 | . | . |
a*c | 1 | 0.80011250 | 0.80011250 | . | . |
b*c | 1 | 2.82031250 | 2.82031250 | . | . |
a*b*c | 1 | 0.05951250 | 0.05951250 | . | . |
d | 0 | 0.00000000 | . | . | . |
a*d | 0 | 0.00000000 | . | . | . |
b*d | 0 | 0.00000000 | . | . | . |
a*b*d | 0 | 0.00000000 | . | . | . |
c*d | 0 | 0.00000000 | . | . | . |
a*c*d | 0 | 0.00000000 | . | . | . |
b*c*d | 0 | 0.00000000 | . | . | . |
a*b*c*d | 0 | 0.00000000 | . | . | . |
Parameter | Estimate | Standard Error |
t Value | Pr > |t| |
Intercept | 2.290000000 | . | . | . |
a | 1.490000000 B | . | . | . |
b | -0.800000000 B | . | . | . |
a*b | 0.320000000 B | . | . | . |
c | -0.780000000 B | . | . | . |
a*c | -0.920000000 B | . | . | . |
b*c | 2.720000000 B | . | . | . |
a*b*c | -0.690000000 B | . | . | . |
d | 0.000000000 B | . | . | . |
a*d | 0.000000000 B | . | . | . |
b*d | 0.000000000 B | . | . | . |
a*b*d | 0.000000000 B | . | . | . |
c*d | 0.000000000 B | . | . | . |
a*c*d | 0.000000000 B | . | . | . |
b*c*d | 0.000000000 B | . | . | . |
a*b*c*d | 0.000000000 B | . | . | . |
Expected Value |
Intercept |
a + d + a*d |
b + d + b*d |
a*b - 2*d - a*d - b*d |
c + d + c*d |
a*c - 2*d - a*d - c*d |
b*c - 2*d - b*d - c*d |
a*b*c + 4*d + 2*a*d + 2*b*d + a*b*d + 2*c*d + a*c*d + b*c*d + a*b*c*d |
NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.
You can see that the sums of squares are the same as those computed from the contrast coding (–1,1). The parameter estimates are different, as you would expect, because the different coding changes the intercept and hence the other coefficients. The aliasing pattern shown in the “Expected Value” of the parameter estimates is also quite different. This reflects the fact that the (0,1) coding results in a different set of estimable functions. As shown in the theory section of Chapter 6, GLM determines estimable functions from the nonzero rows of the (X′X)–(X′X) matrix. The contrast coding results in estimable functions in standard form for assessing aliasing patterns in incomplete factorials. On the other hand, the (0,1) coding results in a different, and unfamiliar, form.
Incomplete-blocks designs are used whenever there are not enough experimental units in blocks to accommodate all treatments. Perhaps the best-known incomplete-blocks design is the so-called balanced incomplete-blocks (BIB) design. This design in not balanced in the sense that we have used the word in previous chapters, because, in fact, not all treatments are assigned in all blocks. Instead, balance in the context of incomplete-blocks designs has the specific definition that all treatments appear in the same number of blocks, and all pairs of treatments appear together in the same number of blocks. These requirements result in certain conditions on the numbers of blocks, treatments, and numbers of treatments per block. For the BIB design with four treatments in blocks of size two, six blocks (three replications) are required (Cochran and Cox 1957). The data appear in Output 11.9. The design is shown below:
BIB Design Example Data |
|||||
Block |
|||||
1 |
2 |
3 |
4 |
5 |
6 |
2.7(1) |
7.1(3) |
7.1(1) |
8.8(2) |
9.7(1) |
13.0(2) |
2.7(2) |
8.6(4) |
9.7(3) |
15.1(4) |
17.4(4) |
16.6(3) |
Output 11.9 Data for a Balanced Incomplete- Blocks Design
Obs | blk | trt | y |
1 | 1 | 1 | 1.2 |
2 | 1 | 2 | 2.7 |
3 | 2 | 3 | 7.1 |
4 | 2 | 4 | 8.6 |
5 | 3 | 1 | 7.1 |
6 | 3 | 3 | 9.7 |
7 | 4 | 2 | 8.8 |
8 | 4 | 4 | 15.1 |
9 | 5 | 1 | 9.7 |
10 | 5 | 4 | 17.4 |
11 | 6 | 2 | 13.0 |
12 | 6 | 3 | 16.6 |
Consider the following statements:
proc glm;
class blk trt;
model y=trt blk / e1 ss3;
means trt blk;
lsmeans trt / stderr pdiff cl;
run;
The analysis-of-variance portion appears in Output 11.10.
Output 11.10 ANOVA for a Balanced Incomplete-Blocks Design
The GLM Procedure | |||||
Source | DF | Sum of Squares |
Mean Square | F Value | Pr > F |
Model | 8 | 281.1275000 | 35.1409375 | 40.82 | 0.0056 |
Error | 3 | 2.5825000 | 0.8608333 | ||
Corrected Total | 11 | 283.7100000 | |||
Source | DF | Type I SS | Mean Square | F Value | Pr > F |
trt | 3 | 102.2566667 | 34.0855556 | 39.60 | 0.0065 |
blk | 5 | 178.8708333 | 35.7741667 | 41.56 | 0.0057 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
trt | 3 | 59.0175000 | 19.6725000 | 22.85 | 0.0144 |
blk | 5 | 178.8708333 | 35.7741667 | 41.56 | 0.0057 |
Least Squares Means | ||||
trt | y LSMEAN | Standard Error |
95% Confidence Limits | |
1 | 6.8000000 | 0.6281310 | 4.801007 | 8.798993 |
2 | 7.6500000 | 0.6281310 | 5.651007 | 9.648993 |
3 | 10.9250000 | 0.6281310 | 8.926007 | 12.923993 |
4 | 13.6250000 | 0.6281310 | 11.626007 | 15.623993 |
i | j | Difference Between Means |
95% Confidence Limits for LSMean(i)-LSMean(j) | |
1 | 2 | -0.850000 | -3.802709 | 2.102709 |
1 | 3 | -4.125000 | -7.077709 | -1.172291 |
1 | 4 | -6.825000 | -9.777709 | -3.872291 |
2 | 3 | -3.275000 | -6.227709 | -0.322291 |
2 | 4 | -5.975000 | -8.927709 | -3.022291 |
3 | 4 | -2.700000 | -5.652709 | 0.252709 |
The Type I sum of squares is the unadjusted treatment sum of squares, based on the ordinary treatment means. Therefore, the unadjusted treatment sum of squares contains both treatment differences and block differences. The Type III treatment sum of squares is adjusted for blocks. This means that block effects have been removed from the sum of squares. Thus, the adjusted treatment mean square measures only differences between treatment means and random error. These concepts are revealed in the estimable functions. Table 11.1 shows the Type I estimable functions.
Table 11.1 Type I Estimable Functions for Treatments
Effect |
|
Symbolic |
Coefficients |
---|---|---|---|
TRT | 1 |
L2 |
+.75 |
|
2 |
L3 |
–.25 |
|
3 |
L4 |
–.25 |
|
4 |
–L2–L3–L4 |
–.25 |
BLK | 1 |
.333L2 +.333L3 |
.167 |
|
2 |
.333L2 +.333L3 |
–.167 |
|
3 |
.333L2 +.333L4 |
–.167 |
|
4 |
.333L2 +.333L3 |
–.167 |
|
5 |
.333L3 +.333L4 |
–.167 |
|
6 |
.333L3 +.333L4 |
–.167 |
The Type I estimable function for treatments (TRT) is of some interest. Consider the contrast
TRT1 − 1/4(TRT1 + TRT2 + TRT3 + TRT4)
This is often called the effect of treatment 1, or the difference between the treatment 1 mean from the mean of all treatments. Simplification gives
3/4(TRT1) − 1/4(TRT2) − 1/4 (TRT3) − 1/4(TRT4)
This expression is obtained by defining
L2 = 3/4
L3 = 1/4
L4 = 1/4
and results in the coefficients that appear in the right-hand column of Table 11.1. You can see that the Type I (unadjusted) estimate of the TRT 1 effect is also a contrast between blocks 1, 3, and 5, which contain treatment 1, and blocks 2, 4, and 6, which do not.
The least-squares means (see Output 11.10) have been “adjusted” for block effects. The corresponding estimable functions (not reproduced here) show that the LS means contain equal representation of block parameters even though individual treatments do not appear in all the blocks. Differences between LS means provide the so-called intra-block comparisons of treatments. There is information about differences between the treatment means contained in the block means that is not used in the intra-block comparisons. This is called the inter-block information.
Expected mean squares from the RANDOM statement reveal the presence of block effects in the Type I mean squares, but not in the Type III mean squares, as shown in Output 11.11. The Type I EMS for TRT contains VAR(BLK), but the Type III EMS does not.
Output 11.11 Expected Mean Squares for a Balanced Incomplete-Blocks Design
The GLM Procedure
Source | Type I Expected Mean Square |
trt | Var(Error) + 0.6667 Var(blk) + Q(trt) |
blk | Var(Error) + 1.6 Var(blk) |
Source | Type III Expected Mean Square |
trt | Var(Error) + Q(trt) |
blk | Var(Error) + 1.6 Var(blk) |
The MIXED procedure can be used to obtain the combined inter- and intra-block information about differences between treatment means. Run the following statements:
proc mixed data=bibd;
class blk trt;
model y=trt / ddfm=satterth;
random blk;
lsmeans trt / pdiff cl;
run;
The results appear in Output 11.12.
Output 11.12 A Mixed-Model Analysis of a Balanced Incomplete-Blocks Design
The Mixed Procedure
Covariance Parameter
Estimates
Cov Parm | Estimate |
blk | 17.8543 |
Residual | 0.8518 |
Type 3 Tests of Fixed Effects
Effect | Num DF |
Den DF |
F Value | Pr > F |
trt | 3 | 3.13 | 23.46 | 0.0121 |
Least Squares Means
Effect | trt | Estimate | Standard Error |
DF | t Value | Pr > |t| | Alpha | Lower | Upper |
trt | 1 | 6.7724 | 1.8337 | 5.96 | 3.69 | 0.0103 | 0.05 | 2.2773 | 11.2674 |
trt | 2 | 7.6678 | 1.8337 | 5.96 | 4.18 | 0.0059 | 0.05 | 3.1728 | 12.1629 |
trt | 3 | 10.9322 | 1.8337 | 5.96 | 5.96 | 0.0010 | 0.05 | 6.4371 | 15.4273 |
trt | 4 | 13.6276 | 1.8337 | 5.96 | 7.43 | 0.0003 | 0.05 | 9.1325 | 18.1227 |
Differences of Least Squares Means
Effect | trt | _trt | Estimate | Standard Error |
DF | t Value | Pr > |t| | Alpha | Lower | Upper |
trt | 1 | 2 | -0.8955 | 0.9176 | 3.13 | -0.98 | 0.3983 | 0.05 | -3.7462 | 1.9552 |
trt | 1 | 3 | -4.1598 | 0.9176 | 3.13 | -4.53 | 0.0183 | 0.05 | -7.0105 | -1.3092 |
trt | 1 | 4 | -6.8552 | 0.9176 | 3.13 | -7.47 | 0.0043 | 0.05 | -9.7059 | -4.0045 |
trt | 2 | 3 | -3.2643 | 0.9176 | 3.13 | -3.56 | 0.0353 | 0.05 | -6.1150 | -0.4137 |
trt | 2 | 4 | -5.9597 | 0.9176 | 3.13 | -6.49 | 0.0065 | 0.05 | -8.8104 | -3.1091 |
trt | 3 | 4 | -2.6954 | 0.9176 | 3.13 | -2.94 | 0.0574 | 0.05 | -5.5461 | 0.1553 |
You can see the distinction between the intra-block and the combined inter- and intra-block comparisons of treatments by comparing results in Output 11.10 and Output 11.12. First of all, the TRT LSMEANS are slightly different in the two output tables. Also, the confidence for the difference between TRT 1 and TRT 2 in Output 11.10 is (–3.802709, 2.102709), whereas the confidence interval in Output 11.12 is (–3.7462, 1.9552). The confidence interval using the combined information in Output 11.12 is slightly narrower. However, this can be misleading. The standard error in Output 11.12 does not take into account the variation induced by estimating the variance-covariance matrix to obtain the estimated GLS estimates of differences between treatment means. If you use DDFM=KENWARDROGER in the MODEL statement you will get a better assessment of the true error of estimation.
Crossover designs are used in animal nutrition and pharmaceutical studies to compare two or more treatments (diets or drugs). The treatments are administered sequentially to each subject over a set of time periods. This enables the comparison of treatments on a within-subjects basis. However, there is a possibility that the response obtained after a particular time period might be influenced by the treatment assigned not only in that period but also in previous periods. If so, then the response contains residual effects from the previous periods. Some authors call these “carry-over” effects. Certain crossover designs permit the residual effects to be estimated, and thus to be effectively removed from estimates of treatment means and comparisons of means.
Cochran and Cox (1957) present two 3×3 Latin squares as a design for estimating the residual effects on milk yields of treatment from the preceding period. The treatment allocation is shown in the table below. The columns of the two squares contain the six possible sequences.
Square Cow |
|
|
1 |
|
|
2 |
|
1 |
A |
B |
C |
A |
B |
C |
|
Period | 2 |
B |
C |
A |
C |
A |
B |
3 |
C |
A |
B |
B |
C |
A |
Output 11.13 contains data from a study that was conducted to compare the effects on heart rate of three treatments; a test drug, a standard drug, and a placebo. Treatments were assigned in the six possible sequences to four patients each. The treatment design for the data in Output 11.13 is equivalent to the Cochran and Cox design in the table above with sequences A-F in Output 11.13 corresponding to Cows I-VI in the table, respectively.
Heart rate was measured one hour following the administration of treatment in each of three visits. The visits are labeled 2, 3, and 4, because visit 1 was a preliminary visit for baseline data. Thus, in the general terminology of crossover designs, period 1 is visit 2, period 2 is visit 3, and period 3 is visit 4. Baseline heart rate was measured, but it is not used in the illustrative analysis.
A model for the data is
yijk = μ + αi + dj + βk + τl(ik) + ρm(ik) + eijk
where αi is the effect of sequence i, dj is the random effect of patient j, βk is the effect of visit k, τl(ik) is the direct effect of treatment l, τl(ik) is the residual effect of treatment m, and eijk is a random effect associated with patient j in visit k. The subscript l(jk) on the treatment direct effect indicates that the treatment (l) is a function of the visit (k) and sequence (i). The same is true of the treatment residual effect subscript m(ik).
When using PROC GLM to analyze data from a crossover design, the sequence, patient, period, and direct treatment effects can be incorporated into the model with the dummy variables that result from using a CLASS statement. However, it is more convenient to use explicitly created covariates in the model for the residual effects. In the data set for the heartrate data, we create covariates for the standard and test drug residual effects named RESIDS and RESIDT, respectively. Their values in the first period (visit 2) are zero because there is no period prior to the first period that would contribute a residual effect. In periods 2 and 3 (visits 3 and 4), the values of RESIDS and RESIDT are 0 or ±1 depending on the treatment in the preceding visit. This particular coding provides estimates of the residual effects corresponding to those prescribed by Cochran and Cox (1957). For example, patient number 2 is in sequence F (test, placebo, standard). The values of RESIDS and RESIDT are both 0 in the first period (visit 2). Patient 2 received the test drug in period 1, so in period 2 (visit 3), the covariates have values RESIDS=0 and REISIDT=1. This specifies that the residual effect ρT for test is contained in the observation on patient 2 in period 2. In period 3 (visit 4), the covariates both have values of –1. This coding specifies a sum-to-zero constraint on the residual effects. Thus, the residual effect ρP of the placebo satisfies the equation ρP = −ρT − ρS, and hence the residual effect ρP of the placebo can be represented with –1 times the residual effects of test and standard.
Output 11.13 Date for Crossover Design with Residual Effects
PATIENT | SEQUENCE | VISIT | BASEHR | HR | DRUG | RESIDT | RESIDS |
1 | B | 2 | 86 | 86 | placebo | 0 | 0 |
1 | B | 3 | 86 | 106 | test | -1 | -1 |
1 | B | 4 | 62 | 79 | standard | 1 | 0 |
2 | F | 2 | 48 | 66 | test | 0 | 0 |
2 | F | 3 | 58 | 56 | placebo | 1 | 0 |
2 | F | 4 | 74 | 79 | standard | -1 | -1 |
3 | B | 2 | 78 | 84 | placebo | 0 | 0 |
3 | B | 3 | 78 | 76 | test | -1 | -1 |
3 | B | 4 | 82 | 91 | standard | 1 | 0 |
4 | D | 2 | 66 | 79 | standard | 0 | 0 |
4 | D | 3 | 72 | 100 | test | 0 | 1 |
4 | D | 4 | 90 | 82 | placebo | 1 | 0 |
5 | C | 2 | 74 | 74 | test | 0 | 0 |
5 | C | 3 | 90 | 71 | standard | 1 | 0 |
5 | C | 4 | 66 | 62 | placebo | 0 | 1 |
6 | B | 2 | 62 | 64 | placebo | 0 | 0 |
6 | B | 3 | 74 | 90 | test | -1 | -1 |
6 | B | 4 | 58 | 85 | standard | 1 | 0 |
7 | A | 2 | 94 | 75 | standard | 0 | 0 |
7 | A | 3 | 72 | 82 | placebo | 0 | 1 |
7 | A | 4 | 100 | 102 | test | -1 | -1 |
8 | A | 2 | 54 | 63 | standard | 0 | 0 |
8 | A | 3 | 54 | 58 | placebo | 0 | 1 |
8 | A | 4 | 66 | 62 | test | -1 | -1 |
9 | D | 2 | 82 | 91 | standard | 0 | 0 |
9 | D | 3 | 96 | 86 | test | 0 | 1 |
9 | D | 4 | 78 | 88 | placebo | 1 | 0 |
10 | C | 2 | 86 | 82 | test | 0 | 0 |
10 | C | 3 | 70 | 71 | standard | 1 | 0 |
10 | C | 4 | 58 | 62 | placebo | 0 | 1 |
11 | F | 2 | 82 | 80 | test | 0 | 0 |
11 | F | 3 | 80 | 78 | placebo | 1 | 0 |
11 | F | 4 | 72 | 75 | standard | -1 | -1 |
12 | E | 2 | 96 | 90 | placebo | 0 | 0 |
12 | E | 3 | 92 | 93 | standard | -1 | -1 |
12 | E | 4 | 82 | 88 | test | 0 | 1 |
13 | D | 2 | 78 | 87 | standard | 0 | 0 |
13 | D | 3 | 72 | 80 | test | 0 | 1 |
13 | D | 4 | 76 | 78 | placebo | 1 | 0 |
14 | F | 2 | 98 | 86 | test | 0 | 0 |
14 | F | 3 | 86 | 86 | placebo | 1 | 0 |
14 | F | 4 | 70 | 79 | standard | -1 | -1 |
15 | A | 2 | 86 | 71 | standard | 0 | 0 |
15 | A | 3 | 66 | 70 | placebo | 0 | 1 |
15 | A | 4 | 74 | 90 | test | -1 | -1 |
16 | E | 2 | 86 | 86 | placebo | 0 | 0 |
16 | E | 3 | 90 | 103 | standard | -1 | -1 |
16 | E | 4 | 82 | 86 | test | 0 | 1 |
17 | A | 2 | 66 | 83 | standard | 0 | 0 |
17 | A | 3 | 82 | 86 | placebo | 0 | 1 |
17 | A | 4 | 86 | 102 | test | -1 | -1 |
18 | F | 2 | 66 | 82 | test | 0 | 0 |
18 | F | 3 | 78 | 80 | placebo | 1 | 0 |
18 | F | 4 | 74 | 95 | standard | -1 | -1 |
19 | E | 2 | 74 | 80 | placebo | 0 | 0 |
19 | E | 3 | 78 | 79 | standard | -1 | -1 |
19 | E | 4 | 70 | 74 | test | 0 | 1 |
20 | B | 2 | 66 | 70 | placebo | 0 | 0 |
20 | B | 3 | 74 | 62 | test | -1 | -1 |
20 | B | 4 | 62 | 67 | standard | 1 | 0 |
21 | C | 2 | 82 | 90 | test | 0 | 0 |
21 | C | 3 | 90 | 103 | standard | 1 | 0 |
21 | C | 4 | 76 | 82 | placebo | 0 | 1 |
22 | C | 2 | 82 | 82 | test | 0 | 0 |
22 | C | 3 | 66 | 83 | standard | 1 | 0 |
22 | C | 4 | 90 | 82 | placebo | 0 | 1 |
23 | E | 2 | 82 | 66 | placebo | 0 | 0 |
23 | E | 3 | 74 | 87 | standard | -1 | -1 |
23 | E | 4 | 82 | 82 | test | 0 | 1 |
24 | D | 2 | 72 | 75 | standard | 0 | 0 |
24 | D | 3 | 82 | 86 | test | 0 | 1 |
24 | D | 4 | 74 | 82 | placebo | 1 | 0 |
The following SAS statements can be used to construct an analysis of variance and parameter estimates similar to those proposed by Cochran and Cox (1957):
proc glm data=hrtrate;
class sequence patient visit drug;
model hr = sequence patient(sequence) visit drug
resids residt / solution;
random patient(sequence)
run;
ANOVA results appear in Output 11.14.
Output 11.14 ANOVA for a Crossover Design
Source | DF | Sum of Squares |
Mean Square | F Value | Pr > F |
Model | 29 | 6408.694444 | 220.989464 | 3.91 | <.0001 |
Error | 42 | 2372.583333 | 56.490079 | ||
Corrected Total | 71 | 8781.277778 |
R-Square | Coeff Var | Root MSE | HR Mean |
0.729813 | 9.301326 | 7.515988 | 80.80556 |
Source | DF | Type I SS | Mean Square | F Value | Pr > F |
SEQUENCE | 5 | 508.944444 | 101.788889 | 1.80 | 0.1333 |
PATIENT(SEQUENCE) | 18 | 4692.333333 | 260.685185 | 4.61 | <.0001 |
VISIT | 2 | 146.777778 | 73.388889 | 1.30 | 0.2835 |
DRUG | 2 | 668.777778 | 334.388889 | 5.92 | 0.0054 |
resids | 1 | 391.020833 | 391.020833 | 6.92 | 0.0119 |
residt | 1 | 0.840278 | 0.840278 | 0.01 | 0.9035 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
SEQUENCE | 5 | 701.183333 | 140.236667 | 2.48 | 0.0466 |
PATIENT(SEQUENCE) | 18 | 4692.333333 | 260.685185 | 4.61 | <.0001 |
VISIT | 2 | 146.777778 | 73.388889 | 1.30 | 0.2835 |
DRUG | 2 | 343.950000 | 171.975000 | 3.04 | 0.0583 |
resids | 1 | 309.173611 | 309.173611 | 5.47 | 0.0241 |
residt | 1 | 0.840278 | 0.840278 | 0.01 | 0.9035 |
The desired ANOVA table is constructed as follows:
Source of Variation | DF |
SS |
|
---|---|---|---|
Sequence | 5 |
508.94 |
(Type I) |
Patient(Sequence) | 18 |
4692.33 |
(Type I) |
Visits | 2 |
146.78 |
(Type III) |
Direct effect of drugs (adjusted for residual effects) | 2 |
343.95 |
(Type III) |
Residual effects(adjusted) | 2 |
391.86 |
(Type I SS RESIDS+Type I SS RESIDT) |
Expected mean squares shown in Output 11.15 show that appropriate tests for VISIT, DRUG, and the carry-over effect covariates utilize residual means square as an error term. A test for SEQUENCE would use PATIENT(SEQUENCE) in the error term.
Output 11.15 Expected Mean Squares for a Crossover Design
Source | Type III Expected Mean Square |
SEQUENCE | Var(Error) + 2.76 Var(PATIENT(SEQUENCE)) + Q(SEQUENCE) |
PATIENT(SEQUENCE) | Var(Error) + 3 Var(PATIENT(SEQUENCE)) |
VISIT | Var(Error) + Q(VISIT) |
DRUG | Var(Error) + Q(DRUG) |
resids | Var(Error) + Q(resids) |
residt | Var(Error) + Q(residt) |
The effect of SEQUENCE is clearly not significant, since the F-ratio would be less than 1 using either a Type I or a Type III mean square in the numerator. The Type III test for DRUG has a significance level p=0.0538. The Type III mean square for DRUG has been adjusted for the residual effects. The Type I mean square for DRUG is not adjusted for the residual effects, and an F-test based on it has a significance probability p=0.0054. Thus, results from tests for DRUG depend on whether residual effects have been removed or not. Estimates of the direct and residual effect parameters can be obtained from Output 11.16.
Output 11.16 Parameter Estimates for a Crossover Design
Parameter | Estimate | Standard Error |
t Value | Pr > |t| | |
Intercept | 82.06250000 B | 4.72870558 | 17.35 | <.0001 | |
SEQUENCE | A | 6.20833333 B | 6.23192824 | 1.00 | 0.3249 |
SEQUENCE | B | -19.33333333 B | 6.23192824 | -3.15 | 0.0030 |
SEQUENCE | C | -0.47916667 B | 6.23192824 | -0.08 | 0.9391 |
SEQUENCE | D | -1.81250000 B | 6.23192824 | -0.29 | 0.7726 |
SEQUENCE | E | -5.79166667 B | 6.23192824 | -0.93 | 0.3580 |
SEQUENCE | F | 0.00000000 B | . | . | . |
PATIENT(SEQUENCE) | 7 A | -4.00000000 B | 6.13677871 | -0.65 | 0.5181 |
PATIENT(SEQUENCE) | 8 A | -29.33333333 B | 6.13677871 | -4.78 | <.0001 |
PATIENT(SEQUENCE) | 15 A | -13.33333333 B | 6.13677871 | -2.17 | 0.0355 |
PATIENT(SEQUENCE) | 17 A | 0.00000000 B | . | . | . |
... | |||||
PATIENT(SEQUENCE) | 2 F | -18.66666667 B | 6.13677871 | -3.04 | 0.0040 |
PATIENT(SEQUENCE) | 11 F | -8.00000000 B | 6.13677871 | -1.30 | 0.1995 |
PATIENT(SEQUENCE) | 14 F | -2.00000000 B | 6.13677871 | -0.33 | 0.7461 |
PATIENT(SEQUENCE) | 18 F | 0.00000000 B | . | . | . |
VISIT | 2 | -2.58333333 B | 2.16967892 | -1.19 | 0.2405 |
VISIT | 3 | 0.75000000 B | 2.16967892 | 0.35 | 0.7313 |
VISIT | 4 | 0.00000000 B | . | . | . |
DRUG | standard | 2.31250000 B | 2.42577478 | 0.95 | 0.3459 |
DRUG | test | 5.93750000 B | 2.42577478 | 2.45 | 0.0186 |
DRUG | placebo | 0.00000000 B | . | . | . |
resids | -4.39583333 | 1.87899706 | -2.34 | 0.0241 | |
residt | 0.22916667 | 1.87899706 | 0.12 | 0.9035 |
First of all, the residual effects presented by Cochran and Cox (1957) are obtained from the parameter estimates for RESIDS and RESIDT. The values are
STD: –4.396
TST: 0.229
PCB: – (–4.396 + 0.229) = 4.167
Notice that these estimates come from the sum-to-zero coding for the residual effect dummy variables.
The direct treatment effects reported by Cochran and Cox (1957) can be obtained from the TRTMENT parameter estimates according to the following equations:
STD: –0.4375 = 2.3125 (1/3)(2.3125 + 5.9375 + 0.0000)
TST: 3.1875 = 5.9375 (1/3)(2.3125 + 5.9375 + 0.0000)
PCB: –2.7500 = 0.000 (1/3)(2.3125 + 5.9375 + 0.0000)
Thus, the direct effects can be obtained from the following ESTIMATE statements:
estimate 'DIRECT EFFECT OF STD'
drug 2 –1 –1 / divisor=3;
estimate 'DIRECT EFFECT OF TST'
drug –1 2 –1 / divisor=3;
estimate 'DIRECT EFFECT OF PCB'
drug –1 –1 2 / divisor=3;
Results from these ESTIMATE statements appear in Output 11.17.
Output 11.17 Direct Effect Estimates
The GLM Procedure
Parameter | Estimate | Standard Error |
t Value | Pr > |t| | |
DIRECT EFFECT OF STD | -0.43750000 | 1.40052172 | -0.31 | 0.7563 | |
DIRECT EFFECT OF TST | 3.18750000 | 1.40052172 | 2.28 | 0.0280 | |
DIRECT EFFECT OF PCB | -2.75000000 | 1.40052172 | -1.96 | 0.0562 |
The direct effect means reported by Cochran and Cox (1957) are equal to the overall mean 80.8056 (printed as HR mean in Output 11.14) added to the direct effects. They are also equal to the GLM least-squares means, obtained from the following statement:
lsmeans drug / pdiff cl e;
The results appear in Output 11.18. You can see from the estimable functions that the LS means contain the INTERCEPT, and average across the SEQUENCE, PATIENT(SEQUENCE), and VISIT parameters. Thus, the correct standard error of these LS means would contain variance due to PATIENT(SEQUENCE). However, this variance is not contained in the standard error computed by PROC GLM for the LS means. (That is why we specified the STDERR option in the LSMEANS statement.) As a consequence, the confidence intervals for LS means displayed in Output 11.18 are not valid. However, the confidence intervals for the differences between LS means in Output 11.18 are valid because the INTERCEPT, SEQUENCE, PATIENT(SEQUENCE), and VISIT parameters would drop out of the differences.
Output 11.18 Least-Squares Means for a Crossover Design
Least Squares Means
Coefficients for DRUG Least Square Means
Effect | DRUG Level standard | test | placebo | ||
Intercept | 1 | 1 | 1 | ||
SEQUENCE | A | 0.16666667 | 0.16666667 | 0.16666667 | |
... | |||||
SEQUENCE | F | 0.16666667 | 0.16666667 | 0.16666667 | |
PATIENT(SEQUENCE) | 7 A | 0.04166667 | 0.04166667 | 0.04166667 | |
PATIENT(SEQUENCE) | 8 A | 0.04166667 | 0.04166667 | 0.04166667 | |
PATIENT(SEQUENCE) | 16 A | 0.04166667 | 0.04166667 | 0.04166667 | |
PATIENT(SEQUENCE) | 18 A | 0.04166667 | 0.04166667 | 0.04166667 | |
... | |||||
PATIENT(SEQUENCE) | 12 F | 0.04166667 | 0.04166667 | 0.04166667 | |
PATIENT(SEQUENCE) | 17 F | 0.04166667 | 0.04166667 | 0.04166667 | |
PATIENT(SEQUENCE) | 20 F | 0.04166667 | 0.04166667 | 0.04166667 | |
PATIENT(SEQUENCE) | 24 F | 0.04166667 | 0.04166667 | 0.04166667 | |
VISIT | 2 | 0.33333333 | 0.33333333 | 0.33333333 | |
VISIT | 3 | 0.33333333 | 0.33333333 | 0.33333333 | |
VISIT | 4 | 0.33333333 | 0.33333333 | 0.33333333 | |
DRUG | standard | 1 | 0 | 0 | |
DRUG | test | 0 | 1 | 0 | |
DRUG | placebo | 0 | 0 | 1 | |
resids | 0 | 0 | 0 | ||
residt | 0 | 0 | 0 |
Least Squares Means for Effect DRUG
DRUG | HR LSMEAN | 95% Confidence Limits | |
standard | 80.368056 | 77.023853 | 83.712258 |
test | 83.993056 | 80.648853 | 87.337258 |
placebo | 78.055556 | 74.711353 | 81.399758 |
i | j | Difference Between Means |
95% Confidence Limits for LSMean(i)-LSMean(j) | |
1 | 2 | -3.625000 | -8.520412 | 1.270412 |
1 | 3 | 2.312500 | -2.582912 | 7.207912 |
2 | 3 | 5.937500 | 1.042088 | 10.832912 |
PROC MIXED can be used to analyze the crossover design data. Run the following statements:
proc mixed data=hrtrate order=internal;
class sequence patient visit drug;
model hr=sequence visit drug resides residt/solution
ddfm=satterth;
random patient(sequence);
lsmeans drug / pdiff cl e;
run;
Edited results appear in Output 11.19.
Output 11.19 Partial Mixed-Model Results for a Crossover Design
The Mixed Procedure
Covariance Parameter Estimates
Cov Parm | Estimate |
PATIENT(SEQUENCE) | 68.0650 |
Residual | 56.4901 |
Type 3 Tests of Fixed Effects
Effect | Num DF |
Den DF |
F Value | Pr > F | |
SEQUENCE | 5 | 18.7 | 0.58 | 0.7165 | |
VISIT | 2 | 42 | 1.30 | 0.2835 | |
DRUG | 2 | 42 | 3.04 | 0.0583 | |
resids | 1 | 42 | 5.47 | 0.0241 | |
residt | 1 | 42 | 0.01 | 0.9035 |
Least Squares Means
Effect | DRUG | Estimate | Standard Error |
DF | t Value | Pr > |t| | Alpha | Lower | Upper |
DRUG | standard | 80.3681 | 2.3626 | 38 | 34.02 | <.0001 | 0.05 | 75.5852 | 85.1510 |
DRUG | test | 83.9931 | 2.3626 | 38 | 35.55 | <.0001 | 0.05 | 79.2102 | 88.7760 |
DRUG | placebo | 78.0556 | 2.3626 | 38 | 33.04 | <.0001 | 0.05 | 73.2727 | 82.8385 |
Differences of Least Squares Means
Effect | DRUG | _DRUG | Estimate | Standard Error |
DF | t Value | Pr > |t| | Alpha |
DRUG | standard | test | -3.6250 | 2.4258 | 42 | -1.49 | 0.1426 | 0.05 |
DRUG | standard | placebo | 2.3125 | 2.4258 | 42 | 0.95 | 0.3459 | 0.05 |
DRUG | test | placebo | 5.9375 | 2.4258 | 42 | 2.45 | 0.0186 | 0.05 |
Differences of Least Squares Means
Effect | DRUG | _DRUG | Lower | Upper |
DRUG | standard | test | -8.5204 | 1.2704 |
DRUG | standard | placebo | -2.5829 | 7.2079 |
DRUG | test | placebo | 1.0421 | 10.8329 |
The test of significance for DRUG in “Type 3 Tests of Fixed Effects” in Output 11.19 is the same as the test from GLM in Output 11.15. Likewise, the least-squares means are equal in the two analyses. This illustrates that ordinary least-squares analyses, as performed by GLM, can be equivalent to generalized least-squares analyses, as performed by MIXED. The phenomenon occurs in this example because the within-patients effects are orthogonal to the between-patients effects. However, notice that the confidence intervals for differences between LS means are the same in Outputs 11.18 and 11.19, but the confidence intervals for the LS means themselves are wider in Output 11.19 than in Output 11.18 because PROC MIXED computes standard errors of LS means that incorporate the PATIENT(SEQUENCE) variance.
The material in this section is related to the discussions of regression analysis in Chapter 2 and analysis of covariance in Chapter 7. This section concerns details of certain models that contain dummy variables generated from the CLASS statement, and also a continuous variable. These are several regression models in one equation. Of particular interest are cases for which the regressions have a common intercept. These types of models are frequently used, for example, in relative potency and relative bioavailability studies (Littell et al. 1997).
Many experiments involve both qualitative and quantitative factors. For example, the tensile strength (TS) of a monofilament fiber depends on the amount (AMT) of a chemical used in the manufacturing process. This chemical can be obtained from three different sources (SOURCE), with values A, B, or C. SOURCE is a qualitative variable and AMT is a quantitative variable. Measurements of TS were obtained from samples from different amounts and sources. The SAS data set named MONOFIL appears in Output 11.20.
Output 11.20 Data for an Experiment with Qualitative and Quantitative Variables
Obs | SOURCE | AMT | TS |
1 | A | 1 | 11.5 |
2 | A | 2 | 13.8 |
3 | A | 3 | 14.4 |
4 | A | 4 | 16.8 |
5 | A | 5 | 18.7 |
6 | B | 1 | 10.8 |
7 | B | 2 | 12.3 |
8 | B | 3 | 13.7 |
9 | B | 4 | 14.2 |
10 | B | 5 | 16.6 |
11 | C | 1 | 13.1 |
12 | C | 2 | 16.2 |
13 | C | 3 | 19.0 |
14 | C | 4 | 22.9 |
15 | C | 5 | 26.5 |
A simple linear regression model for each source relates to TS and AMT:
TS = αA + βA + ε (SOURCE A)
TS = αB + βB + ε (SOURCE B)
TS = αC + βC + ε (SOURCE C)
The parameters αA and βA are the intercept and slope, respectively, for SOURCE=A.
The following statements produce the analysis of variance and parameter estimates in Output 11.21.
proc glm data=monofil;
class source;
model ts=source amt source*amt / solution;
run;
Output 11.21 A Model with Main Effects and Interactions
The GLM Procedure
Dependent Variable: ts
Source | DF | Sum of Squares |
Mean Square | F Value | Pr > F |
Model | 5 | 258.7273333 | 51.7454667 | 263.71 | <.0001 |
Error | 9 | 1.7660000 | 0.1962222 | ||
Corrected Total | 14 | 260.4933333 |
R-Square | Coeff Var | Root MSE | ts Mean |
0.993221 | 2.762805 | 0.442970 | 16.03333 |
Source | DF | Type I SS | Mean Square | F Value | Pr > F |
Source | 2 | 98.0013333 | 49.0006667 | 249.72 | <.0001 |
amt | 1 | 138.2453333 | 138.2453333 | 704.53 | <.0001 |
amt*source | 2 | 22.4806667 | 11.2403333 | 57.28 | <.0001 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
Source | 2 | 0.0702424 | 0.0351212 | 0.18 | 0.8390 |
amt | 1 | 138.2453333 | 138.2453333 | 704.53 | <.0001 |
amt*source | 2 | 22.4806667 | 11.2403333 | 57.28 | <.0001 |
Parameter | Estimate | Standard Error |
t Value | Pr > |t| | |
Intercept | 9.490000000 B | 0.46459062 | 20.43 | <.0001 | |
source A | 0.330000000 B | 0.65703036 | 0.50 | 0.6275 | |
source B | -0.020000000 B | 0.65703036 | -0.03 | 0.9764 | |
source C | 0.000000000 B | . | . | . | |
amt | 3.350000000 B | 0.14007934 | 23.92 | <.0001 | |
amt*source A | -1.610000000 B | 0.19810211 | -8.13 | <.0001 | |
amt*source B | -2.000000000 B | 0.19810211 | -10.10 | <.0001 | |
amt*source C | 0.000000000 B | . | . | . |
NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.
These parameter estimates pertain to the integrated model
The parameters α′ and β′ are further defined as
α'A = αA - αC α'B = αB - αC
β'A = βA - βC β'B = βB - βC
The variable DA is a dummy variable equal to 1 for SOURCE=A and equal to 0 otherwise, and DB has a corresponding definition with respect to SOURCE=B. Thus, the regression models for the three nitrogen sources are
TS = (αC + α'A) + (βC + β'A) AMT + ε (SOURCE A)
TS = (αC + α'B) + (βC + β'B) AMT + ε (SOURCE B)
TS = αC + αCAMT + ε (SOURCE C)
Therefore, the fitted equations are
TS = 9.49 + 0.33 + (3.35 - 1.61) AMT (SOURCE A)
= 9.82 + 1.74 AMT
TS = 9.49 - 0.02 + (3.35 - 2.00) AMT (SOURCE B)
= 9.47 + 1.35 AMT
TS = 9.49 + 3.35 AMT (SOURCE C)
The GLM parameter estimates, in effect, treat the regression line for SOURCE=C as a reference line, and the parameters α′A, α′B, β′A, and β′B are parameters for lines A and B minus parameters for line C. The AMT source parameters β′A and β′B measure differences between the slopes for regression lines A and B, and line C, respectively. Thus, a test that these parameters are 0 is testing that the lines are parallel, that is, they have equal slopes. The appropriate statistic is the F=57.28 for the AMT*SOURCE effect, which has a significant probability p=0.0001.
Caution is advised in using the Type III F-test for SOURCE. It is a test of the equality of the intercepts (H0: αA = αB = αC), which probably has no practical interpretation because the intercepts are simply extrapolations of the lines to L=0. The Type I F-test, on the other hand, tests the equality of the midpoints of the regression lines (H0: αA + βA (2) = αB + βB(2) = αC + βC(2)).
You can compare two sources at a given amount with an ESTIMATE statement. Suppose you want to compare SOURCE=A with SOURCE=B using AMT=3.5. This difference is
(αA + βA(3.5)) - (αB + βB(3.5))
=((αC + α'A) + (βC + β'C) 3.5)
= ((αC + α'B) + (βC + β'C) 3.5)
= α'A + α'B + (β'A + β'B) 3.5
So the appropriate ESTIMATE statement is
estimate 'A vs B at AMT=3.5'
source 1 -1 0
source*amt 3.5 -3.5 0;
The results appear in Output 11.22.
Output 11.22 The Difference between SOURCE=A and SOURCE=B at AMT=3.5
The GLM Procedure
Dependent Variable: ts
Parameter | Estimate | Standard Error |
t Value | Pr > |t| | |
A vs B at AMT=3.5 | 1.71500000 | 0.29715316 | 5.77 | 0.0003 |
Suppose TS also is measured for AMT=0. This variation of the experiment is commonly mishandled by data analysts. Since AMT=0 means there is no chemical, the intercepts for the models are all equal, αA = αB = αC. Thus, a correct analysis should provide equal estimates of the intercepts. The regressions can be written simultaneously as
TS = α + γADAAMT + γBDBAMT + γCDCAMT + ε
where DA is a dummy variable equal to 1 for SOURCE=A and equal to 0 otherwise, and DB and DC have corresponding definitions with respect to SOURCE=B and SOURCE=C. Use PROC GLM to create DA, DB, and DC by including the SOURCE variable in a CLASS statement.
Look at the data set MONOFIL2 printed in Output 11.23. The value C is arbitrarily assigned to SOURCE when AMT=0.
Output 11.23 Data with AMT=0
Qual and Quant Variables
Obs | source | amt | ts |
1 | A | 1 | 11.5 |
2 | A | 2 | 13.8 |
3 | A | 3 | 14.4 |
4 | A | 4 | 16.8 |
5 | A | 5 | 18.7 |
6 | B | 1 | 10.8 |
7 | B | 2 | 12.3 |
8 | B | 3 | 13.7 |
9 | B | 4 | 14.2 |
10 | B | 5 | 16.6 |
11 | C | 1 | 13.1 |
12 | C | 2 | 16.2 |
13 | C | 3 | 19.0 |
14 | C | 4 | 22.9 |
15 | C | 5 | 26.5 |
16 | C | 0 | 10.1 |
17 | C | 0 | 10.2 |
18 | C | 0 | 9.8 |
19 | C | 0 | 9.9 |
20 | C | 0 | 10.2 |
The following statements produce Output 11.24:
proc glm;
class source;
model ts=amt*source / solution
Output 11.24 Parameter Estimates for Data with AMT=0
The GLM Procedure
Dependent Variable: ts
Source | DF | Sum of Squares |
Mean Square | F Value | Pr > F |
Model | 3 | 393.0051791 | 131.0017264 | 903.34 | <.0001 |
Error | 16 | 2.3203209 | 0.1450201 | ||
Corrected Total | 19 | 395.3255000 |
R-Square | Coeff Var | Root MSE | ts Mean |
0.994131 | 2.619986 | 0.380815 | 14.53500 |
Source | DF | Type I SS | Mean Square | F Value | Pr > F |
amt*source | 3 | 393.0051791 | 131.0017264 | 903.34 | <.0001 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
amt*source | 3 | 393.0051791 | 131.0017264 | 903.34 | <.0001 |
Parameter | Estimate | Standard Error |
t Value | Pr > |t| |
Intercept | 9.882352941 | 0.13699380 | 72.14 | <.0001 |
amt*source A | 1.722994652 | 0.06350310 | 27.13 | <.0001 |
amt*source B | 1.237540107 | 0.06350310 | 19.49 | <.0001 |
amt*source C | 3.242994652 | 0.06350310 | 51.07 | <.0001 |
Parameter estimates in Output 11.24 yield the three prediction equations
TS = 9.88 + 1.72 AMT | (SOURCE A) |
TS = 9.88 + 1.24 AMT | (SOURCE B) |
TS = 9.88 + 3.24 AMT | (SOURCE C) |
The relative effect of one source to another can be measured by the ratio of slopes of the regression parameters. For example, the strength of SOURCE B relative to SOURCE A is the ratio 1.24/1.72 = 0.72. This means that one unit of the chemical from SOURCE B has the same effect on tensile strength as .72 units of the chemical from SOURCE A.
Similar models are used in other types of applications. The potency of one drug relative to another in a drug study, or the bioavailability of one nutrient relative to another in a nutrition study, is measured in the same way.
52.15.112.69