As already noted, the GLM and MIXED procedures can be used to analyze a multitude of data structures. In this chapter several applications are presented that utilize tools discussed in the previous chapters. Some of these applications involve statistical topics that are not discussed in great detail in this book. References are given to provide the necessary background information.

11.2 Confounding in a Factorial Experiment

Experiments use confounding in two forms. The first are factorial treatments designs in which all factorial combinations appear in the experiment, but they appear in incomplete blocks containing only a subset of the factor combinations. Thus, within a given block, one or more treatment effects are confounded with block effects. The second are fractional factorial experiments in which only a subset of the factor combinations appear in the experiment. Thus, some of the factorial effects are not estimable, but are aliased with other effects, meaning that the same estimable function estimates both effects. Confounding is covered in most textbooks on the design of experiments (for example, Hicks and Turner 2000).

11.2.1 Confounding with Blocks

The first example for this topic is a 2³ factorial with factors labeled A, B, and C in blocks of size four. There are three replications with interactions ABC, AC, and BC, confounded with blocks in replications 1, 2, and 3, respectively. These factors are thus partially confounded with blocks. The data appear in Output 11.1.

Output 11.1 Data for a Two-Cube Factorial in Blocks of Size Four

Obs	rep	blk	a	b	c	y

1	1	1	1	1	1	3.99
2	1	1	1	0	0	1.14
3	1	1	0	1	0	1.52
4	1	1	0	0	1	3.33
5	1	2	1	1	0	2.06
6	1	2	1	0	1	5.58
7	1	2	0	1	1	2.06
8	1	2	0	0	0	-0.17
9	2	1	1	1	1	3.77
10	2	1	1	0	1	6.69
11	2	1	0	1	0	2.17
12	2	1	0	0	0	-0.01
13	2	2	1	1	0	2.43
14	2	2	0	1	1	1.22
15	2	2	1	0	0	0.37
16	2	2	0	0	1	2.06
17	3	1	1	1	1	4.53
18	3	1	0	1	1	1.90
19	3	1	1	0	0	1.62
20	3	1	0	0	0	-0.70
21	3	2	1	1	0	1.56
22	3	2	1	0	1	5.99
23	3	2	0	1	0	1.44
24	3	2	0	0	1	2.42

Contrasts corresponding to confounded effects can be estimated only from those replications in which they are not confounded. In this example, they are estimated from only two-thirds of the data; thus their standard errors should be larger by a factor of $\sqrt{3 / 2}$ $\sqrt{3 / 2}$ .

The analysis using PROC GLM is straightforward. You can generate contrasts in the DATA step instead of specifying classes for treatments and using CONTRAST statements, as the following code shows:

data confound;
   input rep blk a b c y;
      ca= -(a=0) + (a=1);
      cb= -(b=0) + (b=1);
      cc= -(c=0) + (c=1);
    datalines;
      ·
      data
      ·
;

By sorting the data and running the analysis by REP, you can use the ALIASING option in PROC GLM to print out the confounding pattern. Use the following statements:

proc sort;
   by rep;
proc glm;
   by rep;
   class blk;
   model y=blk ca|cb|cc/solution aliasing;

The results appear in Output 11.2.

Output 11.2 Aliasing Output Showing a Confounding Pattern for a 2³Factorial in Blocks of Size Four

----------- rep=1 -----------

Parameter	Expected Value

Intercept	Intercept + [blk 2] - cacbcc
blk 1	[blk 1] - [blk 2] + 2cacb*cc
blk 2
ca	ca
cb	cb
ca*cb	ca*cb
cc	cc
ca*cc	ca*cc
cb*cc	cb*cc
cacbcc

----------- rep=2 -----------

Parameter	Expected Value

Intercept	Intercept + [blk 2] - ca*cc
blk 1	[blk 1] - [blk 2] + 2cacc
blk 2
ca	ca
cb	cb
ca*cb	ca*cb
cc	cc
ca*cc
cb*cc	cb*cc
cacbcc	cacbcc

----------- rep=3 -----------

Parameter	Expected Value

Intercept	Intercept + [blk 2] - ca*cc
blk 1	[blk 1] - [blk 2] + 2cbcc
blk 2
ca	ca
cb	cb
ca*cb	ca*cb
cc	cc
ca*cc	ca*cc
cb*cc
cacbcc	cacbcc

The contents of Output 11.2 appear immediately after the parameter estimates generated by the SOLUTION option in the MODEL statement. For REP=1, you can see that the three-way interaction CA*CB*CC has a blank under “Expected Value” but the INTERCEPT and BLK 1 effects estimate their usual estimable functions plus the CA*CB*CC effect. This indicates that the ABC interaction effect is confounded with block in REP=1. Similarly, the output indicates that the AC interaction is confounded with block in REP=2, and the BC interaction is confounded with block in REP=2. Although in this example the ALIASING option merely confirms the confounding pattern stated in the introduction, it can be very useful in data sets where the confounding pattern is not obvious and needs to be investigated.

For a complete analysis of the data, combined over all replications, use the following SAS statements:

proc glm;
classes rep blk;
model y=rep blk(rep) ca|cb|cc/ solution;

The results appear in Output 11.3.

Output 11.3 ANOVA for a Two-Cube Factorial in Blocks of Size Four

The GLM Procedure
Class Level Information
Class	Levels	Values
rep	3	1 2 3
blk	2	1 2

Number of observations 24
The GLM Procedure

Dependent Variable: y

		Sum of
Source	DF	Squares	Mean Square	F Value	Pr > F

Model	12	81.74957500	6.81246458	33.60	<.0001

Error	11	2.23018750	0.20274432

Corrected Total	23	83.97976250


R-Square	Coeff Var	Root MSE	y Mean

0.973444	18.96878	0.450271	2.373750

Source	DF	Type I SS	Mean Square	F Value	Pr > F

rep	2	0.05092500	0.02546250	0.13	0.8832
blk(rep)	3	7.43221250	2.47740417	12.22	0.0008
ca	1	21.07500417	21.07500417	103.95	<.0001
cb	1	0.00453750	0.00453750	0.02	0.8838
ca*cb	1	1.72270417	1.72270417	8.50	0.0141
cc	1	37.77550417	37.77550417	186.32	<.0001
ca*cc	1	2.31800625	2.31800625	11.43	0.0061
cb*cc	1	11.34005625	11.34005625	55.93	<.0001
cacbcc	1	0.03062500	0.03062500	0.15	0.7049

Source	DF	Type III SS	Mean Square	F Value	Pr > F

rep	2	0.05092500	0.02546250	0.13	0.8832
blk(rep)	3	1.66755417	0.55585139	2.74	0.0938
ca	1	21.07500417	21.07500417	103.95	<.0001
cb	1	0.00453750	0.00453750	0.02	0.8838
ca*cb	1	1.72270417	1.72270417	8.50	0.0141
cc	1	37.77550417	37.77550417	186.32	<.0001
ca*cc	1	2.31800625	2.31800625	11.43	0.0061
cb*cc	1	11.34005625	11.34005625	55.93	<.0001
cacbcc	1	0.03062500	0.03062500	0.15	0.7049


		Standard
Parameter	Estimate	Error	t Value	Pr > \|t\|

Intercept	2.010625000 B	0.25170936	7.99	<.0001
rep 1	0.328125000 B	0.35597078	0.92	0.3764
rep 2	-0.110000000 B	0.35597078	-0.31	0.7631
rep 3	0.000000000 B	⋅	⋅	⋅
blk(rep) 1 1	0.200000000 B	0.38994646	0.51	0.6182
blk(rep) 2 1	0.000000000 B	⋅	⋅	⋅
blk(rep) 1 2	0.873750000 B	0.38994646	2.24	0.0466
blk(rep) 2 2	0.000000000 B	⋅	⋅	⋅
blk(rep) 1 3	0.668750000 B	0.38994646	1.71	0.1143
blk(rep) 2 3	0.000000000 B	⋅	⋅	⋅
ca	0.937083333	0.09191126	10.20	<.0001
cb	0.013750000	0.09191126	0.15	0.8838
ca*cb	-0.267916667	0.09191126	-2.91	0.0141
cc	1.254583333	0.09191126	13.65	<.0001
ca*cc	0.380625000	0.11256785	3.38	0.0061
cb*cc	-0.841875000	0.11256785	-7.48	<.0001
cacbcc	-0.043750000	0.11256785	-0.39	0.7049

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter ‘B’ are not uniquely estimable.

The standard errors of the coefficients of the confounded effects (ABC, AC, and BC) are indeed larger by $\sqrt{3 / 2}$ $\sqrt{3 / 2}$ than the coefficients of the effects not confounded. You can verify that the sums of squares of the confounded effects, based on data from the replications in which they are not confounded, are identical to the sums of squares in Output 11.3.

The estimable functions option can be used to indicate the nature of the confounding. Requesting the Type I functions for effects in the same order as in the MODEL statement above gives the effects for BLK(REP) unadjusted for the factorial effects and reveals how the blocks are related to the factorial effects.

Output 11.4 Estimable Functions for a Two-Cube Factorial in Blocks of Size Four

Type I Estimable Functions

Effect	Coefficients blk(rep)

Intercept	0

rep 1	0
rep 2	0
rep 3	0

blk(rep) 1 1	L5
blk(rep) 2 1	-L5
blk(rep) 1 2	L7
blk(rep) 2 2	-L7
blk(rep) 1 3	L9
blk(rep) 2 3	-L9

ca	0

cb	0

ca*cb	0

cc	0

ca*cc	2L7

cb*cc	2L9

cacbcc	2L5

Output 11.4 gives the nonzero coefficients of BLK(REP). The coefficient L5 appears on the terms for BLK in REP 1 and also the CA*CB*CC interaction term. This happens because CA*CB*CC is confounded with BLK in REP 1. This is apparent from the data set shown in Output 11.1. The product CA*CB*CC is equal to 1 for all observations in REP 1 of BLK 1 and CA*CB*CC= – 1 for REP 2 of BLK 1. In some data sets, the confounding pattern is not so obvious. Using the coefficients for estimable functions in conjunction with the output from the ALIASING option shown above, you can discover the confounding pattern.

11.2.2 A Fractional Factorial Example

The second example is a ½ fraction of a 2⁴ factorial experiment. The defining contrast is ABCD. The data appear in Output 11.5.

Output 11.5 Data for a ½ Fraction of a 2⁴ Factorial Experiment

Obs	a	b	c	d	y	ca	cb	cc	cd

1	0	0	0	0	2.29	-1	-1	-1	-1
2	0	0	1	1	1.51	-1	-1	1	1
3	0	1	0	1	1.49	-1	1	-1	1
4	0	1	1	0	3.43	-1	1	1	-1
5	1	0	0	1	3.78	1	-1	-1	1
6	1	0	1	0	2.08	1	-1	1	-1
7	1	1	0	0	3.30	1	1	-1	-1
8	1	1	1	1	3.63	1	1	1	1

The data in Output 11.5 include the factor levels in their original form (A, B, C, and D) and in contrast (–1,1) form (CA, CB, CC, and CD).

You can compute the analysis with the aliasing pattern by using PROC GLM statements similar to those used in the previous example:

proc glm;
model y=ca|cb|cc|cd/solution aliasing;

Output 11.6 shows the results.

Output 11.6 PROC GLM Analysis of Data from a ½ Fraction of a 2⁴ Factorial Experiment

Source	DF	Squares	Mean Square	F Value	Pr > F

Model	7	6.35588750	0.90798393	.	.
Error	0	0.00000000	.
Corrected Total	7	6.35588750

R-Square	Coeff Var	Root MSE	y Mean

1.000000	.	.	2.688750

Source	DF	Type I SS	Mean Square	F Value	Pr > F

ca	1	2.07061250	2.07061250	.	.
cb	1	0.59951250	0.59951250	.	.
ca*cb	1	0.00031250	0.00031250	.	.
cc	1	0.00551250	0.00551250	.	.
ca*cc	1	0.80011250	0.80011250	.	.
cb*cc	1	2.82031250	2.82031250	.	.
cacbcc	1	0.05951250	0.05951250	.	.
cd	0	0.00000000	.	.	.
ca*cd	0	0.00000000	.	.	.
cb*cd	0	0.00000000	.	.	.
cacbcd	0	0.00000000	.	.	.
cc*cd	0	0.00000000	.	.	.
cacccd	0	0.00000000	.	.	.
cbcccd	0	0.00000000	.	.	.
cacbcc*cd	0	0.00000000	.	.	.


Source	DF	Type III SS	Mean Square	F Value	Pr > F

ca	0	0	.	.	.
cb	0	0	.	.	.
ca*cb	0	0	.	.	.
cc	0	0	.	.	.
ca*cc	0	0	.	.	.
cb*cc	0	0	.	.	.
cacbcc	0	0	.	.	.
cd	0	0	.	.	.
ca*cd	0	0	.	.	.
cb*cd	0	0	.	.	.
cacbcd	0	0	.	.	.
cc*cd	0	0	.	.	.
cacccd	0	0	.	.	.
cbcccd	0	0	.	.	.
cacbcc*cd	0	0	.	.	.

Parameter	Estimate	Standard Error	t Value	Pr > \|t\|

Intercept	2.688750000 B	.	.	.
ca	0.508750000 B	.	.	.
cb	0.273750000 B	.	.	.
ca*cb	-0.006250000 B	.	.	.
cc	-0.026250000 B	.	.	.
ca*cc	-0.316250000 B	.	.	.
cb*cc	0.593750000 B	.	.	.
cacbcc	-0.086250000 B	.	.	.
cd	0.000000000 B	.	.	.
ca*cd	0.000000000 B	.	.	.
cb*cd	0.000000000 B	.	.	.
cacbcd	0.000000000 B	.	.	.
cc*cd	0.000000000 B	.	.	.
cacccd	0.000000000 B	.	.	.
cbcccd	0.000000000 B	.	.	.
cacbcc*cd	0.000000000 B	.	.	.


Parameter	Expected Value

Intercept	Intercept + cacbcc*cd
ca	ca + cbcccd
cb	cb + cacccd
ca*cb	cacb + cccd
cc	cc + cacbcd
ca*cc	cacc + cbcd
cb*cc	cbcc + cacd
cacbcc	cacbcc + cd
cd
ca*cd
cb*cd
cacbcd
cc*cd
cacccd
cbcccd
cacbcc*cd

From Output 11.6, you can see that because there are only eight observations, only the first seven parameters in the model plus the intercept can be estimated. Also, each estimate is confounded—aliased—with one other factorial effect. The tables of “Parameter” and “Expected Value” at the end of the printout give the aliases. For example, the estimate of the intercept is aliased with the ABCD interaction, indicated on the printout by the fact that the expected value of the intercept is INTERCEPT + CA*CB*CC*CD. Similarly, the output indicates that the expected value of the parameter CA is CA+CB*CC*CD, that is, the main effect of A is aliased with the BCD interaction. You can apply analogous interpretations to the remaining parameters. You can see that this aliasing pattern agrees with the pattern you would derive from standard fractional factorial methods. In this case, which uses a very basic design, the ALIASING option merely restates information someone familiar with fractional factorial design would already know. However, for nonstandard incomplete factorial designs, for instance those you could generate with PROC OPTEX, the ALIASING option can provide useful information that usually is not obvious.

There are three important additional points about the analysis in Output 11.6. First, the default order of effects from the CA|CB|CC|CD syntax used in the MODEL statement causes all estimates involving factors A, B, and C to be estimated first, before any effects involving D appear in the model. This is not very realistic. Normally, you would not use a fractional factorial design unless you expect higher-order interaction effects to be negligible. For example, the output gives an estimate of the ABC interaction, which is aliased with the main effect of D. In practice, you would assume this to be an estimate of the main effect of D. That is, you would use this design only if you could assume that the ABC interaction is essentially zero. Also, because all the two-factor interactions are aliased with other two-factor interactions, you must be sure which you can assume to be negligible, and not alias two potentially important effects.

The second point is that there are no degrees of freedom and hence no F-values or p-values given in Output 11.6. The model used to compute the analysis is saturated. There are various strategies to get around this. A common approach is to assume that all interactions are zero and compute a main-effects-only model using the three degrees of freedom for the two-way interactions to estimate experimental error. You can do this by using the following statements:

proc glm;
model y=ca cb cc cd/solution aliasing;

The results appear in Output 11.7. However, you can easily question whether the results in Output 11.7 are valid, because in Output 11.6, the largest single source of variation was the BC (aliased with AD) interaction. For these data, at least, the assumption that all interaction effects are zero is questionable. If there is a non-negligible BC (or AD) interaction, then the MS(ERROR) in Output 11.7 overestimates σ² and hence the F-values are too low. An alternative strategy, not shown here, uses half-normal plots to estimate σ² and construct approximate tests for the model effects. See Milliken and Johnson (1989, Chapter 4) for an explanation of how to implement half-normal plot analysis using SAS. Under the half-normal plot method, the main effects of A and the BC (or AD) interaction are statistically significant. You would need sufficient understanding of the data to decide whether the interaction is a BC or an AD interaction.

Output 11.7 Main-Effects-Only Analysis of Fractional Factorial Data

		Sum of
Source	DF	Squares	Mean Square	F Value	Pr > F

Model	4	2.73515000	0.68378750	0.57	0.7075

Error	3	3.62073750	1.20691250

Corrected Total	7	6.35588750

R-Square	Coeff Var	Root MSE	y Mean

0.430333	40.85898	1.098596	2.688750

Source	DF	Type I SS	Mean Square	F Value	Pr > F

ca	1	2.07061250	2.07061250	1.72	0.2815
cb	1	0.59951250	0.59951250	0.50	0.5317
cc	1	0.00551250	0.00551250	0.00	0.9504
cd	1	0.05951250	0.05951250	0.05	0.8385

Source	DF	Type III SS	Mean Square	F Value	Pr > F

ca	1	2.07061250	2.07061250	1.72	0.2815
cb	1	0.59951250	0.59951250	0.50	0.5317
cc	1	0.00551250	0.00551250	0.00	0.9504
cd	1	0.05951250	0.05951250	0.05	0.8385

Parameter	Estimate	Standard Error	t Value	Pr > \|t\|	Expected Value

Intercept	2.688750000	0.38841223	6.92	0.0062	Intercept
ca	0.508750000	0.38841223	1.31	0.2815	ca
cb	0.273750000	0.38841223	0.70	0.5317	cb
cc	-0.026250000	0.38841223	-0.07	0.9504	cc
cd	-0.086250000	0.38841223	-0.22	0.8385	cd

The final point concerns the use of the (–1,1) contrasts CA through CD instead of the original (0,1) coding of A through D. If you use the variables A through D in the model, the ALIASING option assesses the aliasing pattern based on the estimable functions that follow from the (0,1) coding. These do not correspond to the standard aliasing pattern for fractional factorial experiments, and can be difficult to interpret. For example, these SAS statements yield the results shown in Output 11.8:

proc glm;
model y=a|b|c|d/ aliasing;

Output 11.8 An Analysis of Fractional Factorial Data Using 0-1 Coding

Source	DF	Type I SS	Mean Square	F Value	Pr > F

a	1	2.07061250	2.07061250	.	.
b	1	0.59951250	0.59951250	.	.
a*b	1	0.00031250	0.00031250	.	.
c	1	0.00551250	0.00551250	.	.
a*c	1	0.80011250	0.80011250	.	.
b*c	1	2.82031250	2.82031250	.	.
abc	1	0.05951250	0.05951250	.	.
d	0	0.00000000	.	.	.
a*d	0	0.00000000	.	.	.
b*d	0	0.00000000	.	.	.
abd	0	0.00000000	.	.	.
c*d	0	0.00000000	.	.	.
acd	0	0.00000000	.	.	.
bcd	0	0.00000000	.	.	.
abc*d	0	0.00000000	.	.	.

Parameter	Estimate	Standard Error	t Value	Pr > \|t\|

Intercept	2.290000000	.	.	.
a	1.490000000 B	.	.	.
b	-0.800000000 B	.	.	.
a*b	0.320000000 B	.	.	.
c	-0.780000000 B	.	.	.
a*c	-0.920000000 B	.	.	.
b*c	2.720000000 B	.	.	.
abc	-0.690000000 B	.	.	.
d	0.000000000 B	.	.	.
a*d	0.000000000 B	.	.	.
b*d	0.000000000 B	.	.	.
abd	0.000000000 B	.	.	.
c*d	0.000000000 B	.	.	.
acd	0.000000000 B	.	.	.
bcd	0.000000000 B	.	.	.
abc*d	0.000000000 B	.	.	.

Expected Value

Intercept

a + d + a*d

b + d + b*d

a*b - 2*d - a*d - b*d

c + d + c*d

a*c - 2*d - a*d - c*d

b*c - 2*d - b*d - c*d

a*b*c + 4*d + 2*a*d + 2*b*d + a*b*d + 2*c*d + a*c*d + b*c*d + a*b*c*d

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

You can see that the sums of squares are the same as those computed from the contrast coding (–1,1). The parameter estimates are different, as you would expect, because the different coding changes the intercept and hence the other coefficients. The aliasing pattern shown in the “Expected Value” of the parameter estimates is also quite different. This reflects the fact that the (0,1) coding results in a different set of estimable functions. As shown in the theory section of Chapter 6, GLM determines estimable functions from the nonzero rows of the (X′X)^–(X′X) matrix. The contrast coding results in estimable functions in standard form for assessing aliasing patterns in incomplete factorials. On the other hand, the (0,1) coding results in a different, and unfamiliar, form.

11.3 A Balanced Incomplete-Blocks Design

Incomplete-blocks designs are used whenever there are not enough experimental units in blocks to accommodate all treatments. Perhaps the best-known incomplete-blocks design is the so-called balanced incomplete-blocks (BIB) design. This design in not balanced in the sense that we have used the word in previous chapters, because, in fact, not all treatments are assigned in all blocks. Instead, balance in the context of incomplete-blocks designs has the specific definition that all treatments appear in the same number of blocks, and all pairs of treatments appear together in the same number of blocks. These requirements result in certain conditions on the numbers of blocks, treatments, and numbers of treatments per block. For the BIB design with four treatments in blocks of size two, six blocks (three replications) are required (Cochran and Cox 1957). The data appear in Output 11.9. The design is shown below:

BIB Design Example Data (Numbers in parentheses indicate treatment number)

Block
1	2	3	4	5	6
2.7(1)	7.1(3)	7.1(1)	8.8(2)	9.7(1)	13.0(2)
2.7(2)	8.6(4)	9.7(3)	15.1(4)	17.4(4)	16.6(3)

Output 11.9 Data for a Balanced Incomplete- Blocks Design

Obs	blk	trt	y

1	1	1	1.2
2	1	2	2.7
3	2	3	7.1
4	2	4	8.6
5	3	1	7.1
6	3	3	9.7
7	4	2	8.8
8	4	4	15.1
9	5	1	9.7
10	5	4	17.4
11	6	2	13.0
12	6	3	16.6

Consider the following statements:

proc glm;
   class blk trt;
   model y=trt blk / e1 ss3;
   means trt blk;
   lsmeans trt / stderr pdiff cl;
run;

The analysis-of-variance portion appears in Output 11.10.

Output 11.10 ANOVA for a Balanced Incomplete-Blocks Design

The GLM Procedure

Source	DF	Sum of Squares	Mean Square	F Value	Pr > F

Model	8	281.1275000	35.1409375	40.82	0.0056
Error	3	2.5825000	0.8608333
Corrected Total	11	283.7100000

Source	DF	Type I SS	Mean Square	F Value	Pr > F

trt	3	102.2566667	34.0855556	39.60	0.0065
blk	5	178.8708333	35.7741667	41.56	0.0057

Source	DF	Type III SS	Mean Square	F Value	Pr > F

trt	3	59.0175000	19.6725000	22.85	0.0144
blk	5	178.8708333	35.7741667	41.56	0.0057

Least Squares Means

trt	y LSMEAN	Standard Error	95% Confidence Limits

1	6.8000000	0.6281310	4.801007	8.798993
2	7.6500000	0.6281310	5.651007	9.648993
3	10.9250000	0.6281310	8.926007	12.923993
4	13.6250000	0.6281310	11.626007	15.623993


i	j	Difference Between Means	95% Confidence Limits for LSMean(i)-LSMean(j)

1	2	-0.850000	-3.802709	2.102709
1	3	-4.125000	-7.077709	-1.172291
1	4	-6.825000	-9.777709	-3.872291
2	3	-3.275000	-6.227709	-0.322291
2	4	-5.975000	-8.927709	-3.022291
3	4	-2.700000	-5.652709	0.252709

The Type I sum of squares is the unadjusted treatment sum of squares, based on the ordinary treatment means. Therefore, the unadjusted treatment sum of squares contains both treatment differences and block differences. The Type III treatment sum of squares is adjusted for blocks. This means that block effects have been removed from the sum of squares. Thus, the adjusted treatment mean square measures only differences between treatment means and random error. These concepts are revealed in the estimable functions. Table 11.1 shows the Type I estimable functions.

Table 11.1 Type I Estimable Functions for Treatments

Effect		Symbolic Expression	Coefficients for TRT1 Effect
TRT	1	L2	+.75
	2	L3	–.25
	3	L4	–.25
	4	–L2–L3–L4	–.25
BLK	1	.333L2 +.333L3	.167
	2	.333L2 +.333L3	–.167
	3	.333L2 +.333L4	–.167
	4	.333L2 +.333L3	–.167
	5	.333L3 +.333L4	–.167
	6	.333L3 +.333L4	–.167

The Type I estimable function for treatments (TRT) is of some interest. Consider the contrast

TRT1 − 1/4(TRT1 + TRT2 + TRT3 + TRT4)

This is often called the effect of treatment 1, or the difference between the treatment 1 mean from the mean of all treatments. Simplification gives

3/4(TRT1) − 1/4(TRT2) − 1/4 (TRT3) − 1/4(TRT4)

This expression is obtained by defining

L2 = 3/4

L3 = 1/4

L4 = 1/4

and results in the coefficients that appear in the right-hand column of Table 11.1. You can see that the Type I (unadjusted) estimate of the TRT 1 effect is also a contrast between blocks 1, 3, and 5, which contain treatment 1, and blocks 2, 4, and 6, which do not.

The least-squares means (see Output 11.10) have been “adjusted” for block effects. The corresponding estimable functions (not reproduced here) show that the LS means contain equal representation of block parameters even though individual treatments do not appear in all the blocks. Differences between LS means provide the so-called intra-block comparisons of treatments. There is information about differences between the treatment means contained in the block means that is not used in the intra-block comparisons. This is called the inter-block information.

Expected mean squares from the RANDOM statement reveal the presence of block effects in the Type I mean squares, but not in the Type III mean squares, as shown in Output 11.11. The Type I EMS for TRT contains VAR(BLK), but the Type III EMS does not.

Output 11.11 Expected Mean Squares for a Balanced Incomplete-Blocks Design

The GLM Procedure

Source	Type I Expected Mean Square

trt	Var(Error) + 0.6667 Var(blk) + Q(trt)

blk	Var(Error) + 1.6 Var(blk)

Source	Type III Expected Mean Square

trt	Var(Error) + Q(trt)

blk	Var(Error) + 1.6 Var(blk)

The MIXED procedure can be used to obtain the combined inter- and intra-block information about differences between treatment means. Run the following statements:

proc mixed data=bibd;
   class blk trt;
   model y=trt / ddfm=satterth;
   random blk;
   lsmeans trt / pdiff cl;
run;

The results appear in Output 11.12.

Output 11.12 A Mixed-Model Analysis of a Balanced Incomplete-Blocks Design

The Mixed Procedure

Covariance Parameter
Estimates

Cov Parm	Estimate

blk	17.8543
Residual	0.8518

Type 3 Tests of Fixed Effects

Effect	Num DF	Den DF	F Value	Pr > F

trt	3	3.13	23.46	0.0121

Least Squares Means

Effect	trt	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper

trt	1	6.7724	1.8337	5.96	3.69	0.0103	0.05	2.2773	11.2674
trt	2	7.6678	1.8337	5.96	4.18	0.0059	0.05	3.1728	12.1629
trt	3	10.9322	1.8337	5.96	5.96	0.0010	0.05	6.4371	15.4273
trt	4	13.6276	1.8337	5.96	7.43	0.0003	0.05	9.1325	18.1227

Differences of Least Squares Means


Effect	trt	_trt	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper
trt	1	2	-0.8955	0.9176	3.13	-0.98	0.3983	0.05	-3.7462	1.9552
trt	1	3	-4.1598	0.9176	3.13	-4.53	0.0183	0.05	-7.0105	-1.3092
trt	1	4	-6.8552	0.9176	3.13	-7.47	0.0043	0.05	-9.7059	-4.0045
trt	2	3	-3.2643	0.9176	3.13	-3.56	0.0353	0.05	-6.1150	-0.4137
trt	2	4	-5.9597	0.9176	3.13	-6.49	0.0065	0.05	-8.8104	-3.1091
trt	3	4	-2.6954	0.9176	3.13	-2.94	0.0574	0.05	-5.5461	0.1553

You can see the distinction between the intra-block and the combined inter- and intra-block comparisons of treatments by comparing results in Output 11.10 and Output 11.12. First of all, the TRT LSMEANS are slightly different in the two output tables. Also, the confidence for the difference between TRT 1 and TRT 2 in Output 11.10 is (–3.802709, 2.102709), whereas the confidence interval in Output 11.12 is (–3.7462, 1.9552). The confidence interval using the combined information in Output 11.12 is slightly narrower. However, this can be misleading. The standard error in Output 11.12 does not take into account the variation induced by estimating the variance-covariance matrix to obtain the estimated GLS estimates of differences between treatment means. If you use DDFM=KENWARDROGER in the MODEL statement you will get a better assessment of the true error of estimation.

11.4 A Crossover Design with Residual Effects

Crossover designs are used in animal nutrition and pharmaceutical studies to compare two or more treatments (diets or drugs). The treatments are administered sequentially to each subject over a set of time periods. This enables the comparison of treatments on a within-subjects basis. However, there is a possibility that the response obtained after a particular time period might be influenced by the treatment assigned not only in that period but also in previous periods. If so, then the response contains residual effects from the previous periods. Some authors call these “carry-over” effects. Certain crossover designs permit the residual effects to be estimated, and thus to be effectively removed from estimates of treatment means and comparisons of means.

Cochran and Cox (1957) present two 3×3 Latin squares as a design for estimating the residual effects on milk yields of treatment from the preceding period. The treatment allocation is shown in the table below. The columns of the two squares contain the six possible sequences.

Square Cow		I	1 II	III	IV	2 V	VI
	1	A	B	C	A	B	C
Period	2	B	C	A	C	A	B
	3	C	A	B	B	C	A

Output 11.13 contains data from a study that was conducted to compare the effects on heart rate of three treatments; a test drug, a standard drug, and a placebo. Treatments were assigned in the six possible sequences to four patients each. The treatment design for the data in Output 11.13 is equivalent to the Cochran and Cox design in the table above with sequences A-F in Output 11.13 corresponding to Cows I-VI in the table, respectively.

Heart rate was measured one hour following the administration of treatment in each of three visits. The visits are labeled 2, 3, and 4, because visit 1 was a preliminary visit for baseline data. Thus, in the general terminology of crossover designs, period 1 is visit 2, period 2 is visit 3, and period 3 is visit 4. Baseline heart rate was measured, but it is not used in the illustrative analysis.

A model for the data is

y_ijk = μ + α_i + d_j + β_k + τ_l(ik) + ρ_m(ik) + e_ijk

where α_i is the effect of sequence i, d_j is the random effect of patient j, β_k is the effect of visit k, τ_l(ik) is the direct effect of treatment l, τ_l(ik) is the residual effect of treatment m, and e_ijk is a random effect associated with patient j in visit k. The subscript l(jk) on the treatment direct effect indicates that the treatment (l) is a function of the visit (k) and sequence (i). The same is true of the treatment residual effect subscript m(ik).

When using PROC GLM to analyze data from a crossover design, the sequence, patient, period, and direct treatment effects can be incorporated into the model with the dummy variables that result from using a CLASS statement. However, it is more convenient to use explicitly created covariates in the model for the residual effects. In the data set for the heartrate data, we create covariates for the standard and test drug residual effects named RESIDS and RESIDT, respectively. Their values in the first period (visit 2) are zero because there is no period prior to the first period that would contribute a residual effect. In periods 2 and 3 (visits 3 and 4), the values of RESIDS and RESIDT are 0 or ±1 depending on the treatment in the preceding visit. This particular coding provides estimates of the residual effects corresponding to those prescribed by Cochran and Cox (1957). For example, patient number 2 is in sequence F (test, placebo, standard). The values of RESIDS and RESIDT are both 0 in the first period (visit 2). Patient 2 received the test drug in period 1, so in period 2 (visit 3), the covariates have values RESIDS=0 and REISIDT=1. This specifies that the residual effect ρ_T for test is contained in the observation on patient 2 in period 2. In period 3 (visit 4), the covariates both have values of –1. This coding specifies a sum-to-zero constraint on the residual effects. Thus, the residual effect ρ_P of the placebo satisfies the equation ρ_P = −ρ_T − ρ_S, and hence the residual effect ρ_P of the placebo can be represented with –1 times the residual effects of test and standard.

Output 11.13 Date for Crossover Design with Residual Effects

PATIENT	SEQUENCE	VISIT	BASEHR	HR	DRUG	RESIDT	RESIDS

1	B	2	86	86	placebo	0	0
1	B	3	86	106	test	-1	-1
1	B	4	62	79	standard	1	0
2	F	2	48	66	test	0	0
2	F	3	58	56	placebo	1	0
2	F	4	74	79	standard	-1	-1
3	B	2	78	84	placebo	0	0
3	B	3	78	76	test	-1	-1
3	B	4	82	91	standard	1	0
4	D	2	66	79	standard	0	0
4	D	3	72	100	test	0	1
4	D	4	90	82	placebo	1	0
5	C	2	74	74	test	0	0
5	C	3	90	71	standard	1	0
5	C	4	66	62	placebo	0	1
6	B	2	62	64	placebo	0	0
6	B	3	74	90	test	-1	-1
6	B	4	58	85	standard	1	0
7	A	2	94	75	standard	0	0
7	A	3	72	82	placebo	0	1
7	A	4	100	102	test	-1	-1
8	A	2	54	63	standard	0	0
8	A	3	54	58	placebo	0	1
8	A	4	66	62	test	-1	-1
9	D	2	82	91	standard	0	0
9	D	3	96	86	test	0	1
9	D	4	78	88	placebo	1	0
10	C	2	86	82	test	0	0
10	C	3	70	71	standard	1	0
10	C	4	58	62	placebo	0	1
11	F	2	82	80	test	0	0
11	F	3	80	78	placebo	1	0
11	F	4	72	75	standard	-1	-1
12	E	2	96	90	placebo	0	0
12	E	3	92	93	standard	-1	-1
12	E	4	82	88	test	0	1
13	D	2	78	87	standard	0	0
13	D	3	72	80	test	0	1
13	D	4	76	78	placebo	1	0
14	F	2	98	86	test	0	0
14	F	3	86	86	placebo	1	0
14	F	4	70	79	standard	-1	-1
15	A	2	86	71	standard	0	0
15	A	3	66	70	placebo	0	1
15	A	4	74	90	test	-1	-1
16	E	2	86	86	placebo	0	0
16	E	3	90	103	standard	-1	-1
16	E	4	82	86	test	0	1
17	A	2	66	83	standard	0	0
17	A	3	82	86	placebo	0	1
17	A	4	86	102	test	-1	-1
18	F	2	66	82	test	0	0
18	F	3	78	80	placebo	1	0
18	F	4	74	95	standard	-1	-1
19	E	2	74	80	placebo	0	0
19	E	3	78	79	standard	-1	-1
19	E	4	70	74	test	0	1
20	B	2	66	70	placebo	0	0
20	B	3	74	62	test	-1	-1
20	B	4	62	67	standard	1	0
21	C	2	82	90	test	0	0
21	C	3	90	103	standard	1	0
21	C	4	76	82	placebo	0	1
22	C	2	82	82	test	0	0
22	C	3	66	83	standard	1	0
22	C	4	90	82	placebo	0	1
23	E	2	82	66	placebo	0	0
23	E	3	74	87	standard	-1	-1
23	E	4	82	82	test	0	1
24	D	2	72	75	standard	0	0
24	D	3	82	86	test	0	1
24	D	4	74	82	placebo	1	0

The following SAS statements can be used to construct an analysis of variance and parameter estimates similar to those proposed by Cochran and Cox (1957):

proc glm data=hrtrate;
   class sequence patient visit drug;
   model hr = sequence patient(sequence) visit drug
      resids residt / solution;
   random patient(sequence)
run;

ANOVA results appear in Output 11.14.

Output 11.14 ANOVA for a Crossover Design

Source	DF	Sum of Squares	Mean Square	F Value	Pr > F

Model	29	6408.694444	220.989464	3.91	<.0001
Error	42	2372.583333	56.490079
Corrected Total	71	8781.277778


R-Square	Coeff Var	Root MSE	HR Mean
0.729813	9.301326	7.515988	80.80556

Source	DF	Type I SS	Mean Square	F Value	Pr > F

SEQUENCE	5	508.944444	101.788889	1.80	0.1333
PATIENT(SEQUENCE)	18	4692.333333	260.685185	4.61	<.0001
VISIT	2	146.777778	73.388889	1.30	0.2835
DRUG	2	668.777778	334.388889	5.92	0.0054
resids	1	391.020833	391.020833	6.92	0.0119
residt	1	0.840278	0.840278	0.01	0.9035


Source	DF	Type III SS	Mean Square	F Value	Pr > F

SEQUENCE	5	701.183333	140.236667	2.48	0.0466
PATIENT(SEQUENCE)	18	4692.333333	260.685185	4.61	<.0001
VISIT	2	146.777778	73.388889	1.30	0.2835
DRUG	2	343.950000	171.975000	3.04	0.0583
resids	1	309.173611	309.173611	5.47	0.0241
residt	1	0.840278	0.840278	0.01	0.9035

The desired ANOVA table is constructed as follows:

Source of Variation	DF	SS
Sequence	5	508.94	(Type I)
Patient(Sequence)	18	4692.33	(Type I)
Visits	2	146.78	(Type III)
Direct effect of drugs (adjusted for residual effects)	2	343.95	(Type III)

Residual effects(adjusted)	2	391.86	(Type I SS RESIDS+Type I SS RESIDT)

Expected mean squares shown in Output 11.15 show that appropriate tests for VISIT, DRUG, and the carry-over effect covariates utilize residual means square as an error term. A test for SEQUENCE would use PATIENT(SEQUENCE) in the error term.

Output 11.15 Expected Mean Squares for a Crossover Design

Source	Type III Expected Mean Square

SEQUENCE	Var(Error) + 2.76 Var(PATIENT(SEQUENCE)) + Q(SEQUENCE)

PATIENT(SEQUENCE)	Var(Error) + 3 Var(PATIENT(SEQUENCE))

VISIT	Var(Error) + Q(VISIT)

DRUG	Var(Error) + Q(DRUG)

resids	Var(Error) + Q(resids)

residt	Var(Error) + Q(residt)

The effect of SEQUENCE is clearly not significant, since the F-ratio would be less than 1 using either a Type I or a Type III mean square in the numerator. The Type III test for DRUG has a significance level p=0.0538. The Type III mean square for DRUG has been adjusted for the residual effects. The Type I mean square for DRUG is not adjusted for the residual effects, and an F-test based on it has a significance probability p=0.0054. Thus, results from tests for DRUG depend on whether residual effects have been removed or not. Estimates of the direct and residual effect parameters can be obtained from Output 11.16.

Output 11.16 Parameter Estimates for a Crossover Design

Parameter		Estimate	Standard Error	t Value	Pr > \|t\|

Intercept		82.06250000 B	4.72870558	17.35	<.0001
SEQUENCE	A	6.20833333 B	6.23192824	1.00	0.3249
SEQUENCE	B	-19.33333333 B	6.23192824	-3.15	0.0030
SEQUENCE	C	-0.47916667 B	6.23192824	-0.08	0.9391
SEQUENCE	D	-1.81250000 B	6.23192824	-0.29	0.7726
SEQUENCE	E	-5.79166667 B	6.23192824	-0.93	0.3580
SEQUENCE	F	0.00000000 B	.	.	.
PATIENT(SEQUENCE)	7 A	-4.00000000 B	6.13677871	-0.65	0.5181
PATIENT(SEQUENCE)	8 A	-29.33333333 B	6.13677871	-4.78	<.0001
PATIENT(SEQUENCE)	15 A	-13.33333333 B	6.13677871	-2.17	0.0355
PATIENT(SEQUENCE)	17 A	0.00000000 B	.	.	.
...
PATIENT(SEQUENCE)	2 F	-18.66666667 B	6.13677871	-3.04	0.0040
PATIENT(SEQUENCE)	11 F	-8.00000000 B	6.13677871	-1.30	0.1995
PATIENT(SEQUENCE)	14 F	-2.00000000 B	6.13677871	-0.33	0.7461
PATIENT(SEQUENCE)	18 F	0.00000000 B	.	.	.
VISIT	2	-2.58333333 B	2.16967892	-1.19	0.2405
VISIT	3	0.75000000 B	2.16967892	0.35	0.7313
VISIT	4	0.00000000 B	.	.	.
DRUG	standard	2.31250000 B	2.42577478	0.95	0.3459
DRUG	test	5.93750000 B	2.42577478	2.45	0.0186
DRUG	placebo	0.00000000 B	.	.	.
resids		-4.39583333	1.87899706	-2.34	0.0241
residt		0.22916667	1.87899706	0.12	0.9035

First of all, the residual effects presented by Cochran and Cox (1957) are obtained from the parameter estimates for RESIDS and RESIDT. The values are

STD: –4.396

TST: 0.229

PCB: – (–4.396 + 0.229) = 4.167

Notice that these estimates come from the sum-to-zero coding for the residual effect dummy variables.

The direct treatment effects reported by Cochran and Cox (1957) can be obtained from the TRTMENT parameter estimates according to the following equations:

STD: –0.4375 = 2.3125 (1/3)(2.3125 + 5.9375 + 0.0000)

TST: 3.1875 = 5.9375 (1/3)(2.3125 + 5.9375 + 0.0000)

PCB: –2.7500 = 0.000 (1/3)(2.3125 + 5.9375 + 0.0000)

Thus, the direct effects can be obtained from the following ESTIMATE statements:

estimate 'DIRECT EFFECT OF STD'
          drug 2 –1 –1 / divisor=3;
estimate 'DIRECT EFFECT OF TST'
          drug –1 2 –1 / divisor=3;
estimate 'DIRECT EFFECT OF PCB'
          drug –1 –1 2 / divisor=3;

Results from these ESTIMATE statements appear in Output 11.17.

Output 11.17 Direct Effect Estimates

The GLM Procedure

Parameter	Estimate	Standard Error	t Value	Pr > \|t\|

DIRECT EFFECT OF STD	-0.43750000	1.40052172	-0.31	0.7563
DIRECT EFFECT OF TST	3.18750000	1.40052172	2.28	0.0280
DIRECT EFFECT OF PCB	-2.75000000	1.40052172	-1.96	0.0562

The direct effect means reported by Cochran and Cox (1957) are equal to the overall mean 80.8056 (printed as HR mean in Output 11.14) added to the direct effects. They are also equal to the GLM least-squares means, obtained from the following statement:

lsmeans drug / pdiff cl e;

The results appear in Output 11.18. You can see from the estimable functions that the LS means contain the INTERCEPT, and average across the SEQUENCE, PATIENT(SEQUENCE), and VISIT parameters. Thus, the correct standard error of these LS means would contain variance due to PATIENT(SEQUENCE). However, this variance is not contained in the standard error computed by PROC GLM for the LS means. (That is why we specified the STDERR option in the LSMEANS statement.) As a consequence, the confidence intervals for LS means displayed in Output 11.18 are not valid. However, the confidence intervals for the differences between LS means in Output 11.18 are valid because the INTERCEPT, SEQUENCE, PATIENT(SEQUENCE), and VISIT parameters would drop out of the differences.

Output 11.18 Least-Squares Means for a Crossover Design

Least Squares Means

Coefficients for DRUG Least Square Means

Effect		DRUG Level standard	test	placebo

Intercept		1	1	1
SEQUENCE	A	0.16666667	0.16666667	0.16666667
...
SEQUENCE	F	0.16666667	0.16666667	0.16666667
PATIENT(SEQUENCE)	7 A	0.04166667	0.04166667	0.04166667
PATIENT(SEQUENCE)	8 A	0.04166667	0.04166667	0.04166667
PATIENT(SEQUENCE)	16 A	0.04166667	0.04166667	0.04166667
PATIENT(SEQUENCE)	18 A	0.04166667	0.04166667	0.04166667
...
PATIENT(SEQUENCE)	12 F	0.04166667	0.04166667	0.04166667
PATIENT(SEQUENCE)	17 F	0.04166667	0.04166667	0.04166667
PATIENT(SEQUENCE)	20 F	0.04166667	0.04166667	0.04166667
PATIENT(SEQUENCE)	24 F	0.04166667	0.04166667	0.04166667
VISIT	2	0.33333333	0.33333333	0.33333333
VISIT	3	0.33333333	0.33333333	0.33333333
VISIT	4	0.33333333	0.33333333	0.33333333
DRUG	standard	1	0	0
DRUG	test	0	1	0
DRUG	placebo	0	0	1
resids		0	0	0
residt		0	0	0

Least Squares Means for Effect DRUG

DRUG	HR LSMEAN	95% Confidence Limits

standard	80.368056	77.023853	83.712258
test	83.993056	80.648853	87.337258
placebo	78.055556	74.711353	81.399758

i	j	Difference Between Means	95% Confidence Limits for LSMean(i)-LSMean(j)

1	2	-3.625000	-8.520412	1.270412
1	3	2.312500	-2.582912	7.207912
2	3	5.937500	1.042088	10.832912

PROC MIXED can be used to analyze the crossover design data. Run the following statements:

proc mixed data=hrtrate order=internal;
class sequence patient visit drug;
model hr=sequence visit drug resides residt/solution
ddfm=satterth;
random patient(sequence);
lsmeans drug / pdiff cl e;
run;

Edited results appear in Output 11.19.

Output 11.19 Partial Mixed-Model Results for a Crossover Design

The Mixed Procedure

Covariance Parameter Estimates

Cov Parm	Estimate

PATIENT(SEQUENCE)	68.0650
Residual	56.4901

Type 3 Tests of Fixed Effects

Effect	Num DF	Den DF	F Value	Pr > F

SEQUENCE	5	18.7	0.58	0.7165
VISIT	2	42	1.30	0.2835
DRUG	2	42	3.04	0.0583
resids	1	42	5.47	0.0241
residt	1	42	0.01	0.9035

Least Squares Means

Effect	DRUG	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper

DRUG	standard	80.3681	2.3626	38	34.02	<.0001	0.05	75.5852	85.1510
DRUG	test	83.9931	2.3626	38	35.55	<.0001	0.05	79.2102	88.7760
DRUG	placebo	78.0556	2.3626	38	33.04	<.0001	0.05	73.2727	82.8385

Differences of Least Squares Means

Effect	DRUG	_DRUG	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha
DRUG	standard	test	-3.6250	2.4258	42	-1.49	0.1426	0.05
DRUG	standard	placebo	2.3125	2.4258	42	0.95	0.3459	0.05
DRUG	test	placebo	5.9375	2.4258	42	2.45	0.0186	0.05

Differences of Least Squares Means

Effect	DRUG	_DRUG	Lower	Upper

DRUG	standard	test	-8.5204	1.2704
DRUG	standard	placebo	-2.5829	7.2079
DRUG	test	placebo	1.0421	10.8329

The test of significance for DRUG in “Type 3 Tests of Fixed Effects” in Output 11.19 is the same as the test from GLM in Output 11.15. Likewise, the least-squares means are equal in the two analyses. This illustrates that ordinary least-squares analyses, as performed by GLM, can be equivalent to generalized least-squares analyses, as performed by MIXED. The phenomenon occurs in this example because the within-patients effects are orthogonal to the between-patients effects. However, notice that the confidence intervals for differences between LS means are the same in Outputs 11.18 and 11.19, but the confidence intervals for the LS means themselves are wider in Output 11.19 than in Output 11.18 because PROC MIXED computes standard errors of LS means that incorporate the PATIENT(SEQUENCE) variance.

11.5 Models for Experiments with Qualitative and Quantitative Variables

The material in this section is related to the discussions of regression analysis in Chapter 2 and analysis of covariance in Chapter 7. This section concerns details of certain models that contain dummy variables generated from the CLASS statement, and also a continuous variable. These are several regression models in one equation. Of particular interest are cases for which the regressions have a common intercept. These types of models are frequently used, for example, in relative potency and relative bioavailability studies (Littell et al. 1997).

Many experiments involve both qualitative and quantitative factors. For example, the tensile strength (TS) of a monofilament fiber depends on the amount (AMT) of a chemical used in the manufacturing process. This chemical can be obtained from three different sources (SOURCE), with values A, B, or C. SOURCE is a qualitative variable and AMT is a quantitative variable. Measurements of TS were obtained from samples from different amounts and sources. The SAS data set named MONOFIL appears in Output 11.20.

Output 11.20 Data for an Experiment with Qualitative and Quantitative Variables

Obs	SOURCE	AMT	TS

1	A	1	11.5
2	A	2	13.8
3	A	3	14.4
4	A	4	16.8
5	A	5	18.7
6	B	1	10.8
7	B	2	12.3
8	B	3	13.7
9	B	4	14.2
10	B	5	16.6
11	C	1	13.1
12	C	2	16.2
13	C	3	19.0
14	C	4	22.9
15	C	5	26.5

A simple linear regression model for each source relates to TS and AMT:

TS = α_A + β_A + ε (SOURCE A)

TS = α_B + β_B + ε (SOURCE B)

TS = α_C + β_C + ε (SOURCE C)

The parameters α_A and β_A are the intercept and slope, respectively, for SOURCE=A.

The following statements produce the analysis of variance and parameter estimates in Output 11.21.

proc glm data=monofil;
class source;
model ts=source amt source*amt / solution;
run;

Output 11.21 A Model with Main Effects and Interactions

The GLM Procedure

Dependent Variable: ts


Source	DF	Sum of Squares	Mean Square	F Value	Pr > F

Model	5	258.7273333	51.7454667	263.71	<.0001

Error	9	1.7660000	0.1962222

Corrected Total	14	260.4933333


R-Square	Coeff Var	Root MSE	ts Mean

0.993221	2.762805	0.442970	16.03333


Source	DF	Type I SS	Mean Square	F Value	Pr > F

Source	2	98.0013333	49.0006667	249.72	<.0001
amt	1	138.2453333	138.2453333	704.53	<.0001
amt*source	2	22.4806667	11.2403333	57.28	<.0001


Source	DF	Type III SS	Mean Square	F Value	Pr > F

Source	2	0.0702424	0.0351212	0.18	0.8390
amt	1	138.2453333	138.2453333	704.53	<.0001
amt*source	2	22.4806667	11.2403333	57.28	<.0001


Parameter	Estimate	Standard Error	t Value	Pr > \|t\|

Intercept	9.490000000 B	0.46459062	20.43	<.0001
source A	0.330000000 B	0.65703036	0.50	0.6275
source B	-0.020000000 B	0.65703036	-0.03	0.9764
source C	0.000000000 B	.	.	.
amt	3.350000000 B	0.14007934	23.92	<.0001
amt*source A	-1.610000000 B	0.19810211	-8.13	<.0001
amt*source B	-2.000000000 B	0.19810211	-10.10	<.0001
amt*source C	0.000000000 B	.	.	.

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

These parameter estimates pertain to the integrated model

$TS = α_{C} + {α^{'}}_{A} D_{A} + {α^{'}}_{B} D_{B} + β_{C} AMT + {β^{'}}_{A} D_{A} AMT + {β^{'}}_{B} D_{B} AMT + ε$ $TS = α_{C} + {α^{'}}_{A} D_{A} + {α^{'}}_{B} D_{B} + β_{C} AMT + {β^{'}}_{A} D_{A} AMT + {β^{'}}_{B} D_{B} AMT + ε$

The parameters α′ and β′ are further defined as

α'_A = α_A - α_C α'_B = α_B - α_C
β'_A = β_A - β_C β'_B = β_B - β_C

The variable D_A is a dummy variable equal to 1 for SOURCE=A and equal to 0 otherwise, and D_B has a corresponding definition with respect to SOURCE=B. Thus, the regression models for the three nitrogen sources are

TS = (α_C + α'_A) + (β_C + β'_A) AMT + ε (SOURCE A)
TS = (α_C + α'_B) + (β_C + β'_B) AMT + ε (SOURCE B)
TS = α_C + α_CAMT + ε (SOURCE C)

Therefore, the fitted equations are

TS = 9.49 + 0.33 + (3.35 - 1.61) AMT (SOURCE A)
      = 9.82 + 1.74 AMT
TS = 9.49 - 0.02 + (3.35 - 2.00) AMT (SOURCE B)
      = 9.47 + 1.35 AMT
TS = 9.49 + 3.35 AMT                         (SOURCE C)

The GLM parameter estimates, in effect, treat the regression line for SOURCE=C as a reference line, and the parameters α′_A, α′_B, β′_A, and β′_B are parameters for lines A and B minus parameters for line C. The AMT source parameters β′_A and β′_B measure differences between the slopes for regression lines A and B, and line C, respectively. Thus, a test that these parameters are 0 is testing that the lines are parallel, that is, they have equal slopes. The appropriate statistic is the F=57.28 for the AMT*SOURCE effect, which has a significant probability p=0.0001.

Caution is advised in using the Type III F-test for SOURCE. It is a test of the equality of the intercepts (H₀: α_A = α_B = α_C), which probably has no practical interpretation because the intercepts are simply extrapolations of the lines to L=0. The Type I F-test, on the other hand, tests the equality of the midpoints of the regression lines (H₀: α_A + β_A (2) = α_B + β_B(2) = α_C + β_C(2)).

You can compare two sources at a given amount with an ESTIMATE statement. Suppose you want to compare SOURCE=A with SOURCE=B using AMT=3.5. This difference is

(α_A + β_A(3.5)) - (α_B + β_B(3.5))
=((α_C + α'_A) + (β_C + β'_C) 3.5)
= ((α_C + α'_B) + (β_C + β'_C) 3.5)
= α'_A + α'_B + (β'_A + β'_B) 3.5

So the appropriate ESTIMATE statement is

estimate 'A vs B at AMT=3.5'
source 1 -1 0
source*amt 3.5 -3.5 0;

The results appear in Output 11.22.

Output 11.22 The Difference between SOURCE=A and SOURCE=B at AMT=3.5

The GLM Procedure

Dependent Variable: ts

Parameter	Estimate	Standard Error	t Value	Pr > \|t\|

A vs B at AMT=3.5	1.71500000	0.29715316	5.77	0.0003

Suppose TS also is measured for AMT=0. This variation of the experiment is commonly mishandled by data analysts. Since AMT=0 means there is no chemical, the intercepts for the models are all equal, α_A = α_B = α_C. Thus, a correct analysis should provide equal estimates of the intercepts. The regressions can be written simultaneously as

TS = α + γ_AD_AAMT + γ_BD_BAMT + γ_CD_CAMT + ε

where D_A is a dummy variable equal to 1 for SOURCE=A and equal to 0 otherwise, and D_B and D_C have corresponding definitions with respect to SOURCE=B and SOURCE=C. Use PROC GLM to create D_A, D_B, and D_C by including the SOURCE variable in a CLASS statement.

Look at the data set MONOFIL2 printed in Output 11.23. The value C is arbitrarily assigned to SOURCE when AMT=0.

Output 11.23 Data with AMT=0

Qual and Quant Variables

Obs	source	amt	ts

1	A	1	11.5
2	A	2	13.8
3	A	3	14.4
4	A	4	16.8
5	A	5	18.7
6	B	1	10.8
7	B	2	12.3
8	B	3	13.7
9	B	4	14.2
10	B	5	16.6
11	C	1	13.1
12	C	2	16.2
13	C	3	19.0
14	C	4	22.9
15	C	5	26.5
16	C	0	10.1
17	C	0	10.2
18	C	0	9.8
19	C	0	9.9
20	C	0	10.2

The following statements produce Output 11.24:

proc glm;
class source;
model ts=amt*source / solution

Output 11.24 Parameter Estimates for Data with AMT=0

The GLM Procedure

Dependent Variable: ts


Source	DF	Sum of Squares	Mean Square	F Value	Pr > F

Model	3	393.0051791	131.0017264	903.34	<.0001

Error	16	2.3203209	0.1450201

Corrected Total	19	395.3255000


R-Square	Coeff Var	Root MSE	ts Mean

0.994131	2.619986	0.380815	14.53500


Source	DF	Type I SS	Mean Square	F Value	Pr > F

amt*source	3	393.0051791	131.0017264	903.34	<.0001


Source	DF	Type III SS	Mean Square	F Value	Pr > F

amt*source	3	393.0051791	131.0017264	903.34	<.0001

Parameter	Estimate	Standard Error	t Value	Pr > \|t\|

Intercept	9.882352941	0.13699380	72.14	<.0001
amt*source A	1.722994652	0.06350310	27.13	<.0001
amt*source B	1.237540107	0.06350310	19.49	<.0001
amt*source C	3.242994652	0.06350310	51.07	<.0001

Parameter estimates in Output 11.24 yield the three prediction equations

TS = 9.88 + 1.72 AMT	(SOURCE A)
TS = 9.88 + 1.24 AMT	(SOURCE B)
TS = 9.88 + 3.24 AMT	(SOURCE C)

The relative effect of one source to another can be measured by the ratio of slopes of the regression parameters. For example, the strength of SOURCE B relative to SOURCE A is the ratio 1.24/1.72 = 0.72. This means that one unit of the chemical from SOURCE B has the same effect on tensile strength as .72 units of the chemical from SOURCE A.

Similar models are used in other types of applications. The potency of one drug relative to another in a drug study, or the bioavailability of one nutrient relative to another in a nutrition study, is measured in the same way.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11 Examples of Special Applications

Create new playlist

Sign In

Sign Up

11.1 Introduction

11.2 Confounding in a Factorial Experiment

11.2.1 Confounding with Blocks

11.2.2 A Fractional Factorial Example

11.3 A Balanced Incomplete-Blocks Design

11.4 A Crossover Design with Residual Effects

11.5 Models for Experiments with Qualitative and Quantitative Variables

Table of Contents for
Chapter 11 Examples of Special Applications