The purpose of this chapter is to provide detailed information about how PROC GLM and PROC MIXED work for certain applications. Certainly, this is not a complete documentation. Rather, it provides enough information to you for a basic understanding and a basis for further reading.

Both GLM and MIXED utilize “dummy” variables, which are also called “indicator” variables in mathematics. They are created whenever a CLASS statement is specified. The primary distinction between GLM and MIXED in this regard is that MIXED separates the sets of dummy variables into a group for fixed effects and a group for random effects, whereas the primary computations of GLM use the dummy variables as representing fixed effects. The general linear model approach uses dummy variables in a regression model. Although this technique is useful in all situations, it is primarily applied to analysis of variance with unbalanced data, where the direct computation of sums of squares fails, and to analysis of covariance and associated techniques.

While the dummy variable approach is capable of handling a vast array of applications, it also presents some complications that must be overcome. Two of the principal complications regarding fixed effects are

❏ specifying model parameters and their estimates

❏ setting up meaningful combinations of parameters for testing and estimation.

Both of these are concerned with estimable functions. These complications must be dealt with in computer programs using general linear models. The purpose of this chapter is to explain, with the use of fairly simple examples, how the GLM procedure deals with the complications. A more technical description of GLM features is given in the SAS/STAT User’s Guide, Version 8, Volume 2.

This chapter describes the essence of general linear model and mixed-model computations. It is more or less self-contained, and you will notice some overlap with previous and subsequent chapters. In particular, the CONTRAST and ESTIMATE statements are discussed in Chapter 3, “Analysis of Variance for Balanced Data,” and the RANDOM statement is discussed in Chapter 4, “Analyzing Data with Random Effects.” This present chapter delves more deeply into some of the same topics. Section 6.2 provides essential concepts of using dummy variables in the context of a one-way classification. Section 6.3 does the same for a two-way classification with both factors fixed. Then Section 6.4 discusses technical issues for mixed models.

6.2 The Dummy-Variable Model

This section presents the analysis-of-variance model using dummy variables, methods for specifying model parameters, and the methods used by PROC GLM. For simplicity, an analysis-of-variance model with one-way classification that results from a completely randomized design illustrates the discussion. In application, however, such a structure might be adequately (and more efficiently) analyzed by using the ANOVA procedure (see Section 3.4, “Analysis of One-Way Classification of Data”).

6.2.1 The Simplest Case: A One-Way Classification

Data for the one-way classification consist of measurements classified according to a one-dimensional criterion. An example of this kind of structure is a set of student exam scores, where each student is taught by one of three teachers. The exam scores are thus grouped or classified according to TEACHER. The most straightforward model for data of this type is

y_ij = μ_i + ε_ij

where

y_ij	represents the jth measurement in the ith group.
μ_i	represents the population mean for the ith group.
ε_ij	represents the random error with mean=0 and variance=σ².
i = 1,..., t	where t equals the number of groups.
j = 1,..., n_i	where n_i equals the number of observations in the ith group.

This is called the means or μ -model because it uses the means μ₁,..., μ_t, as the basic parameters in the mathematical expression for the model (Hocking and Speed 1975). The corresponding estimates of these parameters are

μˆ1 = ȳ 1. . . .μˆt = ȳ t. $\begin{array}{l} {\hat{μ}}_{1} = {\bar{y}}_{1.} \\ . \\ . \\ . \\ {\hat{μ}}_{t} = {\bar{y}}_{t .} \end{array}$

where y̅i.=(Σjyij)/ni ${\overset{̅}{y}}_{i .} = (Σ_{j} y_{i j}) / n_{i}$ is the mean of n_i observations in group i.

In these situations, the statistical inference of interest is often about differences between the means of the form (μ_i − μ_i,) or between the means and some reference or baseline value μ. Therefore, many statistical textbooks present a model for the one-way structure that employs these differences as basic parameters. This is the familiar analysis-of-variance model illustrated in Section 2.3.4:

y_ij = μ + τ_i + ε_ij

where μ equals the reference value and

τ_i = μ_i − μ

Thus, the means can be expressed as

μ_i = μ + τ_i

This relates the set of t means μ_i,..., μ_t, to a set of t+1 parameters, μ, τ₁,..., τ_t. Therefore, this model is said to be overspecified. Consequently, the parameters, μ, τ₁,..., τ_t are not well defined. For any set of values of μ₁,..., μ_t, there are infinitely many choices for μ, τ₁,..., τ₁, which satisfy the basic equations μ₁ = μ + τ₁, i = 1,..., t. The choice may depend on the situation at hand, or it may not be necessary to fully define the parameters.

For the implementation of the dummy-variable model, the analysis-of-variance model

y_ij = μ + τ_i + ε_ij

is rewritten as a regression model

y_ij = μ + τ₁x₁ + ... + τ_tx_t + ε_ij

where the dummy variables x₁,...,x_t are defined as follows:

x₁	equals 1 for an observation in group 1 and 0 otherwise.
x₂	equals 1 for an observation in group 2 and 0 otherwise.
·
·
·
x_t	equals 1 for an observation in group t and 0 otherwise.

In matrix notation, the model equations for the data become

Y = ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢y11...y1n1y21...y2n2...yt1...ytnt⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ = ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢1 1 0 . . . 0. . . .. . . .. . . .1 1 0 . . . 01 0 1 . . . 0. . . .. . . .. . . .1 0 1 . . . 0. . . .. . . .. . . .1 0 0 . . . 1. . . .. . . .. . . .1 0 0 . . . 1⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢β0β1...βt⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥ + ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢ε11...ε1n1ε21...ε2n2...εt1...εtnt⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ = Xβ + ε $Y = [\begin{array}{l} y_{11} \\ . \\ . \\ . \\ y_{1 n_{1}} \\ y_{21} \\ . \\ . \\ . \\ y_{2 n_{2}} \\ . \\ . \\ . \\ y_{t 1} \\ . \\ . \\ . \\ y_{t n_{t}} \end{array}] = [\begin{array}{l} 1 1 0 . . . 0 \\ . . . . \\ . . . . \\ . . . . \\ 1 1 0 . . . 0 \\ 1 0 1 . . . 0 \\ . . . . \\ . . . . \\ . . . . \\ 1 0 1 . . . 0 \\ . . . . \\ . . . . \\ . . . . \\ 1 0 0 . . . 1 \\ . . . . \\ . . . . \\ . . . . \\ 1 0 0 . . . 1 \end{array}] [\begin{array}{l} β_{0} \\ β_{1} \\ . \\ . \\ . \\ β_{t} \end{array}] + [\begin{array}{l} ε_{11} \\ . \\ . \\ . \\ ε_{1 n_{1}} \\ ε_{21} \\ . \\ . \\ . \\ ε_{2 n_{2}} \\ . \\ . \\ . \\ ε_{t 1} \\ . \\ . \\ . \\ ε_{t n_{t}} \end{array}] = X β + ε$

Thus, the matrices of the normal equations are

X′X = ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢n. n1 n2 . . . ntn1 n1 0 . . . 0n2 0 n2 . . . 0. . . .. . . .. . . .nt 0 0 . . . nt⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ , X′Y = ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢Y..Y1.Y2....Yt.⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ $X^{'} X = [\begin{array}{l} n_{.} n_{1} n_{2} . . . n_{t} \\ n_{1} n_{1} 0 . . . 0 \\ n_{2} 0 n_{2} . . . 0 \\ . . . . \\ . . . . \\ . . . . \\ n_{t} 0 0 . . . n_{t} \end{array}], X^{'} Y = [\begin{array}{l} Y_{..} \\ Y_{1.} \\ Y_{2.} \\ . \\ . \\ . \\ Y_{t .} \end{array}]$

where Y_i. and Y_.. are totals corresponding to y̅_i. and y̅_... The normal equations (X′X) β̂ = X′Y are equivalent to the set

μˆμˆ...μˆ+++τˆ1=y¯1.τˆ2=y¯2.τˆt=y¯t. $\begin{array}{l} \hat{μ} & + & {\hat{τ}}_{1} = {\bar{y}}_{1} . \\ \hat{μ} & + & {\hat{τ}}_{2} = {\bar{y}}_{2} . \\ . \\ . \\ . \\ \hat{μ} & + & {\hat{τ}}_{t} = {\bar{y}}_{t} . \end{array}$

Because there are only t- equations, there is no unique solution for the (t+1) estimates μˆ, τˆ1 , . . . , τˆt. $\hat{μ}, {\hat{τ}}_{1}, . . ., {\hat{τ}}_{t} .$ Corresponding to this, the X′X matrix describing the set of normal equations is of dimension (t+1) x(t+1) and of rank t. In this model the first row of X′X is equal to the sum of the other t-rows. The same relationship exists among the columns of X′X. Therefore, X′X is said to be of less than full rank.

6.2.2 Parameter Estimates for a One-Way Classification

There are two popular methods for obtaining estimates with a less-than-full-rank model. Restrictions can be imposed on the parameters to obtain a full-rank model, or a generalized inverse of X′X can be obtained. PROC GLM uses the latter method. This section reviews both methods in order to put the approach used by PROC GLM into perspective.

The restrictions method is based on the fact that any definition of one of the parameters in the model (say the reference parameter) causes the other parameters to be uniquely defined. The definition can be restated in the form of a restriction. Another view of the term restriction is to define the parameters to have a unique interpretation. The corresponding estimates are then required to coincide with the definition of the parameters.

One type of restriction is to define one of the τ_i equal to 0, say τ_t = 0. In this case, becomes the mean of the tth group μ_τ = μ + τ_t = μ and τ_i becomes the difference between the mean for the ith group and the mean for the tth group, τ_i = μ − μ = μ_i − μ_t.

The corresponding restriction on the solution to the normal equations is to require τˆi=0. ${\hat{τ}}_{i} = 0.$ Requiring τˆt=0. ${\hat{τ}}_{t} = 0.$ leads automatically to a unique set of values for the remaining set of estimates μˆ,τˆ1, . . . , τˆt−1. $\hat{μ}, {\hat{τ}}_{1}, . . ., {\hat{τ}}_{t - 1} .$ . This occurs because τt $τ_{t}$ is dropped from the linear model. Consequently, the column corresponding to τ_t is dropped from the X matrix, producing the following model equation:

⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢y11...y1n1...yt−1,1...yt−1,nt−1yt1...ytnt⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ = ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢1 1 . . . 0. . .. . .. . .1 1 . . . 0. . .. . .. . .1 0 . . . 1. . .. . .. . .1 0 . . . 11 0 . . . 0. . .. . .. . .1 0 . . . 0⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢μτ1...τt−1⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥ + ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢ε11...ε1n1...εt−1, 1...εt−1, nt−1εt−1...εtnt⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ $[\begin{array}{l} y_{11} \\ . \\ . \\ . \\ y_{1 n_{1}} \\ . \\ . \\ . \\ y_{t - 1, 1} \\ . \\ . \\ . \\ y_{t - 1, n_{t - 1}} \\ y_{t 1} \\ . \\ . \\ . \\ y_{t n_{t}} \end{array}] = [\begin{array}{l} 1 1 . . . 0 \\ . . . \\ . . . \\ . . . \\ 1 1 . . . 0 \\ . . . \\ . . . \\ . . . \\ 1 0 . . . 1 \\ . . . \\ . . . \\ . . . \\ 1 0 . . . 1 \\ 1 0 . . . 0 \\ . . . \\ . . . \\ . . . \\ 1 0 . . . 0 \end{array}] [\begin{array}{l} μ \\ τ_{1} \\ . \\ . \\ . \\ τ_{t - 1} \end{array}] + [\begin{array}{l} ε_{11} \\ . \\ . \\ . \\ ε_{1 n_{1}} \\ . \\ . \\ . \\ ε_{t - 1, 1} \\ . \\ . \\ . \\ ε_{t - 1, n_{t - 1}} \\ ε_{t - 1} \\ . \\ . \\ . \\ ε_{t n_{t}} \end{array}]$

The solution to the corresponding normal equation (X′X)βˆ=X′Y $(X^{'} X) \hat{β} = X^{'} Y$ , where X′X is now nonsingular, results in

μˆ = y¯t.τˆ1 = y¯1. − y¯t.τˆ2 = y¯2. − y¯t. . . .τˆ(t−1) = y¯(t−1). − y¯t. $\begin{array}{l} \hat{μ} = {\bar{y}}_{t .} \\ {\hat{τ}}_{1} = {\bar{y}}_{1.} - {\bar{y}}_{t .} \\ {\hat{τ}}_{2} = {\bar{y}}_{2.} - {\bar{y}}_{t .} \\ . \\ . \\ . \\ {\hat{τ}}_{(t - 1)} = {\bar{y}}_{(t - 1) .} - {\bar{y}}_{t .} \end{array}$

Another approach defines μ to be equal to the mean of μ₁, μ₂,..., μ_t; — that is, μ = (μ₁ + μ₂ +...+ μ_t)/t. Then μ is called the grand mean and the τ_i are called the group effects. From this definition of μ, it follows that Σ_i τ_i = 0. Consequently,

τ₁ = − τ₁ − τ₂ −…− τ_t−1

Therefore, observations y_tj = μ + τ_t + ε_ij in the tth group can be written

y_tj = μ − τ₁−τ₂−…−τ_t−1 +ε_ij

The parameter τ_t is dropped from the model, which now becomes

⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢y11y1n1yt−1,1yt−1,nt−1yt1ytnt⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ = ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢1 1 0 0 1 1 0 0 1 0 0 1 1 0 0 11 −1 −1 −1 1 −1 −1 −1⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢μτ1τ2τt−1⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ + ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢ε11ε1n1εt−1, 1εt−1, nt−1εt1εtnt⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ $[\begin{array}{l} y_{11} \\ y_{1 n_{1}} \\ y_{t - 1, 1} \\ y_{t - 1, n_{t - 1}} \\ y_{t 1} \\ y_{t n_{t}} \end{array}] = [\begin{array}{l} 1 1 0 0 \\ 1 1 0 0 \\ 1 0 0 1 \\ 1 0 0 1 \\ 1 - 1 - 1 - 1 \\ \\ 1 - 1 - 1 - 1 \end{array}] [\begin{array}{l} μ \\ τ_{1} \\ τ_{2} \\ τ_{t - 1} \end{array}] + [\begin{array}{l} ε_{11} \\ ε_{1 n_{1}} \\ ε_{t - 1, 1} \\ ε_{t - 1, n_{t - 1}} \\ ε_{t 1} \\ ε_{t n_{t}} \end{array}]$

The solution to the corresponding normal equation yields

μˆ = (y¯1. + ... + y¯t.) / tτˆ1 = y¯1. − y¯..τˆ2 = y¯2. − y¯.. . . .τˆt−1 = y¯(t−1). − y¯.. $\begin{array}{l} \hat{μ} = ({\bar{y}}_{1.} + ... + {\bar{y}}_{t .}) / t \\ {\hat{τ}}_{1} = {\bar{y}}_{1.} - {\bar{y}}_{..} \\ {\hat{τ}}_{2} = {\bar{y}}_{2.} - {\bar{y}}_{..} \\ . \\ . \\ . \\ {\hat{τ}}_{t - 1} = {\bar{y}}_{(t - 1) .} - {\bar{y}}_{..} \end{array}$

and the implementation of the condition τ1 = − τ1 − τ2 − − τt−1 $τ_{1} = - τ_{1} - τ_{2} - - τ_{t - 1}$ yields

τˆt=y¯t.−y¯.. ${\hat{τ}}_{t} = {\bar{y}}_{t .} - {\bar{y}}_{..}$

The use of generalized inverses and estimable functions may be preferable for a variety of reasons. In the restrictions method, it might not be clear which particular restriction is desired. In cases of empty cells in multiway classifications, it can be difficult to define the parameters. In fact, it is often hard to identify the empty cells in large, multiway classifications, let alone to define a set of parameters that adequately describe all pertinent effects and interactions. The generalized-inverse approach partially removes the burden of defining parameters from the data analyst.

Section 2.4.4, “Using the Generalized Inverse,” showed that there is no unique solution to a system of equations with a less-than-full-rank coefficient matrix and introduced the generalized inverse to obtain a nonunique solution. Although the set of parameter estimates produced using the generalized inverse is not unique, there is a class of linear functions of parameters called estimable functions for which unique estimates do exist. For example, the function (τ_i − τ_j) is estimable: its least-squares estimate is the same regardless of the particular solution obtained for the normal equations. For a discussion of the definition of estimable functions as it relates to the theory of linear models, see Graybill (1976) or Searle (1971).

PROC GLM uses a generalized inverse to obtain a solution that produces one set of estimates The technique, in some respects, is parallel to using a set of restrictions that set some of the parameter estimates to 0 Quantities to be estimated or comparisons to be made are specified, and PROC GLM determines whether or not the estimates or comparisons represent estimable functions PROC GLM then provides estimates, standard errors, and test statistics.

For certain applications, there is more than one set of hypotheses that can be tested To cover these situations, PROC GLM provides four types of sums of squares and associated F-statistics and also gives additional information to assist in interpreting the hypotheses tested.

6.2.3 Using PROC GLM for Analysis of Variance

Using PROC GLM for analysis of variance is similar to using PROC ANOVA; the statements listed for PROC ANOVA in Section 3.3.2, “Using the ANOVA and GLM Procedures,” are also used for PROC GLM. In addition to the statements listed for PROC ANOVA, the following SAS statements can be used with PROC GLM:

CONTRAST ‘label’ effect values< . . . effect values> < / options>;
ESTIMATE 'label’ effect values< . . . effect values> < / options>;
ID variables;
LSMEANS effects< / options>;
OUTPUT <OUT=SAS-data-set> keyword= names < . . . keyword=names>;
RANDOM effects< / options>;
WEIGHT variable;

The CONTRAST statement provides a way of obtaining custom hypotheses tests. The ESTIMATE statement can be used to estimate linear functions of the parameters. The LSMEANS (least-squares means) statement specifies effects for which least-squares estimates of means are computed. The uses of these statements are illustrated in Section 6.2.4, “Estimable Functions in the One-Way Classification,” and Section 6.3.6, “MEANS, LSMEANS, CONTRAST, and ESTIMATE Statements in the Two-Way Layout.” The RANDOM statement specifies which effects in the model are random (see Section 6.4.1, “Proper Error Terms”). When predicted values are requested as a MODEL statement option, values of the variable specified in the ID statement are printed for identification beside each observed, predicted, and residual value. The OUTPUT statement produces an output data set that contains the original data set values along with predicted and residual values. The WEIGHT statement is used when a weighted residual sum of squares is needed. For more information, refer to Chapter 24 in the SAS/STAT User’s Guide, Version 8, Volume 2.

Implementing PROC GLM for an analysis-of-variance model is illustrated by an example of test scores made by students in three classes taught by three different teachers. The data appear in Output 6.1.

Output 6.1 Data for One-Way Analysis of Variance

The SAS System

Obs	teach	score1	score2

1	JAY	69	75
2	JAY	69	70
3	JAY	71	73
4	JAY	78	82
5	JAY	79	81
6	JAY	73	75
7	PAT	69	70
8	PAT	68	74
9	PAT	75	80
10	PAT	78	85
11	PAT	68	68
12	PAT	63	68
13	PAT	72	74
14	PAT	63	66
15	PAT	71	76
16	PAT	72	78
17	PAT	71	73
18	PAT	70	73
19	PAT	56	59
20	PAT	77	83
21	ROBIN	72	79
22	ROBIN	64	65
23	ROBIN	74	74
24	ROBIN	72	75
25	ROBIN	82	84
26	ROBIN	69	68
27	ROBIN	76	76
28	ROBIN	68	65
29	ROBIN	78	79
30	ROBIN	70	71
31	ROBIN	60	61

In terms of the analysis-of-variance model described above, the τ_j are the parameters associated with the different teachers (TEACH)—τ₁ is associated with JAY, τ₂ with PAT, and τ₃ with ROBIN. The following SAS statements are used to analyze SCORE2:

proc glm;
class teach;
model score2=teach / solution xpx i;

In this example, the CLASS variable TEACH identifies the three classes. In effect, PROC GLM establishes a dummy variable (1 for presence, 0 for absence) for each level of each CLASS variable. In this example, the CLASS statement causes PROC GLM to create dummy variables corresponding to JAY, PAT, and ROBIN, resulting in the following X matrix:

INTERCEPT JAY PAT ROBIN

X = ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢ μ τ1 τ2 τ3 1 1 0 0 . . . . . . . . . . . . 1 1 0 0 1 0 1 0 . . . . . . . . . . . . 1 0 1 0 1 0 0 1. . . .. . . .. . . .1 0 0 1⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ 6 rows for Jay's group 14 rows for Pat's group 11 rows for Robin's group $X = [\begin{array}{l} μ τ_{1} τ_{2} τ_{3} \\ 1 1 0 0 \\ . . . . \\ . . . . \\ . . . . \\ 1 1 0 0 \\ 1 0 1 0 \\ . . . . \\ . . . . \\ . . . . \\ 1 0 1 0 \\ 1 0 0 1 \\ . . . . \\ . . . . \\ . . . . \\ 1 0 0 1 \end{array}] \begin{matrix} \begin{array}{l} 6 rows for Jay's group \end{array} \\ \begin{array}{l} 14 rows for Pat's group \end{array} \\ \begin{array}{l} 11 rows for Robin's group \end{array} \end{matrix}$

Note that the columns for the dummy variables are in alphabetical order; the column positioning depends only on the values of the CLASS variable. For example, the column for JAY would appear after the columns for PAT and ROBIN if the value JAY were replaced by ZJAY.¹

The MODEL statement has the same purpose in PROC GLM as it does in PROC REG and PROC ANOVA. Note that the MODEL statement contains the SOLUTION option. This option is used because PROC GLM does not automatically print the estimated parameter vector when a model contains a CLASS statement. The results of the SAS statements shown above appear in Output 6.2.

Output 6.2 One-Way Analysis of Variance from PROC GLM

The GLM Procedure

Dependent Variable: resista

			Sum of
	Source	DF	Squares	Mean Square	F Value	Pr > F
	Model	2	49.735861	24.867930	0.56	0.5776

	Error	28	1243.941558	44.426484

	Corrected Total	30	1293.677419


R-Square	Coeff Var	Root MSE	yield Mean

0.038445	9.062496	6.665320	73.54839


Source	DF	Type I SS	Mean Square	F Value	Pr > F

teach	2	49.73586091	24.86793046	0.56	0.5776

Source	DF	Type III SS	Mean Square	F Value	Pr > F

teach	2	49.73586091	24.86793046	0.56	0.5776

			Standard
Parameter		Estimate	Error	t Value	Pr > \|t\|

Intercept		72.45454545 B	2.00966945	36.05	<.0001
teach	JAY	3.54545455 B	3.38277775	1.05	0.3036
teach	PAT	0.90259740 B	2.68553376	0.34	0.7393
teach	ROBIN	0.00000000 B	⋅	⋅	⋅

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

The first portion of the output, as in previous examples, shows the statistics for the overall model.

The second portion partitions the model sum of squares (MODEL SS) into portions corresponding to factors defined by the list of variables in the MODEL statement. In this model there is only one factor, TEACH, so the Type I and Type III SS are the same as the MODEL SS. Type II and Type IV have no special meaning here and would be the same as Type I and Type III.

The final portion of the output contains the parameter estimates obtained with the generalized inverses. Specifying XPX and I in the list of options in the MODEL statement causes the X′X and (X′X)^– matrices to be printed. Results appear in Output 6.3.

Output 6.3 XX and (XX) Matrices for a One-Way Classification

The GLM Procedure

The X'X Matrix

	Intercept	teach JAY	teach PAT	teach ROBIN	score2

Intercept	31	6	14	11	2280
teach JAY	6	6	0	0	456
teach PAT	14	0	14	0	1027
teach ROBIN	11	0	0	11	797
score2	2280	456	1027	797	168984
		The SAS System 14:53 Wednesday, November 14, 2001 3

X'X Generalized Inverse (g2)

	Intercept	teach JAY	teach PAT	teach ROBIN	score2

Intercept	0.0909090909	-0.090909091	-0.090909091	0	72.454545455
teach JAY	-0.090909091	0.2575757576	0.0909090909	0	3.5454545455
teach PAT	-0.090909091	0.0909090909	0.1623376623	0	0.9025974026
teach ROBIN	0	0	0	0	0
score2	72.454545455	3.5454545455	0.9025974026	0	1243.9415584

For this example, the matrix X′Y is

X′Y=SCORE2SCORE2SCORE2SCORE2totaltotaltotaltotaloverallforforforJAYPATROBIN=22804561027797 $X′Y= \begin{array}{l} SCORE2 & total & overall \\ SCORE2 & total & for & JAY \\ SCORE2 & total & for & PAT \\ SCORE2 & total & for & ROBIN \end{array} = \begin{array}{l} 2280 \\ 456 \\ 1027 \\ 797 \end{array}$

Taking (X′X)^‑ from the PROC GLM output and using X′Y above, the solution βˆ =(X′X)−X′Y $\hat{β} = {(X^{'} X)}^{-} X^{'} Y$ is

⎡⎣⎢⎢⎢⎢⎢⎢βˆ0βˆ1βˆ2βˆ3⎤⎦⎥⎥⎥⎥⎥⎥ = ⎡⎣⎢⎢⎢⎢ .0909 −.0909 −.0909 .0000−.0909 .2575 .0909 .0000−.0909 .0909 .1623 .0000 .0000 .0000 .0000 .0000⎤⎦⎥⎥⎥⎥ ⎡⎣⎢⎢⎢⎢22804561027797⎤⎦⎥⎥⎥⎥ = ⎡⎣⎢⎢⎢⎢72.453.540.900.00⎤⎦⎥⎥⎥⎥ $[\begin{array}{l} {\hat{β}}_{0} \\ {\hat{β}}_{1} \\ {\hat{β}}_{2} \\ {\hat{β}}_{3} \end{array}] = [\begin{array}{r} .0909 - .0909 - .0909 .0000 \\ - .0909 .2575 .0909 .0000 \\ - .0909 .0909 .1623 .0000 \\ .0000 .0000 .0000 .0000 \end{array}] [\begin{array}{r} 2280 \\ 456 \\ 1027 \\ 797 \end{array}] = [\begin{array}{r} 72.45 \\ 3.54 \\ 0.90 \\ 0.00 \end{array}]$

As pointed out in Section 2.4.4., the particular generalized inverse used by PROC GLM causes the last row and column of (X′X)^- to be set to 0. This yields a set of parameter estimates equivalent in this example to the set given by the restriction that τ₃ = 0. Using the principles discussed in Section 6.2.2, “Parameter Estimates for a One-Way Classification,” it follows that the INTERCEPT μ is actually the mean for the reference group ROBIN. The estimate τˆ1 ${\hat{τ}}_{1}$ labeled JAY is the difference between the mean for Jay’s group and the mean for Robin’s group, and similarly, the estimate τˆ2 ${\hat{τ}}_{2}$ labeled PAT is the mean for Pat’s group minus the mean for Robin’s group. Finally, the estimate τˆ3 ${\hat{τ}}_{3}$ labeled ROBIN, which is set to 0, can be viewed as the mean for Robin’s group minus the mean for Robin’s group.

Remember that these estimates are not unique—that is, they depend on the alphabetical order of the values of the CLASS variable. This fact is recognized in the output by denoting the estimates as biased, which is explained in the note after the listing of estimates.

The other MODEL statement options (P, CLM, CLI, and TOLERANCE), as well as the BY, ID, WEIGHT, FREQ, and OUTPUT statements, are not affected by the use of CLASS variables and may be used as described in Section 2.2.4, “The SS1 and SS2 Options: Two Types of Sums of Squares” and Section 2.2.5, “Tests of Subsets and Linear Combinations of Coefficients.”

6.2.4 Estimable Functions in a One-Way Classification

It is often the case that the particular parameter estimates obtained by the SOLUTION option in PROC GLM are not the estimates of interest, or there may be additional functions of the parameters that you want to estimate. You can specify such other estimates with PROC GLM.

An estimable function is a member of a special class of linear functions of parameters (see Section 2.2.4). An estimable function of the parameters has a definite interpretation regardless of how the parameters themselves are specified. Denote with L a vector of coefficients (L₁, L₂,…, L_t, L_{t + 1}). Then Lβ = L₁μ + L₂τ₁ +…+ L_{t + 1} τ_t is a linear function of the model parameters and is estimable (for this example) if it can be expressed as a linear function of the means μ₁,…,μ_t. Let β̂ be a solution to the normal equation. The function Lβ is estimated by Lβ̂, the corresponding linear function of the parameters. If Lβ is estimable, then Lβ̂ will have the same value regardless of the solution obtained from the normal equations. In the example,

βˆ=μˆτˆ1τˆ2τˆ3=72.4543.5450.9020.000INTERCEPTJAYPATROBIN $\hat{β} = \begin{array}{l} \hat{μ} \\ {\hat{τ}}_{1} \\ {\hat{τ}}_{2} \\ {\hat{τ}}_{3} \end{array} = \begin{array}{l} 72.454 \\ 3.545 \\ 0.902 \\ 0.000 \end{array} \begin{array}{l} INTERCEPT \\ JAY \\ PAT \\ ROBIN \end{array}$

To illustrate, define

L = [1 1 0 0]

Then Lβˆ=μˆ+τˆ1=μˆ1=76.0, $L \hat{β} = \hat{μ} + {\hat{τ}}_{1} = {\hat{μ}}_{1} = 76.0,$ which is the estimate of the mean score of Jay's group Alternately, let

L = [0 + 1 −1 0]

Then Lβ̂ = (τ̂₁ − τ̂₂) = μ̂₁ – μ̂₂ = 2.643, the estimated mean difference between Jay’s and Pat’s groups. Because both of these are estimable functions, identical estimates would be obtained using a different generalized inverse—for example, if different names for the teachers changed the order of the dummy variables.

Variances of these estimates can be readily obtained with standard formulas that involve elements of the generalized inverse (see Section 2.2.4).

You can obtain the general form of the estimable functions with the E option in the MODEL statement

model score2 = teach / e ;

Output 6.4 shows you that L4, the coefficient or τ₃, must be equal to L1 – L2 – L3. Equivalently, L1 = L2 + L3 + L4. That is, the coefficient on μ must be the sum of the coefficients on τ₁, τ₂ and τ₃.

Output 6.4 Obtaining the General Form of Estimable Functions Using the E Option

The GLM Procedure

General Form of Estimable Functions

Effect		Coefficients

Intercept		L1
teach	JAY	L2
teach	PAT	L3
teach	ROBIN	L1-L2-L3

PROC GLM calculates estimates and variances for several special types of estimable functions with LSMEANS, CONTRAST, or ESTIMATE statements as well as estimates of user-supplied functions.

The LSMEANS statement produces the least-squares estimates of CLASS variable means—these are sometimes referred to as adjusted means. For the one-way structure, these are simply the ordinary means. In terms of model parameter estimates, they are μˆ+τˆi $\hat{μ} + {\hat{τ}}_{i}$ . The following SAS statement lists the least-squares means for the three teachers for all dependent variables in the MODEL statement:

lsmeans teach / options;

The available options in the LSMEANS statement are

STDERR	prints the standard errors of each estimated least-squares mean and the t-statistic for a test of the hypothesis that the mean is 0
PDIFF	prints the p-values for the tests of equality of all pairs of CLASS means
E	prints a description of the linear function used to obtain each least-squares mean; this has importance in more complex situations
E=	specifies an effect in the model to use as an error term
ETYPE=	specifies the type (1, 2, 3, or 4) of the effect specified in the E= option
SINGULAR=	tunes the estimability checking

For more information, refer to Chapter 24 in the SAS/STAT User's Guide, Version 8, Volume 2. Output 6.5 shows results from the following SAS statement:

lsmeans teach / stderr pdiff;

Output 6.5 Results of the LSMEANS Statement

The GLM Procedure
Least Squares Means

	score2	Standard		LSMEAN
teach	LSMEAN	Error	Pr > \|t\|	Number

JAY	76.0000000	2.7211053	<.0001	1
PAT	73.3571429	1.7813816	<.0001	2
ROBIN	72.4545455	2.0096694	<.0001	3

Least Squares Means for effect teach

Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: score2

i/j	1	2	3

1		0.4233	0.3036
2	0.4233		0.7393
3	0.3036	0.7393

NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.

The least-squares mean for JAY is computed as μˆ+τˆ1=7245+354 $\hat{μ} + {\hat{τ}}_{1} = 7245 + 354$ . Note that this linear function has coefficients L1=1, L2=1, L3=0 and L4=0, so it meets the estimability condition L1 = L2 + L3 + L4.

Least-squares means should not, in general, be confused with ordinary means, which are available with a MEANS statement. The MEANS statement produces simple, unadjusted means of all observations in each class or treatment. Except for one-way designs and some nested and balanced factorial structures that are normally analyzed with PROC ANOVA, these unadjusted means are generally not equal to the least-squares means. Note that for this example, the least-squares means are the same as the means obtained with the MEANS statement. (The MEANS statement is discussed in Section 3.4.2.)

A contrast is a linear function such that the elements of the coefficient vector sum to 0 for each effect. PROC GLM can be instructed to calculate a sum of squares and associated F-test due to one or more contrasts.

As an example, assume that teacher JAY used a special teaching method. You might then be interested in testing whether Jay’s students had mean scores different from the students of the other teachers, and whether PAT and ROBIN, using the same method, produced different mean scores. The corresponding contrasts are shown below:

	Multipliers for TEACH
Contrast	`JAY`	`PAT`	`ROBIN`
`JAY` vs others	–2	+1	+1
`PAT` vs `ROBIN`	0	–1	+1

Taking β=(μ,τ1,τ2,τ3) $β = (μ, τ_{1}, τ_{2}, τ_{3})$ the contrasts are

Lβ=−2μ1+μ2+μ3(JAY vs others)=−2τ1+τ2+τ3 $\begin{array}{l} L β = - 2 μ_{1} + μ_{2} + μ_{3} (JAY vs others) \\ = - 2 τ_{1} + τ_{2} + τ_{3} \end{array}$

and

Lβ=−μ2+μ3(PAT vs ROBIN)=−τ2+τ3 $\begin{array}{l} L β = - μ_{2} + μ_{3} (PAT vs ROBIN) \\ = - τ_{2} + τ_{3} \end{array}$

The corresponding CONTRAST statements are as follows:

contrast 'JAY vs others' teach -2 1 1;
contrast 'PAT vs ROBIN' teach 0 -1 1;

The results appear in Output 6.6.

Output 6.6 Results of the CONTRAST and ESTIMATE Statements

The GLM Procedure

Contrast	DF	Contrast SS	Mean Square	F Value	Pr > F

JAY vs others	1	46.19421179	46.19421179	1.04	0.3166
PAT vs ROBIN	1	5.01844156	5.01844156	0.11	0.7393

		Standard
Parameter	Estimate	Error	t Value	Pr > \|t\|

LSM JAY	76.0000000	2.72110530	27.93	<.0001
LSM PAT	73.3571429	1.78138157	41.18	<.0001
LSM ROBIN	72.4545455	2.00966945	36.05	<.0001

Keep the following points in mind when using the CONTRAST statement:

❏ You must know how many classes (categories) are present in the effect and in what order they are sorted by PROC GLM. If there are more effects (classes) in the data than the number of coefficients specified in the CONTRAST statement, PROC GLM adds trailing zeros. In other words, there is no check to see if the proper number of classes has been specified.

❏ The name or label of the contrast must be 20 characters or less.

❏ Available CONTRAST statement options are

E	prints the entire L vector.
E=effect	specifies an alternate error term.
ETYPE=n	specifies the type (1, 2, 3, or 4) of the E=effect.

❏ Multiple degrees-of-freedom contrasts can be specified by repeating the effect name and coefficients as needed. Thus, the statement

contrast 'ALL' teach -2 1 1, teach 0 -1 1;

produces a two DF sum of squares due to both contrasts. This feature can be used to obtain partial sums of squares for effects through the reduction principle, using sums of squares from multiple degrees-of-freedom contrasts that include and exclude the desired contrasts.

❏ If a non-estimable contrast has been specified, a message to that effect appears in the SAS log.

❏ Although only (t–1) linearly independent contrasts exist for t classes, any number of contrasts can be specified.

❏ The contrast sums of squares are not partial of (adjusted for) other contrasts that may be specified for the same effect (see the fourth point above).

❏ The CONTRAST statement is not available with PROC ANOVA; thus, the computational inefficiency of PROC GLM for analyzing balanced data may be justified if contrasts are required. However, contrast variables can be defined in a DATA step and estimates and statistics can be obtained by a full-rank regression analysis.

The ESTIMATE statement is used to obtain statistics for estimable functions other than least-squares means and contrasts, although it can also be used for these. For the current example, the ESTIMATE statement is used to re-estimate the least-squares means.

The respective least-squares means for JAY, PAT, and ROBIN estimate μ1=μ+τ1 $μ_{1} = μ + τ_{1}$ , μ2=μ+τ2 $μ_{2} = μ + τ_{2}$ , and μ3=μ+τ3 $μ_{3} = μ + τ_{3}$ . The following statements duplicate the least-squares means:

estimate 'LSM JAY' intercept 1 teach 1;
estimate 'LSM PAT' intercept 1 teach 0 1;
estimate 'LSM ROBIN' intercept 1 teach 0 0 1;

Note the use of the term INTERCEPT (referring to) and the fact that the procedure supplies trailing zero-valued coefficients. The results of these statements appear after the listing of parameter estimates at the bottom of Output 6.6 for convenient comparison with the results of the LSMEANS statement.

6.3 Two-Way Classification: Unbalanced Data

The major applications of the two-way structure are the two-factor factorial experiment and the randomized blocks. These applications usually have balanced data. In this section, the two-way classification with unbalanced data is explored. This introduces new questions, such as how means and sums of squares should be computed.

6.3.1 General Considerations

The two-way classification model is

yijk=μ+αi+βj+(αβ)ij+εijk $y_{i j k} = μ + α_{i} + β_{j} + {(α β)}_{i j} + ε_{i j k}$

where

y_ijk	equals the kth observed score for the (i, j)th cell.
α_i	equals the effect of the ith level of factor A.
β_j	equals an effect of the jth level of factor B
(αβ)_ij	equals the interaction effect for the ith level of factor A and the jth level of factor B.
ε_ijk	equals the random error associated with individual observations.

The model can be defined without the interaction term when appropriate. Let n_ij denote the number of observations in the cell for level i of A and level j of B. If μ_ij denotes the population cell mean for level i of A and level j of B, then

μ_ij = μ + α_i + β_j + (αβ)_ij

At this point, no further restrictions on the parameters are assumed.

The computational formulas for PROC ANOVA that use the various treatment means provide correct statistics for balanced data—that is, data with an equal number of observations (n_ij=n for all i, j) for each treatment combination. When data are not balanced, sums of squares computed by PROC ANOVA can contain functions of the other parameters of the model, and thereby produce biased results.

To illustrate the effects of unbalanced data on the estimation of differences between means and computation of sums of squares, consider the data in this two-way table:

		B
		1	2
A	1	7,9	5
A	2	8	4,6

Within level 1 of B, the cell mean for each level of A is 8—that is, y̅11=(7+9)/2=8 ${\overset{̅}{y}}_{11} = (7 + 9)/2 = 8$ and y̅21=8 ${\overset{̅}{y}}_{21} = 8$ . Hence, there is no evidence of a difference between the levels of A within level 1 of B. Similarly, there is no evidence of a difference between levels of A within level 2 of B, because y̅12=5 ${\overset{̅}{y}}_{12} = 5$ and y̅22=(4+6)/2=5 ${\overset{̅}{y}}_{22} = (4 + 6)/2 = 5$ . Therefore, you may conclude that there is no evidence in the table of a difference between the levels of A. However, the marginal means for A are

y̅1.. = (7 + 9 + 5) / 3 = 7 ${\overset{̅}{y}}_{1..} = (7 + 9 + 5) / 3 = 7$

and

y¯2..=(8+4+6)/3=6 ${\bar{y}}_{2..} = (8 + 4 + 6) / 3 = 6$

The difference of 7–6=1 between these marginal means may be erroneously interpreted as measuring an overall effect of the factor A. Actually, the observed difference between the marginal means for the two levels of A measures the effect of factor B in addition to the effect of factor A. This can be verified by expressing the observations in terms of the analysis-of-variance model yijk = μ + αi+βj $y_{i j k} = μ + α_{i} + β_{j}$ . (For simplicity, the interaction nd error terms have been left out of the model.)

		B
		1	2
A
	1
		7 = μ + α₁ + β₁
			5 = μ + α₁ + β₁
		9 = μ + α₁ + β₂
	2
			4 = μ + α₂ + β₁
		8 = μ + α₂ + β₂
			6 = μ + α₂ + β₂

The difference between marginal means for A₁ and A₂ is shown to be

y¯1..−y¯2..==(1/3)[(α1+β1)+(α1+β1)+(α1+β2)]−(1/3)[(α2+β1)+(α2+β2)+(α2+β2)](α1−α2)+(1/3)(β1−β2) $\begin{matrix} {\bar{y}}_{1..} - {\bar{y}}_{2..} & = & (1 / 3) [(α_{1} + β_{1}) + (α_{1} + β_{1}) + (α_{1} + β_{2})] \\ - (1 / 3) [(α_{2} + β_{1}) + (α_{2} + β_{2}) + (α_{2} + β_{2})] \\ = & (α_{1} - α_{2}) + (1 / 3) (β_{1} - β_{2}) \end{matrix}$
H0:α1−α2=0 $H_{0} : α_{1} - α_{2} = 0$

Thus, instead of estimating (α1−α2) $(α_{1} - α_{2})$ , the difference between the marginal means of A estimates (α1−α2) $(α_{1} - α_{2})$ plus a function of the factor B parameters (β1−β2)/3 $(β_{1} - β_{2}) / 3$ . In other words, the difference between the A marginal means is biased by factor B effects.

The null hypothesis about A that would normally be tested is

H₀: α₁ – α₂ = 0

However, for this example, the sum of squares for A computed by PROC ANOVA can be shown to equal 3(y̅1−y̅2)2/2 $3 {({\overset{̅}{y}}_{1} - {\overset{̅}{y}}_{2})}^{2} / 2$ . Hence, the PROC ANOVA F-test for A actually tests the hypothesis

H0: (α1 − α2) + (β1 − β2) / 3 = 0 $H_{0} : (α_{1} - α_{2}) + (β_{1} - β_{2}) / 3 = 0$

which involves the factor B difference (β1−β2) $(β_{1} - β_{2})$ in addition to the factor A difference (α1−α2) $(α_{1} - α_{2})$ .

In terms of the μ model yijk=μij+εijk $y_{i j k} = μ_{i j} + ε_{i j k}$ , you usually want to estimate (μ11+μ12)/2 $(μ_{11} + μ_{12}) / 2$ and (μ21+μ22)/2 $(μ_{21} + μ_{22}) / 2$ or the difference between these quantities. However, the A marginal means for the example are

y̅1 = (2μ11 + μ12) / 3 + ε̅1 ${\overset{̅}{y}}_{1} = (2 μ_{11} + μ_{12}) / 3 + {\overset{̅}{ε}}_{1}$ .

and

y̅2 = (μ21 + 2μ22) / 3 + ε̅2 ${\overset{̅}{y}}_{2} = (μ_{21} + 2 μ_{22}) / 3 + {\overset{̅}{ε}}_{2}$ .

These means estimate 2(μ11+μ22)/3 $2 (μ_{11} + μ_{22}) / 3$ and (μ21+μ22)/3 $(μ_{21} + μ_{22}) / 3$ , which are functions of the cell frequencies and might not be meaningful.

In summary, a major problem in the analysis of unbalanced data is the contamination of differences between factor means by effects of other factors. The solution to this problem is to adjust the means to remove the contaminating effects.

6.3.2 Sums of Squares Computed by PROC GLM

PROC GLM recognizes different theoretical approaches to analysis of variance by providing four types of sums of squares and associated statistics. The four types of sums of squares in PROC GLM are called Type I, Type II, Type III, and Type IV (SAS Institute). The four types of sums of squares are explained in general, conceptual terms, followed by more technical descriptions.

Type I sums-of-squares retain the properties discussed in Chapter 2, “Regression.” They correspond to adding each source (factor) sequentially to the model in the order listed. For example, the Type I sum of squares for the first factor listed is the same as PROC ANOVA would compute for that effect. It reflects differences between unadjusted means of that factor as if the data consist of a one-way structure. The Type I SS may not be particularly useful for analysis of unbalanced multiway structures but may be useful for nested models, polynomial models, and certain tests involving the homogeneity of regression coefficients (see Chapter 7, “Analysis of Covariance”). Also, comparing Type I and other types of sums of squares provides some information on the effect of the lack of balance.

Type II sums of squares are more difficult to understand. Generally, the Type II SS for an effect U, which may be a main effect or interaction, is adjusted for an effect V if and only if V does not contain U. Specifically, for a two-factor structure with interaction, the main effects, A and B, are not adjusted for the A*B interactions because the symbol A*B contains both A and B. Factor A is adjusted for B because the symbol B does not contain A. Similarly, B is adjusted for A, and the A*B interaction is adjusted for the two main effects.

Type II sums of squares for the main effects A and B are mainly appropriate for situations in which no interaction is present. These are the sums of squares presented in many major statistical textbooks. Their method of computation is often referred to as the method of fitting constants.

The Type II analysis relates to the following general guidelines often given in applied statistical texts. First, test for the significance of the A*B interaction. If A*B is insignificant, delete it from the model and analyze main effects, each adjusted for the other. If A*B is significant, then abandon main-effects analysis and focus your attention on simple effects.

Note that for full-rank regression models, the Type II sums of squares are adjusted for cross-product terms. This occurs because, for example,

y = β0 + β1x1 + β2x2 + β3x1x2 + ε $y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{1} x_{2} + ε$

where the product x₁x₂ is dealt with simply as another independent variable with no concept of order of the term.

The Type III sums of squares correspond to Yates’s weighted squares of means analysis. Their principal use is in situations that require a comparison of main effects even in the presence of interaction. Type III sums of squares are partial sums of squares. In this sense, each effect is adjusted for all other effects. In particular, main effects A and B are adjusted for the interaction A*B if all these terms are in the model. If the model contains only main effects, then Type II and Type III analyses are the same. See Steel and Torrie (1980), Searle (1971), and Speed et al. (1978) for further discussion of the method of fitting constants and the method of weighted squares of means.

The Type IV functions were designed primarily for situations where there are empty cells. The principles underlying the Type IV sums of squares are quite involved and can be discussed only in a framework using the general construction of estimable functions. It should be noted that the Type IV functions are not necessarily unique when there are empty cells, but the functions are identical to those provided by Type III when there are no empty cells.

You can request four sums of squares in PROC GLM as options in the MODEL statement. For example, the following SAS statement specifies the printing of Type I and Type IV sums of squares:

model . . . / ss1 ss4;

Any or all types may be requested. If no sums of squares are specified, PROC GLM computes the Type I and Type III sums of squares by default.

The next two sections interpret the different sums of squares in terms of reduction notation and the μ-model.

6.3.3 Interpreting Sums of Squares in Reduction Notation

The types of sums of squares can be explained in terms of the reduction notation that is developed for regression models in Chapter 2. This requires writing the model as a regression model using dummy variables, with certain restrictions imposed on the parameters to give them unique interpretation.

As an example, consider a 2×3 factorial structure with n_ij observations in the cell in row i, column j. The equation for the model is

yijk = μ + αi + βj + αβij + εijk $y_{i j k} = μ + α_{i} + β_{j} + α β_{i j} + ε_{i j k}$

where i=1, 2, j=1, 2, 3, and k=1, . . . , n_ij. Assume n_ij>0 for all i, j. An expression of the form R(α | μ, β) $R (α | μ, β)$ means the same as R(α1, α2 | μ, β1, β2, β3) $R (α_{1}, α_{2} | μ, β_{1}, β_{2}, β_{3})$ . The sums of squares printed by PROC GLM can be interpreted in reduction notation most easily under the restrictions

Σiαi = Σjβj = Σiαβij = Σjαβij = 0 $Σ_{i} α_{i} = Σ_{j} β_{j} = Σ_{i} α β_{i j} = Σ_{j} α β_{i j} = 0$ (6.1)

that is, by taking an X matrix with full-column rank given by

X = μ α1 β1 β2 αβ11 αβ12⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢1 1 1 0 1 0. . . . . .. . . . . .. . . . . .1 1 1 0 1 01 1 0 1 0 1. . . . . .. . . . . .. . . . . .1 1 0 1 0 11 1 −1 −1 −1 −1. . . . . .. . . . . .. . . . . .1 1 −1 −1 −1 −11 −1 1 0 −1 0. . . . . .. . . . . .. . . . . .1 −1 1 0 −1 01 −1 0 1 0 −1. . . . . .. . . . . .. . . . . .1 −1 0 1 0 −11 −1 −1 −1 1 1. . . . . .. . . . . .. . . . . .1 −1 −1 −1 1 1⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥n11 rows for observations in cell 11n12 rows for observations in cell 12n13 rows for observations in cell 13n21 rows for observations in cell 21n22 rows for observations in cell 22n23 rows for observations in cell 23 $X = \begin{matrix} μ α_{1} β_{1} β_{2} α β_{11} α β_{12} \\ [\begin{array}{l} 1 1 1 0 1 0 \\ . . . . . . \\ . . . . . . \\ . . . . . . \\ 1 1 1 0 1 0 \\ 1 1 0 1 0 1 \\ . . . . . . \\ . . . . . . \\ . . . . . . \\ 1 1 0 1 0 1 \\ 1 1 - 1 - 1 - 1 - 1 \\ . . . . . . \\ . . . . . . \\ . . . . . . \\ 1 1 - 1 - 1 - 1 - 1 \\ 1 - 1 1 0 - 1 0 \\ . . . . . . \\ . . . . . . \\ . . . . . . \\ 1 - 1 1 0 - 1 0 \\ 1 - 1 0 1 0 - 1 \\ . . . . . . \\ . . . . . . \\ . . . . . . \\ 1 - 1 0 1 0 - 1 \\ 1 - 1 - 1 - 1 1 1 \\ . . . . . . \\ . . . . . . \\ . . . . . . \\ 1 - 1 - 1 - 1 1 1 \end{array}] & \begin{array}{l} n_{11} rows for observations in cell 11 \\ n_{12} rows for observations in cell 12 \\ n_{13} rows for observations in cell 13 \\ n_{21} rows for observations in cell 21 \\ n_{22} rows for observations in cell 22 \\ n_{23} rows for observations in cell 23 \end{array} \end{matrix}$

With this set of restrictions or definitions of the parameters, the sums of squares that result from the following MODEL statement are summarized below:

model y=a b a*b / ss1 ss2 ss3 ss4;

Effect	Type I	Type II	Type III = Type IV
A	R (α \| μ) $R (α \| μ)$	R (α \| μ, β) $R (α \| μ, β)$	R (α \| μ, β, αβ) $R (α \| μ, β, α β)$
B	R (β \| μ, α) $R (β \| μ, α)$	R (β \| μ, α) $R (β \| μ, α)$	R (β \| μ, α, αβ) $R (β \| μ, α, αβ)$
A*B	R (αβ \| μ, α, β) $R (αβ \| μ, α, β)$	R (αβ \| μ, α, β) $R (αβ \| μ, α, β)$	R (αβ \| μ, α, β) $R (αβ \| μ, α, β)$

You should be careful when using reduction notation with less-than-full-rank models. If no restrictions had been specified on the model for the two-way structure above, then R(α|μ,β,αβ) $R (α | μ, β, α β)$ because the columns of the X matrix corresponding to the α_i would be linearly dependent on the columns corresponding to μ and the αβ_ij.

In addition, the dependence of reduction notation on the restrictions imposed cannot be overemphasized. For example, imposing the restriction

α₂ = β₃ = αβ₂₁ = αβ₂₂ = αβ₁₃ = αβ₂₃ = 0 (6.2)

results in a different value for R(α | μ, β, αβ). Although the restrictions of equation (6.6) are those that correspond to the sums of squares computed by PROC GLM, the restrictions of equation 6.2 are those that correspond to the (biased) parameter estimates computed by PROC GLM.

There is a relationship between the four types of sums of squares and the four types of data structures in a two-way classification. The relationship derives from the principles of adjustment that the sums-of-squares types obey. Letting n_ijdenote the number of observations in level i of factor A and level j of factor B, the four types of data structures are

❏ equal cell frequencies: n_ij=common value for all i, j

❏ proportionate cell frequencies: n_ij/ n_il= n_kj/ n_kl for all i, j, k, l

❏ disproportionate, nonzero cell frequencies: n_ij/n_il=n_kj / n_kl for some i, j, k, l, but n_ij>0 for all i, j

❏ empty cell(s): n_ij=0 for some i, j.

The display below shows the relationship between sums-of-squares types and data structure types pertaining to the following MODEL statement:

model y=a b a*b / ss1 ss2 ss3 ss4;

For example, writing III=IV indicates that Type III is equal to Type IV.

Data Structure Type

	1	2	3	4
	(Equal n_ij)	(Proportionate n_ij)	(Disproportionate, nonzero n_ij)	(Empty Cell)
A	I=II=III=IV	I=II, III=IV	III=IV
B	I=II=III=IV	I=II, III=IV	I=II, III=IV	I=II
A*B	I=II=III=IV	I=II=III=IV	I=II=III=IV	I=II=III=IV

6.3.4 Interpreting Sums of Squares in the μ-Model Notation

The μ model for the two-way structure takes the form

y_ijk = μ_ij + ε_ijk (6.3)

The parameters of the model relate to the parameters of the standard analysis-of-variance model according to the equation

μ_ij = μ + α_i + β_j + αβ_ij

This relation holds regardless of any restriction that may be imposed upon the α, β, and αβ parameters. The advantage of using the -model notation over standard analysis-of-variance notation is that all of the μ_ij parameters are clearly defined without specifying restrictions; thus, a hypothesis stated in terms of the μ_ij can be easily understood.

Speed et al. (1978) give interpretations of the different types of sums of squares (I, II, III, and IV) computed by PROC GLM using the -model notation. It is assumed that all n_ij>0, making Type III equal to Type IV.

Using their results, the sums of squares obtained from the following MODEL statement are expressed in terms of the μ_ij as given in Table 6.1.

model response = a b a*b / ss1 ss2 ss3 ss4;

Table 6.1 Interpretation of Sums of Squares in the -Model Notation

Type	Effect
	Effect A
I	(Σ_j n_1j)/n_1. = ... = (Σ_j n_ajμ_aj)/n_a
II	Σjn1jμ1j = ΣiΣj(n1jnijμij / n.j), . . . , Σjnajμaj = ΣiΣj(najnijμij / n.j) $\begin{array}{l} Σ_{j} n_{1 j} μ_{1 j} = Σ_{i} Σ_{j} (n_{1 j} n_{i j} μ_{i j} / n_{. j}), . . ., Σ_{j} n_{a j} μ_{a j} \\ = Σ_{i} Σ_{j} (n_{a j} n_{i j} μ_{i j} / n_{. j}) \end{array}$
III & IV	μ11 + . . . + μ1b = . . . = μa1 + . . . + μab $μ_{11} + . . . + μ_{1 b} = . . . = μ_{a 1} + . . . + μ_{a b}$ that is, μ¯¯1. = . . . = μ¯¯a. ${\bar{μ}}_{1.} = . . . = {\bar{μ}}_{a .}$ where μ¯¯i. = Σjμij / b ${\bar{μ}}_{i .} = Σ_{j} μ_{i j} / b$
	Effect B
I & II	Σi ni1μi1 = ΣiΣj (ni1nijμij / ni., . . . ,Σinibμib = ΣiΣj (nibnijμij) / ni.) $Σ_{i} n_{i 1} μ_{i 1} = Σ_{i} Σ_{j} (n_{i 1} n_{i j} μ_{i j} / n_{i .}, . . ., Σ_{i} n_{i b} μ_{i b} = Σ_{i} Σ_{j} (n_{i b} n_{i j} μ_{i j}) / n_{i .})$
III & IV	μ11 + . . . + μa1 = . . . μ1b + . . . + μab $μ_{11} + . . . + μ_{a 1} = . . . μ_{1 b} + . . . + μ_{a b}$ that is, μ¯¯.1 = . . . = μ¯¯.b ${\bar{μ}}_{.1} = . . . = {\bar{μ}}_{. b}$ where μ¯¯ij = Σiμij / a ${\bar{μ}}_{i j} = Σ_{i} μ_{i j} / a$
	Effect A*B
I, II, III, IV	μ_ij − μ_im − μ_lj + μ_lm = 0 for all i, j, l, m

Table 6.1 shows that the tests can be expressed in terms of equalities of weighted cell means, only some of which are easily interpretable. Considering the Type I A effect, the weights n_ij/n_i.attached to μ_ij are simply the fraction of the n_i. observations in level i of A that were in level j of B. If these weights reflect the distribution across the levels of B in the population of units in level i of A, then the Type I test may have meaningful interpretation. That is, suppose the population of units in level i of A is made up of a fraction ρ_i1 of units in level 1 of B, of a fraction ρ_i2 of units in level 2 of B, and so on, where ρ_i1 +…+ρ_ib = 1. Then it may be reasonable to test

H₀ : Σ_jρ_1jμ_1j = ... = Σ_jρ_ajμ_aj

which would be the Type I test in case n_ij / n_i. = ρ_ij..

Practical interpretation of the Type II weights is more difficult—refer to Section 6.3.7.2, “Interpreting Sums of Squares Using Estimable Functions.” Recall that the Type II tests are primarily for main effects with no interaction. You can see from Table 6.1 that the Type II hypothesis clearly depends on the n_ij, the numbers of observations in the cells.

The interpretation of Type III and Type IV tests is clear because all weights are unity. When the hypotheses are stated in terms of the -model, the benefit of the Type III test is more apparent because the Type III hypothesis does not depend on the n_ij, the numbers of observations in the cells. Type I and Type II hypotheses do depend on the n_ij, and this may not be desirable.

For example, suppose a scientist sets up an experiment with ten plants in each combination of four levels of nitrogen (N) and three levels of lime (P). Suppose also that some plants die in some of the cells for reasons unrelated to the effects of N and P, leaving some cells with n_ij <10. A hypothesis test concerning the effects of N and P, which depends on the values of n_ij, would be contaminated by the accidental variation in the n_ij. The scientific method declares that the hypotheses to be tested should be stated before data are collected. It would be impossible to state Type I and Type II hypotheses prior to data collection because the hypotheses depend on the n_ij, which are known only after data are collected.

Note that the Type I and Type II hypotheses are different for effect A but the same for effect B. This occurs because the Type I sums of squares are model-order dependent. Being sequential, the Type I sums of squares are A (unadjusted), B (adjusted for A), and A*B (adjusted for A and B). Thus, the Type I sums of squares for the effects A and B listed in the MODEL statement in the order A, B, A*B would not be the same as in the order B, A, A*B. The Type II hypotheses are not model-order dependent because, for the two-way structure, both Type II main-effect sums of squares are adjusted for each other—that is, A (adjusted for B), B (adjusted for A), and A*B (adjusted for A and B). These Type II sums of squares are the partial sums of squares if no A*B interaction is specified in the model, in which case Type II, Type III, and Type IV would be the same.

Another interpretation of these tests is given by Hocking and Speed (1980), who point out that Type I, Type II, and Type III=Type IV tests for effect A each represent a test of

H₀: μ₁₁ +…+ μ_1b =…= μ_a1 +…+ μ_ab

subject to certain conditions on the cell means. The conditions are

Type I	no B effect, μ_.1 = … = μ_.k, and no A*B effect, μ_ij – μ_im – μ_ij + μ_im = 0 for all i, j, l, m
Type II	no A*B effect, μ_ij – μ_im – μ_ij + μ_im = 0 for all i, j, l, m
Type III=Type IV	none, (provided n_ij > 0 for all ij).

6.3.5 An Example of Unbalanced Two-Way Classification

This example is a two-factor layout with data presented by Harvey (1975). Two types of feed rations (factor A) are given to calves from three different sires (factor B). The observed dependent variable y_ijk (variable Y) is the coded amount of weight gained by each calf. Because unequal numbers of calves of each sire are fed each ration, this is an unbalanced experiment. However, there are observations for each ration-sire combination; that is, there are no empty cells. The data appear in Output 6.7.

Output 6.7 Data for an Unbalanced Two-Way Classification

Obs	a	b	y

1	1	1	5
2	1	1	6
3	1	2	2
4	1	2	3
5	1	2	5
6	1	2	6
7	1	2	7
8	1	3	1
9	2	1	2
10	2	1	3
11	2	2	8
12	2	2	8
13	2	2	9
14	2	3	4
15	2	3	4
16	2	3	6
17	2	3	6
18	2	3	7

The analysis-of-variance model for these data is

y_ijk = μ + α_i + β_j + αβ_ij + ε_ijk

where

i	equals 1, 2
j	equals 1, 2, 3.

i and j have elements as defined in Section 6.3.1, “General Considerations.” The model contains twelve parameters, which are more than can be estimated uniquely by the six cell means that are the basis for estimating parameters. The analysis is implemented with the following SAS statements:

proc glm;
class a b;
model y=a b a*b / ss1 ss2 ss3 ss4 solution;

The statements above cause PROC GLM to create the following twelve dummy variables:

❏ 1 dummy variable for the mean (or intercept)

❏ 2 dummy variables for factor A (ration)

❏ 3 dummy variables for factor B (sire)

❏ 6 dummy variables for the interaction A*B (all six possible pairwise products of the variables from factor A with those from factor B).

The options requested are SOLUTION, and for purposes of illustration, all four types of sums of squares. The results appear in Output 6.8.

Output 6.8 Sums of Squares for an Unbalanced Two-Way Classification

The GLM Procedure

Dependent Variable: y

			Sum of
	Source	DF	Squares	Mean Square	F Value	Pr > F

	Model	5	63.71111111	12.74222222	5.87	0.0057

	Error	12	26.06666667	2.17222222

	Corrected Total	17	89.77777778


R-Square	Coeff Var	Root MSE	yield Mean

0.709653	28.83612	1.473846	5.111111


Source	DF	Type I SS	Mean Square	F Value	Pr > F

a	1	7.80277778	7.80277778	3.59	0.0824
b	2	20.49185393	10.24592697	4.72	0.0308
a*b	2	35.41647940	17.70823970	8.15	0.0058

Source	DF	Type II SS	Mean Square	F Value	Pr > F

a	1	15.85018727	15.85018727	7.30	0.0193
b	2	20.49185393	10.24592697	4.72	0.0308
a*b	2	35.41647940	17.70823970	8.15	0.0058

Source	DF	Type III SS	Mean Square	F Value	Pr > F

a	1	9.64065041	9.64065041	4.44	0.0569
b	2	30.86591760	15.43295880	7.10	0.0092
a*b	2	35.41647940	17.70823970	8.15	0.0058

Source	DF	Type IV SS	Mean Square	F Value	Pr > F

a	1	9.64065041	9.64065041	4.44	0.0569
b	2	30.86591760	15.43295880	7.10	0.0092
a*b	2	35.41647940	17.70823970	8.15	0.0058


			Standard
Parameter		Estimate	Error	t Value	Pr > \|t\|

Intercept		5.400000000 B	0.65912400	8.19	<.0001
a	1	-4.400000000 B	1.61451747	-2.73	0.0184
a	2	0.000000000 B	⋅	⋅	⋅
b	1	-2.900000000 B	1.23310809	-2.35	0.0366
b	2	2.933333333 B	1.07634498	2.73	0.0184
b	3	0.000000000 B	⋅	⋅	⋅
a*b	1 1	7.400000000 B	2.18606699	3.39	0.0054
a*b	1 2	0.666666667 B	1.94040851	0.34	0.7371
a*b	1 3	0.000000000 B	⋅	⋅	⋅
a*b	2 1	0.000000000 B	⋅	⋅	⋅
a*b	2 2	0.000000000 B	⋅	⋅	⋅
a*b	2 3	0.000000000 B	⋅	⋅	⋅

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

The first portion shows the statistics for the overall model. The overspecification of the model is obvious: The twelve dummy variables generate only six degrees of freedom (five for the terms listed in the MODEL statement plus one for the intercept).

The next portion of the output shows the four types of sums of squares. Note that Types III and IV give identical results. This is because n_ij>0 for all i, j.

As noted in Section 6.3.3, “Interpreting Sums of Squares in Reduction Notation,” the parameter estimates printed by PROC GLM are a solution to the normal equations corresponding to the restriction in equation (6.2.) The same condition applies to the parameter estimates

αˆ2=βˆ3=αβˆ13=αβˆ21=αβˆ22=αβˆ23=0(6.4) $\begin{matrix} {\hat{α}}_{2} = {\hat{β}}_{3} = {\hat{α β}}_{13} = {\hat{α β}}_{21} = {\hat{α β}}_{22} = {\hat{α β}}_{23} = 0 & (6.4) \end{matrix}$

These values, which now equal 0, appear in Output 6.8.

Section 6.3.4, “Interpreting Sums of Squares in the -Model Notation,” also shows that the parameters of the model relate to the parameters of the standard analysis-of-variance model (see equation 6.3). A corresponding relation holds between the respective parameter estimates, namely

yˆij=μˆij=μˆ+αˆi+βˆj+αβˆij(6.5) $\begin{matrix} {\hat{y}}_{i j} = {\hat{μ}}_{i j} = \hat{μ} + {\hat{α}}_{i} + {\hat{β}}_{j} + {\hat{α β}}_{i j} & (6.5) \end{matrix}$

Putting equations (6.4) and (6.5) together as shown in the table below gives the interpretation of the parameter estimates printed by PROC GLM. The table below also shows the relationship between means and parameter estimates.

	Sire 1	Sire 2	Sire 3
Ration 1	μˆ11=y¯11. =μˆ+αˆ1+βˆ1+αβ11ˆ $\begin{array}{l} {\hat{μ}}_{11} = {\bar{y}}_{11.} \\ = \hat{μ} + {\hat{α}}_{1} + {\hat{β}}_{1} + \hat{{αβ}_{11}} \end{array}$	μˆ12=y¯12. =μˆ+αˆ1+βˆ2+αβ12ˆ $\begin{array}{l} {\hat{μ}}_{12} = {\bar{y}}_{12.} \\ = \hat{μ} + {\hat{α}}_{1} + {\hat{β}}_{2} + \hat{{αβ}_{12}} \end{array}$	μˆ13=y¯13. =μˆ+αˆ1 $\begin{array}{l} {\hat{μ}}_{13} = {\bar{y}}_{13.} \\ = \hat{μ} + {\hat{α}}_{1} \end{array}$
Ration 2	μˆ21=y¯21. =μˆ+βˆ1 $\begin{array}{l} {\hat{μ}}_{21} = {\bar{y}}_{21.} \\ = \hat{μ} + {\hat{β}}_{1} \end{array}$	μˆ22=y¯22. =μˆ+βˆ2 $\begin{array}{l} {\hat{μ}}_{22} = {\bar{y}}_{22.} \\ = \hat{μ} + {\hat{β}}_{2} \end{array}$	μˆ23=y¯23. =μˆ $\begin{array}{l} {\hat{μ}}_{23} = {\bar{y}}_{23.} \\ = \hat{μ} \end{array}$

Note especially the following items in Output 6.8:

❏ The intercept μˆ $\hat{μ}$ printed by PROC GLM is the cell mean y¯23. ${\bar{y}}_{23.}$ for the lower right-hand cell (ration 2, sire 3).

❏ The estimate α¯¯1=−2.4 ${\bar{α}}_{1} = - 2.4$ is the difference between the cell means for the two rations fed to sire 3, αˆ1=y¯13−y¯23. ${\hat{α}}_{1} = {\bar{y}}_{13} - {\bar{y}}_{23.}$ .

❏ The interaction parameter estimate αβˆ11 ${\hat{α β}}_{11}$ = 5.4 is the interaction of ration 1 and ration 2 by sire 1 and sire 3, αβ11=y¯11.−y¯13.−y¯21.+y¯23. $α β_{11} = {\bar{y}}_{11.} - {\bar{y}}_{13.} - {\bar{y}}_{21.} + {\bar{y}}_{23.}$ . Generally, in a two-way layout with a rows and b columns, the interaction parameter estimates αβ_ij and measures the interaction of rows i and a by columns j and b.

6.3.6 The MEANS, LSMEANS, CONTRAST, and ESTIMATE Statements in a Two-Way Layout

The parameter estimates printed by PROC GLM are the result of a computational algorithm and may or may not be the estimates with the greatest practical value. However, there is no single choice of estimates (corresponding to a particular generalized inverse or set of restrictions) that satisfies the requirements of all applications. In most instances, specific estimable functions of these parameter estimates, like the estimates obtained with the LSMEANS, CONTRAST, and ESTIMATE statements, can be used to provide more useful estimates. The CONTRAST and ESTIMATE statements for balanced data applications are discussed in Chapter 2.

The CONTRAST, LSMEANS, and ESTIMATE statements are similar for one- and two-way models, but principles and interpretations become more complex. Consider the results from the following SAS statements:

proc glm;
   class a b;
   model y=a b a*b;
   means a b;
   lsmeans a b / stderr;
   contrast 'A EFFECT' a -1 1;
   contrast 'B 1 vs 2 & 3' b -2 1 1;
   contrast 'B 2 vs 3' b 0 -1 1;
   contrast 'ALL B' b -2 1 1, b 0 -1 1;
   contrast 'A*B 2 vs 3' a*b 0 1 -1 0 -1 1;
   estimate 'B2, B3 MEAN' intercept 1 a .5 .5 b 0 .5 .5
                                 a*b 0 .25 .25 0 .25 .25;
   estimate 'A in B1' a -1 1 a*b -1 0 0 1;

The MEANS statement provides the raw or unadjusted main-effect and interaction means. The LSMEANS statement produces least-squares (adjusted) means for main effects together with their standard errors. The results of these two statements are combined in Output 6.9. (PROC GLM prints these results on separate pages.)

Output 6.9 Results of the MEANS and LSMEANS Statements for a Two-Way Classification

The GLM Procedure

Level of		--------------y--------------
a	N	Mean	Std Dev

1	8	4.37500000	2.13390989
2	10	5.70000000	2.35937845

Level of		--------------y--------------
b	N	Mean	Std Dev

1	4	4.00000000	1.82574186
2	8	6.00000000	2.50713268
3	6	5.00000000	1.54919334

The GLM Procedure
Least Squares Means

		Standard
a	y LSMEAN	Error	Pr > \|t\|

1	3.70000000	0.64055339	<.0001
2	5.41111111	0.49940294	<.0001


		Standard
b	y LSMEAN	Error	Pr > \|t\|

1	4.00000000	0.73692303	0.0002
2	6.46666667	0.53817249	<.0001
3	3.20000000	0.80725874	0.0019

The raw and least-squares means are different for all levels except B1, which is balanced with respect to factor A.

Quantities estimated by the raw means and least-squares means can be expressed in terms of the μ model. For level 1 of factor A, the raw mean (4.625) is an estimate of (2μ₁₁ + μ₁₂ + μ₁₃)/(2 + 5 + 1), whereas the least-squares mean (4.367) is an estimate of (μ₁₁ + μ₁₂ + μ₁₃)/3. The raw means estimate weighted averages of the μ^ij whose weights are a function of sample sizes. The least-squares means estimate unweighted averages of the μ^ij.

The results of the five CONTRAST statements appear in Output 6.10.

Output 6.10 Contrast in a Two-Way Classification

Dependent Variable: y

Contrast	DF	Contrast SS	Mean Square	F Value	Pr > F

A effect	1	9.64065041	9.64065041	4.44	0.0569
B 1 vs 2 & 3	1	1.93798450	1.93798450	0.89	0.3635
B 2 vs 3	1	24.62564103	24.62564103	11.34	0.0056
ALL B	2	30.86591760	15.43295880	7.10	0.0092
A*B 2 vs 3	1	0.25641026	0.25641026	0.12	0.7371

The first four CONTRAST statements are similar to those presented for the one-way structure. Note that when a contrast uses all available degrees of freedom for the factor (such as the ALL B contrast), the sums of squares are the same as the Type III sums of squares for the factor.

The fifth CONTRAST statement requests the interaction between the factor A contrast and the B2 vs 3 contrast. It is constructed by computing the product of corresponding main-effect contrasts for each AB treatment combination. The procedure is illustrated in the table below:

Construction of Interaction Contrast
			Level of Factor B
		1	2	3
Level of Factor A	Factor A Contrast	Factor B Contrast
Level of Factor A	Factor A Contrast	0	1	–1
1	1	0	1	–1
2	–1	0	–1	1

Main-effect contrasts are given on the top and left, and interaction contrasts (products of marginal entries) are given in the body of the table. These are inserted into the CONTRAST statement in the same order of interaction cells as indicated by the CLASS statement (levels of B within levels of A).

In terms of the μ model, the hypothesis tested by the F-statistic for this interaction contrast is

H₀: μ₁₂ − μ₁₃ − μ₂₂ + μ₂₃ = 0

The two ESTIMATE statements request estimates of linear functions of the model parameters. The first function to be estimated is the average of the cell means for levels 2 and 3 of factor B. The other statement requests an estimate of the effect of factor A within level 1 of factor B, or an estimate of μ₂₁ – μ₁₁. Output 6.11 summarizes results from the ESTIMATE statements.

Output 6.11 Parameter Estimates in a Two-Way Classification

		Standard
Parameter	Estimate	Error	t Value	Pr > \|t\|

B2, B3 MEAN	4.83333333	0.48510213	9.96	<.0001
A in B1	-3.00000000	1.47384606	-2.04	0.0645

Expressing each comparison in terms of model parameters α_i, β_j, (αβ)_ij is the key to filling in the coefficients of the CONTRAST and ESTIMATE statements. Consider the ESTIMATE A in B1 statement, which is used to estimate μ₂₁ – μ₁₁. Writing this expression as a function of the model parameters by substituting

μ_ij = μ + α_i + β_j + αβ_ij

yields

μ₂₁ − μ₁₁ = − α₁ + α₂ − α β₁₁ + αβ₂₁

The −α₁ + α₂ term tells you to insert A − 1 1 into the ESTIMATE statement. There are no β’s in the function, so no B expression appears in the ESTIMATE statement. The −αβ₁₁ + αβ₂₁ term tells you to insert A*B–1 0 0 1 0 0, or, equivalently, A*B–1 0 0 1 into the ESTIMATE statement. The ordering in the statement

class a b;

specifies that the ordering of the coefficients following A*B corresponds to αβ₁₁ αβ₁₂ αβ₁₃ αβ₂₁ αβ₂₂ αβ₂₃. The SAS statement

class b a;

would indicate an ordering that corresponds to αβ₁₁ αβ₂₁ αβ₁₂ αβ₂₂ αβ₁₃ αβ₂₃.

Now consider the CONTRAST A statement. The hypothesis to be tested is

H₀: −(μ₁₁ + μ₁₂ + μ₁₃)/3 + (μ₂₁ + μ₂₂ + μ₂₃)/ 3 = 0

Substituting μ_ij + α_i + β_j + αβ_ij gives the equivalent hypothesis:

H₀: −α₁ + α₂ − (αβ₁₁ + αβ₁₂ + αβ₁₃)/3
+ (αβ₂₁ + αβ₂₂ + αβ₂₃) / 3 = 0

Again the −α₁ + α₂ term in the function tells you to insert A – 1 1. This brings up an important usage note: Specifying A – 1 1 causes the coefficients of the A*B interaction term to be automatically included by PROC GLM. That is, the SAS statement

contrast 'A' a -1 1;

is equivalent to the statement

contrast 'A' a -1 1 a*b -.333333 -.333333 -.333333 .333333 .333333 .333333;

Similarly, the SAS statement

estimate 'A EFFECT' a -1 1;

provides an estimate of

– (μ₁₁ + μ₁₂ + μ₁₃)/3 + (μ₂₁ + μ₂₂ + μ₂₃)/3

without explicitly specifying the coefficients of the αβ_ij terms. However, you should note that specifying the αβ_ij coefficients does not cause PROC GLM to automatically include coefficients for α_i or β_j. For example, the term A – 1 1 must appear in the ESTIMATE A in B1 statement. Similarly, a contrast to test H₀: −μ₁₁ + μ₂₁ = 0 requires the following statement:

contrast 'A in B1' a -1 1 a*b -1 0 0 1;

The A – 1 1 term must be included.

6.3.7 Estimable Functions for a Two-Way Classification

The previous section discussed the application of the CONTRAST statement, which employs the concept of estimable functions. PROC GLM can display the construction of estimable functions as an optional request in the MODEL, LSMEANS, CONTRAST, and ESTIMATE statements. This section discusses the construction of estimable functions and their relation to the sums of squares and associated hypotheses available in the GLM procedure, and to CONTRAST, ESTIMATE, and LSMEANS statements. The presentation of estimable functions consists of results obtained using the unbalanced factorial data given in Output 6.7. For more thorough discussions of these principles, see Graybill (1976) and Searle (1971).

6.3.7.1 The General Form of Estimable Functions

The general form of estimable functions is a vector of elements that are the building blocks for generating specific estimable functions. The number of unique symbols in the vector represents the maximum number of linearly independent coefficients estimated by the model, which is equal to the rank of the X′X matrix. In the GLM procedure this is obtained by the E option in the MODEL statement

proc glm;
class a b;
model y=a b a*b / e solution;

Table 6.2 gives the vector of coefficients of the general form of estimable functions for our example. There are only six elements (L1, L2, L4, L5, L7, L8), which correspond to the number of degrees of freedom in the model (including the intercept). The number of elements for an effect corresponds to the degrees of freedom for that effect; for example, L4 and L5 are introduced opposite the effect B, indicating B has 2 degrees of freedom.

Table 6.2 General Form of Estimable Functions

Effect			Parameters*	Coefficients
Intercept			μ	L1
A	1		α₁	L3
	2		α₂	L1-L2
B	1		β₁	L4
	2		β₂	L5
	3		β₃	L1 – L4-L5
A*B	1	1	αβ₁₁	L7
	1	2	αβ₁₂	L8
	1	3	αβ₁₃	L2 – L7 – L8
	2	1	αβ₂₁	L4 – L7
	2	2	αβ₂₂	L5 – L8
	2	3	αβ₂₃	L1 – L2 – L4 – L5 + L7 + L8

* These are implied by the output but not printed in this manner.

According to Table 6.2, any estimable function Lβ must be of the form

Lβ=L1μ+L2α1+(L1−L2)α2+L4αβ1+L5β2+(L1−L4−L5)β3+L7αβ11+L8αβ12+(L2−L7−L8)αβ13+(L4−L7)αβ21+(L5−L8)αβ22+(L1−L2−L4−L5+L7+L8)αβ23(6.6) $\begin{array}{l} Lβ=L1 μ +L2 α_{1} + (L1 - L2) α_{2} + L 4 α β_{1} + L5 β_{2} + (L1 - L4 - L5) β_{3} \\ + L7 α β_{11} + L8 α β_{12} + (L2 - L7 - L8) α β_{13} + (L4 - L7) α β_{21} & (6.6) \\ + (L5 - L8) α β_{22} + (L1 - L2 - L4 - L5+L7+L8) α β_{23} \end{array}$

for some specific values of L1 through L8. The various tests in PROC GLM test hypotheses of the form H₀: Lβ =0.

Coefficients for any specific estimable function are constructed by assigning values to the individual L’s. For example, setting L2=1 and all others equal to 0 provides the estimable function α₁ − α₂ + αβ₁₃ − αβ₂₃. It is clear, however, that no estimable function can be constructed in this manner to equal 1 or 2 individually. That is, no matter what values you choose for L1 through L8, you cannot make Lβ = α₁ or Lβ = α₂. This is because α₁ and α₂ are non-estimable functions; without additional restrictions there is no linear function of the data whose expected value is α₁ or α₂.

6.3.7.2 Interpreting Sums of Squares Using Estimable Functions

The coefficients required to construct estimable functions for each effect in the MODEL statement are available for any type of sum of squares requested as an option in the MODEL statement. For example,

model y = a b a*b / e e1 e2 e3;

will provide the general form, the coefficients of estimable functions for Types I, II, and III, and the corresponding sums of squares for each effect listed in the MODEL statement.

Table 6.3 gives the coefficients of the Type I, Type II, and Type III estimable functions associated with factor A. Types III and IV are identical for this example because all n_ij > 0.

Table 6.3Estimable Functions for Factor A

Effect			Type I Parameters	Type II Coefficients	Type III Coefficients
Intercept			0	0	0
A	1		L2	L2	L2
	2		–L2	–L2	–L2
B	1		0.05*L2	0	0
	2		0.325*L2	0	0
	3		–0.375*L2	0	0
A*B	1	1	0.25*L2	0.2697*L2	0.3333*L2
	1	2	0.625*L2	0.5056*L2	0.3333*L2
	1	3	0.125*L2	0.2247*L2	0.3333*L2
	2	1	–0.2*L2	–0.2697*L2	–0.3333*L2
	2	2	–0.3*L2	–0.5056*L2	–0.3333*L2
	2	3	–0.5*L2	–0.2247*L2	–0.3333*L2

All coefficients involve only one element (L2), since the A effect has only 1 degree of freedom. Estimable functions are constructed by assigning specific values to the elements. For factor A, with only one variable, the best choice is L2=1. Application to the Type I coefficients generates the estimable function

Lβ=α1−α2+0.05β1+0.325β2−0.375β2+0.25αβ11+0.625 αβ12+0.125αβ13−0.2αβ21−0.3αβ22−0.5αβ23 $\begin{array}{l} Lβ = α_{1} - α_{2} \\ + 0.05 β_{1} + 0.325 β_{2} - 0.375 β_{2} \\ + 0.25 α β_{11} + 0.625 α β_{12} + 0.125 α β_{13} \\ - 0.2 α β_{21} - 0.3 α β_{22} - 0.5 α β_{23} \end{array}$

Thus, using the Type I sum of squares in the numerator of an F-statistic tests the hypothesis Lβ = 0 for this particular Lβ. In addition to α₁ − α₂, this Lβ also involves the function of coefficients of factor B

0.5β₁ + 0.325β₂ − 0.375β₃

as well as a function of the interaction parameters. Actually, this is to be expected, since the Type I function for A is unadjusted—it is based on the difference between the two A factor means (y̅1..−y̅2..) $({\overset{̅}{y}}_{1..} - {\overset{̅}{y}}_{2..})$ .

As explained in Section 6.3.1, the mean for A is

y¯1..=1/8[(2y¯11.+5 y¯12.+y¯13.)] ${\bar{y}}_{1..} = 1 / 8 [(2 {\bar{y}}_{11.} + 5 {\bar{y}}_{12.} + {\bar{y}}_{13.})]$

Each cell mean y̅ij. ${\overset{̅}{y}}_{i j .}$ is an estimate of the function μ + α_ii + β_j + αβ_ij. Omitting for this discussion the interaction parameters, y̅1.. ${\overset{̅}{y}}_{1..}$ is an estimate of

1/8[2(μ+α1+β1)+5(μ+α1+β2)+(μ+α1+β3)]=μ+α1+(0.25β1+0.625β2+0.125β3) $\begin{array}{l} 1 / 8 [2 (μ+ α_{1} + β_{1}) + 5 (μ+ α_{1} + β_{2}) + (μ+ α_{1} + β_{3})] \\ = μ+ α_{1} + (0.25 β_{1} + 0.625 β_{2} + 0.125 β_{3}) \end{array}$

Likewise y̅2.. ${\overset{̅}{y}}_{2..}$ is an estimate of

μ + α₂ + (0.2β₁ + 0.3 β₂ + 0.5β₃)

Hence, (y̅1..−y̅2..) $({\overset{̅}{y}}_{1..} - {\overset{̅}{y}}_{2..})$ is an estimate of

a₁ − α₂ + 0.05β₁ + 0.325β₂ − 0.375β₃

which is the function provided by Type I.

The coefficients associated with A*B provide the coefficients of the interaction terms in the Type I estimable functions. The coefficients associated with A*B are useful for expressing the estimable functions and interpreting the tests in terms of the model. Recall that

μ_ij = μ + α_i + β_j + αβ_ij

A little algebra shows that any estimable Lβ with coefficients as given in Table 6.3 can also be written as

Lβ=L7μ11+L8μ12+(L2−L7−L8)μ13+(L4−L7)μ21+(L5−L8)μ22+(L1−L2−L4−L5+L8)μ23 (6.7) $\begin{matrix} \begin{matrix} L β = L 7 μ_{11} + L8 μ_{12} + (L2 - L7 - L8) μ_{13} + (L 4 - L 7) μ_{21} \\ + (L 5 - L 8) μ_{22} + (L 1 - L 2 - L 4 - L 5 + L 8) μ_{23} \end{matrix} & (6.7) \end{matrix}$

This is easily verified by starting with Lβ in equation (6.7), replacing each μ_ij with μ + α_i + β_j + αβ_ij, and combining terms to end up with the original expression for Lβ in equation 6.6. For example, after factoring out L2, we see that the Type I estimable function for A is

Lβ=L2(0.25μ11+0.625μ12+0.125μ13−0.2μ21−0.3μ2−0.5μ23) $L β = L2(0 .25 μ_{11} +0 .625 μ_{12} +0 .125 μ_{13} - 0.2 μ_{21} - 0 .3 μ_{2} - 0 .5 μ_{23})$

Thus the Type I F-test for A tests the hypothesis H₀: Lβ = 0, or equivalently,

H₀: 0.25 μ₁₁ + 0.625 μ₁₂ + 0.125 μ₁₂ = 0.2 μ₂₁ + 0.3 μ₂₂ + 0.5 μ₂₃

This is the hypothesis that is tested in Table 6.1. Since the coefficients are functions of the frequencies of the cells, the hypothesis might not be particularly useful.

Applying the same method to the Type II coefficients for A, we have, after setting L2=1,

Lβ=.2697μ11+.5056μ12+.2247μ13−.2697μ21−.5056μ22−.2247μ23=.2697(μ11−μ21)+.5056(μ12−μ22)+.2247(μ13−μ23) $\begin{matrix} L β & = .2697 μ_{11} + .5056 μ_{12} + .2247 μ_{13} - .2697 μ_{21} - .5056 μ_{22} - .2247 μ_{23} \\ = .2697 (μ_{11} - μ_{21}) + .5056 (μ_{12} - μ_{22}) + .2247 (μ_{13} - μ_{23}) \end{matrix}$

This expression sheds some light on the meaning of the Type II coefficients. Recall that the Type II SS are based on a main-effects model. With no interaction we have, for example, μ₁₁ − μ₂₁ = μ₁₂ − μ₂₂ = μ₁₃ − μ₂₃. Let Δ denote the common value of these differences. The Type II coefficients are the coefficients of the best linear unbiased estimate Δˆ $\hat{Δ}$ of Δ given by

Δˆ = Σwj (y̅1j. − y̅2j.) $\hat{Δ} = Σ w_{j} ({\overset{̅}{y}}_{1 j .} - {\overset{̅}{y}}_{2 j .})$

where

wj=n1jn2j/(n1j+n2j)∑k(n1kn2k/(n1k+n2k)) $w_{j} = \frac{n_{1 j} n_{2 j} / (n_{1 j} + n_{2 j})}{\sum_{k} (n_{1 k} n_{2 k} / (n_{1 k} + n_{2 k}))}$

For example,

.2697 = (2)(2) / (2 + 2)(2)(2) / (2 + 2) + (5)(3) / (5 + 3) + (1)(5) / (1 + 5) $.2697 = \frac{(2) (2) / (2 + 2)}{(2) (2) / (2 + 2) + (5) (3) / (5 + 3) + (1) (5) / (1 + 5)}$

Note that these are functions of cell frequencies and thus do not necessarily generate meaningful hypotheses.

Type III (and Type IV) estimable functions for A likewise (Table 6.2) do not involve the parameters of the B factor. Further, in terms of the parameters of the cell means (μ model)

Lβ = 1/3(μ₁₁ + μ₁₂ + μ₁₃) − 1/3(μ₂₁ + μ₂₂ + μ₂₃)

Thus the Type III F-statistic tests H0 : μ̅1.=μ̅2. $H_{0} : {\overset{̅}{μ}}_{1.} = {\overset{̅}{μ}}_{2.}$ as stated in Table 6.1. Note again that this hypothesis does not involve the cell frequencies, n_1j.

Table 6.3 gives the coefficients of estimable functions for factor B. There are 2 degrees of freedom for factor B, thus two elements, L4 and L5. Consider first the Type III coefficients because they are more straightforward. The Type III F-test for factor B is testing simultaneously that any two linearly independent functions are equal to 0; functions are obtained by selecting two choices of values for L4 and L5.

The simplest choices are to take L4=1, L5=0 and L4=0, L5=1. This gives the estimable functions

L₂β = β₂ − β₃ + (αβ₁₂ − αβ₁₃ + αβ₂₂ − αβ₂₃)/2

and

L₂β = β₂ − β₃ + (αβ₁₂ − αβ₁₃ + αβ₂₂ − αβ₂₃) / 2

In terms of the μ model, this gives

L₁β = (μ₁₁ − μ₁₃ + μ₂₁ − μ₂₃) / 2

and

L₂β = (μ₁₂ − μ₁₃ + μ₂₂ − μ₂₃) / 2

Thus, the Type III F-statistic tests

H0 : μ̅.1 = μ̅.3 and μ̅.2 = μ̅.3 $H_{0} : {\overset{̅}{μ}}_{.1} = {\overset{̅}{μ}}_{.3} a n d {\overset{̅}{μ}}_{.2} = {\overset{̅}{μ}}_{.3}$

or, in equivalent form

H0:μ¯¯.1=μ¯¯.2=μ¯¯.3(6.8) $\begin{matrix} H_{0} : {\bar{μ}}_{.1} = {\bar{μ}}_{.2} = {\bar{μ}}_{.3} & (6.8) \end{matrix}$

Another set of choices is L4=1, L5=1 and L4=L5=–1. These lead to

H0 : μ̅.1 = μ̅.2 and (μ̅.1 + μ̅.2) / 2 = μ̅.3 $H_{0} : {\overset{̅}{μ}}_{.1} = {\overset{̅}{μ}}_{.2} a n d ({\overset{̅}{μ}}_{.1} + {\overset{̅}{μ}}_{.2}) / 2 = {\overset{̅}{μ}}_{.3}$

which is also equivalent to equation (6.8). Therefore, both sets of choices lead to the same H₀. It is significant that the H₀ in equation (6.8) is independent of cell frequencies and, thus, is desirable for the usual case where cell frequencies are unrelated to the effects of the factors. Table 6.4 gives the coefficients of the Type I & Type II and Type III & Type IV estimable functions associated with factor B.

Table 6.4 Estimable Functions for Factor B

Effect			Type I & Type II Coefficients	Type III & Type IV Coefficients
Intercept			0	0
A	1		0	0
	2		0	0
B	1		L4	L4
	2		L5	L5
	3		–L4–L5	–L4–L5
A*B	1	1	0.401L4-0.1236L5	0.5*L4
	1	2	–0.1658L4+0.3933L5	0.5*L5
	1	3	–0.2416L4–0.2697L5	–0.5L4–0.5L5
	2	1	0.5899L4+0.1236L5	0.5*L4
	2	2	–0.1685L4+0.6067L5	0.5*L5
	2	3	–0.7584L4+0.7303L5	–0.5L4–0.5L5

Recall that since B followed A in the MODEL statement, the Type I SS for B is the same as the Type II SS for B. The coefficients are again a function of the cell frequencies. The nature of the function is not easy to determine but is similar to the Type II coefficients for factor A (see Table 6.1).

As a matter of computational interest, the Type II estimable functions for B are equal to the Type III estimable functions if there is no interaction. For then αβ_ij = 0, and Table 6.4 shows that the coefficients for α_i and β_j are the same for Types II and III. This is not to say that the Type II SS and Type III SS will be equal, but rather that they give tests of the same hypothesis when there is no interaction. If, indeed, there is no interaction, then the Type II F-test is more powerful than the Type III F-test. The assumption of no interaction is, however, probably rarely satisfied in nature.

Table 6.5 Estimable Functions for A*B

Effect		Coefficients for All Types
Intercept		0
A	1	0
0	2	0
B	1	0
	2	0
	3	0
A*B	1	L7
	1	L8
	1	–L7–L8
	2	–L7
	2	–L8
	2	L7+L8

Table 6.5 gives the coefficients of the estimable function for the A*B interaction and, again, two elements are available. In this case all types of effects give the same results, since for each type the interaction effects are adjusted for all other effects. The estimable functions can be readily interpreted if the coefficients are recorded in the 2×3 cell format implied by the factorial array. For example, let L7 = –1 and L8 = 0; the resulting function can be illustrated as follows:

		B
		1	2	3
A	1	–1	0	+1
A	2	+1	0	–1

This is the interaction in the 2×2 subtable consisting of the columns for B1 and B3, or the interaction of the contrast (α₁ − α₂) with the contrast (β₁ − β₂).

6.3.7.3 Estimating Estimable Functions

Estimates of estimable functions can be obtained by multiplying the vector of coefficients by the vector of parameter estimates using the SOLUTION option. For example, letting L2=1 in Table 6.3 for Type I results in the vector

L = (0 1 −1 .05 .325 −.375 −.375 .25 .625 .125 −.2 −3 −.5)

The vector of parameter estimates (see Output 6.7) is

β̂ = (5.4 −2.4 .0 −2.9 2.933 .0 5.4 −1.33 .0 .0 .0 .0)

The estimate Lβˆ $L \hat{β}$ = 1.075 is equal to y̅1..−y̅2.. ${\overset{̅}{y}}_{1..} - {\overset{̅}{y}}_{2..}$ , the unadjusted treatment difference. Likewise, using the Type III coefficients gives an estimate of 1.044, which is the difference between the two least-squares means of the A factor (see Table 6.8).

Variances of these estimates can be obtained by the standard formula for the variance of a linear function. The estimated variance of the estimates is s²(L(X′X)⁻ L′), where s², the estimated error variance, is the residual mean square from the overall analysis of variance, and (X′X)– is the generalized inverse of the X′X matrix generated by the dummy variables. The square root of this variance provides the standard error of the estimated function; hence, a t-test is readily constructed.

Since the LSMEANS, CONTRAST, and ESTIMATE statements offer methods of estimating and testing the most desired functions, the preceding technique is seldom employed. However, if these statements produce functions that are nonestimable, the generation of estimates from scratch may provide otherwise unobtainable estimates.

6.3.7.4 Interpreting LSMEANS, CONTRAST, and ESTIMATE Results Using Estimable Functions

Sometimes it may be useful to examine the construction of estimable functions associated with the LSMEANS, CONTRAST, and ESTIMATE statements. Information on the construction of these functions is available by specifying E as one of the options in the statement. (Don’t confuse this with the E=effect option, which specifies an alternate error term.) Output 6.12 shows the results from the E option:

proc glm;
   class a b;
   model y=a b a*b / solution;
   contrast 'B 2 vs 3' b 0 -1 1 / e;
   estimate 'A in B1' a -1 1 a*b -1 0 0 1 / e;
   lsmeans a / stderr e;

Output 6.12 Estimable Functions for the LSMEANS, CONTRAST, and ESTIMATE Statements

Coefficients for Contrast A*B 2 vs 3

		Row 1

Intercept		0

a	1	0
a	2	0

b	1	0
b	2	0
b	3	0

a*b	1 1	0
a*b	1 2	1
a*b	1 3	-1
a*b	2 1	0
a*b	2 2	-1
a*b	2 3	1


Coefficients for Estimate A in B1

		Row	1

Intercept		0
a	1	-1
a	2	1

b	1	0
b	2	0
b	3	0

a*b	1 1	-1
a*b	1 2	0
a*b	1 3	0
a*b	2 1	1
a*b	2 2	0
a*b	2 3	0


Coefficients for a Least Square Means

		a Level
Effect		1	2

Intercept		1	1
a	1	1	0
a	2	0	1
b	1	0.33333333	0.33333333
b	2	0.33333333	0.33333333
b	3	0.33333333	0.33333333
a*b	1 1	0.33333333	0
a*b	1 2	0.33333333	0
a*b	1 3	0.33333333	0
a*b	2 1	0	0.33333333
a*b	2 2	0	0.33333333
a*b	2 3	0	0.33333333

The hypothesis tested by the CONTRAST statement is

H₀: − β₂ + β₃ − .5αβ₁₂ + .5αβ₁₃ − .5αβ₂₂ + .5αβ₂₃ = 0

or in the μ-model notation

−.5μ₁₂ + .5 μ₁₃ − .5μ₂₂ + .5μ₂₃

Note that the coefficients of the interaction effects are supplied by the procedure.

The function estimated by the ESTIMATE statement is

Lβ = − α₁ + α₂ − αβ₁₁ + αβ₂₁

or in μ-model notation

Lβ = − μ₁₁ + μ₂₁

The least-squares means for A1 estimates is

μ + α₁ + (β₁ + β₂ + β₃ + αβ₁₁ + αβ₁₂ + αβ₁₃) / 3

or, in μ-model notation

(μ₁₁ + μ₁₂ + μ₁₃) / 3

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Effect	Type I	Type II	Type III = Type IV
A	R (α \| μ) $R (α \| μ)$	R (α \| μ, β) $R (α \| μ, β)$	R (α \| μ, β, αβ) $R (α \| μ, β, α β)$
B	R (β \| μ, α) $R (β \| μ, α)$	R (β \| μ, α) $R (β \| μ, α)$	R (β \| μ, α, αβ) $R (β \| μ, α, αβ)$
A*B	R (αβ \| μ, α, β) $R (αβ \| μ, α, β)$	R (αβ \| μ, α, β) $R (αβ \| μ, α, β)$	R (αβ \| μ, α, β) $R (αβ \| μ, α, β)$

Table of Contents for Chapter 6 Understanding Linear Models Concepts

Create new playlist

Sign In

Sign Up

6.1 Introduction

6.2 The Dummy-Variable Model

6.2.1 The Simplest Case: A One-Way Classification

6.2.2 Parameter Estimates for a One-Way Classification

6.2.3 Using PROC GLM for Analysis of Variance

6.2.4 Estimable Functions in a One-Way Classification

6.3 Two-Way Classification: Unbalanced Data

6.3.1 General Considerations

6.3.2 Sums of Squares Computed by PROC GLM

6.3.3 Interpreting Sums of Squares in Reduction Notation

6.3.4 Interpreting Sums of Squares in the μ-Model Notation

6.3.5 An Example of Unbalanced Two-Way Classification

6.3.6 The MEANS, LSMEANS, CONTRAST, and ESTIMATE Statements in a Two-Way Layout

6.3.7 Estimable Functions for a Two-Way Classification

6.3.7.1 The General Form of Estimable Functions

6.3.7.2 Interpreting Sums of Squares Using Estimable Functions

6.3.7.3 Estimating Estimable Functions

6.3.7.4 Interpreting LSMEANS, CONTRAST, and ESTIMATE Results Using Estimable Functions

Table of Contents for
Chapter 6 Understanding Linear Models Concepts