One-Way Classification

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4.3. One-Way Classification

In the previous section, we considered an interlaboratory study where four bivariate observations corresponding to two different methods were made in three different laboratories. The purpose of the study was to compare the three laboratories and decide if these laboratories provide, on the average, the same bivariate measurements. The three groups or classes of interest were three laboratories, which define a categorical variable (or factor) with three levels represented by these laboratories. In general, a one-way classification model can be defined for a variable with A levels or groups. If we denote by y_ij, the p by 1 vector of responses on the j^th unit of the i^th group, then we can write

where €_ij is the p by 1 random vector corresponding to error, and is assumed to have a zero vector as the mean and the variance-covariance matrix Σ. The surplus or slack effect of the i^th group is represented by the p by 1 vector τ_i and the p by 1 vector μ is the overall mean. The n_i is the number of observations in the i^th group. If n₁ = ... = n_a, then the design is balanced. A usual assumption, though not crucial but only convenient, is to take .

This assumption implies that the weighted sum of treatment effects is zero, thereby making μ the overall average across all treatment groups. In fact, any other linear restriction on τ₁,...,τ_a can be used instead so long as it provides an additional equation which is linearly independent of the system of normal equations given in Equation 3.3. The purpose of making such an assumption is to devise a convenient method to find an appropriate generalized inverse of X′X, where X is the corresponding design matrix when the above model is represented as a multivariate linear model given in Equation 3.1. In fact, PROC GLM makes the alternative assumption of τ_a = 0 rather than the traditional assumption of adopted by various multivariate analysis and experimental design books. This assumption amounts to setting the effect of last treatment to zero, thereby making μ the mean of the last group. A model with this assumption is often referred to as the reference cell model.

As mentioned earlier, since the choice of the generalized inverse is immaterial when estimating an estimable linear function or performing a testable linear hypothesis, what linear restriction is placed on τ_i does not affect the subsequent analysis in any way. Since for a one-way classification model Rank (X_n×(a+1)) = a, the rank of X is short only by one. As a result, only one linear restriction on τ_i is needed. For the higher order classifications, the number of linearly independent restrictions needed is equal to the rank deficiency of X.

To test the multivariate null hypothesis of no differences in the group means, that is, H₀ :τ₁ =τ₂ = ... = τ_a, it is possible to use any of the four multivariate tests described in Chapter 3, after making the appropriate modifications in the degrees of freedom. Specifically, the quantity (a+1) (which was (k+1) in the notation of Chapter 3), which was the rank of X in the full rank model of Chapter 3, would be replaced by a, the actual rank of the matrix X.

EXAMPLE 1

Hypothesis Testing, Laboratories Comparison Data (continued) We return to the Jackson (1991) data as presented in Table 4.1. The objective of simultaneously comparing the three laboratories translates to the bivariate null hypothesis,

H₀ : τ₁ = τ₂ = τ₃

against the alternative

H₁ : At least two τ_i are different from each other.

The null hypothesis is testable, as seen earlier, and the four different multivariate tests, namely Wilks' Λ, Pillai's trace, Hotelling-Lawley's trace, and Roy's maximum root test, are available to test H₀. Further, the design is balanced with no missing values and hence the four types of analyses are equivalent, all resulting in identical SS&CP matrices (in fact, for one-way classification, this is true even for unbalanced data). The SAS code to do this analysis is presented in Program 4.1. The program produces Output 4.1.

/* Program 4.1 */

options ls = 64 ps=45 nodate nonumber;
    data jack;
    input lab method1 method2;
    lines;
    1 10.1 10.5
    1 9.3 9.5
    1 9.7 10.0
    1 10.9 11.4
    2 10.0 9.8
    2 9.5 9.7
    2 9.7 9.8
    2 10.8 10.7
    3 11.3 10.1
    3 10.7 9.8
    3 10.8 10.1
    3 10.5 9.6
    ;
    /* Source: Jackson (1991, p. 301). Principal Components.  Copyright
       1991 John Wiley & Sons, Inc.  Reprinted by permission of
       John Wiley & Sons, Inc. */

    Title1 'Output 4.1' ;
    title2 'Balanced One-Way MANOVA';
    proc glm data = jack;
    class lab;
    model method1 method2 = lab/nouni;
    manova h = lab/printe printh ;
    run;
    /* proc glm data = jack ;
    class lab ;
    model method1 method2 = lab/nouni;
    contrast 'Test: lab eff.' lab 1 -1  0,
                              lab 1  0  -1;
    manova/printe printh;
    run; */

The independent variable which defines the classification is denoted by LAB and the two methods specified as METHOD1 and METHOD2 are the dependent variables. We perform the analysis using the GLM procedure. The MANOVA statement performs multivariate analysis. It is important that the variable LAB is specified in the CLASS statement. This enables SAS to create the appropriate X matrix. We could have used any other numeric or nonnumeric coding for the values taken by the class variable LAB, since classification variables can be either character or numeric.

To test the null hypothesis, it suffices to indicate the variable LAB as H=LAB in the MANOVA statement. The PRINTE and PRINTH options enable us to print the SS&CP matrices corresponding to the error and the null hypothesis H₀. We could have also specified the type of SS&CP matrices to be used in the analysis in the MODEL statement but since the four types of analyses are identical in this case, it is not necessary to specify one type over another. As a result, SAS uses the default, Type III analysis.

Example 4.1. Output 4.1

Balanced One-Way MANOVA

                     E = Error SS&CP Matrix

                            METHOD1           METHOD2

          METHOD1            2.7275              2.63
          METHOD2              2.63              2.81


               H = Type III SS&CP Matrix for LAB

                            METHOD1           METHOD2

          METHOD1             1.815            -0.605
          METHOD2            -0.605      0.4466666667


         Manova Test Criteria and F Approximations for
            the Hypothesis of no Overall LAB Effect
   H = Type III SS&CP Matrix for LAB   E = Error SS&CP Matrix

                      S=2    M=-0.5    N=3

Statistic               Value        F    Num DF  Den DF  Pr > F

Wilks' Lambda          0.069895    11.13       4      16  0.0002
Pillai's Trace         0.971691   4.2522       4      18  0.0135
Hotelling-Lawley Trace 12.71214   22.246       4      14  0.0001
Roy's Greatest Root    12.66516   56.993       2       9  0.0001

  NOTE: F Statistic for Roy's Greatest Root is an upper bound.
         NOTE: F Statistic for Wilks' Lambda is exact.

For the present data set, the number of data points n = 12 and the number of dependent variables p = 2. The null hypothesis can be written as

and since the left-most matrix in H₀, that is L, has rank 2, the value of r = Rank(L) = 2 (see Table 3.2). The four test statistics corresponding to the null hypothesis are shown in Output 4.1. Recall that according to Table 3.3, the transformation of Wilks' Λ to F statistic is exact, since p = 2 here. As a result,

follows an F (4, 16) distribution. Corresponding to the observed value of F = 11.1299 with df (4,16), the p value is 0.0002. Consequently, we conclude that there is sufficient evidence against H₀ and that there is a significant difference between the laboratories. We reach essentially the same conclusions under the other three test criteria. The output also presents the corresponding SS&CP matrices for error and the hypothesis as results of PRINTE and PRINTH options in the the MANOVA statement. These were respectively denoted by E and H in the previous chapter. It may also be noted that as an alternative to H=LAB, one could also use the following CONTRAST statement,

contrast 'Test: lab eff.'  lab 1 -1 0,
                           lab 1 0 -1;

It is so since the above statement specifies the hypothesis τ₁ = τ₂ and τ₂ = τ₃ together which are then equivalent to our H₀ stated earlier. For completeness we have included this code in Program 4.1 but have commented it out to suppress the corresponding output.

EXAMPLE 2

An Unbalanced One-Way Classification, Diabetic Patients Study Data Crowder and Hand (1990, p. 8) provided this example of unbalanced data. Two groups of subjects, an eight-member normal control group and a six-member group of diabetic patients without complications, were to be compared as part of a medical experiment. The subjects performed a small physical task, and the measurements were recorded on each of the subjects during various subsequent time points. The data in Table 4.2 are these measurements after one minute, five minutes, and ten minutes after performing the task. The question of interest concerns differences between the two groups. In other words, we want to investigate if the two groups differ from each other in their abilities to perform the specified physical task.

This one-way classification data has GROUP as the CLASS variable. On each of the 14 subjects, a trivariate vector of data representing the three measurements at one, five, and ten minutes after performing the physical task, is available. If the respective population mean vectors for the two groups on these measurements are represented as μ⁽¹⁾ = μ + τ₁ and μ⁽²⁾ = μ + τ₂, then the matrix B of regression coefficients can be written as

Table 4.2. Effect of a Physical Task on Hospital Patients
		Time
	Subject	1	5	10
Group 1	1	7.6	8.7	7.0
	2	10.1	8.9	8.6
	3	11.2	9.5	9.4
	4	10.8	11.5	11.4
	5	3.9	4.1	3.7
	6	6.7	7.3	6.6
	7	2.2	2.5	2.4
	8	2.1	2.0	2.0
Group 2	9	8.5	5.6	8.4
	10	7.5	5.0	9.5
	11	12.9	13.6	15.3
	12	8.8	7.9	7.3
	13	5.5	6.4	6.4
	14	3.2	3.4	3.2

To test the equality of the treatment effects (that is, the two groups' abilities to complete the specified physical task) between the two groups for all the three time points, the null hypothesis is

that is,

H₀ : LB = 0,

which can be tested as in Example 1. However, in the present context a more realistic hypothesis may be to test that the amount of change in measurements from one minute to five minutes is equal for the two groups and that the change between the five minutes and ten minutes is equal for the two groups. These can be represented as

The above representation deserves some further explanation. Let us first premultiply B to M, resulting in

The entries in the first row of the above matrix represent the successive differences in the intercept or the overall mean for the three time points. The second row represents the successive treatment differences for Group 1 and the third row represents the same for Group 2. Since we want to compare these differences for the two groups, this is accomplished by premultiplying BM by L = (0 1 -1) and equating the product to zero. This results in the simplification of H₀ : LBM = 0 to

The choices of either L or M indicated here are not unique. For example, L = (0 -1 1) and M = are the other equally legitimate choices for L and M. The tests for the hypotheses of the type H₀:LBM = 0 were described in Chapter 3. In SAS, this objective is attained by specifying the M matrix in the MANOVA statement. SAS automatically identifies the corresponding L matrix from the specification H = GROUP.

The M matrix can be specified using one of the two different yet equivalent ways. We can either explicitly specify all the entries of M in the M= specification of the MANOVA statement as

m =(0 -1 0,
    0 1 -1);

or ask SAS to create it so as to correspond to the measurement differences of interest. The latter is achieved by using the algebraic statements which, in the present context, are

m = min1 - min5,
    min5 - min10;

where MIN1, MIN5, and MIN10 were the names assigned in Program 4.2 to the measurements at 1, 5, and 10 minutes after performing the physical task. It may also be pointed out that when using the former choice, the assignment is column after column separated by commas. Similarly, when the respective columns are written in different lines of the program, the matrix in the SAS code may visually resemble M′ and not M. In Program 4.2, we have used the latter alternative. See Output 4.2 for the results.

/* Program 4.2 */

options ls=64 ps=45 nonumber nodate;
    data phytask ;
    input group min1 min5 min10 ;
    lines ;
    1 7.6  8.7  7.0
    1 10.1 8.9  8.6
    1 11.2 9.5  9.4
    1 10.8 11.5 11.4
    1 3.9  4.1  3.7
    1 6.7  7.3  6.6
    1 2.2  2.5  2.4
    1 2.1  2.0  2.0
    2 8.5  5.6  8.4
    2 7.5  5.0  9.5
    2 12.9 13.6 15.3
    2 8.8  7.9  7.3
    2 5.5  6.4  6.4
    2 3.2  3.4  3.2
    ;
    /* Source: Crowder and Hand (1990, p. 8). */

    title1 'Output 4.2';
    title2 'Unbalanced One-Way MANOVA';
    proc glm data = phytask;
    class group;
    model min1 min5 min10 = group/nouni;
    manova h = group m = min1-min5,
                         min5-min10/printe printh ;
       manova h = intercept m = min1-min5,
                         min5-min10/printe printh ;
    run;

Example 4.2. Output 4.2

Unbalanced One-Way MANOVA

        Manova Test Criteria and Exact F Statistics for
           the Hypothesis of no Overall GROUP Effect
    on the variables defined by the M Matrix Transformation
  H = Type III SS&CP Matrix for GROUP   E = Error SS&CP Matrix

                      S=1    M=0    N=4.5

Statistic               Value        F    Num DF  Den DF  Pr > F

Wilks' Lambda           0.65534   2.8926       2      11  0.0979

Pillai's Trace          0.34466   2.8926       2      11  0.0979
Hotelling-Lawley Trace 0.525925   2.8926       2      11  0.0979
Roy's Greatest Root    0.525925   2.8926       2      11  0.0979


        Manova Test Criteria and Exact F Statistics for
         the Hypothesis of no Overall INTERCEPT Effect
    on the variables defined by the M Matrix Transformation
H = Type III SS&CP Matrix for INTERCEPT   E = Error SS&CP Matrix

                      S=1    M=0    N=4.5

Statistic               Value        F    Num DF  Den DF  Pr > F

Wilks' Lambda          0.872037   0.8071       2      11  0.4709
Pillai's Trace         0.127963   0.8071       2      11  0.4709
Hotelling-Lawley Trace 0.146741   0.8071       2      11  0.4709
Roy's Greatest Root    0.146741   0.8071       2      11  0.4709

Suppose, in addition, that we are also interested in testing the null hypothesis that the changes with respect to time in the levels of overall means (intercepts) are zero. This amounts to testing H₀ : μ₁ = μ₂ = μ₃ or

With the choice of M the same as earlier and L = (1 0 0), this hypothesis also reduces to the form H₀ : LBM = 0. As earlier, the corresponding M will be specified through the M= specification of the MANOVA statement. However, the choice of L in this case is specified by indicating H = INTERCEPT.

The null hypotheses in Equations 4.5 and 4.6 are tested using Program 4.2. Output 4.2 presents portions of the resulting output. We use the default Type III analysis since all the four types of analyses are identical in this case.

In both the cases, since L is a nonzero row vector, it is of rank 1. Consequently, all four multivariate test criteria lead to an exact and identical F test statistic. For the hypothesis in Equation 4.5, the p value corresponding to the test statistic is 0.0979, which indicates that there is some evidence, though it is not very strong, against the null hypothesis. However, with respect to the null hypothesis in Equation 4.6, there is not enough evidence to reject H₀ (p value = 0.4709) and hence we conclude that levels of overall mean are the same for the three periods.

Certain other ways of analyzing repeated measures data are discussed in Chapter 5.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for One-Way Classification

Create new playlist

Sign In

Sign Up

4.3. One-Way Classification

Example 4.1. Output 4.1

Example 4.2. Output 4.2

Table of Contents for
One-Way Classification