5.4. Power Evaluation in a Two-Factor Model for Qt Interval

This section describes the process of estimating the power of a trend test in a two-factor ANOVA model. The application is for a general toxicology study in beagle dogs with a vehicle control (Dose 0) and three groups of increasing doses of a compound (Doses 1, 2 and 3). A sample size of three or four animals per sex per group is generally used. The purpose of the analysis is to evaluate the treatment effects on heart rate-corrected QT intervals by identifying the highest no effect dose level (NOEL). The power evaluation is important because the recent ICH S7B guideline (2004) recommends that the sensitivity and reproducibility of the in vivo test system should be characterized. See also Chiang et al (2004).

The QT interval of the electrocardiogram (ECG) is a common end-point for characterizing potential drug-associated delayed ventricular repolarization in vivo (ICH S7B, 2004). It has been conjectured that delayed ventricular repolarization caused by a compound may lead to serious ventricular tachyarrhythmias in humans (see, for example, Kinter, Siegl and Bass, 2004). Statistical analysis of QT interval data is complicated by the fact that it is inversely correlated with heart rate (HR). Therefore, analysis of QT interval data generally includes an adjustment for the RR interval (the RR interval, expressed in seconds, is equal to 60 times the reciprocal of heart rate, i.e., RR=60/HR). Fridericia's formula (, Fridericia, 1920) is commonly used to obtain the heart rate-corrected QT intervals in nonclinical evaluation.

5.4.1. Sequential Testing Method

At Eli Lilly and Company, QT intervals in a general toxicology study are collected at pre-specified time points both before and after treatment on selected dosing dates. QTc data at each time point are analyzed using a two-factor ANOVA model. Factors in the model include treatment, sex, and their interaction. Effects associated with treatment and treatment-by-sex interaction are tested using an F-test at the 0.05 significance level. Monotonicity of dose response is examined by first testing for an interaction between the treatment linear trend and sex group at the 0.05 significance level. If this interaction is significant, a sequential trend test (Tukey, et al., 1985) on treatment means is performed at the 0.05 significance level for each sex group and for the two sex groups combined. Otherwise, the sequential trend test is performed only for the combined group. For a detailed description of linear and other trend tests in dose-ranging studies, see Chapter 11, "Design and Analysis of Dose-Ranging Clinical Studies".

To define the sequential testing procedure in the two-factor ANOVA model, consider the following four tests:


Test A.

An interaction between the treatment linear trend and sex.


Test B.

The overall treatment linear trend for combined sexes.


Test C.

The treatment linear trend for each sex.


Test D.

Test C when Test A is significant or Test B when Test A is not significant.

Figure 5-1. Flow chart for analysis of two-factor ANOVA

Figure 5.1 is a flow chart of the sequential testing procedure. Test D represents an overall assessment of the study and is of primary interest.

The sequential trend test by Tukey et al. (1985) is carried out as follows. Let μ0 denote the mean QTc effect in the control group (Dose 0) and μ1, μ2, μ3 denote the mean QTc effects at Doses 1, 2 and 3, respectively. The linear trend based on an ordinal dosing scale is examined by testing the following linear contrast:


This trend test is performed in a sequential fashion to identify the highest NOEL. If the linear contrast of the four means is not significant, one concludes that the high dose (Dose 3) is the NOEL and no further testing is performed. If the linear contrast is significant, the test procedure continues to assess the significance of a linear trend in the first three means using the following contrast:


If this linear contrast is not significant, one concludes that the medium dose (Dose 2) is the NOEL and testing stops. However, if the test is significant, the test procedure continues to compare the control and low dose (Dose 1):


If this linear contrast is not significant, the low dose (Dose 1) is the NOEL. Otherwise, the NOEL is not established for this response variable.

5.4.2. Power Evaluation

We evaluate the statistical power of the sequential testing method via simulation. The following parameters are needed: the number of simulations, the sample size in each treatment group, the effect size in each treatment group, and the common variance.

The simulation study will utilize 2000 simulations with n = 3 per sex or n = 4 per sex in each treatment group. Let yijk denote the observation for the k-th animal in the i-th treatment group (i = 0, 1, 2, 3) and j-th sex group (j = 1 denotes males and j = 2 denotes females). Suppose that yijk is normally distributed with mean μij and variance σ2.

The dose-response profile is defined to be flat from Dose 0 up to Dose 2 and assumed to show an increase at Dose 3. In other words,


for j = 1, 2. To allow for a sex difference at Dose 3, we make the following assumptions. The treatment effects in male and female animals at Dose 3 are given by


Here δ1 is the relative treatment difference in the male animals at Dose 3 compared to Dose 0 and, similarly, δ2 is the relative treatment difference in the female animals at Dose 3 compared to Dose 0. It is worth noting that the assumptions given above result in conservative power estimates. If drug effects were present at Doses 1 and 2, the power of the sequential testing method would be greater.

The control means in males and females as well as the variance of QTc interval were estimated from a historical database including pre-treatment QTc values from 91 male and 91 female beagle dogs:


Power simulations are performed under 12 scenarios defined by 6 combinations of δ1 and δ21 = 0.05, δ2 = 0; δ1 = 0.05, δ2=0.05; δ1 = 0.1, δ2 = 0; δ1 = 0.1, δ2 = 0.1; δ1 = 0.15, δ2 = 0; δ1 = 0.15, δ2 = 0.15) and two values of the sample size per sex per treatment group (n = 3 and n = 4). Simulated data are generated by calling the %SIMULQT macro that can be found on the book's companion Web site. For example, the following call simulates QTc interval data for δ1 = δ2 = 0.05 and three animals per sex per treatment group (the parameters of the %SIMULQT macro are defined in the code available on the book's companion Web site):

%simulqt(n_sim=2000, avgmale=236.35, avgfemale=237.88, var=102.98,
    n=3, delta1=0.05, delta2=0.05, out=simul1, seed=2631);

Program 5.3 analyzes the simulated data using the MIXED procedure, with the ESTIMATE statements specifying the linear contrasts of treatment means. The p-values for Tests A, B and C are saved in the TESTS data set. To evaluate the power of Test D, the FINALCOUNT data set converts the three p-values into binary variables (TESTA, TESTB and TESTC) based on the 0.05 significance level. The TESTD variable is then defined as the sum of TESTA*TESTC and (1-TESTA)*TESTB. Finally, the binary variables are summarized using the MEANS procedure to compute the power of each test.

Example 5-3. Power evaluation in the two-factor model for QT interval
proc mixed data=simul1;
    by simul;
    class simul dose sex animal;
    model qtc=sex|dose;
    /* Test A */
    estimate 'Linear trend*sex'      dose*sex 3 −3 1 −1 −1 1 −3 3;
    /* Test B */
    estimate 'Combined trend test'   dose −3 −1 1 3;
    /* Test C */
    estimate 'Trend test in females' dose −3 −1 1 3
                                     dose*sex −3 0 −1 0 1 0 3 0;
    estimate 'Trend test in males'   dose −3 −1 1 3
                                     dose*sex 0 −3 0 −1 0 1 0 3;
    ods output estimates=tests;
data tests (keep=simul label probt);
    set tests;
    if probt<0.0001 then probt=0.0001;
proc transpose data=tests out=testfinal;
    by simul;
    var label probt;
data testfinal (drop=_name_ _label_);
    set testfinal;
    rename col1=flagp col2=combp col3=femalep col4=malep;
    if _name_='Probt';
data finalcount;
    set testfinal;
    retain testA testB testC testD 0;
    testA=(flagp<0.05);
    testB=(combp<0.05);
    testC=max((femalep<0.05),(malep<0.05));
    testD=testA*testC+(1-testA)*testB;
proc means data=finalcount noprint;
    var testA testB testC testD;
    output out=power;
data summary (keep=testA testB testC testD);
    set power;
    if _stat_='MEAN';
proc print data=summary noobs;
    format testA testB testC testD 5.3;
    run;

Example. Output from Program 5.3
testA    testB    testC    testD

0.051    0.421    0.414    0.445

Output 5.3 displays the estimated power of the four tests. The power of the overall analysis (Test D) in this scenario (δ1 = δ2 = 0.05 and n = 3) is clearly too low (44.5%). The estimated powers for all 12 scenarios are summarized in Table 5.1. The power of the sequential testing procedure is above 95% when Dose 3 is expected to prolong the QTc interval by 10% in both male and female beagle dogs.

Table 5-1. Estimated Power of Sequential Testing Procedure in the Two-Factor Model for Qt Interval
Treatment effect in males (δ1)Treatment effect in females (δ2)Sample size per sex per treatment group
n= 3n = 4
0.05023.2%30.2%
0.050.0544.5%59.2%
0.1063.4%78.7%
0.10.195.7%98.8%
0.15093.2%98.4%
0.150.15100.0%100.0%

5.4.2.1. Evaluation of Type I Error Rate in a Two-Factor ANOVA Model

Another important characteristic of the sequential testing method is the probability of a Type I error. The Type I error rates for Tests A and B are 5% and, under an additional assumption of independent multiple tests, the Type I error rate for Test C is 1 − (1-0.05)2 = 9.75%. Although calculation of the Type I error rate for Test D is not as straightforward, it can be evaluated by simulation.

To estimate the Type I error rate associated with Test D, one needs to create a simulated data set under the global null hypothesis of no drug effect (δ1 = δ2 = 0) as shown below:

%simulqt(n_sim=2000, avgmale=236.35, avgfemale=237.88, var=102.98,
    n=3, delta1=0, delta2=0, out=simul1, seed=4641);

The Type I error rate of Test D is computed using Program 5.3.

Example. Output from Program 5.3 (Computation of the Type I error rate)
testA    testB    testC    testD

0.054    0.054    0.098    0.084

Output 5.3 (Computation of the Type I error rate) shows the estimated Type I error probabilities of Tests A, B, C and D. The estimated Type I error rate of the overall analysis (Test D) is 8.4%. Since this rate is greater than 5%, one can consider adjusting the significance levels for Tests A and B downward, i.e., carrying them out at a level that is lower than 0.05.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.28.108