10.1. Introduction

In their "Perspectives on Large-Scale Cardiovascular Clinical Trials for the New Millennium," Dr. Eric Topol and colleagues (1997) provide a fine preamble to our discussions:

The calculation and justification of sample size is at the crux of the design of a trial. Ideally, clinical trials should have adequate power, ≈90%, to detect a clinically relevant difference between the experimental and control therapies. Unfortunately, the power of clinical trials is frequently influenced by budgetary concerns as well as pure biostatistical principles. Yet an underpowered trial is, by definition, unlikely to demonstrate a difference between the interventions assessed and may ultimately be considered of little or no clinical value. From an ethical standpoint, an underpowered trial may put patients needlessly at risk of a new therapy without being able to come to a clear conclusion.

In addition, it must be stressed that investigators do not plan studies in a vacuum. They design them based on their knowledge and thoughtful conjectures about the subject matter, on results from previous studies, and on sheer speculation. They may already be far along in answering a research question, or they may be only beginning. Richard Feynman, the 1965 Nobel Laureate in Physics and self-described "curious character," stated this somewhat poetically (1999, P. 146):

Scientific knowledge is a body of statements of varying degrees of uncertainty,

some mostly unsure,

some nearly sure,

none absolutely certain.

This reflects what we will call The March of Science, which for clinical research is sketched in Figure 10.1.

As we step forward, our sample-size considerations need to reflect what we know. At any point, but especially at the beginning, the curious character inside of us should be free to conduct observational, exploratory, or pilot studies, because as Feynman said, "something wonderful can come from them." Such studies are still "scientific" but they are for generating new and more specific hypotheses, not testing them. Accordingly, little or no formal sample-size analyses may be called for. But to become "nearly sure" about our answers, we typically conduct convincing confirmatory studies under specific protocols. This often requires innovative and sophisticated statistical planning, which is usually heavily scrutinized by all concerned, especially by the reviewers. No protocol is ever perfect, but as the New York Yankee catcher and populist sage, Yogi Berra, Don't make the wrong mistake.

Figure 10-1. March of science in clinical research

Medical research is still dominated by traditional (frequentist) hypothesis testing and classical power analysis. Here, investigators and reviewers typically ask, "What is the chance (inferential power) that some given key p-value will be significant, i.e. less than some specified Type I error rate, α?" Thus, one cannot understand inferential power without knowing what p-values are and what they are not. Researchers rely on them to assess whether a given null hypothesis is true, but p-values are random variables, so they can mislead us into making Type I and II errors. The respective classical error rates are called α and β = 1 − power. All of this is reviewed in detail.

This chapter also considers other error rates that relate directly to two crucial questions that researchers should address. First, if a test turns out to be significant, what is the chance that its null hypothesis is actually true (Type I error)? A great many researchers think that this chance is at most α. They might say something like, "We will use α = 0.05 as our level for statistical significance, so if we get a significant result, we will be more than 95% confident that the treatments are different with respect to this outcome." Researchers want to be able to make statements like this, but this particular logic is wrong. Likewise, if a test turns out to be non-significant, they might ask, "What is the chance that its null hypothesis is actually false (Type II error) to some particular degree?" Many researchers think this is the usual Type II error rate, β. It is not.

So, what is an appropriate way to do this? We describe something we call the crucial Type I error rate (here, α*), which is the chance that the null hypothesis is true even after obtaining significance, p ≤ α. Similarly, the crucial Type II error rate (β*) is the chance that the null hypothesis is false in some particular way even though a p > α result has occurred. We argue that α* and β* are just as relevant (if not more so) than α and β. We demonstrate how crucial error rates can be guesstimated if investigators are willing to state and justify their current belief about the chance that the null hypothesis is indeed false. Importantly, for a given α level, greater inferential power reduces both crucial error rates.

All these concepts will be developed and illustrated by carrying out a sample-size analysis for a basic two-group trial to compare two treatments for children with severe malaria: usual care only versus giving an adjuvant drug known to reduce high levels of lactic acid. Two planned analyses will be covered. The first compares the groups with respect to a binary outcome, death within the first ten days. The second compares them on a continuous outcome, the ratio of two amino acids measured in plasma, using baseline values as covariates. The principles covered apply to any traditional statistical test being used to try to reject a null hypothesis, including analyses far more complex than those discussed here.

While obtaining an appropriate and justifiable sample size is important, going through the analytical process itself may be just as vital in that it forces the research team to work collaboratively with the statistician to delineate and critique the rationale undergirding the study and all the components of the research protocol. The investigators must specify tight research questions, the specific research design, the various measures, and an analysis plan. They must come to agree on and justify reasonable conjectures for what the "infinite dataset" may be for their study. In essence, they must imagine how the entire study will proceed before the first subject is recruited. The "group think" on this can be invaluable.

Our reader audience includes both collaborating statisticians and content investigators. While the examples given here involve clinical trials, the principles apply broadly across all of science. Therefore, we present almost no mathematical details.

The SAS procedures POWER and GLMPOWER are the primary computational engines, but we only use a small portion of their capabilities. Far more information can be found in the current SAS/STAT User's Guide.

To save space, some SAS code has been shortened and some output is not shown. The complete SAS code and data sets used in this book are available on the book's companion Web site at http://support.sas.com/publishing/bbu/companion_site/60622.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.231.15