8.10 Interpreting p-Values

It is common in many fields to report p-values from significance tests in scientific publications. Some statisticians are critical to this focus on the p-value because its meaning is frequently over- or misinterpreted. It is important to remember that the p-value is connected with the null hypothesis and nothing else. It measures the probability of obtaining data that are at least as extreme as the actual sample, under the assumption that the null hypothesis is true. Technically, this is not equivalent to the probability of the null hypothesis being true.

As explained in the example about Sven's fuel consumption, a significance test helps us to draw a more robust conclusion where a subjective judgment of the data could lead to ambiguities. Although the significance level is a somewhat arbitrary number, it is a quantitative means for motivating a conclusion. Nonetheless, the p-value is not entirely objective. Consider the t-test, for example. Here, the p-value is directly related to the observed t-value (a great tobs corresponds to a small p). Equation 8.1 shows that the magnitude of tobs depends on two things. It becomes large if the sample mean deviates substantially from the hypothesized population mean, that is, if there is a large effect in the data. But it also becomes large if the sample size n increases. In other words, it measures the effect in relation to the uncertainty in the data.

With a large sample size the uncertainty decreases and even small effects become statistically significant. Smaller samples, on the other hand, require the effect to be greater to stand out from the noise. In connection with the teatime experiment we expressed this by saying that a larger experiment is more sensitive than a small one. The same conclusion was reached in the section about the power of a test: to detect a small difference we need a large sample size. This means that the p-value says more about the precision in the data than about the reality of the investigated effect. A high p-value does not say that the effect is inexistent. It says that the experiment is too insensitive to detect it if it exists.

Due to this state of affairs the term “significant” is not entirely adequate, because even a small effect can produce a low p-value if the sample size is sufficiently large. We should at least say that an effect is statistically significant at a specified confidence level. Above all, we should remember that, to scientists, magnitudes of effects and experimental errors are of primary interest. We should not substitute these with a p-value but rather supplement them with one. After all, the p-value is a statement about your particular experiment more than a statement about reality.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.34.39