Also available for most statistics
are single values that in some way reflect relative inaccuracy. For
instance, many statistics come with another value called the standard
error, which reflects how inaccurate that statistic is (i.e. how well
the statistic represents the population). The bigger the standard
error or other inaccuracy number, the less accurate we think the statistic
is. In
Figure 12.8 SAS example of statistics assessing accuracy of a mean we see a column
called “Std. Error” which gives the measure of relative
inaccuracy for the average spending figure of R1,500.
To actually assess a
raw measure of inaccuracy, like the standard error, can be difficult
or close to impossible. As a result, we usually translate this inaccuracy
number into a standard value called a p-value (“probability
value”).
Most important to know
is that p-values always test the null hypothesis as discussed above,
where the null hypothesis is that the statistic is zero. Unlike confidence
intervals, which can assess accuracy for its own sake or can test
benchmarks other than zero, p-values always refer to the “Statistic
= 0” null hypothesis.
As an illustration of
p-values, let’s continue with the drug testing scenario. Say
that you wish to test an HIV/AIDS drug. You give one group of HIV
patients the drug, and compare their reactions to a control group
that is not given the drug. Most importantly, you wish to compare
a blood statistic called the “CD4 count” between the
groups. If your drug works then you expect to see a better change
over time in CD4 counts for the group that gets the drug than for
the group that does not. Therefore, your null hypothesis (which you
hope to reject) is:
Hull
hypothesis: The drug makes no difference in CD4 counts between the
two groups (or, more formally, the difference between the two groups
= 0).
The alternate hypothesis
(for which you hope to find evidence) is:
Alternate
hypothesis: The drug leads to better CD4 counts in the group given
the drug (or, more formally, CD4 for the group given drug –
CD4 for the group not given drug > 0).
Where does the p-value
come into all this? Well, say that your trial comes up with the conclusion
that changes in CD4 counts are on average 10 points better in the
group given the drug. Assuming you have designed the drug trial to
screen out all other possible causes, there are two remaining possible
reasons for this difference, namely a) the drug caused the difference,
and b) the drug caused little difference and really the differences
are just due to sheer random chance (people react randomly to things
all the time).
The p-value is used
for such hypothesis-based decisions. The p-value essentially tells
you how likely it is that the null hypothesis is true compared to
your found statistic, or that your statistic is due to random chance.
Say I tell you that given the CD4 differences between the two groups
and the inaccuracies, there is a 3% chance that the difference is
actually just zero. In other words that although you found a difference
of 10, this is just a result of random reactions and the difference
could just as easily be zero, at the same level of confidence. The
3% is the p-value, and you may essentially say “I’m
happy with the 3% chance of randomness. On the flip side this means
that I can be 97% confident that my drug trial result is sufficiently
far away from zero and accurate to be bigger than zero!”
This is then the primary
use of the p-value: to provide an assessment of your statistic versus
random chance. The lower the p-value, the greater the chance that
your statistic is not random chance.
If we are not using
confidence intervals, p-values are what we usually use in order to
assess relative accuracy of a statistic. Appendix A of this chapter
explains how we get from the measure of inaccuracy to the p-value,
but you don’t really need to know this. The basic analyst can
bypass the measure of inaccuracy and look directly at the p-value.
The rest of this section therefore explains the use of p-values.
The following bullets
summarize the use of p-values:
-
P-values are expressed in proportions
that equate to the opposite of confidence relative to some benchmark,
essentially measuring lack of confidence. For example, we may obtain
a p-value of 0.02. In simple terms, this would equate to 2% ”lack
of confidence” in our result; therefore we would be able to
say that we have 98% statistical confidence in our result rather than
the benchmark.
-
Therefore, the
smaller the p-value the more accurate our statistic.
We usually say that if the p-value is less than .05 or .01 then the
statistic is significant at the 5% or 1% level, which equates to 95%
and 99% confidence respectively.
-
A p-value of 5% or 1% means that
there is a 5% or 1% chance of wrongly rejecting the hypothesis that
the estimate is not accurate. When you do research and measure a statistic,
you are implicitly proposing that your statistic is accurate. The
standard error provides a measure of how variable and therefore potentially
inaccurate your statistic might be. If our statistic is so variable
that we are not likely to find the same estimate again, then we would
reject our research proposition that it is accurate. Based on this,
we can conclude whether our statistic is sufficiently accurate.
-
Normally we want a low p-value
(less than, say, 0.05), because this reflects the conclusion that,
in less than (say) 5% of similar studies, we would expect to find
a significantly different statistical estimate to the one that we
found.
P-values
with SAS as an Example
In SAS, p-values are
designated “Pr >.”
Figure 12.12 Mean and correlations illustrating standard error and p-values shows two outputs.
On the top is the earlier statistical output for our mean, and highlights
the standard error. We would typically not try to interpret this directly.
Instead, we would usually look to a p-value. The lower picture shows
the p-values for some correlations (see the “Pr < |r|”
rows) – the lower the p-value the more accurate our statistic;
here the p-value is less than .0001 so we believe it is very accurate,
as discussed in the figure.
Therefore, if you are
not using confidence intervals (or if you supplement the use of intervals
with p-values), look for the p-value to be lower than .05 or .01 and
interpret accordingly. (Note that some fields of study allow for higher
p-values, such as .10 equating to significance at 10% and therefore
90% confidence).