So far, I have said that when looking for patterns in data you get a certain class
of statistical tests or numbers – often called fit statistics – that help you to do
this.
However, the point that has been made above is that assessing whether your data has
a certain pattern or not is only a step along the path. The characteristics of the
pattern – if there is one – are what finally count, not the fact that there is a pattern.
The characteristics of the pattern are also described by statistical numbers, and
it is these final statistical parameters and coefficients that we care about.
Readers often get confused between the intermediate fit statistics that help identify
patterns and the final statistical coefficients that actually describe the pattern.
In any advanced statistical techniques there are pages of numbers. Much of the output
is often the intermediate fit assessment just telling you if there is an appropriate
pattern in the data or not. But don’t forget that ”holy grail” of the final few statistics
that describe the pattern itself, if there is a pattern, and therefore tell you what
you wanted to know about the world.
In case this still seems a little confusing, let me give a metaphor for the difference
between intermediate fit statistics and final statistical parameters. Say that you
want to buy a car. You ultimately care about the combinations of price, age, type,
safety rating, etc. It is the statistics on these final characteristics (price etc.)
that you care about. Being an individual, you may have preferences on the optimal
combination of price, age, and the like.
Now, there are hundreds of thousands of cars for sale out there. Say that you download
some database of cars for sale. You could look through it one record at a time, but
this might be tough. Instead, you write a computer algorithm that identifies all the
cars that are close to your preferences (this is a bit like data mining).
The algorithm might tell you that a certain Subaru is 88% close to your specifications.
This is an intermediate fit statistic that helps you understand fit of data to your
preferred pattern. The 88% does not actually tell you what the price is exactly, or
if the high score has to do with safety more than age, for example. However, the 88%
fit statistic has helped you along the way. Now you can look at the actual price,
safety and other characteristics that are the final statistical coefficients of interest.