Step 2: Getting Data

The Importance of Data in Statistics

Data is at the heart of statistics. Basically, data is information. It is any type of information about the things you are trying to analyze. It may be information about customers, or companies, or shares, like the Accu-Phi sales and other customer data. It may come to you in numbers, words, phrases, sentences, pictures, or other formats. If you can record it in a consistent and retrievable way, it is data.
For instance, say you are a manager of an automobile manufacturing plant. You might want to understand your production efficiencies better. You need information to do so, perhaps speed of production of each car produced, number of defects of each car, and the like. This is raw data.
Data gathering and cleaning is a huge step, because gathering the wrong information means you will get the wrong answers. (You have doubtless heard the expression “GIGO,” which stands for “Garbage in, Garbage out.” This is especially true in statistics, where wrong data means your study may well be nonsense.)
Continuing with the automobile manufacturing example, if you get inaccurate data on the speeds of production then any further analysis will have the wrong answers and your decisions will be made on this wrong information.
I discuss the critical data step in far more detail in Chapter 3 and Chapter 4. For now, I summarize the major data challenges as follows:
  • Data challenge 1: Focusing on the right observations (your population and samples). For instance, who or what are you studying? Which people, companies, and the like?
  • Data challenge 2: Choosing issues to analyze (constructs). Are you interested in demographics of people, profitability of companies, economic variables of countries? It’s important to pick the right constructs and constructs that really matter.
  • Data challenge 3: Once you have gathered your data, making sure it has been cleaned, that is, it has no major faults that could derail your analysis.

Data Are Not Statistics

One thing that people often fail to understand at first is that data itself is not statistics, and data has limited use without statistics.
In the Chapter 1 Accu-Phi example, they have a customer dataset describing demographics, sales, and other features of each customer. This is useful for sure: your customer representatives can access a particular customer’s details when meeting him or her, which may lead to better interactions.
However, individual raw data does not really help us to answer the bigger and broader questions about our situation. As we’ll see in the next section, statistics are really summaries of the data. Statistics help to explore and describe many data points simultaneously, as a whole. This in turn allows for a more general view of whatever you are studying. For example, Accu-Phi want a general view of what the data tells them about the drivers of sales as a whole, which could help us create more targeted and effective marketing initiatives. The statistics would be a few summary numbers that inform us about this concept or question.
Last updated: April 18, 2017
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.66.94