CHAPTER 8

Data-Based Decision Making

Data Collection and Analysis

Everyone would like to make data-based decisions. In fact, why would you not want to use data to make decisions? But as easily as “data-based decisions” rolls off the tongue, getting real meaning out of your data does not happen without some significant effort.

Data–Information–Knowledge–Wisdom

To really make data useful, it needs to go through a transformation process. Data in its rawest form is generally not particularly useful. For example, take an output from a test run. It may barely be readable to humans. Some type of processing (parsing, organizing, sorting) needs to be done to turn it into actual information. Information provides answers, to who, what, when, and where. To move from information to knowledge, additional processing or analysis needs to take place. Knowledge is when information provides some type of conclusion or drives some type of action. Knowledge often answers question of how. Knowledge eventually becomes wisdom when it becomes understood. It drives wiser decisions.

The data–information–knowledge–wisdom model (also known as the DIKW model) is shown in Figure 8.1.

Using a methodology like GQM that we discussed earlier gives your data collection some purpose. The more specific you can be about what questions you are trying to answer, the more targeted your data collection can be. As part of your data planning, it is also important to know who your target audience is and how to customize the message to them. Figure 8.2 shows an example of laying out the data flow. In this example, the field data for return rate is compared to the goal. Drilling down into that data takes different paths for management and engineering. For management, they want to understand the costs associated with those returns. For engineering, they want to understand the reason for the returns.

image

Figure 8.1 Data–information–knowledge–wisdom model

Data Cleansing

Depending on the source, raw data often needs to go through some type of cleansing process. If your data is not reliable, you cannot extract to the information you need. Or worse, your information may be wrong. Cleansing is a form of processing or manipulation to “scrub” the data for issues such as: incomplete records, different formats (e.g., different date types), duplicate records, or mapping data from once source to another source to get the complete record. The more sophisticated your data collection system, the less cleansing you will likely need to do. But for smaller companies and new data collection efforts, the need for data cleansing is common. This work can often be automated, but the analysis team will first need to create a set of rules for how the data is checked and manipulated. Understanding the cleansing process is important to know so everyone has the right expectations. How well you manage the data cleansing will determine your level of data quality.

image

Figure 8.2 Data flow for field returns

Data Quality

Data can originate from spread sheets, other databases, or enterprise business software. Some systems (manual or automated) allow the data to be entered in a free form, that is, the user has the freedom to enter text. An example is filling in a blank form that uses a free form text or a numeric field. These type of free-form fields, if not restricted in some way, are responsible for many errors in data reporting. This situation can occur in spreadsheets as well. Particularly when more than one person works on the same spreadsheet. Even sophisticated programs allow for free-form text fields. Whenever possible, it is recommended to the administrators of such programs that they use drop-down choices that will help ensure data integrity. Some databases can treat upper- and lower-case spellings as different fields so “Firmware,” “firmware,” and “firm ware” will be considered as three different values, even if it is under the same heading in a spreadsheet for example. If you are trying to display a count of ‘Firmware,” the fields that contain “firmware” and ‘firm ware” would be missed because the database is looking for an exact match. This difference in the way data is entered or in the manner it is exported will affect accuracy and the way your project will look in a new report or dashboard. It is important that owner of the data take the time to review their work for:

Accuracy

Completeness

Relevance

Consistency from one load to another

Reliability in its collection and format

Free of duplications

Accuracy. Whether your data originates from a database, spreadsheet, or from another system it needs to be checked for accuracy. Spelling is one area that needs to be checked carefully. As does upper and lower case within the data fields. Misspelled words or a mix of upper and lower case will cause errors in the final presentation. Avoid abbreviations if possible. If there are issues in the data, it will be reflected in your report or dashboard.

Completeness. Check to make sure that all fields are complete and that there is consistency in the data fields. If for example, a date/time stamp is used in a report calculation and your data has some blanks in that field category, there will be inaccuracies in the final presentation. Issues can occur when any field required to do a calculation, such as Return on Investment, is left blank.

Relevance. When extracting data for use in a new report, review it from a standpoint of its relevance to the problem or issues that you are addressing. Everything else is not necessary. Try to capture only what you absolutely need.

Consistency from one load to another. When providing data for an automated report or dashboard be sure that you are providing data in exactly the same format each time. Any change, no matter how small, can affect the way the final output displays the data. For example, if your data is delivered in a spreadsheet and you add a column that you did not have before or changed a column name, that change can break the report automation. You must inform your analyst or database person, of an intended change well in advance of an expected publish date to allow for necessary reporting modifications.

Reliability in its collection and format. You must be consistent in the way that you collect and format your data. If you are extracting your data from a larger source, such as a customer relationship management (CRM) or enterprise resource planning (ERP) system, care should be taken to ensure that the same methodology is used each time you run the extraction. If you need to make changes to your data or its format, send sample data along with information on what has changed to reporting staff, so they can properly accommodate the change.

Free of duplications. Maintaining data quality requires going through the data periodically and scrubbing it. Typically, this involves updating it, standardizing it, and deduplicating records to create a single view of the data, even if it is stored in multiple disparate systems. There are many vendor applications on the market to make this job easier.

Data quality is critical to the process of transforming your information into a standard report or dashboard. This reporting will reflect your organization’s business analytics and business intelligence capabilities. The emphasis on data quality assures trust in the data, which ultimately results in trust in the individual or team that is providing the reporting.

Sharing Data

How you present your data will determine your success in reaching your intended audience. As discussed in earlier chapters, one of the keys to achieving sustainable quality is to consistently keep everyone informed on your progress. Your best chance of keeping everyone updated must be as painless and hassle-free as possible. So, consider the options:

1. You can send out a weekly report through e-mail.

2. You can set up a page on an internal collaboration tool, where you post your latest results.

3. You can create a quality dashboard that is updated real time.

I was fortunate enough to manage a top-notch analytics team. They were able to do some amazing things with data that originally started out as a hodge-podge of disparate systems. It required a significant data cleansing effort before we had trustworthy data. We set up a central quality dashboard that everyone could access and see the latest quality data from across the company. It housed data from all the functional groups and all products. There were a few key benefits to having this centralized view of quality.

Created a common view across product lines. Prior to the central dashboard, each product line presented their data in a slightly and sometimes conflicting manner. The dashboard allowed us to have a standard look and feel, as well as common rules for calculations and formulas.

We built a central repository (data warehouse) where data from different sources were combined and cleaned, as mentioned above.

Once charts/graphs were designed, they created live feeds from the data warehouse. This provided (almost) live feeds to the charts. So, you were always seeing the latest data.

The dashboard had easy access from anywhere. So, executives that may be traveling could easily see the latest quality reporting on their phones.

There was central control for updates and enhancements. Because the use of the dashboard continued to grow across the company, we had to put a request system in place to handle all the requests for changes and new reports. We eventually started publishing a dashboard roadmap, to keep everyone informed of when new features and releases would be coming out.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.18.228