CHAPTER 2

Essential Data Analytics 101 and Beyond

Foundation of Data Analytics

There is no doubt data is constantly growing, and we are ever-more surrounded by data in our daily lives. Data, however, no matter its type or volume, is pointless by itself. Without processing techniques and analysis, data is just a jumble of strings and numbers. In a kitchen full of the best raw materials, without the knowledge to cook, you cannot produce a meal. In business, without knowing how to analyze data, you’re just wasting storage space and driving no value. Data tools work the same way. In the kitchen, you can’t simply start using all the modern gadgets, hoping to create something to eat. You must know which gadget to use for which step. And investing in data analytics tools with no idea of what to use for a particular scenario will not produceany outcome.

When it comes to data, there is no universal slam dunk for a quick win. What kind of analytics works depends on factors such as volume, complexity, and structure of data. And, since we’re speaking of slam dunks, word used to describe a basketball shot, let’s consider a sports scenario. Sports analytics is an emerging trend, one that is estimated to grow to $3.44 billion by 2028.24 Players and coaches can use data analytics for everything from organizing better training routines to making better player selections. Player selection was highlighted in the acclaimed 2011 movie Moneyball, which was nominated for 73 awards and winning 19.25

My favorite line from that film is “Your goal shouldn’t be to buy players; your goal should be to buy wins. And in order to buy wins, you need to buy runs.” Beane, the movie’s leading character, saw opportunities that other baseball experts failed to notice.

This approach to data can be broken down as two main questions:

1. What is the desired outcome?

2. What is needed, in terms of data and techniques, to achieve the desired outcome?

Data analytics and empowerment help to break the patterns, to overcome conventional wisdom, and not to stick to old ways of solving a problem. Approaching a problem from an analytical perspective and balancing it with conventional wisdom is a key to succeeding in analytics.

The Lies We Tell About Data Analytics

Data analytics is a complex area, and its intricacy leads to several misconceptions and misinformation about it. Not everyone understands the behind-the-scenes workings of data analytics, and many assume there is some mystery or magic recipe for succeeding with data. Many are not aware of the different domains, steps involved in the process leading to knowledge gaps that hinder uniformity of understanding. One of the growing pains of data analytics is that the same words can mean different things to different people. In the following, I’ve collected a handful of lies or myths about data in order to debunk them right from the get-go.

Lie #1: Data analytics is easy once there is lot of good data. The word good data is very subjective and does not make any statement about acceptable quality of data. Collecting quality data remains one of the biggest challenges in data collection. Even if you assume that the unicorn of “good data” exists, how will you analyze every data set the company collects? It is not the volume of data collection that companies struggle with, but the quality of data. And determining which data is best suited for analyzing, and which will drive value, is the toughest part of data analytics. Therefore, starting with a business problem and a high-level understanding of your desired outcome is critical for success in data analytics.

Lie #2: No data means no data analytics. There is a common misconception that if you have no data, data analytics is not for your organization. But data analytics isn’t just about analyzing existing data; it’s about creating the data as well. If your organization has never collected data or retained it, it is high time to start the data collection process through pilot initiatives, customer feedback, and a move from paper to digital (e.g., via materials ordered, customers using the system). Data analytics could also mean identifying gaps, which you may need to fill with external data in order to get a holistic picture.

Lie #3: Once we implement data analytics and get insights, the entire organization will be data driven. People will adopt a new idea, concept, or technology only when they understand it, perceive it as useful, and want to use it. Data analytics is no different. How employees perceive the data analytics and technology surrounding them will influence if—and how effectively—they use the resulting data insights, which will in turn influence business value. Hence companies with the greatest tools and analytics solutions can still fail to gain any real business benefit. Fred Davis developed the Technology Acceptance Model (TAM) to demonstrate people’s perception of a technology or solution.26 The model has four different subsections: perceived ease of use, perceived usefulness, attitude, and intention to use. How these are interconnected thereby influences the adoption of technology. A quantitative approach with a self-completed survey can help fine-tune data analytics solutions to better serve the organization.27

Lie #4: Gaining data insights will be easy after data analytics implementation. If you have read any data-related content or attended training, you probably know that insights, outcomes, and value are overused buzzwords. Everyone talks about getting insights from data, but there is little-to-no discussion about how to even define insight. In my experience, insights is a fancy word that most people struggle to define. To make matters worse, the word is now used so widely that people are likely uncomfortable questioning its meaning for fear of looking ignorant and have misconception that others somehow know to get insights.

Let us consider a few ways to define insight:

A complex, deep, qualitative, unexpected, or relevant revelation.28

A fact that is evaluated through a mental model to inspire a psychological state of enlightenment.29

A eureka moment, a light bulb moment, or an “aha” moment.30

Transition from a state of not knowing how to solve a problem to a state of knowing how to solve it.31, 23

A by-product of an analysis (and often not an achievable goal of a predefined task or procedure).32

Although these definitions vary, they share some common elements, such as defining insight as something sudden, unexpected, close to state of mind, knowing what to do, which we were previously struggling. Data insights can be described in terms of seven characteristics: actionable, collaboratively refined, unexpected, confirmatory, spontaneous, trustworthy, and interconnecting.33

But the question should not be “What is an insight?” but rather “How can I gain data insight?”34 Because there is an intuitive, trial-and-error aspect to insights, tools that generate automated insights may help. Tableau, “Ask Data,” and Power BI Q&A are all steps in this direction. These features provide user with an ability to type a question in natural language (without any need for coding skill) to get results.

Lie #5: Data analytics requires a huge up-front investment. Advancements in technology have, in fact, brought down the cost of implementing data analytics. If you want to develop a cutting-edge, forward-thinking product with highly talented team, the analytics cost will be high. But you can lower the cost of most other projects by using public, cloud, and open-source technologies. You do not need your own group of data scientists to start a data analytics initiative. There are plenty of easy-to-use tools on the market that allow a plug-and-play approach. With these, you can start analyzing your data with minimum effort.

Lie #6: Data needs to be accurate and complete in order to start data analytics. There is no such thing as complete or perfect data. Missing values and some inaccuracies are a fact of life for any real-world data. Organizations cannot wait to start their analytics journey in hopes of fixing every possible data issue first. There are a few ways to handle missing data, among them imputation (substituting missing values with mean value, or machine learning approach of nearest k-neighbor), or simply ignoring missing data. For inaccurate data, identify the root cause of the problem and fix the issue to improve data quality and define an acceptable error rate.

Lie #7: Excel is in the past. Many people have a love–hate relationship with Excel. Some want Excel to go away forever, while others maintain it is still the best tool for the job. Regardless of your personal preference, Excel continues to evolve and is widely used. Excel remains the go-to tool for quick analysis and testing, for ruling out scenarios, and also when there is no access to other tools. People interpret data when there is a story associated with it, and Excel surely connects data to its audience to demonstrate a story—all with minimal effort. Excel is a flexible tool to analyze data, and not everyone is aware of its ability to use Power BI from within Excel. With features like Power Pivot, Power Query, Power View, and MS graph API, Excel is the Swiss Army knife of data play, regardless of your technical abilities with advanced tools. There have also been widespread requests to Microsoft to include Python as an official programming language for Excel. Until that happens, xlwings supports Python with Excel.35

Lie #8: Data analytics is the only thing I will need. Although many business functions can benefit from data and use data analytics, it is not a solution for everything. Do not be lured by the industry buzz; take the time to evaluate which areas of your business can benefit from data. It is important to understand the specific problem at hand and various options available to address the problem. Can your organization benefit from using robotic process automation (RPA) or intelligent automation for some scenarios? RPA, for example (as discussed in Chapter 9), can supplement data analytics. Data analytics is an evolving area with technical advancements occurring constantly. The definition of analytics expands accordingly, and it is important to watch out for latest trends and techniques.

Steps Involved in Data Analytics

Teams across the organization should have basic understanding of the steps involved in data analytics to grow to a data-centric company. The following are the high-level steps involved.

Start With a Problem Statement

Start with an objective by asking—and answering—what you want to get out of your analytics initiative. In my experience, the expectations of technical teams, business stakeholders, and leadership do not always align when it comes to analytics. The result is wasted time and no added value. Even your objective can be fine-tuned as you progress, but alignment on expectations is required to thrive in data analytics. How will you test your result and confirm for accuracy? What business problem are you solving? (One example of a business problem might be a declining number of users.) Do you have an idea of the data you need to solve this specific problem?

Collect the Data

How do you plan to source your data? Is the data internally available or does it come from an external source (e.g., Google Trends, social media application programming interfaces (APIs))? Will it consist of structured data or unstructured data (e.g., sentiment analysis to determine customer views about the product)? Do you need to experiment to create this data? How frequently will you collect data? (Not every problem requires real-time data). Data collection is an important starting point, and it is important to streamline the collection process along with automation. You cannot sustain a data initiative by collecting data manually at scale.

Clean the Data

There is overlap between data cleaning and its many synonyms, including data wrangling, data cooking, data cleaning, data preparation, and data transformation, which can lead to confusion. But whatever you call it, data cleaning is the heftiest step in the process, taking up 50 to 80 percent of total data analytics time.36 Regardless of your confidence in the efficiency of your data collection and quality checks, not all data collected is good. There are data anomalies, which will skew your data analysis if they are not handled during data cleaning.

Analyze the Data

(For details, see the following section “Quantitative and Qualitative Analytics.”)

Understand and Communicate the Results

This step does not mean coming up with pretty visualizations. Rather this is about understanding the results and what actions need to be taken to solve the business problem. How do you plan to communicate your findings to business stakeholders so that they are easy to comprehend? Whether you use a dashboard, a report, a presentation, scorecards, or a sketch on the wall, data storytelling (Chapter 7) is a key skill to help you communicate data analysis findings in an effective way.

And remember: always evaluate and modify your data analytics steps according to the business problem at hand and the data maturity of the organizations.

Quantitative Analytics and Qualitative Analytics

Using data is like creating a painting. You need various tools and techniques to paint in different mediums—one tool alone can’t do the job on its own. Similarly, a single method for analyzing data won’t allow you to explore all findings. There are several types of data analytics methods, and knowing them will help you choose the right method for a specific scenario.

At the highest level, analytics can be classified as quantitative and qualitative analytics. Qualitative analytics examines the interconnections in complex data sources, which cannot be expressed as numbers. It describes tracking problems in which this event seen here has some relationship to that event shown there.37 Owing to its subjective nature, qualitative analysis is less tangible than quantitative analysis, which is based on numbers. How comfortable are these shoes? The answer in a customer feedback survey might describe how comfortable they feel on your legs to how they feel after wearing them for long hours. These findings describe the quality of the product and cannot be expressed as numbers. Qualitative analysis stemmed from research initiatives of psychologists like Carl Rogers, who sought to explore and better understand behavioral aspects of humans (answer the why).38 It can be considered as the creative side of data analysis. Exploratory Data Analysis (EDA) is the investigation of data to find patterns, test assumptions, find relationships, and unearth a deeper understanding of the data.

Quantitative analysis is simpler than qualitative analysis as it is about qualifying actual numbers. It intends to filter out the noise and synthesize the relevant data into something that can be interpreted by humans.39 How many users purchased our product this month? How many users abandon the cart without making a purchase? All these actions are quantifiable as numbers, and their quantitative analysis helps achieve an objective understanding of a situation.

This can be further categorized. To do this, let’s compare various analytics types through the example of a coffee store. You own a chain of a few coffee stores in the United States, and you want to take advantage of the data your stores have been collecting for the past three years. This data has never been analyzed or used for any business decision making. You are looking for ways to identify the low-revenue stores, to understand why revenue is dropping at some stores, and to determine the actions you can take to improve revenue at underperforming stores.

Descriptive Analytics

Descriptive analytics is a simple, foundational and first level of analytics that aims to aggregate, demonstrate patterns in data, and explain what happened. For example, if you want to compare revenue at two of your coffee stores, descriptive analytics will be useful. Mean and median are examples of descriptive analytics, and visualizing these values could help identify trends in your organizational data. Descriptive analytics is a starting point that should be used with other advanced types of analytics to draw a holistic picture of your data.

Diagnostic Analytics

This important type of analysis answers the “Why,” which descriptive analytics cannot answer and can be considered as root cause analysis. It can show, for example, that coffee store A’s revenue increased due to significant influx of new customers just after a competitor store closed its doors. Methods such as data discovery, drill-down, and drill-through are used in diagnostic analytics.

Predictive Analytics

After learning what happened and why it happened, predictive analytics helps explain what will happen next. It is forecasting kind analysis that uses historical data along with advanced techniques like modeling and machine learning. Predictive analytics is best suited to organizations with the resources and expertise to implement statistics methods and interpret big data. Predictive analytics can create a variety of what-is scenarios. In our coffee store example, it can predict how store revenue will change when the price of a cup of coffee changes, or how many customers will stop or reduce buying coffee if that price increases.

Prescriptive Analytics

The word prescriptive is derived from prescription, which is a health care word. How does a doctor write a prescription for a medicine? Based on research and experimentation during clinical trials, doctors establish what medicine will be effective to treat a certain illness. Similarly, prescriptive analytics identifies the actions to be taken to improve a business outcome and actions can be incremental. The often-incremental nature of these actions adds another layer of complexity to the task of determining the effectiveness of an individual action. For example, if customers stop or reduce their coffee purchases with price increase, prescriptive analytics can help determine what actions can be taken to increase store revenue. But effective prescriptive analytics depends on the availability of high-quality data, and organizations should determine if the benefits of using prescriptive analytics are worth the effort required to set up and maintain it. It is one of the most complex and effort-intensive types of analytics we discussed.

The line between predictive and prescriptive analytics can be blurry and confusing. One of the simplest lightweight examples is A/B testing for a website or a product. What is A/B testing? A/B testing is a straightforward experiment in which two versions of a product or feature are released to a split user base to determine which version is preferred.40 Your aim is to predict which version is liked by the user. But let’s say your testing also determines which version leads to more customer orders, and you implement that version at scale, thereby increasing your revenue. This is one of the simplest use cases to show that it is not always easy to differentiate between predictive and prescriptive analytics. A balanced mix of different analytics types is required to achieve the best data-driven discoveries and better decision making.

Facets of Analytics

The word analytics has a wide usage and covers many different aspects. There is lot of confusion around the various analytics branches and how they are used. Here are some of the most common analytics branches:

Data Analytics—Analyze raw data using mathematical and statistical methods to uncover trends. For example, sales at a coffee store over six months.

Business Analytics—Use of data to understand business performance and make strategic business decisions. For example, to increase sales at a coffee store.

Marketing Analytics—Data related to all types of marketing (e-mail, events) not just web. For example, judge the effectiveness of a marketing campaign or personalize a brand.

Web Analytics—Analytics related to web/online-related data. For example, page views per visit.

Operational Analytics—It is a category of Business Analytics. It puts data to work in business processes, for example, more efficient processing of patient claims to improve patient experience.

Embedded Analytics—As the word embed suggests, embedded analytics is about integrating analytics capability with applications like customer relationship management (CRM) and enterprise resource planning (ERP). For example, Netflix embeds recommendations into user profiles.

Edge Analytics—Relatively new branch related to the IoT, with the capability to, for example, process sensor data from a machine in real time to notify the user and physician about an issue or monitor health.

Augmented Analytics—Considered the future of analytics, it employs the use of machine learning and natural language processing to automate analysis processes normally done by a specialist or data scientist. For example, business intelligence tools like PowerBI have augmented analytics to provide automated insights.41

Next Steps

The best way to learn is to get your hands dirty. Pick a data set from your organization and try the various analytics methods described in this chapter. Even if you do not have much organizational data yet to experiment with, you can start your data analytics journey with an open-source data set like World Bank Open Data, WHO, and many others. Try finding your insight or aha moment with the data. Team up with someone to compare their analysis to yours. How can you build synergies with others to perform data analysis collaboratively? What kind of team will bring out the best analysis of your data? All this is discussed in detail in Chapter 3, which addresses the composition of your analytics team.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.86.121