Omar Abdon and Randy Shi
In this chapter we describe the basics of data analytics and the migration to Big Data analytics when massive data volumes overwhelmed traditional methods of analysis, and detail the four process steps and the tools used in Big Data analytics. We demonstrate why Big Data is essential to manufacturing organizations of all sizes, and finally we demonstrate how Big Data analytics is helping small to midsize enterprises (SMEs), including best practices and affordable tools.
Data are facts and statistics collected together for reference or analysis. People have been analyzing data for thousands of years, but the Industrial Revolution (Industry 1.0) greatly increased available data for analysis, with the invention of the printing press making mass literacy possible.
The growth of data continued under Industry 2.0, with telegraph, telephone, and faster travel using trains and automobiles. The first uses of data analytics in business dates back to the turn to the early twentieth century, when Frederick Winslow Taylor initiated his time management exercises. Henry Ford's measuring the speed of his assembly lines is another early example.
Industry 3.0 accelerated the growth further with computers, software applications, the Internet, smart machines, barcoding, robots, and so on. Computers were key in the evolution of data analytics, as they were embraced as trusted decision-making support systems. Industry 3.0 created so much data that it outgrew traditional data analysis, fostering the introduction of Big Data analytics. Data warehouses, the Cloud, and a wide variety of software tools have accelerated the growth of Big Data over the past 20 years.
The first references to Big Data can be found in 2003 by The Data Center Program created by the Massachusetts Institute of Technology (MIT). Prior to this, the phrase “data analytics” was often employed as a crucial description in early research conducted in the late 1990s. It has become essential to clarify the words Big Data and predictive analytics to avoid confusion.1
To begin, it is important to understand the difference between analysis and analytics.
Next, it is important to understand the difference between structured data and unstructured data, especially given the rapid growth in unstructured data.
Structured data usually resides in relational database management systems (RDBMSs). Examples include Social Security numbers, phone numbers, ZIP codes, sales orders, purchase orders, customer and supplier masters, item numbers in a bill of material, and so on. Relational databases are more than just structured data. The structure is supposed to reflect the relationships between the data as well. The columns, tables, rows, sheets, tabs, and so on of 2D, 3D, or 4D relational databases are supposed to permit the finding of subtle or hidden facts embedded in the data, as well as to make sorting, generating reports, and so on much easier. A program evaluation and review technique (PERT)-style schedule is an example of a relational database when in the format of a tool like Microsoft Project. Not only is data such as start and stop dates included, but also information such as which events are dependent on past events, what kicks off what, and so on. Then a number of other bits of data can be included, like costs, who is responsible, and so on.
Structured data may be generated by people or by machines, as long as it created within an RDBMS structure. This format makes it easy to search, either with human-generated queries or by searching types of data and field names.3
While structured data is easy to search, maintaining its accuracy can be a challenge to anyone using material planning systems (MRPs), enterprise planning systems (ERPs), and inventory/logistics systems. In an age of continuing mergers and acquisitions there is an ongoing merging of item, customer, and supplier masters, resulting in the same items, customers, and suppliers being easily duplicated and not always easily discovered through queries or even using specially designed tools.
My favorite story about part or item number duplication comes from the time I was managing the supply chain for a facility making dental X-rays and operating tables. One day my production control manager advised that production had halted due to a shortage of simple twist ties used to secure wire harnesses. Walking the floor, one of the plant's more senior production workers told me that there were plenty of twist ties in-house but they were stocked under four different item numbers – one for each of the small companies they had acquired over the years. The engineering department had never bothered to update the item masters to show the equivalency or to merge numbers. Ironically, the largest inventory was stocked under an item number designated as excess inventory and being written off by our KPMG auditors.
Another problem in using structured data is in maintaining accurate promise or due dates for sales, purchase, and work orders. I found this to be a chronic issue in both large and smaller organizations over the years. Past-due orders make it very difficult to execute production and meet customer commitment dates.
Unstructured data is basically everything else. While unstructured data has an internal structure, it is not structured using predefined schema or data models. It may be textual or nontextual, machine- or human-generated. Typical sources of machine- and human-generated data include:
Large amounts of unstructured data are challenging to define. Data sets are increasing exponentially in size, becoming too vast, too raw, or too unstructured for analysis using traditional relational database methods. This is typically what is considered Big Data.
According to some projections, the quantity of accessible data is expected to double every two years.4 Data is coming from various sources, including conventional sources like industrial equipment, automobiles, electricity meters, and shipping containers, to name a few. Data is also coming from a variety of newer Smart Technology sources such as the Industrial Internet of Things (IIoT) and computer vision. Smart Manufacturing generates Big Data to measure location, movement, vibration, temperature, humidity, electric current, and vehicle identifications, to name a few.
Exhibit 7.1 shows the exponential growth in unstructured data from 2010 to 2025 in zettabytes.5 (A zettabyte is a measure of storage capacity, equal to 1,0007 (1,000,000,000,000,000,000,000 bytes). One zettabyte is equal to a thousand exabytes, a billion terabytes, or a trillion gigabytes.)6
Binny Vyas, writing in Softweb Solutions, details the significant data challenges manufacturers face.7
Vyas goes on to describe how Big Data analytics can meet these challenges and the many benefits it brings to manufacturers.
There are four major levels of analytics: descriptive, diagnostic, predictive, and prescriptive. The value increases from the first to fourth level with a corresponding increase in the effort required to use it.
Exhibit 7.2 shows the four levels of data analytics, the value, and the effort required.8
Descriptive analytics is a frequently used data analysis methodology that utilizes historical data collection, organization, and display to convey information clearly and concisely. In contrast to other analysis techniques, descriptive analytics focuses on what has already occurred in an organization and is not used to make conclusions or predictions from its results. In other words, descriptive analytics is a fundamental building block to inform or prepare data for further analysis.
The simplest way to employ data analytics is to use straightforward mathematics and statistical techniques, such as arithmetic, averages, and percent changes. Using visual tools like line graphs and pie and bar charts, it is possible to communicate results so that a large corporate audience can understand them quickly.
How does descriptive analytics work? The two most often used techniques in descriptive analytics are data aggregation and data mining (also known as data discovery). Data aggregation is the act of gathering and organizing data in order to generate manageable data sets. During the data mining phase, patterns, trends, and meaning are found in the data, and then this information is rendered in an intelligible manner.
Descriptive analytics are broken down into five main phases.9
Descriptive analytics is widely utilized within organizations daily. Companies utilize descriptive analytics such as data on inventory, workflow, sales, and revenue to analyze their past operations. Reports of this type make it easy to get a comprehensive picture of an organization's activities.
The use of social analytics is usually an example of descriptive analytics, according to the DeZyre online learning platform.10 One way of understanding descriptive analytics is by studying what people share on social media such as Facebook and Instagram. This study would typically include capturing the number of followers, likes, dislikes, posts, reply posts, and so on.
The primary function of descriptive analytics is to collect surface data and perform a limited analysis of it. Further research and insights drawn from the data are not applicable to prediction and inference. Therefore, descriptive analytics cannot utilize them for either prediction or inference. What this approach is capable of revealing, however, are patterns and significance when data from different time periods are compared.
Since descriptive analysis depends solely on historical data and simple computations, it may be performed simply and on a daily basis. Its application does not necessarily require considerable analytical skills. This implies that companies may report on performance reasonably fast and simply and obtain insight into improvements. In general, descriptive analytics is restricted analytics that usually cannot go beyond the data surface.
After learning what happened, the next stage is to understand the reason for what happened. Diagnostic analytics can be challenging because of the need for domain knowledge. To be successful, the analyst must understand a business at a detailed level, including its processes, regulations, policies, target markets, and so on.11
The analyst is like a sleuth. For example, a grocery store experiences a large drop in vanilla ice cream sales in June. This fact was discovered using descriptive analytics. The next stage is for the analyst to investigate why this happened. The drop may have been caused by supply shortages or could have been caused by recent news reports of vanilla ice cream recalls over product contamination. This example demonstrates the hypothesis testing and the domain expertise required to understand why the drop in sales happened.12
Performing diagnostic analytics is especially challenging in larger organizations because departments often function in silos where data sharing is not the norm. This is when good interview skills and techniques will be very helpful. For example, imagine that a doctor examines a patient and just makes an observation that the patient is sick and leaves the room. Of course, good doctors use diagnostic analysis to determine the cause of a sickness. Data analytics works much the same way: the analyst makes an observation, identifies the descriptive analysis, and then moves forward to the diagnosis.13
Using diagnostic tools permits an organization to get the most out of its data by translating complex data into visualizations and insights that anyone in an organization can benefit from. Diagnostic analytics helps gain value from data by asking the right questions and then doing a deep analysis to obtain the answers.
As the name indicates, predictive analytics looks into the future and attempts to anticipate and understand what could happen. Analyzing previous data patterns and trends using historical data and consumer insights may anticipate what will happen in the future and, therefore, inform many business areas, such as creating realistic objectives, effective planning, managing performance expectations, and avoiding risks.
Probability-based predictive analytics is part of the predictive analytics discipline. Predictive analytics seeks to predict future outcomes and the likelihood of those events by employing various techniques, including data mining, statistical modeling (mathematical relationships between variables to predict outcomes), and machine learning algorithms (classification, regression, and clustering techniques). Predictions are made by machine learning algorithms, for example, by attempting to anticipate missing data with the best feasible estimates based on known data. A new type of machine learning, deep learning, resembles the design of human brain networks. Applications for deep learning range from social and environmental analysis for credit scoring to automated processing of digital medical X-rays to anticipate medical diagnoses for clinicians.
Using predictive analytics, organizations are empowered to act more proactively by using data-driven decisions and strategies. Predictive analytics may be used by businesses to do forecasting and trend analysis for various purposes, including predicting consumer behavior and purchasing patterns and detecting sales trends. Just as forecasts help predict such things as supply chain, operations, and inventory demands, predictions may help anticipate those issues as well.
Although predictive analysis cannot achieve 100% accuracy, it can serve as a valuable forecasting and business strategy tool. Many aspects of a business can benefit from predictive analytics, including:
Descriptive analytics explains what has happened, diagnostic analytics explains why it happened, predictive analytics explains what may happen, prescriptive analytics explains what should be done in a given circumstance. With prescriptive analytics an organization has the information it needs to take action.
Prescriptive analytics uses what was learned in descriptive, diagnostic, and predictive analytics to recommend the best potential courses of action. A high level of specialized analytics expertise is required to be successful. Prescriptive analytics uses artificial intelligence (AI), and specifically machine learning, which incorporates models and algorithms that allow computers to make decisions based on statistical data relationships and patterns.14
Prescriptive analytics systems are powerful and complex, requiring close monitoring and maintenance. There are especially sensitive to data quality issues such as incorrect or missing data, which can lead to false predictions, or inflexible predictions that are poor at handling data changes.15 You should implement data quality standards and keep an eye on the models’ predictions. Because of its complexity, it is not used by many manufacturing organizations, especially SMEs. It definitely requires the help of data scientists.
Examples of popular uses for prescriptive analytics include the following:
Increasingly, businesses are turning to data to unearth insights that may help them develop business strategies, make choices, and provide customers with better goods, services, and personalized online experiences. However, when considering the four different descriptive, diagnostic, predictive, and prescriptive analytics techniques, these methodologies’ potential utility is enormous, even though business analytics is a broad field. When utilized in conjunction with one another, these various ways of analysis are very complimentary and vital to the success and survival of any business.
A new phase in the Industrial Revolution, referred to as Industry 4.0, is characterized by a strong emphasis on interconnection, automatization, machine learning, and real-time data. In the context of Industry 4.0, which includes the Internet of Things (IoT), and Smart Manufacturing, physical production and operations are combined with smart digital technology, machine learning, and Big Data to create a more holistic and better-connected ecosystem for companies that focus on manufacturing and supply chain management, among other things. Every organization in the modern world faces a unique set of challenges. Still, they all share a fundamental requirement: the need for connectivity and access to real-time information across processes, partners, products, and people across all industries.
The following areas are some of the critical factors that will dictate the success of Big Data analytics in supporting Industry 4.0 and Smart Manufacturing.
According to Nick Piette, director of product marketing and API services at Talend, one of the upcoming trends in Big Data analytics is leveraging the data to improve customer experiences. He also believes that adopting a Cloud-first attitude will be beneficial. Piette says, “More and more brand interactions are happening through digital services, so it's paramount that companies find ways to improve updates and deliver new products and services faster than they ever have before.”18
Monfried explains, “In 20 years, big data analytics will likely be so pervasive throughout business that it will no longer be the domain of specialists. Every manager, and many nonmanagerial employees, will be assumed to be competent in working with big data, just as most knowledge workers today are assumed to know spreadsheets and PowerPoint. Analysis of large data sets will be a prerequisite to almost every business decision, much as a simple cost/benefit analysis is today.”19
He then ties that prediction to Big Data technologies that work in the Cloud. “This doesn't mean, however, that everyone will have to become a data scientist. Self-service tools will make big data analysis broadly accessible. Managers will use simplified, spreadsheet-like interfaces to tap into the computing power of the Cloud and run advanced analytics from any device.”20
Great cooks always have handy knives. Letting different tools handle different tasks is extremely important in the data science pipeline as well. The data pipeline procedures include data collection, data process, and data analytics. This section walks you through the dominant data tools in the market and illustrates how each tool is used in the pipeline. In addition, this section also contains sample codes to facilitate your reading.
Data pipelines can be described as moving data from one source to another so it can be stored, used for analytics, or combined with other data. Specifically, it is the process to consume, prepare, transform, and, finally, enrich unstructured, structured, and semistructured data in a controlled manner. This section discusses what the pipeline consists of and some details that one should be watching for during the process.
Collection starts with using different methodologies to grab data and ends with storing them in a data warehouse cleverly. Most analysts don't need to spend too much energy in the data collection process. In most cases, the analysts are the users of the collected data. However, in the managerial level of analytics, data collection/warehousing is a must-know procedure.
A proper data collection process requires a clear business goal. Is our future data for internal use or external use? Which parts of the data are we collecting in a business process? Are we observing the operational data or the production data? What are we trying to achieve with the data we will collect? Are there other data we should acquire to supplement the collected data? Many examples tell the stories of failure when these questions are not properly addressed:
One of my clients uses cameras to gather their business customers’ data. Despite the innovative method that brings significant monitor values to their business customers’ operation, the data itself cannot capture the production results. In another words, when the analytics tell a story about improved operation, their business customers cannot easily find its effect on their production. Now, my client will need to find additional means to collect the production data.
On the other hand, there's always a tradeoff in precision and generics in the design of data warehouses, where collected data is stored. What are the collection sources of your data? What transformation do you need to do before putting those data into a datamart? Do you expect a large volume of data with different build-ups in the future? Without considering these questions, your data management will be chaotic. Data warehouse breakdowns are more likely to happen. Communication of data problems will be more confusing. Data insights will be more difficult to discover. Finally, when you want to renovate your data warehouse to improve efficiency, it is already too big, too complicated, and too costly for you to change.
The phrase “garbage in, garbage out” describes how flawed data produces flawed results. Unfortunately, even with a rigorous processing of data collection, the data is never clean. Thus, data cleaning is inevitable in order to convert flawed data into usable data. Common areas needing cleaning include dealing with null values, merging different data sets, and unifying timestamps.
Analytical output is the final product presented to the users, who can be managers, stakeholders, customers, and others. In terms of dashboard design, maximizing the users’ reading efficiency, the simplicity of the content is crucial. Removing distractions allows viewers to focus on the essentials more clearly. Standardization means unifying title names, font styles, font sizes, and so on. The analytical products should look professional and clear. Cleaning and standardization not only play a significant role in the design phase but also help analysts across different departments to communicate and collaborate. The color palette is usually a neglected area for standardization. Assigning different colors to the elements in charts is one of the most effective methods to improve usability.
There are significant benefits for SMEs to adopt Big Data analytics. Some of the good reasons to make the investment are:
Large manufacturing and distribution organizations are very adept at using enterprise-level analytics tools. These tools are essential for them to remain viable in today's global marketplace. While SMEs are not candidates for the highest-end platforms, there are good options available to them as well. SMEs will also need to use more affordable and easy-to-use Big Data tools for them to remain viable in today's marketplace. A few examples of suitable tools include:
There are many data sources and data types available for SMEs. They come in a wide variety of conditions, from poor to high quality levels. For these reasons, data can be misused and misunderstood, which can lead to major problems if not rectified. Michael Keenan, writing in Cyfe, describes some major issues SMEs face in gaining value from Big Data analytics.26
A problem prevalent in small organizations is the tendency to focus on vanity metrics such as Instagram followers and Facebook Likes. The best advice in measuring metrics is to focus on the very few key performance metrics, especially forward-looking metrics that help predict future events. For example, the size of the sales pipeline when compared over time is more predictive than measuring bookings or shipments over time.29
There are some commonsense best practices in data analytics that have proved their value to SMEs over the years to expose critical business trends. Trend analysis quantifies and explains trends and patterns in a data set over time. A trend is an upward or downward shift in a data set over time. A trend is valuable, as it quantifies and explains data patterns over time. Armed with trend analysis, an organization can take actions to support good trends and address bad ones.
Michael Keenen provides a good list of proven best practices.30
Data analytics is a critical component of Smart Manufacturing and Industry 4.0 for a very simple reason. All the new technology creates massive amounts of structured and unstructured data from many different sources. The large volumes, inconsistent quality, and disparate nature of all this new data is far beyond the scope of traditional data analytics. Big Data analytics has proven its utility in analyzing data for large and small manufacturing organizations alike. While most of the powerful data analytics tool require the help of talented data scientists, SMEs can enjoy the benefits of Big Data by availing themselves of a variety of affordable and easy-to-use analytic tools to lower operating costs, increase market share, and reduce quality issues.
18.223.134.29