Chapter 1

The Business Demand for Data, Information, and Analytics

Abstract

In the business world, knowledge is not just power. It is the lifeblood of a thriving enterprise. Knowledge comes from information, and that, in turn, comes from data. Many enterprises are overwhelmed by the deluge of data, which they are receiving from all directions. They are wondering if they can handle Big Data—with its expanding volume, variety, and velocity. There is a big difference between raw data, which by itself is not useful, and actionable information, which business people can use with confidence to make decisions. Data must to be transformed to make it clean, consistent, conformed, current, and comprehensive—the five Cs of data. It is up to a Business Intelligence (BI) team to gather and manage the data to empower the company’s business groups with the information they need to gain knowledge—knowledge that helps them make informed decisions about every step the company takes. While there are attempts to circumvent or replace BI with operational systems, there really is no good substitute for true BI. Operational systems may excel at data capture, but BI excels at information analysis.

Keywords

Big Data; Data; Data 5 Cs; Data capture; Data variety; Data velocity; Data volume; Information; Information analysis; Operational BI
Information in This Chapter
• The data and information deluge
• The analytics deluge
• Data versus actionable information
• Data capture versus information analysis
• The five Cs of data
• Common terminology

Just One Word: Data

“I just want to say one word to you. Just one word… Are you listening? … Plastics. There’s a great future in plastics.”

Mr. McGuire in the 1967 movie The Graduate.

The Mr. McGuires of the world are no longer advising newly-minted graduates to get into plastics. But perhaps they should be recommending data. In today’s digital world data is the key, the ticket, and the Holy Grail all rolled into one.
I do not just mean it’s growing in importance as a profession, although it is a great field to get into, and I’m thrilled that my sons Jake and Josh are pursuing careers in data and technology. Data is where the dollars are when it comes to company budgets. Every few years there is another report showing that business intelligence (BI) is at or near the top of the chief information officer’s (CIO) list of priorities.
Enterprises today are driven by data, or, to be more precise, information that is gleaned from data. It sheds light on what is unknown, it reduces uncertainty, and it turns decision-making from an art to a science.
But whether it’s Big Data or just plain old data, it requires a lot of work before it is actually something useful. You would not want to eat a cup of flour, but baked into a cake with butter, eggs, and sugar for the right amount of time at the right temperature it is transformed into something delicious. Likewise, raw data is unpalatable to the business person who needs it to make decisions. It is inconsistent, incomplete, outdated, unformatted, and riddled with errors. Raw data needs integration, design, modeling, architecting, and other work before it can be transformed into consumable information.
This is where you need data integration to unify and massage the data, data warehousing to store and stage it, and BI to present it to decision-makers in an understandable way. It can be a long and complicated process, but there is a path; there are guidelines and best practices. As with many things that are hard to do, there are promised shortcuts and “silver bullets” that you need to learn to recognize before they trip you up.
It will take a lot more than just reading this book to make your project a success, but my hope is that it will help set you on the right path.

Welcome to the Data Deluge

In the business world, knowledge is not just power. It is the lifeblood of a thriving enterprise. Knowledge comes from information, and that, in turn, comes from data. It is up to a BI team to gather and manage the data to empower the company’s business groups with the information they need to gain knowledge—knowledge that helps them make informed decisions about every step the company takes.
Enterprises need this information to understand their operations, customers, competitors, suppliers, partners, employees, and stockholders. They need to learn about what is happening in the business, analyze their operations, react to internal and external pressures, and make decisions that will help them manage costs, grow revenues, and increase sales and profits. Forrester Research sums it up perfectly: “Data is the raw material of everything firms do, but too many have been treating it like waste material—something to deal with, something to report on, something that grows like bacteria in a petri dish. No more! Some say that data is the new oil—but we think that comparing data to oil is too limiting. Data is the new sun: it’s limitless and touches everything firms do. Data must flow fast and rich for your organization to serve customers better than your competitors can. Firms must invest heavily in building a next-generation customer data management capability to grow revenue and profits in the age of the customer. Data is an asset that even CFOs will realize should have a line on the balance sheet right alongside property, plant, and equipment” [1].
It can be a problem, however, when there is more data than an enterprise can handle. They collect massive amounts of data every day internally and externally as they interact with customers, partners, and suppliers. They research and track information on their competitors and the marketplace. They put tracking codes on their websites so they can learn exactly how many visitors they get and where they came from. They store and track information required by government regulations and industry initiatives. Now there is the Internet of Things (IoT), with sensors embedded in physical objects such as pacemakers, thermostats, and dog collars where they collect data. It is a deluge of data (Figure 1.1).

Data Volume, Variety, and Velocity

It is not only that enterprises accumulate data in ever-increasing volumes, the variety and velocity of data is also increasing. Although the emerging “Big Data” databases can cause an enterprise’s ability to gather data to explode, the volume, velocity, and variety are all expanding no matter how “big” or “small” the data is.
Volume—According to many experts, 90% of the data in the world today was created in the last two years alone. When you hear that statistic you might think that it is coming from all the chatter on social media, but data is being generated by all manner of activities. For just one example, think about the emergence of radio frequency identification (RFID) to track products from manufacturing to purchase. It is a huge category of data that simply did not exist before. Although not all of the data gathered is significant for an enterprise, it still leaves a massive amount of data with which to deal.
Velocity—Much of the data now is time sensitive, and there is greater pressure to decrease the time between when it is captured and when it is used for reporting. We now depend on the speed of some of this data. It is extremely helpful to receive an immediate notification from your bank, for example, when a fraudulent transaction is detected, enabling you to cancel your credit card immediately. Businesses across industry sectors are using current data when interacting with their customers, prospects, suppliers, partners, employees, and other stakeholders.
Variety—The sources of data continue to expand. Receiving data from disparate sources further complicates things. Unstructured data, such as audio, video, and social media, and semistructured data like XML and RSS feeds must be handled differently from traditional structured data. The CIO of the past thought phones were just for talking, not something that collected data. He also thought Twitter was something that birds did. Now that an enterprise can collect data from tweets about its products, how does it handle that data and then what does it do with it? Also, what does it do with the invaluable data that business people create in spreadsheets and Microsoft Word documents and use in decision-making? Formerly, CIOs just had to worry about collecting and analyzing data from back office applications, but now their data can come from people, machines, processes, and applications spread across the world.
image
FIGURE 1.1 Too much information. www.CartoonStock.com.
Unfortunately, enterprises have not been as good at organizing and understanding the data as they have been at gathering it. Data has no value unless you can understand what you have, analyze it, and then act on the insights from the analysis.
See the book’s companion Website www.BIguidebook.com for links to industry research, templates, and other materials to help you learn more about business intelligence and make your next project a success.
To receive updates on newly posted material, subscribe to the email list on the Website or follow the RSS feed of my blog at www.datadoghouse.com.

Taming the Analytics Deluge

With this flood of data comes a flood of analytics. Sure, enterprises are adept at gathering all sorts of data about their customers, prospects, internal business processes, suppliers, partners, and competitors. Capturing data, however, is just the beginning. Many enterprises have become overwhelmed by the information deluge, and either cannot effectively analyze it or cannot get information that is current enough to act on.
This is a massive, potential headache for CIOs. According to the 2014 State of the CIO Survey [2], leveraging data and analytics is the most important technology initiative for 2014, with 72% of CIOs surveyed stating that it is a critical or high priority. Gartner concurs, with the prediction that BI and analytics will remain a top focus for CIOs through 2017, and that the benefits of fact-based decision-making are clear to business managers in a broad range of disciplines, including marketing, sales, supply chain management (SCM), manufacturing, engineering, risk management, finance, and HR [3].
Adding to the complexity, now many more people in an organization need the information that comes from all this data. Where in the past only a few managers received information to analyze, now business people at all levels are using analytics in their jobs. For example:
• Shipping data extends far beyond the shipping department to include outside shipping carriers and customers.
• Website analytics are no longer just the domain of webmasters looking at “hits”—marketing managers use them to measure the success of sales and social media campaigns.
• Medical information is shared not just with doctors, but also with hospital networks, patients, and insurance companies.
• TV streaming services compile data on everything we watch and use that to recommend other movies and shows it thinks we would like.
All of this information has had a tremendous impact on businesses’ ability to make informed decisions. According to a study conducted by The Economist [4], the flood of information and analytics has had these effects:
• The majority of companies (58%) claim they will make a bigger investment in Big Data over the next 3 years.
• Two-thirds of executives consider that their organizations are “data driven.”
• Over half (54%) say that management decisions based purely on intuition or experience are increasingly regarded as suspect.
• The majority of executives (58%) rely on unstructured data analysis including text, voice messages, images, and video content.
• Although 42% of executives say that data analysis has slowed down decision-making, the vast majority (85%) believe that the growing volume of data is not the main challenge, but rather being able to analyze and act on it in real time.
• As organizations increasingly look to the output from analytics to automate decision making, data quality is seen as a major hurdle.

The Importance of Analytics

Businesses cannot underestimate the importance of their analytics initiatives. While enterprises still need leaders and decision-makers with intuition, they depend on data to validate their intuitions. In this sense, data becomes a strategic guide that helps executives see patterns they might not otherwise notice. A study from Bain found that enterprises with the most advanced analytics capabilities outperformed competitors by wide margins, with the leaders showing these results [5]:
• Twice as likely to be in the top quartile of financial performance within their industries
• Five times as likely to make decisions much faster than market peers
• Three times as likely to execute decisions as intended
• Twice as likely to use data very frequently when making decisions.

Analytics Challenges

The use of analytics is not just growing in volume; it is also growing more complex. Advanced analytics is expanding to include predictive analytics, data visualization, and data discovery.
Analytics is not just about numbers; it is about brainpower. More companies are realizing they need to hire a new class of data-savvy people to develop complicated analytics models; these people are often referred to as data scientists. A McKinsey report stated that “By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of Big Data to make effective decisions”[6]. Are companies worried about this? You bet! According to the 2014 Gartner CIO Agenda Report, “Fifty-one percent of CIOs are concerned that the digital torrent is coming faster than they can cope, and 42% don’t feel they have the right skills and capabilities in place to face this future”[7].
Our industry has faced a shortage of skilled people in business intelligence, analytics, and data integration that has kept business from effectively using the data they already have. With the onslaught of Big Data and the advanced skills it requires, it is important that more people learn how to work with these advanced analytics solutions.

Analytics Strategy

Business analytics is indispensable for analyzing this data and using it to make informed business decisions. A Forrester report [8] highlights some of the reasons why BI analytics is so critical:
• Many business decisions remain based on intuitive hunches, not facts
• Analytics lessens the discontinuity between intuition and factual decision-making
• Competitive differentiation can be achieved by more sophisticated data usage
• Big Data enables new use cases but will require analytics to take full advantage of its potential.
To make the most of the power of analytics, an enterprise needs a strategy based on how its business people interact with and use data. Chapter 15 will cover this topic in detail, but to sum up, an analytics strategy may include:
• Designing a data architecture that enables reporting, analytics, predictive modeling, and self-service BI
• Architecting a BI portfolio
• Architecting solutions with data discovery, data visualization, and in-memory BI
• Enabling operational and analytical BI
• Designing and implementing analytical sandboxes and hubs
• Creating data and analytical governance programs
• Creating shared BI metadata environments.

Too Much Data, Too Little Information

As the sailor in The Rime of the Ancient Mariner said, “Water, water, everywhere, nor any drop to drink.” How about “Data, data everywhere, but not any of the right information I need to do my job?” Businesses may be awash with data, but that does not necessarily mean they have useful information (Figure 1.2).
My son Jake’s first job out of college was at a firm that analyzed large volumes of data for resume parsing and job matching. His boss voiced a similar lament when he said “What’s the use of Big Data if it’s just a lot of data? We need information we can analyze, not just a big pile of data.”

The Difference between Data and Information

There is a big difference between data and information, although the terms are often used interchangeably. Data is raw, random, and unorganized. Information is data that has been organized, structured, and processed. Information is what you use to gain knowledge. We all eat, so let us look at how a foodie would approach these concepts:
Data: a collection of ingredients sitting on the counter. They include carrots, onions, leeks, garlic, and potatoes from the farmer’s market, and a package of chicken, a box of rice, and some cans of broth from the grocery store. In the data warehousing (DW)/BI world, this is like source data from different operational systems.
image
FIGURE 1.2 No water to drink; no information to consume.
Information: then you get everything ready by washing, peeling, and cutting up the vegetables, cutting up the chicken, and opening the cans of broth. You put it all in the pot and turn on the heat where it cooks and becomes soup. In the DW/BI world, the data has been moved into the ETL (extract, transform, and load) system and is transformed into information.
Knowledge: Now the soup is ready to be put into bowls and eaten. In the DW/BI world business people consume the information in reports to gain knowledge that helps them make informed business decisions.

The Role of BI in Creating Actionable Information

BI turns data into “actionable” information—information that is useful to the business and helps it gain knowledge. Business demand for actionable information is ever expanding. With business managers and workers seemingly connected 24/7 via their smartphones, tablets, and other devices, expectations are being raised even further for BI systems that can go beyond basic reporting and provide analytics capabilities at the speed of thought.
It is good that enterprises have recognized the significant business value of analyzing the surging amount of information and then acting on that analysis. It is a bad thing, however, when enterprises become so overwhelmed by the information deluge that they cannot effectively analyze it or receive current enough information on which to act.
Enterprises are in this position because many are still using the standard techniques and technologies that became mainstream in BI and DW before the current information deluge. Enterprises have been encountering significant increases in volume, variety, and velocity over the last several years as they have expanded from integrating data from internal operational systems to data interchange with customers, prospects, partners, suppliers, and other stakeholders.
Once the necessary data is located and evaluated, work often needs to be done to turn it into a clean, consistent, and comprehensive set of information that is ready to be analyzed.

The Information Backbone

No matter what the BI request, the right business information must be available. That does not mean that all you need to do is to grant a business person access to a database. Comprehensive, consistent, conformed, clean, and current data does not happen without a strategy to manage the information.
Once the information is in order, then business people can plug into the information backbone through many BI tools such as data discovery, data visualization, ad hoc query, dashboards, scorecards, OLAP analysis, predictive analytics, reports, and spreadsheets. An enterprise’s data demands are ever expanding and evolving—meaning that the information backbone is likewise expanding. This often requires data integration, data cleansing, data profiling, and, most importantly, data governance.

Data Capture versus Information Analysis

You know you need BI to arm your enterprise’s business people with knowledge, but there are some gray areas where BI might not seem like the optimal method. This is where many people become confused. Before explaining this, we will introduce the differences between BI and transaction processing systems, and data capture versus information analysis.

The Roles of BI and Operational Systems

To understand the role of a BI system versus a transaction processing system, start with data—there is a big difference between just capturing data and using it for analysis. Capturing data means converting or translating it to a digital form. For example, when you scan a printed bar code at the grocery store checkout, it captures data on the item’s price. When you use your smartphone to scan the QR code on a movie poster, it captures that data and sends you to a web video with a preview of the movie. When you use your phone to scan a check for online deposit, and then you key in the deposit amount, that information is captured and sent to the bank.
Captured data is input into operational systems. These are the systems that perform the day-to-day transactions of a business, such as deposits in a bank, sales in a store, and course registrations in a university. These are also called transaction processing systems, because it is where the enterprise processes its transactions.
Contrast this with business intelligence, which is the applications used for reporting, querying, and analytics. This category also includes data warehousing, which is the database backbone to support BI applications. A data warehouse is not the only data source used by BI, but it remains a key ingredient to an enterprise-wide solution providing clean, consistent, conformed, comprehensive, and current information, rather than yet another data silo.
Traditionally, operational systems had only limited reporting capabilities. This is understandable; they are built for transactional processing, after all. An enterprise’s data can be scattered across many different operational systems, making it very hard to gather and consolidate. In a big medical center, for instance, one system could process data related to patient accounts, another could be delegated to medical research data, and another used for human resources. The systems are built to process large amounts of data, and do it quickly.
The answer to the need for better reporting was BI—and it is still the answer. But there is also a middle ground, called operational BI, which causes a lot of confusion.

Operational BI Blurs the Lines

Operational BI (also sometimes called real-time BI) shifts queries and reporting to the operational systems themselves. On the positive side, this allows queries on real-time data and immediate results. On the negative side, it causes confusion. I’ve worked with clients where people did not understand the boundaries between data capture and analysis; they thought the same data could be used for both transaction processing and analysis. The reality is that data still must be structured for analysis—hence the conundrum of data capture versus information analysis.
It is dangerous when an enterprise considers operational BI as a panacea for its business information needs. Too often, the latest technology is seen as a solution that avoids all that tough, time-consuming, data integration stuff we have been doing for years. It is not a shortcut. IT and business still need to communicate and agree on the data definitions and data transformations for business information.

Where Data Warehousing Fits in

It is easy to see why the idea of simply accessing enterprise data where it is stored in the operational systems sounds so appealing. Operational BI does have many benefits, such as the simplicity of a single suite of BI tools for accessing, reporting, and analyzing data. But it is a big mistake to think it makes data warehousing obsolete. Not even close. Data warehousing is a necessary part of an enterprise’s information management strategy for many reasons.
• Operational data is different from analytical data in several ways:
Operational data is structured for efficiently processing and managing business transactions and interactions, whereas data in data warehouses is structured for business people to understand and analyze.
Operational systems live in the here and now, whereas data warehousing must support the past, present, and future. Operational systems record the business event as is, whereas data warehousing tracks changes in dimensions—products, customers, businesses, geopolitical, account structures, and organizational hierarchies—so that information can be examined as is, as was, and as will be.
Operational data typically contains a relatively short time span, whereas analytical data is historical. A business needs to perform period-over-period analysis or examine trending using historical data.
The data for many of the attributes that the business wants to analyze is neither needed nor available in an operational system.
• Operational data is spread out over many source systems, making it hard to bring together and analyze. The more sources you have, the more data integration you will need.
• Every enterprise, no matter how large or small, must perform data integration to ensure that its data is consistent, clean, and correct.
• There are many business algorithms used to transform data to information outside of operations systems. Finance, sales, marketing, and other business groups each must transform the data into the business context they need to perform their work.
• There are both enterprise-wide and business group-specific performance measures of key performance indicators (KPIs) that need to be derived outside of operational systems.
There have been numerous times when vendors proclaim that data warehousing is no longer needed. Over the years, we have heard them talking about middleware, virtual data warehouses, conformed data marts, enterprise information integration (EII), enterprise application integration (EAI), service oriented architectures (SOA), data virtualization, and real-time access from every generation of BI tools. In fact, it is a recurring theme. There is no “silver bullet” that helps an enterprise avoid the hard work of data integration. Information that is clean, comprehensive, consistent, conformed, and current is not a happenstance; it requires thought and work.
Whatever silver-bullet promises you may hear, the answer has nothing to do with connectivity, bandwidth, memory, or slick interfaces. The reality is that a lot of analysis is needed in order to make sense of the data scattered across silos both in and outside of enterprises. Operational systems lack many key attributes and do not support all the necessary business transformations to handle this analysis.
Enterprises need both operational and analytical BI. Chapter 5 will cover operational BI in more detail, and will include its benefits, risks, and guidelines for doing it right.

The Five Cs of Data

Before a BI/DW program can deliver actionable information to business people, it must whip the enterprise’s data into shape. Data that has been whipped into shape will be clean, consistent, conformed, current, and comprehensive—the five Cs of data.
Clean—dirty data can really muddy up a company’s attempt at real-time disclosure and puts the CFO at high risk when signing off on financial reports and even press releases based on incorrect information. Dirty data has missing items, invalid entries, and other problems that wreak havoc with automated data integration and data analysis. Customer and prospect data, for example, is notorious for being dirty. Most source data is dirty to some degree, which is why data profiling and cleansing are critical steps in data warehousing.
Consistent—there should be no arguments about whose version of the data is the correct one. Management meetings should never have to break down into arguments about whose number is correct when they really need to focus on how to improve customer satisfaction, increase sales, or improve profits. Business people using different hierarchies or calculations for metrics will argue regardless of how clean the transactional data is.
Conformed—the business needs to analyze the data across common, shareable dimensions if business people across the enterprise are to use the same information for their decision-making.
Current—the business needs to base decisions on whatever currency is necessary for that type of decision. In some cases, such as detecting credit card fraud, the data needs to be up to the minute.
Comprehensive—business people should have all the data they need to do their jobs—regardless of where the data came from and its level of granularity.
So much is riding on your data. No one can afford to be sloppy or wasteful in their BI and data integration strategies. Mistakes are expensive and it is highly embarrassing when a customer finds errors before you do. Businesses, now more than ever, need to understand who their current and potential customers are as well as how much revenue and profit each product or service line generates. This demands clean, consistent, conformed, current, and comprehensive data.
Do not be swayed by vendor sales pitches for quick-hit solutions that promise short cuts. Prepackaged analytics and corporate dashboards offer the allure of off-the-shelf solutions that appear to take almost no work and instantly provide the business with the answers to all their questions. If the solution looks too good to be true, step back and ask some questions. Read the fine print, such as the qualifiers that everything works fine only if all your data is clean, all the data is included in the supported sources, and your business analytics match the prebuilt models. In short, the prebuilt solution works if you already have all your data in place. These quick and dirty solutions often have the unintended consequence of creating more data silos.
It would be great if all it took was buying the right tool to provide your business with comprehensive, cleansed, consistent, conformed, and current business information from source data. You do need to buy tools, but it does not happen in minutes, hours, or days. It takes time and hard work.
The best situation for an enterprise is when, in addition to using clean, consistent, conformed, current, and comprehensive data, they are routinely using analytic-driven processes to manage and grow the business. Advanced analytics tools such as data visualization, predictive analytics, and data discovery often play a large role in this. The reality is that most enterprises are not in this state of “analytical nirvana,” but it should always be their goal.
Their path should include:
Getting serious about data governance. Data governance, both for data definitions and for business agreed-on metrics, is foundational because if data is not consistent, then slick visualizations do not matter. The old school term was GIGO (garbage in, garbage out). You do not have to make everything perfect for BI to be useful, but if business people spend their time debating the numbers or reconciling data (because they do not trust it), then foundational work must be done. It is not glamorous, but it is essential.
Getting serious about data integration. Most enterprises have a data integration backlog, but keep getting distracted by industry pundits and vendors that claim that this time, with the latest and greatest analytic tools, they do not need to integrate (gather, cleanse, standardize, conform, and transform from multiple business processes) data but can simply “point and click” to get their answers. It is not that easy.
Getting serious about spreadsheets. Spreadsheets are often used for reporting, but it is a problem when they are used as the data integration and transformation tool without any governance or architecture. If a spreadsheet is used to analyze the same consistent, comprehensive, clean, and current data as the dashboard, data discovery, and data visualization applications then it is a viable tool. Many business people gravitate to it since it is truly pervasive.

Common Terminology from our Perspective

The three core building blocks of a DW/BI program are data integration, DW, and BI. Data integration is the foundation of DW, which, in turn, is the foundation of BI. (See Chapter 18 to learn why an enterprise should set up a BI program, as opposed to tackling projects individually and tactically.)
See Figure 1.3 for a visual representation of how they interrelate.
Data integration—combining data from different sources and bringing it together to ultimately provide a unified view. Data integration and data shadow systems are often at opposite ends of the DW/BI spectrum. If an enterprise has inconsistent data, it is highly likely that it has a data integration problem. The components of data integration include the data sources; the processes to gather, consolidate, transform, cleanse, and aggregate data and metadata; standards; tools; and resources and skills.
Data warehousing—the process of storing and staging information, separate from an enterprise’s day-to-day transaction processing operations, and optimizing it for access and analysis in an enterprise. In this process, data flows from data producers to the data warehouse, where it is transformed into information for business consumers. It encompasses all the data transformations, cleansing, filtering, and aggregations necessary to provide an enterprise-wide view of the data.
As for reporting and decision-support systems, historically it was a centralized database. In the classic definition from Bill Inmon’s book Building the Data Warehouse it is:
• Integrated—data gathered and made consistent from one or more source systems
• Subject oriented—organized by data subject rather than by application
• Time variant—historical data is stored (Note: in the beginning, enterprise applications often only stored a limited amount of current and historical data online.)
• Nonvolatile—data did not get modified in the DW, it was read-only.
BI—to present data to business people so they can use it to gain knowledge. BI enables access and delivery of information to business users. It is the visible portion of the corporate data systems, as opposed to data warehousing, which is in the “back room.” BI is what business people see via tools and dashboards. The data comes from relational data sources or enterprise applications such as enterprise resource planning (ERP), customer resource management (CRM), and SCM. The source of the data can be a “black box” from the business person’s perspective; they mainly care about what it is, not where it came from.
image
FIGURE 1.3 How BI, DW and DI fit together.
A few of the key terms are depicted in Figure 1.4. Some enterprises will use data warehousing as the umbrella term for everything depicted in that diagram while others will use BI instead. As long as it is designed, developed, tested, and deployed either term is fine. Table 1.1 presents some of the key terms and the discipline to which they apply.
image
FIGURE 1.4 Categorizing BI, DW, and DI terminology.
Each building block includes a host of different technologies, many listed in Table 1.1. Although there is overlap, we have tried to group them by BI, DW, and data integration. This is not meant to be an all-inclusive list or definitions, but rather our perspective on some of the bigger terms. The rest of this book shows how these technologies are put into action and where they fit with one another. Becoming familiar with these terms now will help make the concepts easier to understand.
Note that much of what is covered here is technology. As you’ll read in Part VII, technology is the easy part of a DW/BI program. It is the people, process, and governance issues that are the hard part.

Table 1.1

Common Terminology

TermApplies to BI, DW, and/or DI
Ad hoc query—People use SQL to make ad hoc queries to a database when the need arises. This is the opposite of predefined queries, which are performed routinely and known ahead of time. Tools for ad hoc querying can help you manipulate data for analysis and report creation. Most business people, however, do not really need ad hoc querying; they do fine with interactive reporting and data discovery.BI
Analytics—The examination of information to uncover insights that give a business person the knowledge to make informed decisions. Analytics tools enable people to query and analyze information using data visualization to communicate findings in an easy-to-understand way. There are different analytical types: descriptive (what happened), diagnostic (why it happened), predictive (what is likely to happen) and prescriptive (what actions should be taken). Descriptive analytics is the most common and considered foundational or core. The others are labeled advanced analytics.BI
BI appliance—Bundled hardware and software aimed at making it easier and more cost-effective for enterprises to purchase, use and maintain their BI solution. Scalability and flexibility are key benefits. There is a wide variety of architectures used in appliances, so a formal evaluation and proof of concept (POC) are highly recommended to ensure a match with your situation.BI
BI application—Any BI project deliverable that the BI team develops for business people to use in their analysis. This can be a dashboard, scorecard, report, data visualization, ad hoc query, OLAP cube, predictive model, or data model. There can be many BI deliverables in an application.BI
BI styles—There are different BI application types that a business person may use in performing their analysis, such as: reporting, dashboards or scorecards, OLAP or pivot analysis, ad-hoc query, statistical analysis, alerting or notifications, data discovery, data visualization, spreadsheets and advanced analysis.BI
BI tool—A vendor’s software tool used to develop the BI application and deliver one or more BI styles.BI
EAI (enterprise application integration)/SOA—Tools and methods for consolidating and integrating the applications that exist in an enterprise. The goal is usually to protect the investment in legacy applications and databases while adding or migrating to a new set of applications that exploit the Internet, e-commerce, extranet, and other new technologies.DI
Dashboards—This BI tool displays numeric and graphical informations on a single display, making it easy for a business person to get information from different sources and customize the appearance. This is often a mashup of other BI styles.BI
Data cleansing—The process of finding and fixing errors, inconsistencies and inaccuracies in data. The level of cleanliness required depends on each industry’s best practices. Data quality tools are used for the more complex processing while data integration tools performs basic processing.DI
Data franchising—Packages data into a BI data store so business people can understand and use it. Although this creates data that is redundant with what’s in the data warehouse, it is a controlled redundancy. The data stores may be dependent data marts or cubes. Data franchising takes place after data preparation.DW
Data mart—A subset of a data warehouse that’s usually oriented to a business group or process rather than enterprise-wide views. They have value as part of the overall enterprise data architecture, but can cause problems when they sprout uncontrolled as data silos with their own data definitions, creating data shadow systems.DW
Data mining—This process analyzes large quantities of data to find patterns such as groups of records, unusual records, and dependencies. Data mining helps businesses sift through data to find patterns and relationships they do not yet know, such as “what is the likelihood that a customer who buys our hammer will also buy our nails?”BI
Data quality—Achieved when data embodies the “five Cs”: clean, consistent, conformed, current, and comprehensive.DW
Data preparation—The core set of processes for data integration. These processes gather data from diverse source systems, transform it according to business and technical rules, and stage it for later steps in its life cycle when it becomes information used by information consumers.DW
Data profiling—An essential part of the data quality process; this involves examining source system data for anomalies in values, ranges, frequency, relationships, and other characteristics that could hobble future efforts to analyze it. It enables early detection of problems.BI
Data governance—A process that enforces consistent definitions, rules, business metrics, policies, and procedures for how an enterprise treats its data. It can encompass many areas including data creation, movement, transformation, integration, definitions, all the way to consumption. A data governance program helps the organization treat its data as a corporate asset and maximize its value, but the process of governance is challenged by data that is unstructured and from the cloud, as well as by Big Data.DI
Data visualization—Presenting data in a visual way, such as with graphs and charts, helps business people glean insights they might not otherwise discern from tabular data. Dashboards and self-service BI use data visualization, but it is only as effective as the quality of the data it draws upon.BI
Data virtualization—Retrieving and manipulating data without requiring details of how it is formatted or where it is located. It enables enterprises to expand the data used in their analysis without requiring that it be physically integrated. They do not have to get IT involved (via business requirements, data modeling, and ETL and BI design) every time data needs to be added, allowing them to focus more on data discovery. Also called data federation and formerly called enterprise information integration (EII).BI
Dimensional modeling—A generally accepted practice in the data warehouse industry to structure data intended for user access, analysis, and reporting in dimensional data models.DI
ETL (extract, transform, and load)—The process in which data is taken from the source system, configured, and stored in a data warehouse or database. ETL tools automated data integration tasks.DW
In-memory analytics—Leveraging advances in memory to provide faster and deeper analytics by querying a system’s random-access memory (RAM) instead of on disks. In-memory analytics architectural options include in-memory analytics in the BI tools, as part of the database or on the BI appliance platform.BI
MDM (master data management)—The set of processes used to create and maintain a consistent view, also referred to as a master list, of key enterprise reference data. This data includes such entities as customers, prospects, suppliers, employees, products, services, assets, and accounts. It also includes the groupings and hierarchies associated with these entities.DI
Metadata management—The classic definition of metadata as “data about the data.” metadata, is a means to an end—an enabler to the desired goal of making decision-support data accessible to the business community throughout an enterprise. In managing metadata, an enterprise needs to understand what the data means, how it was transformed from creation to consumption, and its associated data quality.DI
ODS (operational data store)—A type of database sometimes used in a BI data architecture. Unlike a data warehouse, an ODS may serve both analytical and operational functions.DW
OLAP (online analytical processing)—This technique for analyzing business data uses dimensional models often deployed as cubes, which are like multidimensional pivot tables in spreadsheets. It often answers the question “Tell me what happened and why.” OLAP tools can perform trend analysis and enable drilling down into data. They enable multidimensional analysis such as analyzing by time, product, and geography. The two OLAP camps are MOLAP (multidimensional) and ROLAP (relational). HOLAP (hybrid) combines them.BI
Operational BI—Queries and reporting are performed on operational systems themselves, as opposed to the data warehouse. Most enterprises need a mix of operational BI and analytical BI from the DW.BI
Predictive analytics—An advanced form of analytics that uses business information to find patterns and predict future outcomes and trends. Determining credit scores by looking at a customer’s credit history and other data is a typical use for predictive analytics.BI
Report (or analytical) governance—BI deliverables need solid report governance in order to provide consistent information with which the business can make decisions. Report governance includes managing not only reports but also dashboards, scorecards, self-service BI, ad hoc query, OLAP analysis, predictive analytics, data visualization, data mining, and spreadsheets along with the data used. It is more accurate to refer to this as “analytical governance” rather than just report governance.DI
Reporting—Collecting data from various sources and presenting it to business people in an understandable way so they can analyze it. This is the core BI style. Reports were initially static with predefined formats but have become interactive and customizable.BI
Scorecards—Performance management tools that help managers track performance against strategic goals. These may be considered a type of dashboard.BI
Self-service BI—Intuitive tools that allow BI consumers to obtain the information they need without the help of the IT group. People still need the IT group for the hard work of making the data clean, correct, consistent, current, and comprehensive.BI
Structured data—Data that can be organized in a pre-defined record or file and may be stored in a database or spreadsheet. Some examples of structured data are an enterprise’s sales, employee and financial data.DI
Text or textual analytics—The use of data mining for analysis of unstructured textual data such as emails. Text mining tools help find, for example, instances of fraud in thousands of emails or mentions of a company’s name in social media.BI
Unstructured data—Data that is free form or unorganized. Email messages, tweets, PowerPoint, Word documents, or video images are examples of unstructured data.DI

References

[1] Gualtieri M, Yuhanna N, Kisker Ph.D. H, Curran R, Murphy D. Customer data should be the lifeblood of your enterprise. Forrester Research, Inc.; June 11, 2014 Web.

[2] 2014 state of the CIO survey. CIO magazine; January 2014 PDF.

[3] Gartner predicts business intelligence and analytics will remain top focus for CIOs through 2017. Press release; December 16, 2013 Web.

[4] The deciding factor: big data & decision making. Capgemini and the Ecomomist; June 26, 2012 Web.

[5] Pearson T, Wegener R. Big data: the organizational challenge. Bain & Company; 2013 PDF.

[6] Manyika J, Chui M. Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute; May 13, 2011 Web.

[7] Taming the digital dragon: the 2014 CIO agenda. Gartner; 2014 Web.

[8] Kisker Ph.D. H, Green C. TechRadar™: BI analytics, Q3 2013. Forrester Research, Inc.; July 11, 2013 PDF.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.25.217