Chapter 16

Data Shadow Systems

Abstract

Data shadow systems (sometimes called spreadmarts) are frequently-seen departmental systems (usually spreadsheets) that the more technically inclined members of business groups create for their co-workers to use to gather and analyze data on their own when they do not want to work with IT or cannot wait for them. They might just be one-off reports, or they can be as serious as full-fledged analytical applications used extensively for data integration. Data shadow systems create silos, resulting in inconsistent data across the enterprise. They often frustrate IT groups, who do not understand why departments are not using the “official” IT BI tools. Business people are frustrated because the data shadow system gives them what they want. Resolving the problem of data shadow systems requires a compromise between the business and the IT groups, and not losing the valuable parts of the data shadow systems. The BI team needs to identify them, and either replace them or incorporate them into the overall BI program.

Keywords

Data integration; Data shadow system; Silo; Spreadmart; Spreadsheet
Information in This Chapter:
• Identifying and triaging data shadow systems
• The evolution of data shadow systems
• The damages caused by data shadow systems
• The benefits of data shadow systems
• Moving beyond data shadow systems
• Renovating data shadow systems
• Recommendations

The Data Shadow Problem

Business groups are empowered by knowledge—and knowledge comes, in part, from having access to accurate and timely information. It is generally up to IT groups to make that happen. But it does not always work out that way. Departments create their own reporting systems—data shadow systems—and these solutions can put the business at risk.
What are data shadow systems and how do they originate? Quite often, business groups take matters into their own hands because they are not getting the data they need and business decisions are affected. Perhaps their requests are too low in the IT group’s project queue. Or maybe they simply have not been able to effectively collaborate with IT to create a solution that meets their needs. Worse yet, if they are not a Fortune 500 company, there might not even be any IT resources available to them. Whatever the reason, their solution is to create a departmental data shadow system.
Data shadow systems, also called spreadmarts, are a group of spreadsheets and customized, workgroup databases created by business groups to gather data to enable reporting and analysis. Historically, the workgroup databases used Microsoft Access, and often still do, but more often nowadays these are relational or statistical databases that a business group controls.
Data shadow systems support many business processes such as budgeting, forecasting, pipeline analysis, profitability analysis, or other analytical tasks. They include data from enterprise applications, data warehouses, data marts, external sources, BI applications, and even PDF or printed prebuilt reports. Although data shadow systems are really business-built reporting applications, the business groups probably would not think of them as applications because they generally start as spreadsheets and later grow into data shadow systems.
Collectively, data shadow systems support many business people in an enterprise. Often, they are also using the BI and enterprise data warehousing set up by their IT group, but they do their “real” reporting and analysis from the data shadow systems.
This approach is both good and bad. Initially, it is good for the business group because they are finally getting the information they need to contribute to the health and success of the business. Getting information from a data shadow system is fast—a business analyst has the ability to make urgent requests for information a priority. Also, while techies may scoff at the use of Microsoft Excel and workgroup databases for data analysis, using these familiar tools can make business people comfortable and more independent.
The negative aspects of data shadow systems depend on how they are used and whether they are integrated into the organization’s overall BI architecture—which they almost never are. One of the biggest weaknesses is if the data shadow system is a data silo, as shown in Figure 16.1. In that case it will promote the creation of data that is no longer consistent with other data being used in the enterprise. In order to achieve a high level of quality, data must be viewed from an enterprise and holistic perspective. Data may be correct within each data silo, but the information will not be consistent, relevant, or timely when viewed across the entire enterprise. To make matters worse, each report or analysis interprets the data differently, so even when the numbers start off the same in each silo, the end results will not be consistent.
WHAT IS BAD ABOUT THEM?
Here is a brief list of the problems that data shadow systems cause. All are explained later in this chapter.
• Inconsistent data across the enterprise
• Lost productivity
• Data errors during importing, calculating, and data source changes
• Stale data
• Limited scalability
• Increased risk
• Lack of discipline
• No audit trail
• No documentation
image
FIGURE 16.1 Data shadow systems are BI silos.
Too often, people using data shadow systems feel that as long as the data that their silo is using came from the same source as other silos then everything is OK. However, with the myriad data filters, business rules, data manipulation, and macros applied to each silo, it is almost guaranteed that data will not remain consistent.
Inconsistent, inaccurate data will certainly tarnish the image of a data shadow system for business people. But they will also feel the strain of creating and managing their own departmental systems. Ideally, they should focus on their “real” jobs and spend less time on the care and feeding of the data shadow system. When the people most experienced with using an undocumented shadow system move on to other jobs, those left behind have little knowledge and no documentation of the data shadow system’s inner workings.
Data shadow systems are a fact of life in companies of all sizes. Whereas they do not offer an optimal solution from a technology standpoint, they have many business-oriented benefits that cannot be ignored. Keeping the needs of business people in mind, it is possible to replace or rework these shadow systems with solutions that dovetail with a company’s overall DW/BI architecture.

Are There Data Shadow Systems in Your Organization?

Does your organization have data shadow systems? The answer is probably yes, despite what IT thinks. With that in mind, the first step is to identify them so that you can then scope out the extent of the problem and determine what to do.
WHAT IS GOOD ABOUT THEM?
Here is a brief list of the benefits of data shadow systems. All are explained later in this chapter.
• Contain business knowledge
• Responsive, fast, and flexible
• Fill in IT and tool gaps
• Accessible and inexpensive
• Familiar to business people
• Effective for certain tasks
Clues that business groups are using data shadow systems are that people are finding out that numbers are inconsistent, there is a lot of debate about the differences in these numbers during meetings, and business analysts are spending time gathering and reconciling data. Each group thinks it has the most accurate numbers, and that the other groups’ numbers are off the mark.
Maybe the finance group is using spreadsheets—even dozens or hundreds of them—to create reports to examine business performance. They are comparing actual revenue and expense numbers versus budget and forecast. They are figuring out what is selling, who is selling it, how much it costs to produce and deliver, and how much profit they are making. They could also be using dozens of spreadsheets or workgroup databases to extract data from enterprise resource planning (ERP), enterprise applications, and even the data warehouse to transform that data for use in their spreadsheets.
The marketing, sales, and other organizations could also be using their own workgroup databases and spreadsheets to gather data from across your organization, transform it, report on it, and analyze it. The problem could also be deeper if departments are using their own operational systems, such as a cloud application, without IT’s knowledge and/or support. Collectively, these systems may informally support many independent business units, end users, and business processes across the enterprise.
Despite all the powerful BI tools available, your company is resorting to workgroup databases and spreadsheets for data integration. The biggest issue here is that each group has its own version of the numbers, and each group thinks its numbers are the “right” ones because, from their perspective, they are right. How can these business groups make informed decisions about the business when, from an enterprise perspective, no one has the right numbers?

What Kind of Data Shadow Systems Do You Have?

Business groups create various types of data shadow systems depending on their level of knowledge and the data they need. Rather than taking the stance that you should eliminate or replace all of them, step back and examine each shadow system to determine whether it is appropriate for its task. There are three main types of data shadow systems:
One-off reports (mild)—Business people use spreadsheets to filter and transform data, create graphs, and format them into reports that they present to their management, customers, suppliers, or partners. With this type of data shadow system, people are using data they already have and the power of Excel to present it. There is no business justification—or even time—for IT to get involved.
Ad hoc analysis (moderate)—Business analysts create data shadow systems for performing exploratory, ad hoc analysis for which they do not have a standard report. For instance, they may want to explore how new business conditions might affect product sales or perform what-if scenarios for potential business changes. They use the data shadow system to probe around, not even sure what they are looking for, and they often bring in supplemental data that may not be available in the data warehouse. This exploration can also be time sensitive and urgent.
Full-fledged analytical application (serious or very serious)—Most data shadow systems start out as one-off reports or ad hoc analysis, then morph into full-fledged reporting or analytical applications. It is usually not the goal to create such a system, but after power users create the first one, they are asked by the business to keep producing the report, then asked to enhance it, add more data, and add more reports until, eventually, it becomes an application itself. Business processes, such as budgeting, planning, and forecasting, are often supported by data shadow systems on an ongoing basis. It is common to find these systems supporting business functions in marketing, finance, sales, HR, and operations.
The key characteristics differentiating one-off reports and ad hoc analysis from full-fledged analytical applications are scope and the ongoing nature of the analytical applications. These are listed in Table 16.1.

Table 16.1

Comparison between Types of Data Shadow Systems

One-Off Reports or Ad hoc AnalysisAnalytical Applications
Data sourcesFixed and relies on “authorized” data sourcesExpands as time goes on and uses a variety of data sources including departmental and external sources that may not be used by IT
Data integrationLimited—relies on data sources to have consistent data that can be easily combined in spreadsheetsMay be extensive—depends on the variety of data sources used and their data consistency
Data transformationLimited—typically limited to metrics calculated in reportsMay be extensive—depends on the business needs and complexity of the business processes supported
Reporting and analytical functionalityLeverages spreadsheet capabilities so reporting may be very robust but typically limited to stand-alone reports or analysisLeverages spreadsheet capabilities but typically must build in application functionality due to more complex analytics and/or potentially many interconnected reports

Data Shadow System Triage

Not all data shadow systems are created equally, so handle them differently based on scope, use, and risks. Just as paramedics or emergency room medical personnel have to triage their patients, both business and IT groups need to perform a data shadow system triage to determine where they should expend their limited time and resources, as well as how to maximize business values and ROI.

Mild and Moderate

First, one-off reports and ad hoc reports may actually be good candidates for data shadow systems. Maybe power users and IT could create these reports using BI tools, but if the business people are getting what they want, then it may be best to leave these shadow systems alone, especially if there is a limited number of reports or business people involved. If these reports perform limited or no data integration or data transformations, then the primary risk of data shadows systems, data inconsistency, is avoided altogether. The patient may have mild symptoms, but since none are life threating (no business risk), move onto more pressing matters.

Serious

If, on the other hand, the data shadow system is a full-fledged analytical application with many reports or users but still limited in its data integration and data transformation, then IT may offer to help by augmenting it rather than replacing it. These systems are similar to patient symptoms that are serious, but still not life threatening—you deal with them, but they are still less than the “very serious” category. Potential remedies may include:
• Upgrading to the latest version of the spreadsheet and enabling the self-service BI capabilities offered in that spreadsheet ecosystem.
• Offering more extensive prebuilt data connections to the business applications and data sources being used in the data shadow systems. Maybe offering data connections to your data warehouse or data marts that have already been built for the business groups involved.
• Enabling or identifying spreadsheet integration capabilities in the current BI environment that may be used by the business users instead of expanding the data shadow system even more.

Very Serious

The category of data shadow systems that should command the highest priority—those with very serious symptoms—is those used extensively for data integration and data transformations, creating silos of data inconsistency. These systems have likely become productivity traps, as the analysts supporting them spend ever-increasing amounts of time taking care of the systems, leaving less time for their “real” jobs. The commitment of resources and the time to keep these systems supported and growing indicate their business value. These are the systems that must be either renovated or replaced.
It is important to view each category of data shadow systems as business-built self-service BI applications that have likely been very useful to the business community. IT needs to embrace and support those used for one-off and ad hoc reports, but consider renovating or replacing those used for full-fledged business reporting. This latter category causes more harm than good when they have become data and analytical silos. The remainder of the chapter deals with this category of data shadow systems.
Table 16.2 provides some guidance on the triage process.

Table 16.2

Data Shadow Systems—Triage

System Value and UsageAction
The system has little business value.Eliminate it.
Only a few business users make use of it.Not worth the effort to renovate, just leave it in place.
It is used for one-off reporting or ad hoc, exploratory analysis.Leave it in place.
There is significant business value and/or many business users.Target the data shadow system for renovation or replacement.

The Evolution of Data Shadow Systems in an Organization

When determining the best way to evaluate and possibly replace data shadow systems, it is important to understand how and why they come into existence.

Filling in a Gap

Generally, data shadow systems are created and expand when business people find that the existing reporting applications do not supply them with all the data they need to do their jobs and operate the business. A business must be able to respond and adapt quickly. Business decisions cannot be placed on hold while IT addresses its backlog of requests from the rest of the enterprise for new capabilities, new data feeds, and custom programs. Likewise, business people may be in a rush to obtain data, and do not have the time for IT’s process of collecting requirements, performing impact analysis, designing applications, and creating the BI solutions.
So, when it cannot obtain the information it needs when it needs it, a business group takes matters into its own hands. Management asks the group’s business analysts and power users to gather the data and create custom reports to support the business analysis needed—however they can.
We have tried to keep this book product agnostic and avoid mentioning specific vendors and products. However, despite the growing number of cloud-based and open source spreadsheet alternatives to Microsoft Excel, the reality is that there are hundreds of millions of Microsoft Excel users, making it pervasive and the go-to spreadsheet used in data shadow systems.
Along these same lines, a second tool historically used in data shadow systems is Microsoft Access. Although Access is still prevalent, there are several alternatives:
• Databases—because of many innovations there are various database options that do not require a database analyst (DBA) and can be managed by a business group.
• Statistical tools—as more business people are becoming proficient in statistics and predictive analytics these tools are being used in conjunction with spreadsheets.
• Data discovery and other self-service BI tools—although IT is deploying these tools for BI, they are often being used as the query tool to feed spreadsheets and data shadow systems.
The analyst’s typical process looks something like this:
1. Pull data from a data warehouse, data mart, or enterprise application into a spreadsheet or a local database
2. Manipulate the data a few times with database queries or Excel macros (maybe a dozen times or more)
3. Pull in some other data the business unit uses
4. Put the results into a spreadsheet or another worksheet
5. Crunch some numbers to perform final analysis
6. Maybe create pivot tables or charts
7. Prepare reports in the final worksheets or spreadsheet.
What happens next is how a “simple” spreadsheet or report explodes into a data shadow system. The analyst presents the findings with supporting spreadsheet reports and management or peers ask for more: more data from more sources; more data manipulation and calculations; more business rules; more graphs; more reports. The spreadsheet expands to include dozens upon dozens of worksheets pulling in ever-increasing data, more worksheets integrating and transforming the data, and, finally, many more worksheets used for reports.
It is typical to see either a pool of spreadsheets or one massive spreadsheet with many worksheets. (Sometimes analysts use many separate spreadsheets, wheras others may have one spreadsheet with many worksheets depending on file size and habit—but it is the same thing no matter how they implement it.)
As shown in Figure 16.2, the data shadow system performs three primary functions:
1. Data gathering from many sources including local and enterprise spreadsheets and databases using either (a) queries from spreadsheets, database tools, statistical tools, or BI tools, or (b) data exports from enterprise applications, DW, data marts, external sources, or other spreadsheets.
image
FIGURE 16.2 Data workflow.
2. Data integration and transformation using dozens of worksheets to manipulate and transform the data—filter, merge, aggregate, apply business rules, etc.—to support whatever analysis needs to be performed.
3. Data reporting and analysis using many more worksheets to perform the business analysis and produce the reports.

As They Grow, It Is Harder to Turn Back

As we have chronicled, the initial request for a report grows from one spreadsheet into an ever-expanding data shadow system. Not only does the data shadow system expand in terms of data and reporting, but the number of business people relying on it likely expands dramatically. As soon as one data shadow system sprouts, many others will appear not only in the business group that created the first one but also across the enterprise. It is common for an enterprise to have many data shadow systems that people increasingly rely on for business analysis or examining reports. There is a tipping point, and many enterprises have long since passed this, where data shadow systems are the predominant BI solution used by the business. This may happen even if an enterprise has made significant investments in data warehousing and business intelligence.
With the rise and expansion of data shadow systems, more and more business analysts are caught up in building, expanding, and “feeding” them. This care and feeding is overwhelming and takes time away from the tasks they used to do. The new workload is initially justified because of the business value created for the analyst’s management and peers. In some cases, the data shadow system work is actually part of their initial job description. Then, over time, it becomes clear that it is a problem.
As the data shadow systems become more entrenched, it is harder to turn back and involve the IT group. The power users are reluctant to adopt a new corporate reporting “standard,” which they believe will limit their effectiveness. Change comes hard, especially when it means learning a new toolset and adapting to new definitions for key entities, calculations, or metrics. They are used to using Microsoft Excel, which does some things better than traditional BI tools. Executives perpetuate the problem because they do not want to pay hundreds of thousands of dollars or more to build a robust data infrastructure and deploy enterprise reporting and analysis tools to recreate the reports they already have. It is easier and faster to keep asking the power users to generate the reports.
Then, unfortunately, the business group finds itself basing its “real” decision-making on the information it receives from these localized, silo’d systems. It may seem like a fine, quick fix, but as you will see further in this chapter, it is a situation plagued with problems and risks.

IT Group Response to Data Shadow Systems

The emergence of data shadow systems can leave IT managers scratching their heads, wondering why the business group is creating shadow systems when it already has great DW and BI tools.
Their reactions can include:
Hostility—“Your system is poorly designed and was not needed in the first place. Do not expect us to support it,” and “Why are you clinging to Excel when you have the best-of-breed BI tool available for your use?”
Peaceful coexistence—“We know the shadow system gives you the information you need, so let us figure out how we can all work together so you continue getting your data in a way that is consistent with company IT standards.”
Blissful ignorance—“We have no idea what they are doing with all those spreadsheets and Access databases in Marketing, and we do not want to know,” and “They are using our BI tools, so we do not think it is an issue that they are also using spreadsheets.”
The most likely scenario is blissful ignorance and the least likely is peaceful coexistence.
Unfortunately, the hostility scenario can become adversarial, especially if IT tries to shut the data shadow systems down. The business reaction will be to tell IT they would not have created the shadow system if the DW and BI solution did what is was supposed to do. The reality is that data shadow systems usually have business value, otherwise no one would have created them in the first place or continue to support them. Because they have business value and the business groups are IT’s customers, it is highly unlikely that IT is going to “win” in a confrontation. Actually, everyone loses in the hostile scenario.
Later we will discuss ways to work out a compromise between the groups.

Damages Caused by Data Shadow Systems

Data shadow systems may give business groups what they want, especially in the short run, but it does not mean they are happy. Business people do not want to spend so much time creating, expanding, and then maintaining these systems. Nor should they. They should be spending their time gaining a better understanding of their business, not wrestling with technology.
Data shadow systems may start off for good reason and are quite manageable. But as they change and expand over time, they tend to turn into productivity traps. The problems associated with shadow systems include:
Inconsistent data across the enterprise—each business group uses its own data sources, filters, business rules, data conversions, naming conventions, and calculations to generate reports and analyses based on its view of the business. Each group may view customers, products, services, accounts, and geographic hierarchies differently. Different business units sell the same products with different names, packaging, pricing, and partner channels. The business may operate very differently in different countries or regions. Even when the same data sources are used across data shadow systems with all interpretations noted, it is unlikely that they are consistent. Although there may be valid reasons for these differences at the departmental level, they are a problem at the enterprise level. When each group manages its own data and processes, it is nearly impossible to deliver a consistent, enterprise view of customers, products, sales, and profits.
Lost productivity due to “analyst time sink”—business analysts might spend half their time each month creating, loading, and maintaining data shadow systems. Instead of analyzing data, these high-priced employees act like surrogate information systems professionals, gathering, massaging, and integrating data. Many executives have initiated BI projects simply to offload these time-consuming data management tasks from analysts.
Lost productivity due to reconciliation—another productivity trap is the time wasted arguing whose numbers are correct and performing the inevitable reconciliation analysis that will be needed to “validate” the numbers. This lost time is directly attributable to inconsistent data noted above. This is a more subtle time sink, but one that takes business people away from gaining insights from the data and running the business.
Data error #1 (import)—with analysts getting data into the data shadow systems on their own there is a risk that the wrong data will get into the system. There are three common methods for getting data into systems: (1) manually entering data (primarily for lookup and reference data), (2) extracting data from a database or application and then importing it into the system, and (3) performing a query to import data directly into a system. Although typos are the primary risk when doing (1), processes (2) and (3) are more likely to carry the risk of the wrong data being imported due to applying incorrect filters or business rules. Since most data shadow systems are not documented, even when (2) and (3) are initially correct, business rules or filters are not changed when they should be, resulting in incorrect data.
Data error #2 (calculations)—users embed logic in complex macros and hidden worksheets that are not documented and that few people other than their creator may understand. There are many cases of errors in macros that are not uncovered for long periods of time, resulting in incorrect data. Similar to the problem with filters and business rules changing in the example above, macros are not modified based on business or data changes, resulting in incorrect data. An even more pervasive problem is when one analyst copies another analyst’s spreadsheet without knowing or understanding the macros being used, potentially leading to unreliable data.
Data error #3 (data sources change)—data shadow systems may generate system and data errors when they are linked to data sources, such as enterprise applications or files, that change without notice. IT generally manages changes in enterprise applications by performing a change impact analysis to avoid this risk, but data shadow systems are typically not on their radar and therefore not part of the impact study.
Data error #4 (stale data)—shadow systems may be initially populated with “official” corporate data. However, once data is part of a shadow system, it may no longer reflect updates and changes from the corporate system where it originated.
Limited (or no) scalability—few data shadow systems scale well either because of the limitations of the tools used or because of the skills of the analysts. The tools may not be able to keep up with increasing data volumes, update frequency, or the number of business people using it. Sometimes the issue is simply that the analyst in charge just does not have the time to expand the scale of the data shadow system.
Increased risk—business people may make decisions based on faulty data, establish plans using assumptions based on incorrect analyses, and increase the possibility of fraud and theft of key corporate data assets.
Lack of discipline—because dealing with technology and architectural design are not what business users do best, they cobble data shadow systems together with tools they know but without an overarching design. Each addition or modification becomes more difficult to implement and more costly to maintain, and when data management principles and disciplines are not followed, data consistency and integrity suffer. This is not so much of a problem at the onset when the analyst creates the initial spreadsheet, but develops into an increasing problem as the system expands with an accidental architecture and does not follow best practices.
No audit trail—these systems are often a loose collection of spreadsheets, personal databases, extractions, filters, queries, business rules, and macros, so it is nearly impossible to trace or audit the processes that created them to ensure adequate control and compliance. It is often ironic that a finance group can pass an audit because the IT processes it uses are auditable, but the data shadow systems that they use to make decisions are not, and are ignored in an internal audit.
No documentation—when the analysts who created or are responsible for an undocumented shadow system move on to other jobs, those left behind to support it have limited or no knowledge of how this now critical system works. Many a data shadow system is treated as an authoritative analytical source even though no one knows what data is being used or how it is manipulated. The dangers from this increase as the data shadow system ages.

The Benefits of Data Shadow Systems

Despite these problems, data shadow systems do have their advantages; otherwise the business would not have them. Based on their good points, it is often not realistic to totally eliminate all of them. Some of their advantages include:
Business knowledge—because data shadow systems have been used to operate and manage business groups, they are teeming with the knowledge that drives a business. They already contain the algorithms that transform data from ERP, enterprise applications, and data warehouses, into relevant information people can use to understand their business. Combined with email, they also support the informal workflow of information and they are the informal business processes that drive decision-making for performance management.
Responsive—business people often need to make timely decisions, such as close a deal, develop a new plan, monitor a key process, manage a budget or fulfill a customer requirement, regardless of the state of their existing reporting applications. Data shadow systems can give business people a short-term fix by giving them at least some of the data that they need to make a more informed decision.
Fast and flexible—the senior VP must have a financial report ASAP, but IT has a weeks-long queue. Business cannot be put on hold, competitors cannot be slowed down, and shifting marketplaces cannot be frozen in time to accommodate the time it takes to source and integrate the data into a static data warehouse structure. Often, analysis is an iterative process with one report creating the desire for additional data and modified reports. A data shadow system enables that senior VP to keep asking the analyst to refine the latest report and make the changes rather quickly.
Fills in IT gaps—some organizations may not have an IT staff or a data management infrastructure, or IT is understaffed and cannot support them quickly or inexpensively. In either case, business groups need to build their own analytics solutions.
Fills in tool gaps—the organization’s BI tools may not support the types of complex analysis, forecasting, or modeling that business analysts need to perform, or they may not display data in the format that executives need.
Accessible and inexpensive—because data shadow systems are based on readily available desktop tools, they are cheap and quick to build. Within a day or two, a savvy business analyst can prototype, if not complete, an application that can handle the task at hand.
Familiar—business users want tools that they know and understand. Their jobs are to manage and improve their business, not spend a lot of time learning how to get the most out of new and different software tools. Also, they know what information is in their systems and how to use it for analysis.
Effective—data shadow systems are built for business people by business people. These systems get the job done. On the other hand, with IT-built systems, some of the business requirements may have been lost in translation or new business requirements may have missed the window to be incorporated at all. Sometimes, enterprise standards overwrite departmental standards that were actually of great value. Any data shadow system that did not get the job done would have been scrapped by now.
Despite these benefits, there is a high price to pay for data shadow systems in the long term. Many executives have recognized the dangers of data shadow systems and made significant investments to fix this problem. However, not all have succeeded. In fact, most struggle to deliver a robust data delivery environment that weans business groups off data shadow systems and delivers the consistent, timely information they need.

Moving beyond Data Shadow Systems

You have uncovered data shadow systems in your organization, and you see the problems they cause. Now what? You might wish you could just pull the plug. But moving beyond data shadow systems is a process that requires communication, accountability, and compromise.
ALIGN BUSINESS AND IT
• Communicate and convince the business that data shadow systems need to be fixed.
• Enlist business executive sponsorship (funding) and commitment (resources) so you have what you need to fix the problems and not recreate new data shadow systems every time there is a fire drill.
• Get the business involved, ask questions, and make sure they are part of the renovation team.
• Get IT involved and working in an agile manner, willing to compromise on technology for the sake of business usage and consistent data.

Stopping the Blame Game

Before anything else, the business and IT groups need to have a civilized dialogue. Rather than getting into the blame game, which will thwart any productive effort, both sides need to communicate the good, the bad, and the ugly. The business needs to communicate why it created the data shadow systems, what they see as the benefits, and the problems and risks of maintaining the status quo. IT needs to listen and not judge or be defensive. IT needs to discuss what it sees as the problems and risks associated with keeping the data shadow systems without “attacking” the business. These discussions are almost like couples therapy.
The next step in this process is for both the IT and the business groups to share the blame for the current situation and then move on to solving the problems. See Table 16.3.
If business and IT can share the blame, then it is time to determine how to move forward.

Table 16.3

How Each Group Can Share Responsibility

Business GroupIT Group
Takes responsibility for the fact that inconsistent data across the enterprise raises costs, lowers productivity, and may result in decisions based on erroneous data.Takes responsibility for slow response times and inflexible processes. Learns to develop agile systems that adapt quickly to rapidly changing business conditions and requirements.
Learns to recognize the importance of building sustainable, scalable solutions.Learns about the business group’s goals, how they operate, even the terminology they use.
Does not blame IT for failures that the business caused by underfunding projects and overriding IT decisions, making it impossible to serve the business’s needs.Does not blame the business for creating the shadow system and realizes why they created it. Understands the business value of the shadow system and the data it provides.
Treats IT as a partner.Treats business as a customer.

Business and IT—In It Together

The next hurdle in our couple’s therapy is to determine what both sides will agree on to get things done. This is a significant hurdle because it involves compromise by both sides. If both sides are not willing to compromise then any efforts regarding the data shadow systems are doomed to fail. These failed efforts are very costly in terms of time, expense, and likely a degrading relationship between business and IT.
I have seen too many IT groups propose an all-or-nothing solution—either the business users do it the “right” way or it will not get done. But no matter how great the architecture and BI are, if the solution does not offer the business value or ROI it either will not get approved or even worse it will get approved and built, but then ignored by the business. “My way or the highway” is a silly approach for IT to take because, ultimately, the business groups are its customers. Just as with any customer, you cannot force them to buy or use your product. IT needs to sell the business groups.
If new solutions are not built or used, then the business people return to the expanding data shadow system that neither side really wants. That scenario, repeated time and time again, costs companies millions of dollars and undermines business and IT productivity. In addition, the ever-increasing data silos put the business at risk by enabling decisions made from inconsistent and incomplete data.
As part of a compromise, the business and IT need to explore the following alternatives and honestly assess the outcomes. See Table 16.4.

Table 16.4

Assessing the Choices

ScenarioActionOutcome
Rip & replace (Green field)The IT group architects and develops the “right” solution.Too long and expensive. While the “right” solution is being built, the business needs to make new business decisions, so it builds more data shadow systems.
Stand stillThe business group continues to use and expand the data shadow system.All the same problems persist; nothing has changed. Both IT and business are frustrated, but the business continues to build new data shadow systems.
RenovateTogether, they agree on a more cost-effective and timely solution for data shadow systems that need to be replaced and leave others alone.New solutions better meet the business group’s needs with timely and consistent data. The business does not feel a need to build new data shadow systems.
In the last alternative shown in the table, the groups use a triage approach to determine what data shadow systems should be targeted for renovation or replacement. Once they target them, the renovation team needs to determine what portions of the data shadow systems need to change or be replaced. This is significantly different from the “rip & replace” approach, which assumes everything must go. This is the time to think outside of the box and build systems quickly while IT incrementally builds out a unified data backbone that business users can access when they quickly need to get to data to make decisions.
We will examine options later, but as an example, consider building a business group-specific database, e.g., a data mart, to replace the workgroup databases and replace cobbled-together queries with more robust data integration processes. Shift the data sources to the enterprise data warehouse, if available, or the systems of record when not in the EDW. Also continue to use additional lookup tables and external data that business users incorporated in the existing data shadow system as necessary to support analytics.

Changing the Approach

The problem with data shadow systems is not the technology used to create them. Spreadsheets and other desktop-oriented tools are an important part of any organization’s technology portfolio. Problems arise when individuals use these tools as data integration tools to collect, transform, and house corporate data for decision-making, planning and process integration, and monitoring. Adding insult to injury, the data integration performed is a hodgepodge of processes that are not designed nor documented. When this happens, data silos proliferate, undermining data consistency and heightening business risk. The problem is that spreadsheets are an inappropriate tool for the data integration processes, not necessarily the wrong tool for analysis. Too often the attack from IT on data shadow systems concentrates on the overall use of spreadsheets rather than narrowing the focus to data integration.
The technical remedy for data shadow systems is to manage and store data and logic centrally in a uniform, consistent fashion and then let individuals access this data using their tools of choice. In other words, the presentation layer should be separated from the logic and data. When this is done, business people can still access and manipulate data for reporting and analysis purposes, but they do not create new data or logic for enterprise consumption. The goal is to transform data shadow systems into managed spreadsheets.
This lets IT do what it does best—collect, integrate, and validate data, and implement business rules (data integration)—and lets business analysts do what they do best—analyze data, identify trends, create plans, and recommend decisions (analytics). BI vendors are starting to offer more robust integration between their platforms and Microsoft Office tools. Today, the best integration occurs between Excel and OLAP databases, where users get all the benefits of Excel without compromising data integrity or consistency, since data and logic are stored centrally. Microsoft Excel is most likely the final BI tool used in the analysis food chain. Another BI tool may be included in the environment to pull data from the data mart, but that tool must get the data into the spreadsheet for the business users to complete their analysis.

Changing the Culture

Applying the right mix of technology to address the data shadow system problem is the easy part. The hard part is changing habits, perceptions, behaviors, processes, and systems. People do not change on their own, especially when they have been successful with a certain set of tools and processes for analyzing data and making decisions.
Changing a data shadow system-dependent culture usually requires top executives both to communicate the importance of having unified, consistent, enterprise data and to apply incentives and penalties to drive the right behaviors. The reality is that executives are often the enablers for data shadow systems when they ask their staff to answer some questions or perform some analysis ASAP. The business people bringing back the answers quickly are likely to be rewarded regardless if they are using a data shadow system. Everyone needs to make changes from the top down. Ultimately, change takes time but the right organizational levers can speed up the process.
It helps if you can use some of the agile project methodologies described in Chapter 18. This includes keeping the project team small and making the deliverables quick, letting the business group drive the renovation project, and using a working proof of concept to present the initial deliverables.

Misguided Attempts to Replace Data Shadow Systems

Often, IT’s knee-jerk response to data shadow systems is to replace them. IT views the data shadow systems as technical abominations and therefore figures that the right technical solutions will solve all. IT assumes that the business users will see the technically superior system as better than their data shadow systems and naturally flock to that new solution. This is a harsh response, and is not recommended.
There are two technical directions that IT generally uses to replace data shadow systems, and neither is very successful:
• Migrate to a best-in-class BI tool
• Create a standard set of reports meeting business needs.

Migrating to BI Tool

BI tools have evolved over the years to include reporting, analysis, dashboards, visualizations, and many more capabilities, sometimes turning into power BI product suites. Each new generation of BI tool gets expanded functionality and is heralded as business friendly. In product evaluations these tools fulfill numerous checklist items and receive high praise from the technology oriented. Vendors assume that since these BI tools have so much functionality, business people will drop the use of spreadsheets for them.
But a funny thing happens at many organizations; business people view the comparisons between spreadsheets and best-in-class BI tools differently. Business people are quite familiar with spreadsheets and have likely been using them their entire careers. Just as BI tools have gotten more powerful, so too have spreadsheets. Business people are likely to stay with spreadsheets and their data shadow systems because they provide the information they need to do their jobs. Too often, IT misreads why data shadow systems were created, not because of the reporting interface but rather because the data is not readily available.
Selecting and standardizing on a BI tool is a sensible strategy for an organization, but it should leave room for spreadsheets in the organization’s BI portfolio. If IT wants to change the data shadow system landscape, then they will need to realize that a shiny new BI tool is not going to compel business people to abandon their data shadow systems.

Create a Standard Set of Reports

The second misguided option that IT tries to use to replace data shadow systems is to create or expand a standard set of reports for business people. It gathers the requirements, builds the reports, and then becomes frustrated as business people continue to use and expand their data shadow systems. This approach typically results in failure because of the very nature of the one-off reports and ad hoc analysis that kick off the creation of data shadow systems—business and data requirements are not all known and, in fact, are evolving as the business itself changes. The second limitation of creating a standard set of reports is that the business person likely has limited or no ability to expand or modify these reports as he/she can in a spreadsheet. The business people often see this approach as handcuffing their ability to do their jobs.

Renovating Data Shadow Systems

The key to renovating a data shadow system is to split it into its data integration and analytical processes, and then work to understand each. This is easier said than done, however, as data shadow systems are usually built with an “accidental” architecture. They are not planned; they evolve over time. Their creators use familiar, convenient tools rather than the tool that is appropriate for the task. The systems evolve haphazardly and are not typically documented. See Figure 16.3 for an example of a data shadow system accidental architecture.
image
FIGURE 16.3 Data shadow system example.
To understand the shadow system’s data integration and analytics capabilities requires a laborious reverse-engineering process of examining spreadsheet macros and queries, database queries, workgroup database code, and any other process used to integrate or analyze data.
The reverse-engineering process may be onerous, but it is important because these data shadow systems are where data is truly transformed from data into business information. This is because there is often a gap between what is in an organization’s data warehouse and what the business person needs to do analysis. Reverse-engineering not only is needed for renovation but also is an excellent way for IT to understand what the business really needs in regard to data. If IT fails to close that data gap in their data warehousing environment, then the business will fill that gap with data shadow systems.

Data Integration Considerations

The most significant liabilities of data shadow systems are data inconsistency and data quality. Too often IT thinks the “problem” of data shadow systems is that people are using Microsoft Excel as the front-end BI tool. When you are replacing or renovating a data shadow system your main concern should not be whether you should use Microsoft Excel as a BI front end. Rather, you should focus on using your data-integration tool to replace the ETL cobbled together in these systems. The BI initiative is only as good as the underlying data. That is what you should concentrate on.
The real problem causing inconsistent and error-prone data is that people are using Microsoft Excel, workgroup databases, and whatever tool is available to perform ETL. A typical data shadow system has:
• Between six to three dozen steps (and I have seen systems with hundreds!) of Microsoft Office (Microsoft Access and/or Microsoft Excel) queries or imports gathering data
• A series of steps using the workgroup database or spreadsheet to “integrate” the data
• A series of worksheets to create the tabular reports and charts.
The data used by data shadow systems is entered manually, via imported flat files or using database queries. Rarely are ETL tools used. Once you use reverse-engineering to understand what data-integration process are used, you can determine how best to renovate it.
The typical data-integration alternatives that could be used in data shadows system renovation include:
Data integration suite or extract, load, and transform (ELT) tool—use in complex data-integration cases.
Extract, load, and transform tool—use with less demanding data-integration processes. An ELT tool often comes bundled with database-management systems and typically costs less than data-integration suites tools (but may have less functionality).
Data virtualization—use when the data shadow system pulls small to moderate amounts of data from several sources with moderate transformation, or when there is a need for real-time integration.
BI queries or imports—use when you can leverage more advanced capabilities that have been built into self-service BI tools without having to use the tools above.
When designing data-integration processes, IT groups have a tendency to use the data-integration suite or ETL tool they use for loading their data warehouse. Although this can be a viable option, it sometimes contributes to the data shadow system problem because the cost of using these tools is too high or the backlog to get new data built is so long. If this is the case, then IT should consider alternatives to their standard data-integration suite, such as using ELT tools or other data-integration options.
The resistance that I generally see in these situations is: (1) IT wants to have one enterprise standard and/or (2) IT feels that their standard tool is superior to the alternative. This is a time when IT needs to compromise and consider the big picture. Even though their arguments makes sense intellectually, the reality is if they do not compromise then data shadow systems will continue to be built with a hodgepodge of undocumented integration processes. From the business group’s perspective, the choices are not the enterprise standard ETL tool versus the data-integration alternative. Rather it is a choice between building the data shadow systems with Excel or not getting the data they need. Remember, this might be critical data that is needed to run the business. Compromise may be better for the business and produce more business value. This all-or-nothing scenario is often the single breaking point for many projects. If the project is too costly, it ends up being a lose/lose situation.
Your data-integration suite should have a portfolio of options to replace the ETL processes of your data shadow systems. These options include your standard ETL batch process, pushdown or ELT processes, real-time enterprise information integration, or even data-integration services (implemented as part of your service-oriented architecture strategy).
No matter what data-integration choice you make, it surely will be a significant improvement over stringing together Microsoft Office products to perform ETL.

Analytical Process Considerations

The business is focused on getting its analysis done, not on the tool it is using. The IT group, however, will be frustrated if the business is ignoring the BI tool it selected for them and is using spreadsheets instead. Under these circumstances, IT needs to critically examine if their standard BI tool really meets business needs better than a spreadsheet. The key is to examine this from the business person’s perspective rather than the technologist’s. A business person may feel that the spreadsheet is good enough and it is not worth learning a new tool, even if it is technically superior. IT might not like that answer, but they need to hear it.
IT should consider a quick fix for renovating the data shadow system front end that continues to use spreadsheets, but also leverages the self-service BI capabilities that have been built into these spreadsheets. The key is that the spreadsheets are used for BI, not data integration. Another quick fix would be to enable a BI tool to supplement the spreadsheet or be an optional alternative. This does not force the business person to abandon the spreadsheet, but allows them to explore and move to the BI tool if they desire.
Often business people do not feel the need for all the functionality that is built into the best-of-breed BI tools IT selects for them. Although they look great on paper, the reality is that the BI tool only has a business value and ROI if it is used by the business. IT should evaluate alternatives to their standard BI tool if the business is not willing to use it. There are many user-friendly, cost-effective BI options in the marketplace today. Also consider the latest generation of self-service BI tools that are being adopted and used by business people. These tools might not always win the evaluation bake-off, but they win in business value because they are really being used.
THE RENOVATION APPROACH SHOULD PROVIDE
• Consistent, comprehensive, accurate, and timely data.
• Flexibility to augment or supplement data that is not in your data warehouse.
• Minimal need for business people to gather and integrate data on their own.
• An adaptable architecture that enables rapid application development.
• Easy-to-use, flexible tools for a business user to report on and analyze data (and remember, Microsoft Excel is going to be part of your BI portfolio).
• Ability to publish reports, allow subscriptions, provide alerts, and enable workflows.
• Increased responsiveness and understanding of business needs and processes.
• Easy to use reporting and analytics, balancing responsiveness to business needs and data integrity.

Balanced Priorities

Any BI tool that you build or buy has to balance business/IT priorities, processes, and standards. For many companies that have successfully renovated their data shadow systems, it has taken a couple of attempts to satisfy business needs and get business people to use the solution. Often these companies use a replacement strategy with an application or tool that wins their evaluation bake-off and that IT feels the business will readily embrace. This approach usually receives a lukewarm reception. Enlightened enterprises then take the next step of truly reengineering the solution, separating data-integration and business intelligence processes.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.224.226