Introduction

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Introduction

I really hate data governance. I have been responsible for it both as an employee and supported many efforts as a consultant. I always gave it my all and, at the time, felt it was some of my best work - yet it was always a burdensome experience. There was too much work, it was ill defined and seemingly destined for failure.

Let’s consider a river as a metaphor for data and data governance; in its purest state, data is very much like a rushing river with jetties, rapids, and waterfalls. Step in the wrong spot and you will be sucked into a whitewash that will take your breath away. Governance was an attempt to control the rushing water, so we created a series of locks and dams. Your data stewards are out there; oar in hand, attempting to navigate the deluge. Their goal is to help you safely interact with your data. But even with all that in place, the amount of data and high demand for data can make it seem like a futile endeavor.

Said another way, if governance is a funnel, we have two ways to modify the flow:

Limit the size of the top at the funnel
Expand the size at the bottom of the funnel

In our analogy here, the top of the funnel is the data entering your data governance efforts. Limiting the size at the top of the funnel makes data governance owners feel like they have more control. The bottom of the funnel represents data your end-users will ultimately have access to; naturally, the bottom part of the funnel is already limited. By design, current data governance practices limit the output of data by requiring seemingly all data to be “governed” or “managed” before end users can use it. A few years ago, it became commonplace to modify governance practices to focus it on key attributes in your organization - I implemented this modification in my most recent role as a BI Director. We realized it’s not feasible to expect a few people to fully vet and approve every piece of data in your organization’s data universe. So, rather than opening up the bottom of the funnel, we closed the top and limited the responsibility of the data governance program to only the list of approved organizational metrics. This method definitely makes the ratio of data feel more palatable, but it does nothing to improve your user’s access to data, and often creates pent up demand for data.

I have always defined data governance as critical to any analytics work. In my first book, “Healthcare Business Intelligence,” I named it the most important tenet to a successful BI program. Yet, in my experience and in my heart of hearts, I knew very well that what we were doing was wrong. I’d rarely seen traditional data governance practices work end to end - so much of it focused on controlling data rather than using data. Many data governance programs focused too much on prevention and as a result, made it too difficult to get data to an end-user. As a result, the program would be designated a failure, or the business users would find ways around it; fueling pockets of shadow BI everywhere you look.

If you talk to anyone in just about any role in data, they will tell you that data governance is really important. Yet, if you ask the business people that are using the data, their responses are split. Everyone agrees that you must “govern” data, but to what degree varies dramatically. Governance is one of the keys to successful data management, yet there’s a lack of shared definition - or worse, the definitions are so broad that it becomes the proverbial “other bucket” of data management. It’s no wonder we can’t get traction.

Demand for data has never been higher. Angst of data people has never been higher. So, what gives? I know from firsthand experience that even well informed and well-intentioned people can take data and do odd things with it (sum an average? Use a pie chart for intricate analysis? Use correlation and causation interchangeably). But here’s the deal, how in the world will these same people make smarter choices or even get more comfortable asking about the data and what it implies if we don’t actually give them the data? How did you learn while growing up? It wasn’t by sitting quietly and waiting for a wave of knowledge to suddenly and magically hit you – this is no different. End-users must be allowed to get in there and roll up their sleeves. Right now we have a funnel that is both small at the top and small at the bottom, with frustrated users who are (for right or wrong) taking matters into their own hands to answer their questions with data, and last but not least irritated executive sponsors - and that isn’t good!

War and trust

Many data professionals have war stories about the dumb things that people did with data. I have so many of them, I’ve lost count. So, the idea that we can govern “appropriate usage” seemed like a panacea or cure-all, and I was totally on board but would eventually come to a different realization.

I’ve experienced the consequences of end users making decisions with “bad data;” my prime directive was to ensure that our data was in great shape and only then release it to our users. One occasion, we scrambled to fix some “bad data,” but it was too late, the trust had been broken. I was at the helm of the data ship and I felt responsible, but in hindsight, the “bad data” wasn’t really to blame. I had made a promise that was impossible to keep, a setup for failure, even then.

You see, there is really no such thing as “clean data,” not when we’re talking about the accuracy of petabytes of data – it’s just not feasible to guarantee its cleanliness. Maybe when we all had a few megabytes in our data warehouses, we could expect that level of confidence, but those days are gone forever. We are now faced with a tsunami of data, more than any human or teams of humans could control, but before we get to that, we need to take a look back at how we got here. It’s critically important, when you consider a change of this magnitude, to first understand the variables that brought us to this moment in time so we can avoid repeating past mistakes.

The history of data governance

I have this scene in my head about the origin of data governance. Imagine a big conference room, large executive leather chairs and windows framing a metropolis - Mad Men-style. Inside on an average Tuesday, all of the CXOs and VPs got together for their quarterly sales review. Each holding dear “their” numbers, but they were all different numbers. Arguments ensued; papers strewn about as each executive argued their own rationale. Finally, one of them exclaimed, “we have to govern this thing!”

I’ve always loved doing a little research. I went in search of books, articles, blogs, and people from as far back as possible to understand how we got to where we are today with data governance. There is an alarmingly high amount of misinformation about data governance. A quick Google search will demonstrate some of the challenges about how we define data governance and how it works. I sifted out the noise for you and interviewed a number of folks that I really trust to give you a brief and accurate history of data governance.

When you Google data governance you get some fascinating stuff. First, a definition from SearchDataManagement.com:

“Data governance (DG) is the overall management of the availability, usability, integrity, and security of data used in an enterprise. A sound data governance program includes a governing body or council, a defined set of procedures and a plan to execute those procedures.”

That wasn’t quite satisfactory to me, so I kept looking. Wikipedia offers:

“Data governance is a data management concept concerning the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data. The key focus areas of data governance include availability, usability, consistency, data integrity and data security and includes establishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data which an enterprise has can be used by the entire organization.”

May 2019

To begin with, I interviewed Claudia Imhoff. If you have spent any time in data, specifically data warehousing and data modeling, this name will be familiar to you. For those that are not familiar, Claudia is literally one of the founders of what we now term the “data warehouse.” Who else could be better to help us understand the evolution of data governance?

My first question to Claudia was “how did this all start?” The answer: with stewardship. “Data Stewardship was primarily a function to provide context to data, look for data quality issues and be a bridge between those techie people and the non-techie people.” Claudia said, “The role was desperately needed, as it was born from the re-systemization of data as the volumes were growing, even in the late nineties.”

Today, most data governance programs still have data stewards. The role is meant to help arrange and bring order to chaos. Generally speaking, Stewards are not full-time roles, and there is usually a limited number for people that can fill the role. But their job is a big one: To make sure that all data released is well defined and within the appropriate limits (i.e. range of max and min) based on a broadly accepted and well-socialized definition.

Data stewards were meant to help solidify the squishy. There was an intuitive sense that there were issues, but no one really knew what “bad” meant. Did it mean bad data? Were they concerned about bad decisions? It was all up in the air. The hope was that data stewards could help bring clarity and objectivity to the data analytic work going on throughout the organization. But even back then, the trouble was that there were no clear definitions of what it meant to be successful with stewardship or governance. Despite attempts to tie data governance projects to specific business functions, many of those efforts were one-time improvements. Time and time again, governance and stewardship would start and then sputter out, incapable of proving the value they provided to the organization.

Value and return on investment (ROI) have always been a challenge in the data world. While there may be the proverbial pot of gold under the data rainbow, many times it’s just an illusion. Because data work can be tech-heavy, and the “tech” part of the work is easier to tangibly define, we tend to prematurely invest money in software. But without tightly tying that investment to real, long-term benefits related to our data, we lose out on the positive portion of that calculation. And it’s not just about associating governance with a project that has identified and tangible benefits; it’s about attaching it to improved understanding, better or faster decision-making. You can tie your technology investments to usage and that may help, but as history has shown time and again, it sputters out because it’s a short-term value.

Larger software projects such as metadata management and master data management round out the “tech-heavy” aspect of data governance. Virtually none of these efforts provide value to your end-users. Data people (I included) will tell you that metadata, MDM, and policies and procedures are critical to a well-vetted data governance effort. Unfortunately, the people you are there to support, the people that need to use the data, couldn’t care two cents about a policy and procedure document. If you can’t tie what you are doing right now to a tangible business value, it is time to step back and ask yourself why you are doing it.

Most data governance efforts are still focused on control. Literally attempting to make sure that all the data are defined, correct, and have “high quality.” Many programs attempt to ensure that the average business user can fully understand every aspect of the data and don’t accidentally misinterpret it or make erroneous decisions. Neither of which is possible of course.

The idea that all data can be “correct” is not possible for many reasons. The data is too “messy.” There aren’t enough people in any organization to “clean” it. Average end users don’t see enough data to know what types of questions to ask. Most data departments don’t have the business context or the time or the tools to address the core issue. The core issue is so-called “bad data” and it is just a symptom of a broken or misaligned method or process that created the data. Our broken data governance processes aren’t optimized to help the organization improve the processes that create the “bad data,” the data quality or to use the data more effectively.

I’m not saying that we should just stop trying to do the right thing when it comes to data governance or data quality. What I am saying is we have to stop chasing our own tail with unattainable goals; we must accommodate for the possibility of error. We have to consider where it’s best to put our limited resources to ensure that we can squeeze every possible piece of value out of our data. It’s time to challenge some long-held beliefs about data governance, quality, and usage.

Organizational impact of governance

For as long as there have been data governance practices, we’ve had executive sponsors. Over the years, I’ve been lucky enough to work with many executives in this way and there’s one thing they all have in common: they’re executives. Other than that, it’s a crapshoot. Some are so detailed oriented they can’t let go of day-to-day operations. Others are so high-level you wonder what color the sky is in their world. I’ve had amazing executive sponsors, but honestly, I’ve had more than my fair share of terrible ones too. For far too long now the role of an executive sponsor for governance, or really any data related function, has been ill-defined. Yet we rely on them for support when those all-important doors close.

Whether your executive has supported a data governance function in the past or not, it’s time to clearly define the role and expectations of the executive sponsor. Data is too important to any organization to have your sponsor flying blind in that executive boardroom (no one likes that). If they are, in fact, going to support the governance function, the first thing that has to change is their level of participation. It’s a rare executive that understands data governance. Most know that governance is needed to get insights out of the data. But this “new” data governance we’re proposing here requires a bit of backbone, a whole lot of patience, and a thorough understanding of the why.

It’s easy to blame the too-busy executive for their failings in executive support. More often, the truth is that we have failed to clearly and consistently communicate the value and challenges associated with data governance in a way the executive can understand. Communication is a two-way street; your executive must be willing to put in a little effort, but you also have to consider your audience when presenting the challenges and opportunities.

Disrupting data governance

Data governance is broken. There is no way to make incremental changes to fix it. At the core of the issue, not just with governance but also with all of analytics, is the urgent need to provide commensurate value. Executives have (begrudgingly?) come to terms with the fact that data is an important asset to their organizations, but many of them have been burned by the current methods and processes associated with good data governance that they are apprehensive, and rightly so. The data people in most organizations have tried their best, often under intense scrutiny, to build data assets (warehouses) and processes to get data out to the masses. But these same data people are often met with frustrated end users, overly critical stakeholders, and in many cases, peers who are completely checked out or that just don’t have the time or interest in becoming “data literate.”

For a long time, data professionals have used the mantra (and many still do) that “business users just don’t understand.” I know I’ve used it often and even recently. Data professionals feel that it’s their responsibility to protect end-users from the mess, but seeing the mess actually helps our business partners understand the challenge, and empathize with why the work takes so long, as well as why it’s so important. Protecting our business partners from this reality means we have created our own little mess and it has come back to bite us. Data professionals often unintentionally convince their business partners that they can’t understand our struggles, even though we know we need them to.

As a result, data teams create data literacy programs to help the average business user understand the data better. I literally had an executive look at me and say: “isn’t that your job?” to understand the data and provide insights? His point was that he doesn’t go out and hire “data people” or “analysts” so why should he expect them to understand and use the data the way that my team could?

We have a chasm of our own making; one that if we are not careful will kill the data industry as we know it. We are already seeing signs. Many of us watch our hard work get diminished because a business unit went out and bought a new piece of software that provided insights the team was looking for. Celebrations abound and then, in the proverbial awkward meeting, everyone turns to you with a look that says “see, it’s not that hard.” It’s happening all the time because every product out there has a dashboard tool embedded in it now. The integration of the data is becoming less and less important because the data they see is good enough for what they need.

The “old” way of data governance is at the center of much of this lack of use. Our command-and-control approach to governance has - by design - shielded our users from seeing how much work it takes to get to a “clean” set of data. Rather than improving the process upstream, data teams are often put in the position of fixing data downstream. A position that any data quality person will tell you is untenable. We have too much data, too much demand, and nowhere near enough resources.

We find ourselves in a time when petabytes of data are created daily; when anyone can easily acquire a software tool with its own database; when answers are demanded in nanoseconds. The parochial idea of having everyone slow down long enough so we can define data and control its usage is nuts.

Between outputs and outcomes

The outputs from the old way of doing data governance were long lists of activities that led to nowhere, and the outcome was often missed. The good intentions of more usable data, safe data, or well-defined data got lost in the shuffle of activities that didn’t clearly align with the way the average user thinks about or wants to use data.

Re-framing data governance around the concept of using data is a small but critically important semantic change. The intent of governance was always around “appropriate usage” but the world changed, and our processes didn’t.

After doing a lot of research for this book and talking to a long list of experts, one theme kept coming back, over and over again: trust. I heard it in almost every interview I held. I’d write it down on my whiteboard then I’d erase it and go on to something else. I think I did that at least ten times before I could no longer ignore it: data governance is about trust. I don’t think that’s a surprise, but what it means is that we have been marching along with a list of outputs for data governance that have almost nothing to do with the one outcome that matters - the trust.

I am advocating for what I term “radical democratization of the data.” It’s time to get the data out there. In order to do that, data governance teams will have to re-evaluate what they do and recalibrate toward helping the organization adjust to the concept that there is no such thing as 100% accurate data. Radically democratizing access to data means that we have to trust each other. The data professionals have to recognize that the average end-user is just trying to get their job done. And the average end-user has to acknowledge that the data team can’t conceivably address every data or business nuance, especially without the context.

But, WHY?

In Simon Sinek’s seminal book “Start with Why,” he laments that most people assume they know, but maybe they really don’t, or if they do know they at least assume others know too. I think this is what has happened in data governance. We assumed that everyone knew why, and we lost our way. As a result, data governance could never really gain traction because we defined it as a what, not a why.

Why are we doing data governance? What value does data governance provide the business? If we can’t prove without a reasonable doubt that what we are doing is providing value, we should stop doing it. There’s too much other work to do to justify working on things that don’t provide value to the business.

I break the “Why” of data governance into four functions:

Under our new allocation of value, the protection of the data is ascribed only ten percent of importance. Why? Because protection is part of the broader effort of governance more appropriately placed with your security team. It is an effort that requires capabilities well beyond a typical data steward. It also, unfortunately, carries little weight with someone when they want to see data. It’s like insurance; you only realize how much you really need after something bad has happened. We can’t completely dismiss protection, but it should not be the thing that leads. The more important function, without a doubt, is the need to increase the usage of the data assets.

Besides the need to get more people in the data, and the re-alignment of protection to the InfoSec team, correctness can’t be the goal either. The idea that the data is “correct” was something that most traditional data governance efforts rallied around. When you put too much focus on correctness you lose room for nuances in the data. You may be introducing a bias that looks like a pattern. Just like the idea that a nurse manager and a finance manager define a patient differently, only having one way of defining and controlling your data may very well be impeding insights rather than supporting them. Here’s a great example. Let’s say a steward decides that all reports would only show patient counts during “business days,” and the definition of business day was Monday through Friday. Unbeknownst to you or the steward, a clinic opens up for urgent care hours on Saturdays. The reports aren’t modified, and as such, decisions are made from data that isn’t shown. Technically the data is “correct” based on our own definitions. The goal of correctness implies a sense of all-knowing which is not scalable in a modern, fast-changing organization.

Being wrong isn’t our problem, expecting to be right is. The truth is, particularly in the data world, correctness is an exercise in futility. When we seek first to be correct, we are saying we value consistency of the answer over the accuracy of the answer. That can lead to feeling the need to hide data, change the data or outright ignore data that doesn’t fit into our version of correct. I’ve seen each of these scenarios play out in organizations that were absolutely trying their best. It’s a slippery slope that leads to consistent bias, not accuracy. We are formulaically ignoring insights in pursuit of being right. The fact that the data wasn’t entered correctly from the beginning is an insight in and of itself. Instead, we chalk it up to a training problem or a difficult employee and move on. We are right, or are we?

If command and control can’t be the goal and correctness can’t be the goal, what the heck is the goal, you ask? Quite literally…messing up, “Failing fast” or whatever you want to call it. It’s time to re-arrange your success metrics around usage and not the count of metrics released (or reports being used). Success should be end-users asking questions about data, success should be a ten-fold increase in your user base. Forget control - train your stewards on how to respond to questions and challenges and what to do when failures occur. Re-organize the concept of stewardship from preventing “mistakes” to responding when users have questions. Your stewards are like first responders for data, on the ground and ready to help.

How this book will help

You’ve got this far, so you are probably of a shared mindset or at least what I’ve said has resonated with you. In the following chapters, we will breakdown the shift in data governance to four familiar pillars: People, Process, Technology, and Culture. Each will be chapters that will outline the changes needed. In the “People” chapter we will revisit the job descriptions of the data governance roles, allowing for them to be assigned both to a specific function (i.e. finance) as well as keeping an “at-large” function. In the “Process” chapter we will see how data governance can adapt to agile methods and DataOps procedures to not only protect the sensitive data, but also promote usage. In the “Technology” chapter, we will learn how technology solutions must be a means to an end. In chapter five, “Culture,” we will see how critical the concept of governance becomes when we reconsider the implications of the change. A data quality chapter will fill in some details that are critical for us to implement data governance into a modern data warehouse. Finally, we will put it all together and see the benefits of the disruption. This final chapter will be a framework to operationalize the change in your organization; full of checklists and backlogs for use in getting started toward a radical democratization of your organization’s data.

Nothing is perfect and the more we try to make it perfect the faster we lose ground. Let’s embrace the vulnerabilities; only then will we be able to improve the state of our data.

Before we start

As I was preparing to write this book, I interviewed a lot of people (find the list of interviewees in the acknowledgments section). Some are quoted directly while others provided a deeper contribution to the content, with no specific quotes. I learned so much from these individuals and will be forever grateful to them. One particular discussion prompted a small drawing on my whiteboard; I think it’s important enough to acknowledge before we get too far. Joe Warbington was the Healthcare Analytics Director at a-large Data Visualization Company. I’ve known Joe for a few years now and he is a prolific content creator for both data visualization and the healthcare industry. He’s seen a lot of companies and he knows what works and what doesn’t. As he was talking, he referenced how impatient some companies were, and that the relative maturity of the organization in the data industry was an indicator of how much time data governance efforts really take.

A light went on in my head as I drew the axis for a line graph. I labeled the Y-axis “Time” and the X-axis “Maturity”:

Take a minute to consider your organization’s maturity with data. There are formal ways to measure this of course and we will cover them later. For now, as you sit there reading, just doing a quick gut-check: how mature is your organization with data? If you feel like you’re on the low end of the maturity axis, just know that it will take you longer to find your footing. The part that no one wants to talk about is the value of the experience as you build any data governance effort; the effort and focus it takes to become more data mature is a benefit in and of itself. You learn, grow, hire differently and talk differently about what data means to the organization. Don’t skip the work, because within the work is value and education. So even if you think you’re “immature,” embrace it and work to improve it. Just remember, it’s going to take some time. Are you ready? Let’s go!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Introduction

Create new playlist

Sign In

Sign Up

Table of Contents for
Introduction