Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 7
Buried in Bad Data

“You will never get businesspeople to leave their spreadsheets behind and adopt your business intelligence tools. You are leading everyone here astray by suggesting that companies should stop using Excel.”

In fact, that's not what I said at all, as I look to my left to see who is speaking up during my presentation.

I'm angered and frustrated standing in the middle of the u‐shaped conference room, looking for reactions from the thirty IT leaders attending my session on becoming a data‐driven organization. We're at a SINC conference for IT leaders in Scottsdale, Arizona. It's perfect blue‐sky weather outside, and I want to make sure my audience finds value in the time they're spending indoors with me.

So, I repeat myself to him and everyone in the room. I have a lot of patience, and as a leader, you'll often have to repeat the message from different vantage points and hope the reframing helps people understand the “why” on your perspectives.

“I don't tell anyone to give up spreadsheets,” I say calmly, and continue. “Used within boundaries, they are extremely effective tools, especially when reviewing new, smaller data sources. The problem is that no one tells people the guidelines, and one simple spreadsheet slowly becomes an operational forecasting machine that's updated weekly with formulas that are never validated. Then one day something goes wrong. An error is found. Or the spreadsheet has become such a monster that it's taking too long to process the data. Or some new data is dropped in and the spreadsheet can't easily accommodate it. Or the person running it leaves the company. Then what do you do?

“My point is that there are better tools, and part of our mission as IT leaders is not just buying and deploying them, it's illustrating where to use them and demonstrating best practices for how people should use them.”

I see some nodding heads and continue.

“And whenever there are new tools, there are legacy ways of working that need to be challenged. Why would I want the analyst to continue manually updating their spreadsheet if they can build a more useful dashboard that connects to live data? They'll only get there if they take a step away from what they're doing today. That requires finding examples where new ways of working drive business impact. It means nurturing people in the organization on how to use the new tools, providing them with data best practices and governance. You need to identify which leaders are willing to challenge the status quo and catalyze them. Leaders must confront what tools, practices, and behaviors should be left behind, like using spreadsheets as Swiss Army knives.”

I don't even have to tell the group about the security and privacy concerns of organizations that copy and share data willy‐nilly in spreadsheets. IT leaders love soapboxing how this technology or that poor process (never mind, the complete lack of process) creates security issues. It's a license for some of them to say “no” to something and move on. For some IT leaders, those were the good old days, when IT fortressed the status quo by finding a security or compliance concern to block change and progress.

But I rarely go there because I want to move IT leaders away from blurting out a knee‐jerk “no” to new technology requests. My goal is to get them to say, “Yes, but.” As in, “Yes, you can have access to the data, but I have to mask these personal identifiable information (PII) fields that you probably don't need. Yes, I can give you access to the data source, but you need to secure analyst permission from the data owner.”

I've been giving this same speech throughout my whole career to open the minds of IT leaders. I want them to see what's possible with self‐service data and analytics tools. That way, when they see analysts in their companies with overengineered spreadsheets, they have the courage to challenge sacred cow technologies. I am hoping they shift from the command‐and‐control mindset that steers them to “no,” and challenge them to enable more people in their business to access, implement, and leverage technology and data. I am helping them see the benefits of citizen data science coupled with a proactive data governance program.

Occasionally I'll get pushback like from people like this. It's unusual, but this isn't the first or last time I've been challenged during one of my talks or workshops. It's good to be challenged, even when it's coming from another consultant who just wants to be contrarian to steal the spotlight. But in the back of my mind, what I'm really feeling right now is, “Who the fuck is this guy?” and how do I not let him distract the other leaders in the room from broadening their perspectives?

Over the years, I've learned some hard lessons and fought some challenging battles around data practices and tools. Admittedly, I haven't worked with the largest data sets nor the most experienced data scientists. I've only worked with a subset of the latest and greatest big data platforms. I fight the small data fights because that's where businesspeople have the most exposure to data. Win these small battles, and you open the organization to tap into their bigger data opportunities.

The biggest challenges often come from the finance department. Their tools have been antiquated for decades, and they have many years of practice pulling data from the ERP, wrangling it with other data sources, and then making it all presentable. Financial analysts see the world in rows and columns, and they usually don't believe the results unless they've had a hand in programming the formulas themselves.

Much worse is that many financial analysts aren't motivated to change. They see the writing on the wall and are thinking, “If IT automates my financial spreadsheets, then why will they need me around?”

But here's the thing—my primary mission isn't just to automate. Automation is a byproduct of democratizing the data so a larger group of people can access it and self‐serve answers to their own questions. Why should I have to go to a financial analyst to see how the IT department is spending against our budget? It should be a dashboard that I can look at any time I want. I should be able to give my lieutenants access to that data so they make fiscally responsible decisions.

I also want to enable real‐time data for decision‐making and not settle for stale week‐ or month‐old data or reports. And the data need to be accurate with procedures to measure and improve data quality. You can't easily move to reliable, real‐time data processing when there's a human in the middle muddling data from multiple sources via spreadsheets.

Automation is a means to this end, but it's not the primary mission. If I lead with automation, it'll scare off the team I'm partnering with to help when instead it should liberate them. In most cases, the analysts are freed to spend more time exploring new areas of data once I help automate their data integrations, transformations, deduping, normalizing, blending, joining, cleansing, and mastering. But they won't see that opportunity upfront. At first, all they see is the writing on the wall that Isaac plans to automate their job. I once heard one financial analyst say to his colleague, “He's the executive who goes to India, so watch out.”

The truth is that financial models are full of exceptions and workarounds. There's so much bad data buried in corporate systems that analysts are forced to program their spreadsheets to accommodate data quality shortcomings. This in turn is hard to model even with all the right data integration, data quality, and master data management technologies. The financial analysts will bury your agile data team with all this complexity, and in some cases, they'll do it with a smile. When they get a working prototype, they'll easily find the use cases where the model or automations don't work or the data visualizations aren't yielding the correct results.

If you find yourself here, you're in the death spiral of subject matter expert challenges. You might have three swings of the bat to address issues and make improvements. But that's it. Remember, the finance department reports to the CFO, and they're not that interested in seeing agile improvements. To them, iterations are wasteful—at least until you get them to adopt an agile mindset. They're thinking to themselves, “Why didn't they capture the requirements and implement the technology correctly the first time?”

As one of my best mentors told me repetitively: “You can't push a rope uphill.”

But sometimes, I must learn from my own mistakes. Let me explain.

Avoid Bulldozing Institutional Hills as Your First Transformation

It's the end of the company's fiscal year when I attend its advisory meeting. It's the first time I meet the group, and I sense the enthusiasm in the room around my arrival. The executive team wants better technology and needs to launch new products, and they hire me to make recommendations and lead aspects of the implementation.

The meeting isn't anything special, and the group is happy with its financial results. Great news, I think to myself, because many of my most successful roles involve both digital transformations and turnarounds. They're transformations in that the organization needs to rebuild its customer experiences, products, services, and operations to compete in a digital world. But they were also turnarounds because the organizations were already losing market share and revenues were declining. When a transformation and turnaround are happening at the same time, it's a double whammy—sort of like running a marathon with a painful, injured leg, hoping to complete the race without risking further damage. It becomes a triple whammy when the culture is slow, full of detractors, and seeks perfection over experimentation. When I assess companies and their leadership teams, I'm sniffing out whether they need digital transformations, a turnaround, and what types of cultural transformations.

They often need all three, but for now, I'm just happy to see that I may have some financial runway to run the transformation here.

Only several weeks later, when I return to the office, the executives' enthusiasm has turned to somberness. It turns out their forecasts were wrong, and it wasn't a good year after all.

“Wait, what?” is all I can think to myself. It's not unusual to have inaccurate forecasts in the first, second, and sometimes even the third quarter. But by the end of the fiscal year, you expect most businesses to know where things stand within a couple of points depending on how last‐minute, end‐of‐year sales come in. But this forecast misses by a lot more, and this team needs to dive into the root causes.

I sense an opportunity to fix a data and a business process problem. I step up to help and figure that, at minimum, I'll get a better view of how the organization operates.

It doesn't take long to get a high‐level view of the problem. Forecasting is always a messy cocktail that requires moving data between customer relationship management (CRMs), ERPs, and other operational systems, and that's what I find here. The sales team isn't using the CRM with any discipline—aka governance—so leadership can't make accurate forecasts of the current state of the pipeline. The financial quote‐to‐cash processes aren't uniform, and since this is a services company, there's an added element of when and how much to invoice compared to what salespeople record in the CRM. None of the data are centralized or integrated, unless you call a financial analyst's spreadsheet a reasonable data warehousing solution. Most leaders wouldn't, but they rarely dive into how data goes from workflow to system to forecast. You can end up with bad data, misguided insights, and inaccurate forecasts simply with one bad formula, one copy‐and‐paste error, or some other undocumented assumption.

The problem here, and with many companies that I advise, is that this is far more than a forecasting and data issue. It's an opportunity to insert more rigor in their business processes, which requires executing a high‐level formula many CIOs and chief data officers (CDOs) know. The components of this formula include creating sales process disciplines that are implemented in the CRM. It requires streamlining the order‐to‐cash processes and automating invoicing. It means using ETLs to move data between systems, centralizing all this data in a warehouse, and then adding a self‐service business intelligence tool so that the sales, finance, marketing, and other teams can generate reports useful for their decision‐making needs.

Piece of cake, right? It is until you add in all the human elements of changing the sales process, the financial team's tools, and executive decision‐making methods all in one shot. These problems are data issues wrapped around several business process issues. Solving them demands leaders agree on a way of working and create a structure to business processes. It requires people in multiple departments, roles, and seniority to shed outdated technologies, redefine manual processes, and replace them with citizen data science and data governance best practices.

But the hardest element of data‐driven transformations is realigning executives who don't recognize that letting anyone do anything they want using whatever tools they know creates risks, inefficiencies, frustration, and sometimes chaos. In many situations, I am stepping into a sales‐driven process connected to hearty leadership bonuses. These processes often encourage selling anything to anyone and transition to ad hoc fulfillment processes.

This is a big‐boulder problem and one that requires top‐down buy‐in before instituting bottom‐up changes. If the CEO, head of sales, and the CFO aren't on board, then you're pushing a rope uphill, and your data‐driven transformation needs to start elsewhere.

I've made many mistakes trying to move this boulder. When leaders aren't truly on board, instead of guiding a transformation, you're actually pushing a rope called “change” uphill. You don't want to challenge the status quo without leadership support when the big sharks are in the water hunting for next year's bonus.

So, let me give you a better place to start a data‐driven transformation. It's called the marketing department.

How to Find Early Data‐Driven Partners in the Marketing Department

I love the marketing department because its objectives are often full of opportunities. Their teams are already versed in experimentation practices, and they most often have a test‐new‐things mindset. Their jobs are to try many experiments, sunset the ones that aren't working, and scale the ones that generate the desired results. There's usually some acceptance that not all their experiments will yield fruitful results.

Guess how they make these decisions? Yes, marketing is a creative process and requires experience and intuition into customer buying behaviors, but guess what grounds marketers in making faster decisions or delivering personalized experiences? Bingo, give that leader a Starbucks card. It's data and analytics. Only it's not that simple.

Marketers use data and analytics in many facets of their work, including when deciding what market segments to target, which campaigns to run, which messages to use, and which keywords to buy. They're often eager to collaborate with a technology partner willing to say “yes and” to their ideas and requests. They also need support because there's a lot of noise out there.

As of this writing, there are over 8,000 different tools that a marketing team can use to automate campaigns and analyze results. Scott Brinker, known as the godfather of marketing technology, publishes a landscape supergraphic¹ of all the various tools yearly, and it's grown from 150 back in 2011.

With all of these tools at the marketer's disposal, your CMO can become mired in choices. Her teams are trying to figure out what tools to select and how to get the most from them. Your marketing colleagues must decide what experiments to run, how to interpret the results, and how to join the data from multiple technologies to create a 360‐degree view of customers and prospects. Along the way, they must also dedupe data, cleanse addresses, scrub emails, and figure out company hierarchies.

Newsflash. The marketing team is not doing all that in Salesforce or any other CRM, no matter how well the salesperson claims their CRM is a one‐stop‐shop for all things marketing and sales.

Whenever I deliver my data and analytics keynote at IT conferences, I ask attendees to raise their hands if they work with their marketing departments on data, analytics, and machine learning. It's only in the last year that the number of raised hands is approximately 10 percent of the people in the room. It used to be only a few slowly raising hands.

Marketing is ripe for collaboration with the technology, digital, and data functions. Marketing needs technology help. They need a ton of data help. They have a budget to spend, and, most importantly, their objective is to grow business, not just find ways to cut costs.

When you're looking for a strategic business partner to experiment and collaborate with, try identifying a team in the marketing department that could benefit from your technology expertise. Just make sure that you've done your business homework and are approaching the conversation already having a basic understanding of marketing's goals and priorities. Ultimately, you can use these collaborations to build champions for the transformative work your team is taking on.

I'm about a year into my job as CIO when Alice comes to me with her small data problem. She's had roles in corporate reporting, and our CMO Eleanor hires her to make sense of all the different marketing campaigns they are running. Alice comes to my office, and my son's roadmap is still on the whiteboard.

“We're buying keywords across three platforms and placing ads in several magazines,” she explains. “We run surveys and award the top businesses, which gives us a powerful lens over who may be prospects for our products. But a lot of our work includes buying marketing lists and providing leads to the inside sales team, only they push back when they miss their quotas and claim we've given them unqualified opportunities.”

I try to picture what this well‐oiled machine looks like and ask Alice if she'll share with me how this works today.

“Can I bring in Donna to show you?” she asks.

Five minutes later, Alice returns with Donna, an early‐career marketing analyst, who opens her laptop and starts showing me how she's doing her work today. “I get lists from three providers and merge them into three sheets in this Excel. I then have to merge them into one, but that's not easy because all we have are names and email addresses with spelling errors and other idiosyncrasies.”

Donna shows me a gargantuan formula that helps merge the list and walks me through several of its components. I am impressed because she has an organized problem‐solving approach like a coder, but she's a marketer, and I wonder if she's had any computer science training. I ask her to tell me about her background, and I learn she studied anthropology in college and interned with our company the previous summer. This is her first job out of college.

Donna goes on to show me the websites where she downloads data as well as the data she pulls in from our CRM web analytics data, user registration data, and her other data wrangling efforts. It's then that I notice her spreadsheet has about thirty worksheet tabs at the bottom.

Holy cow, this anthropology major really has a developer's mindset. At that moment, I recognize the paradigm shift when IT offers governed self‐service business intelligence tools and practices throughout the organization. It can spur citizen data science within our walls. Donna may not be a trained data scientist or software developer. However, she's a digital native who grew up immersed in technology and is not intimidated to learn new tools independently. While she has the skills to work in IT, seeing her work in marketing is actually more powerful because she can apply her data skills to their daily challenges and transform how the department operates as an insider. She's on her way to becoming a Digital Trailblazer.

I ask Donna and Alice to partner with me on a data‐driven journey. Their eyes light up when I show them the Tableau dashboards we are working on for customers. They see the utility and the ease of use, but more importantly, they buy into a different way of working. Instead of Donna becoming a one‐person data hub for the marketing department, they will empower all their colleagues by creating a center of excellence focused on becoming data driven. Instead of creating manual work in spreadsheets, they can prioritize questions, develop dashboards, and roll out tools to their colleagues. I agree to mentor them and partner them with a data warehousing consultant who would join, cleanse, and develop a data warehouse.

Alice thanks me profusely as she leaves my office. She wasn't expecting this much support from the CIO!

At that point, I'm ready to approach our CMO. I now better understand marketing's goals, have seen their challenges firsthand, and I know how I will help their cause. I must talk to Eleanor now because I need her to buy into the vision and approach.

Democratizing Data Exposes Data Quality Issues and Backbones Transformation

Identifying citizen data scientists, deploying self‐service data visualization tools, and enabling data access is only part of the equation in transforming to a data‐driven organization. You hope that empowering people with data and analytics leads to operational and culture changes, but this often requires leaders to spark the transformation. Then, democratizing data exposes many underlying data quality issues, and transformation leaders must leverage this feedback to drive proactive data governance programs.

So let me share with you three stories that touch on the challenges around transforming behaviors and addressing data quality.

The year is 2008, and I'm working at Businessweek magazine. Businessweek.com represents a growing percentage of the overall revenue, but the majority comes from the magazine's print ads. We have no idea how long we can continue selling expensive print ads, which can cost tens to hundreds of thousands of dollars. From my experience working with newspapers, the answer is not long enough. We need to grow website and mobile traffic quickly and then figure out how to grow digital revenue from these digital channels.

The problem is that to get more traffic, we also need to create more content. People only go to a handful of websites regularly, and the rest of the traffic comes from people searching Google, finding interesting articles in their Facebook news stream, or clicking links from other articles. Having more content optimized for search is like having more fishing lures in different areas of the lake. Hopefully, those lures catch the people's attention with their diverse interests, and we're able to reel them into the website.

Once they land on the article, we have the second challenge of helping them click on a second article.

To make all this happen, we must change how the editorial team sees their jobs. The editor has almost full autonomy in the magazine on topics to pursue, whether an article is newsworthy, and how to best illustrate it.

Surprising as it may sound, content for the website works in similar ways even though the audience and medium are completely different. As CIO, I support the content management systems (CMS) where writers submit articles and a list of related article links into a publishing workflow. Once approved for publishing, the section editor decides where and how to represent the piece on their channel page. We have channel pages for technology, small business, markets, and other topics.

If we succeed in getting editorial to produce more content, we will have to get them to change how they're spending their time. Jerry, the head of digital strategy who joined me in the executive presentations about our architectures, understands the problem right away. He asks me how we can use technology to replace some of the manual steps editors perform today when updating website content.

Brian heads the digital editorial team. He's open to new ways of doing things if he believes in the change, but to get others on board, he needs a solid story to sell to his staff. Telling editorial to do something different from their own beliefs is incredibly challenging. They're wickedly smart, dedicated, opinionated, and sometimes stubborn people. At Businessweek, I learn to admire how they come up with new angles to pursue stories, their process for finding sources, and their integrity to tell the story accurately.

Jerry agrees with the editorial team's charter, but that doesn't always mean the editor knows what readers want best. Yes, they've mastered representing the reader's interests in the magazine, but that publishing cycle is relatively slow, even for a weekly magazine. People either subscribed or they didn't. They bought the magazine from the newsstand or they didn't. It's difficult to capture whether a print article received due attention because there's no real way to measure what people read in print.

But that's totally different on our website, where we can use web analytics to see what readers are engaging with. By generating interest in one article, we can drive ongoing readership. To do this, we need to publish more articles online that meet our standards for high integrity while luring a wide swath of readers into a subject area.

Jerry realizes he'll need data to sway editorial minds. There's no way he can go to Brian, who leads digital content, with just ideas on adjusting long‐standing editorial policies and beliefs.

So Jerry organizes an experiment with Brian's permission. It's a simple A/B test where we sometimes show articles with the related content selected by the article's author or the editor. Other times, we use an algorithm to choose the related article links based on keywords, reader interest, and analytics on what people are clicking.

Sometime later, we gather to review the results of this experiment. To this day, the image of Brian's reaction remains burned in memory. He's looking at the results on paper that show the machine outperforms editorial. He stares at it for a while, and I can tell he's thinking through all the implications of machines challenging their hard‐earned institutional knowledge. It's a scene right out of the movie Moneyball, and he is in dismay, but not in denial.

But Jerry's not done sharing the findings. The second insight comes from analyzing the channel pages that the editorial team sweats over daily. Who gets the lead spot? What articles, and in what order? They have the web analytics metrics from the previous days, weeks, and months to see what readers click on. They know which stories demonstrate editorial excellence better than others. They also know they need to share the spotlight between different writers.

But there's one key metric they aren't looking at regularly. It turns out that a highly significant percentage of the traffic to these pages isn't coming from readers. It's coming from Google and other search bots indexing the site. The data show us that most people arrive from search, land on an article, and, hopefully, find another article to read. That's why related articles are critically important, and even a small percentage of improvements can drive increased readership. But other readers come to the homepage, hopefully see an article of interest, and go directly to the page to read it.

The number of people affiliating themselves with a channel and going to these pages is quite small. Meaninglessly small. It's important to have these pages for the bots, but not for people. The implications are that the time the editorial staff spends in the content management system ordering and presenting content on these web pages is, well, not the best use of their time.

It's a pivotal meeting to how we change Businessweek.com. We bring in user experience designers, heavily leverage web analytics, interview subscribers, and leverage agile methods to drive significant changes to the site. By the time we finish the more important parts of the upgrades, there's almost no manual work to manage content on the site. Articles are entered and approved. A small news team decides which articles are newsworthy and slots them in for the home page. The content areas of Businessweek.com are automated with rules and machine learning that we develop with the editorial and sales leaders.

Several years later, I am in a situation where a group absolutely needs new data technologies to scale their product. This story is about the products we developed with Charles, the product manager whose dashboards I challenged in Chapter 4, which ultimately led to the launch of several analytics products. The story doesn't end there because the products' successes required us to reengineer their workflows and data platforms. Their version of duct tape architectures includes an early version of Oracle keyword searching, a Microsoft Access database, and a bunch of manual steps to connect the two. The database stores these long, multi‐line queries, and Oracle executes the search as a nightly batch job.

Charles knows this jalopy is well beyond its time and usefulness. Oracle Text was developed generations before the days of Google, and the query dialect is archaic. What's worse is that these are searching multi‐page documents, and there's significant, complex logic in the queries to find word proximities.

We are loading 70,000 document folders yearly, with each folder storing up to dozens of documents. Our goal is to search for 30,000 entities, about three times more than what we are doing in the legacy system. To do this, we need higher grade tools and capabilities so that our subject matter experts can hire others to grow and maintain the dictionaries and taxonomies.

Here's the thing that you probably won't realize until you take on one of these projects and believe you see the light at the end of the tunnel. I find there are two types of stakeholders when working on upgrading data technologies.

The first group just assumes the upgrade works better than its predecessor. They don't get involved in reviewing and comparing the output of the new systems compared to the older ones. They don't want to test the new system before it goes live to make sure it works as expected. When you ask for their help, they walk away from the responsibility.

Some will call it a QA responsibility, and there's truth to that assertion. QA should be validating data pipelines and performing A/B testing wherever it's possible. But there are some areas that QA can't be expected to reasonably test, especially with natural language processing tools applied to industry‐specific documentation. For example, let's say we compare the results of multiple tools that scan an architect's building specification and they produce different results. How is QA supposed to know which one is the correct specification for a building exterior without knowing how design architects write specifications? They would have to build up as much subject matter expertise as our business sponsors to realize this answer. Often, that's not feasible given time constraints.

So no assistance is an issue because it's hard to know when the new system is sufficiently accurate.

But trying to compare accuracy also has its pitfalls, and this second group aims for causal proof that the new system outperforms its predecessor. That's not easy to prove when the results are subjective or when each algorithm has its areas of strengths and weaknesses.

In this case, our product owner is well aware of accuracy because the product and business model require it. If we say the building exterior is red, it better be red because the accuracy of this information is the business value of the product offering. He defines a sample set of tagged documents and puts them through the legacy system as a control. We then put it through the new system, call it a machine learning algorithm, and perform a comparison.

When working with machine learning algorithms parsing and interpreting documents, getting accuracy this way is not a trivial undertaking. When the algorithm is wrong, is it because of the data quality, the choice of algorithm, algorithm configuration, features selected, or a human error? When you tweak the algorithm and get a new output, now you have two comparisons to do: one against the original baseline and possibly a second against the previous run. Is it better? Where is it worse? How do you explain why it's working better, and under what conditions is it worse?

You might think this issue only comes up with machine learning algorithms but finding and agreeing on algorithm accuracy is a much broader issue.

Years later, I'm working with the nonprofit Charity Navigator (charitynavigator.org) on their rating systems. They rate charities based on their financials, accountability, transparency, and impact. Charity Navigator started as an industry watchdog, but over the last several years, they've become a go‐to source when trying to find reputable charities in causes that might interest you. Are you concerned about impacting the environment, people's health, the arts, or animals? Are you looking to assist people affected by COVID‐19 or a natural disaster? Charity Navigator helps you identify trustworthy charities that match your goals. President Obama² and Secretary Clinton³ have tweeted Charity Navigator's blog posts during times of need. The Gates Foundation is a donor.⁴

When I join, their challenge is expanding the scope of their ratings. More charities, more data sources, and a wider breadth of ratings, including rating the charity's impact on their constituents.

The problem is they are using a system that could not scale to do any of those things. Moreover, just replacing and upgrading the system to better technology isn't good enough. To scale and handle the rating of more charities means developing a process that requires less manual review and input from the rating analysts. And to enable new and evolving ratings, it needs data lineage and data quality capabilities. So when the analysts tweak or add to the rating methodology, which charity ratings are impacted? And how do you define whether the change is a good one?

Loads of other data science questions come up during this journey. For example, Charity Navigator has criteria to determine a charity's eligibility for rating through its methodology. Some criteria evaluate whether the charity is large enough and has been in business long enough to be rated fairly. There are hard and weak dependencies between the rating calculations and eligibility rules, and the data science on their methodologies requires ongoing improvements. So, how and where should Charity Navigator expand its eligibility criteria and rate more charities?

And how do you drive transformation at a $4 million charity organization⁵ with a small team? You can't just throw advanced big data tools like Apache Spark and expect them to have the data science expertise to select and operationalize optimal algorithms.

As intriguing as all these questions are, the basic ones hold us back for months. We build a new system that processes the current ratings. New data comes in, it goes through an extract, transform, load platform, and out comes a rating. In this new process, very few places require a person to review the rating. One place that requires review is the cause and type of charity, and we use some basic natural language processing algorithm to help with this assignment. It works well for some cause types like animal charities but is less accurate in others like human services, so we ask an analyst to finalize the assignments.

But our progress is slowed down from even more basic quality issues. The calculations are documented on their website,⁶ and they are relatively straightforward to implement. That is, until you get into boundary conditions, rounding differences, and sequencing the business logic.

We run some comparisons between our new process and the legacy one. For example, the legacy system might calculate that a charity has a 2.31 percent growth, and the new system comes out with 2.37 percent. In aggregate, 85 percent of charities come out with near‐identical scores, but 15 percent are off by as much as 10 percent. Are these defects worth pursuing or insignificant rounding errors?

That's not easy to answer. What looks like a rounding error can influence a donor's decision on whether and how much to contribute.

But the more important challenge is gaining analyst confidence in the new system and way of working. This isn't just replacing an old system with a new one. It's a transformation in their entire operating model, and that takes time for everyone to learn new tools, challenge results, adjust to new processes, and grow to new responsibilities.

Today, Charity Navigator's new ways of working, systems, and advanced methodologies help them rate over 195,000 charities with their new Encompass Ratings⁷ that include beacons on impact and leadership. Now that's significant progress and a worthy journey.

Sometimes it is worth moving boulders, but it requires teamwork, and it's never easy.

Digital Trailblazer Lessons

Leading Data-Driven Organizations

Over the next decade, organizations chasing the new oil of data, analytics, machine learning, and artificial intelligence must pursue these questions: What's a rounding error, and how much accuracy is required? What data quality issues need improvements? Where are there biases in the training data? Is the machine giving us valid results? Where is human intervention necessary and valuable? When should we improve the algorithm? When we make a change, is it better?

Most importantly, how can we expand the number of employees asking questions, analyzing data, formulating insights, and presenting data‐driven decision recommendations?

When archeologists dig and find fossils, we learn more about our past and our future. There's a disciplined process to review and share the findings, but there's always room for people to develop their own correlations and insights.

And that's what we seek with data. More people interpreting data for themselves. More insights. More significant correlations. More people asking questions and seeking answers backed by valid data and algorithms. More debate on results and insights.

Here are some lessons learned from this chapter:

Understand how company culture and norms impact data quality initiatives. You can learn a lot about a company's culture by understanding how it defines quality in its products, operations, forecasting, and decision‐making. And underneath these business definitions of excellence is an underlying requirement—or at least an expectation—around data quality. Sometimes this gets expressed in service levels, for example, expecting that data for financial reporting is available by 8 a.m. You may hear leaders state that they don't trust the data, are frustrated with duplicate CRM records, or are confused when different data sources show conflicting information. Until a proactive data governance program⁸ gets instituted with data owners assigned with tools, processes, and data quality metrics, many of these data quality issues are escalated to IT to fix.
Measured or not, expectations around data quality become ingrained in business processes and then culture, especially when people perceive that it's their job to review and manipulate data manually. That's why upgrading data processing systems or automating them becomes a significant transformation management battleground. What was once a back‐office IT process now comes front and center, and you're not just improving technology, you're changing the company's business process and possibly challenging its mission and culture. It's one reason to target data processing changes as a transformation because you're not just putting in a new set of data pipes and infrastructure.
Focus on meaningful problems because data journeys can be long and complicated. The first question you must ask about a new data challenge is whether it's worth the investment to solve it. If the sponsor asks a question and builds a dashboard to seek answers, then how does she make it actionable? And what is that worth? Most organizations have their analytics, machine learning, data visualization, and data integration wish lists. And then there's the second list that must be factored into any discussion around data strategy and priorities, and it's a list that no one wants to talk about: DataOps, data governance, data debt, data policies, data security, and data privacy. Just like in technology, organizations have more data‐related problems and opportunities than they have people available to solve them. So before you embark on a data journey, make sure there's an articulation of business impact made long before you take too many implementation steps.
Seek easy onramps and avoid selling automation. You probably read about advanced companies that have departments of Ph.D. data scientists, the latest big data technologies, and have their public relations firm sharing their latest and greatest machine learning successes. Those organizations are the outliers. Most companies are somewhere in their journeys to become smarter and faster organizations. While it's critical to prioritize meaningful problems, I also seek out the ones with easy onramps. That doesn't mean the problem is easy to solve! It more often implies that I have a collaborative business partner and an open‐minded agile data team ready to work iteratively through the challenge.
The stated business priorities might include scaling a process, making it more efficient, or adding new capabilities. But I've learned my lesson and avoid labeling the solutions as automation. Some expect automation to deliver near‐perfect quality, and they'll use this to roadblock progress. They'll reason it out with their colleagues and neglect to mention all the expensive‐to‐implement exception use cases. Sure, you can handle these as data quality exceptions and pass them to a data steward to manage, but then it's not automated in their minds. Remember, data processing is part of the culture. State expectations upfront and avoid setting them too high when naming initiatives or labeling solutions.
Pick appropriate data technologies aligned to a future way of working. I only mention a handful of technologies in this chapter, but I don't want you to believe that the underlying technology isn't important. In fact, it's critically important to select technologies that not only solve the business challenges but also fit the structure, governance, and skills of the organization. I didn't mention many technologies because I've implemented different solutions and stacks in every example I shared in this chapter. There are many data technologies to develop dashboards, move data from point A to point B, improve data quality, create data catalogs, enable master data management, or simplify machine learning. Platforms may offer similar capabilities but target different people, experiences, productivity, and quality. The best approaches in selecting data technologies require partnering with experts, running several proofs of concepts, and validating that platforms meet performance and compliance requirements.
Invite experts and colleagues to contribute to your data‐driven journey. Data and analytics projects rarely materialize on their own. If you want to find the opportunities and business challenges, you need to step into the weeds and find them.
It's critically important to get out of the office and learn from peers, but learning is a two‐way street. Even if you're highly introverted, you should seek out ways to share what you know. The feedback will make you a stronger leader.

You don't become a leader by getting a promotion. You must take the first steps in the journey on your own, make some mistakes, and figure out a leadership style that works for you. And you must bring people along in the journey, so leadership is not just what works for you. You have to adjust to your audience, mission, culture, and goals. And what worked for you last time around may not work this time. You'll always be stepping out of your comfort zone. Get used to it.

▪ ▪ ▪

If you would like more specifics on these lessons learned and best practices, please visit https://www.starcio.com/digital-trailblazer/chapter-7.

Notes

1. “Marketing Technology Landscape Supergraphic (2020): Martech 5000 — Really 8,000, but Who's Counting?” Chief Marketing Technologist, https://chiefmartec.com/2020/04/marketing-technology-landscape-2020-martech-5000/.
2. Barack Obama (@BarackObama), “The best part of my job was meeting people like this—ready to make a selfless act in a time of need. Many Americans are already making deep sacrifices to keep our communities healthy, but if you're able to, consider helping those hit the hardest.” Twitter, March 19, 2020, https://twitter.com/BarackObama/status/1240660590600892417.
3. Hillary Clinton (@Hillary Clinton), “I'm thinking today of our fellow Americans affected by the earthquakes in Puerto Rico. Many families have been left without power and water. If you can, support an organization providing disaster relief here: charitynavigator.org.” Twitter, Jan. 7, 2020, https://twitter.com/HillaryClinton/status/1214607714627796992.
4. “Committed Grants | Bill & Melinda Gates Foundation,” 2021, Bill & Melinda Gates Foundation, https://www.gatesfoundation.org/about/committed-grants?q=charity%20navigator.
5. “Charity Navigator—Rating for Charity Navigator,” 2021, charitynavigator.org, https://www.charitynavigator.org/ein/134148824.
6. “Charity Navigator's Methodology: Charity Navigator,” 2021, Charity Navigator, https://www.charitynavigator.org/index.cfm?bay=content.view&cpid=5593.
7. Charity Navigator Encompass Rating System: Charity Navigator,” 2021, Charity Navigator, https://www.charitynavigator.org/index.cfm?bay=content.view&cpid=8077.
8. Isaac Sacolick, “What Is Proactive Data Governance,” Social, Agile, and Transformation, https://blogs.starcio.com/2020/03/proactive-data-governance.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7: Buried in Bad Data

Create new playlist

Sign In

Sign Up

Avoid Bulldozing Institutional Hills as Your First Transformation

How to Find Early Data‐Driven Partners in the Marketing Department

Democratizing Data Exposes Data Quality Issues and Backbones Transformation

Notes

Table of Contents for
Chapter 7: Buried in Bad Data