CHAPTER 13

Welcome to the Data Jungle

Every Question Ever Asked

Right now, as I write this, I’m thinking about some of the big and small decisions I have to make today. Should I add sugar to my coffee? Should I have a haircut before my meeting? Should I be reminding the firm I’m talking to about the job offer they are supposed to send me? Should I spend my time today finishing the proposal to prospective publishers about this book? At a granular level, every action we take personally or professionally involves data. In the examples I used, this could be data about calorie counts, who I’m meeting, what when I last spoke to the company in question, and so on. For almost all of these, the data we process is implicit, often even not a result of conscious thought. But we do decide to dry our hair when it’s wet or turn into our gates when we reach our houses—I mean each of these is an active choice we make. Under the hood, we’re constantly processing data, and often a lot of complex data as well—think of how much data you’re actually processing while driving or cycling or even crossing the road on foot. Professionally, a lot of our decisions are more explicit, but all too often, the underlying reasoning and data are not. We rely our experience—which is nothing but an implicit and accumulated mental data-store, while making decisions on how long to set a meeting for, how often to write a blog or what should go into our calendar. Or even aspects of our hiring plan, business strategy, or sales presentation. We may not even be aware that we’re processing years of accumulated data, but we are. Think of it this way: every question ever asked in the history of humankind is a request for data. And the only true purpose of data is to improve decision making. The fundamental challenge of data is therefore to connect the right data, at the right time, to decision points.

Has data has always been around? Shawn DuBravac points out in Digital Destiny1data was always present even before humans. We just didn’t have a way of capturing and harnessing it. This is a moot point, and I think we should distinguish between fact and data. Dinosaurs roaming the earth is a fact. But it only becomes data when this is recorded. Data therefore requires the human articulation of facts. Language therefore was critical to the creation and sharing of data. Writing allowed persistence, printing enabled mass distribution and digital technologies gave it superpowers.

Clive Humby was known for cocreating the data engine for Tesco’s famously successful customer analysis, in Britain. But history will probably remember him more for coining the phrase “data is the new oil” in 2006. This has been quoted by strategists, architects, and CEOs across the world as the value of data became apparent over the past decade. Companies such as IBM and even Google, have built their entire business strategy around the value of data. But when it comes to your digital transformation, a good way to look at data is through the lens of quantification.

The last two years have been incredibly disruptive due to the global COVID-19 pandemic. But it has also been an education on the importance and centrality of data in our lives. Every day, for the past 20 months, we’ve been buffeted by pandemic-related news and information, and we’ve had to figure out how to read the insights behind the hysterical headlines and agenda-driven messages. A very good example came during the first deadly wave in Europe. While several countries were choosing to underestimate the actual number of fatalities by how they classified deaths during the pandemic, the Economist magazine carried data on excess fatalities—the gap between typical deaths in each country every year versus those reported this year. Depressing though this is, it provided a much clearer picture of what was really going on.

The Purpose of Data

Most ecommerce websites live and die by their data. If you’re in the business of selling travel, or insurance, or secondhand cars via your website, and if you’re reasonably successful, then it’s likely that you have thousands, or even millions of people visiting your website every month. And for ecommerce, everything that happens on the website is a highly measurable funnel. The number of people who came to the site, the number of people who searched, the number that looked at a product, or how many products they looked at. How many added a product to the basket, and finally, how many checked out and paid. Most large ecommerce businesses track these numbers to the hour, or even minutes. The impact of any changes made to the site—even something as small as changing the color of the button is instantly visible as people either react positively or negatively (or sometimes not at all), and you know very quickly which it is.

The question is, why would you decide to change the color of any button, or change the way your shopping basket looks and behaves? Or even how your search works? Either you’re shooting in the dark, or you have some idea of where the problem is. For example, if the industry average for shopping basket abandonment (the percentage of people who add items to a shopping cart but never check out) in your industry is 1 percent, but you’re getting between 3 and 5 percent—this should tell you that something needs to be fixed. But you only know this if you’ve been tracking your data. Which is why, we come back to this double-edged value of data. It is both a trigger for your decision making and ideally, also a way to measure the impact of your decisions. In the digital world, you only get to this point if you have the initial Connect done well. Unless enough people are connecting with you via your site, your app, or your service, you won’t even have the data to make these calls. In the traditional world as we’ll see soon, we were only getting this data as an epilogue—well after the point when we could do something about it.

Operating in a VUCA World

There’s also the matter of speed. You need the data when you’re making the decision. And increasingly, you’re having to make decisions in near-real time because the world is changing fast. The term VUCA captures this well. It stands for volatile, uncertain, complex, and ambiguous. Volatility means that the change is sudden. Uncertain means that it’s not clear what direction it’s taking. Complex means that the factors driving the change are unclear. And ambiguous means that we don’t quite know what the status is at present. I’m sure that over the past two years, through the course of the pandemic, you’ve experienced each of these scenarios. Think of this—in 2019, the world leader in videoconferencing was Cisco, with its Polycom brand. In the consumer space and professional space, it would have been Skype. Over the next two years, videoconferencing went through the roof, with usage going up exponentially. Yet, neither Cisco nor Skype is leading the market today. It’s largely Zoom and Teams that have captured the market. Skype and Teams are at least both part of Microsoft. But the way in which Zoom has left everybody behind in the personal calling space is telling, isn’t it?

In this VUCA world, you’re constantly on the frontline and having to make tactical or strategic calls. This is the reason why your data is so critical. And why mature ecommerce firms are so focused on the data from every step of their customers’ journey.

The pandemic is an appropriate example of the VUCA environment, where companies—especially in travel and hospitality—have had to create dashboards that track the impact of COVID-19 on their daily operations. Locations that are open or closed, areas where employees and customers need to take special precautions, and so on. As I write this, the omicron variant is causing more consternation and additional constraints on travel. From day to day, we need to know which countries, cities, or regions are impacted, where governments are likely to impose lockdowns or restrictions, and where the risks too our employees and customers are beyond the acceptable threshold, and act accordingly, in near-real time.

The irony is that even as we look for the right data for our decision making, we’re in the middle of a data flood. The amount of data we have access to is many orders of magnitude more than before. Let’s take a closer look at this tsunami of data.

Industry 4.0—The Era of Abundance

Around the end of 2021, atop the mountains in the Elqui province of Chile, a new observatory is being completed, with the world’s largest ever digital camera. Actually, it’s more than a camera, it’s a telescope. To get an idea of its size, consider that the mirror is 27 feet in diameter, and the lens was created out of 22 tons of molten glass. It has 32 giga pixels of resolution (that’s 32,000 megapixels). The telescope will point out at the sky and retrieve images every 15 seconds, armed with over 200 sensors, will generate 6 million gigabytes of data every year. Which is the roughly the same amount of data you would generate if you took half a million pictures with your new iPhone 13 with its 12-megapixel camera every night for a year.

And yet, probably the most salient aspect of this mass of data is that it doesn’t feel particularly newsworthy. There is a data explosion all around us that is the result of the explosion in connections. A lot of what used to be analog and human-driven now is increasingly driven through devices and systems. This includes, for example, taking pictures, paying money, booking taxis, connecting with friends and colleagues, reading books and magazines, or even the number of steps we—all of which we now do through our smartphones. At work, this includes e-mails, accessing files and customer information, filling time sheets and expense reports, and thanks to the pandemic, even attending meetings. Even at an environmental level—the weather, traffic conditions, and even pollen count are now monitored via smart and connected sensors. As we’ve seen in the chapters earlier, everything, including our own bodies and brains, are becoming sources for new data streams. As all these digital connections create giant flows of data, it’s up to us whether to tap into this. We are in the middle of a transition from implicit to explicit data in our decision-making models. One of the immediate benefits of this should be that we are able to make better and more informed decisions, but there’s an interim step required—we need to build the skills and knowledge required to manage all this data.

Arguably, this flood of data has changed the rules for how businesses compete. Along with storage and computing power, data is plentiful and accessible at low costs. We now operate in an era of abundance. This is decidedly different from the economic basis of the past centuries, where we were largely competing for scarce resources. Abundance requires a different mindset—where the focus has to be on identifying the most valuable parts of a nearly infinite resource. Some call this the Industry 4.0 model. Whatever the framework you use, and whichever industry you’re in, there’s little doubt that we find ourselves today sailing in an ocean of data.

Managing the Data Deluge

You see articles every other day talking about how Data Science is the hottest job in the 21st century. By now you’ve probably gotten sick of hearing about big data, little data, fat data, thin data, and all manner of data. But this is the core, the engine. The successful firm of the future will have a data-driven engine room. It’s important that we get our heads around terabytes, exabytes, and zettabytes, and the changing economics of data. In 1986, the world’s total data storage capacity was 2.6 exabytes. An exabyte is a billion gigabytes. In 2005, just under 20 years, that was 295 exabytes. In 2013, we created 4.4 zettabytes of data (a zettabyte is a 1,000 exabytes), and by 2020, the volume of data rose to 64.2 zettabytes. Here’s a sample of what happened on the Internet on average in every minute in 2020, according to DOMO who publishes this annually—over 41 million WhatsApp messages were shared, 200,000 Zoom meetings were held, 400,000 hours of Netflix shows were downloaded, almost 70,000 users applied for jobs on LinkedIn, and Amazon shipped over 6500 packages. In every minute.

This tsunami of data has created its own challenge of data management—creating products from Hadoop and MongoDB to data lakes and data puddles, which we’ll talk about later. The scary part of all of this is that we’re still in the relatively early days of the data deluge. We are hurtling into a quantified universe fed by smart cities, homes, and cars; platform-driven models and clickstream-driven relationships. Ultimately, data is the end-game of digital transformation.

This data deluge is complex not only because of its sheer size, but as we’ve highlighted already—the speed at which it is being collected, and the sophistication of the interrelationships between the data—all of these require the creation of a whole new data management toolkit. Four distinct developments have changed the way we manage and consume this big data: the cost of storage, the design of databases, the cost of bandwidth, and the improvement in retrieval technologies.

Tip: Think about the total volume of data in your business. Now multiply that by 10, and assume that all that data is coming into your business in real time. That is what you’ll probably have to deal with within the next five years.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.64.66