Chapter 1 The Ever-Changing Data Ecosystem

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

We humans constantly learn and evolve over generations to build our modern society. However, at times, there are more sudden changes in the state of affairs thereby breaking the regular patterns creating revolutions. In simple terms, when we exchange one way of doing things for something altogether different, we hope a better society at scale. Industrial revolutions in the past have emerged in a quest to get to a better next progressive stage. Industrial Revolution 1.0, for example, involved coal-powered production. Industrial Revolution 2.0 entailed gas and electricity (mass production), and Industry 3.0 was electronics (automation). The boom of Internet and technology advancements led to the current revolution: we live in Industry 4.0—the digital age and the Internet of Things (IoT). This revolution is leading to the creation of a new raw material—data—and like all other raw materials, data needs to be used effectively to create something. Data is no longer seen as just something that benefits corporations by providing competitive advantage. Data is an economic driver: it accelerates the economic development of a country and creates more data in the process.

In the past decade, both data collection and data usage have gone through the roof. Everyday activities such as borrowing books from a library, banking, fitness and exercising, smart household appliances such as washing machine and microwave, driving cars, and even dating are all digital, and many are connected to the Internet. So they create a lot of usage and preferences data. This data helps industries to understand user patterns and behaviors and thus create better user experiences—which again generate data and value. A data ecosystem operates on a continuous cycle in which we provision with data to create more data and value. In this ever-changing technological landscape, something can become obsolete quickly, while something that was never a possibility can become feasible. Remember, there was a time when data was mainly used by technology companies, and other industries considered big data and analytics as tech-centered buzzwords that had nothing to do with them. Today, all companies—regardless of industry, size, or geography—need to invest in understanding their data to get ahead of the competition. The government also collects data for the wider social and economic development. A variety of services, like emergency and postal services, depend on accurate address information. Denmark, for example, released its standardized and unified address data to the public free of charge. This single Denmark Address Register (Open Data) has an annual return of economic benefit that is 70 times its maintenance cost.³

Let’s talk about another example. What do you think of when you hear the word Farming? It might be acres of land, soil, seeds, crops, or pesticides, and so on. But the word data does not immediately come to mind about conventional farming. Conventional farming practices have used pesticides and fertilizers, along with legacy knowledge and gut feeling, to increase yield. But modern farming is augmenting decision making with data. It’s a continuous cycle: -> Collect data using file sensors -> insights leading to value-driven farming -> create more data -> repeat. “Climate Corp” is transforming the agricultural industry by using detailed crop yield data, weather observations from one million locations in the United States, and 14 terabytes of soil-quality data—all free from the U.S. Government—to help farmers make informed decisions. A company like Climate Corp is feasible because it uses excessive open-source data.⁴

Traits of Data

Although there are several traits of data, some aspects such as quality, relevance, and completeness stand out. I like to remember it as 3 Ds of data traits—Discover, Digest, Doable. Data is discoverable, you know it exists and have a way to access it. Data is understandable, you can process and digest the data as an organization. Data is doable, meaning you can act on it in meaningful ways creating business value. If you cannot access the needed data or even know it exists, cannot understand or interpret the data, and unable to apply data to decision making, collecting high-quality data is of no use. Hence 3 Ds mentioned are critical traits to your data journey.

Data as Goods or Resource

Unlike other revolutionary industrial goods, like coal, data is nonrivalrous good. That means that even when data is used for one purpose, its quantity and efficacy are not depleted for other future uses.⁵

Variety of Data

The term “Data” is loosely bandied about in a wide variety of contexts. In the early years, data was mostly structured consisting of numbers, values, and stored in relational databases like SQL Server. In recent years, the definition of data continues to expand to encompass unstructured data like audio, video, images, connected devices, and sensor data. In simple terms, structured data is organized in a Table, while unstructured data is distributed across files.

Types of Data

There are all kinds of data, including but not limited to the following:

• Self-produced or quantified self-data, like that gathered by fitness devices like Fitbit.⁶

• Open access data.

• Personal data or Protected Health Information (PHI).

• Automatic data from sensors and the IoT.

• Internal company data.

• Transactional data, like purchases made from a website.

Decisions related to data privacy, storage, archiving, and acceptable usage are all influenced by the type and variety of data being collected, organized, or processed.

Big Data or Small Data?

Characteristics of Big and Small Data

In the past, all data was small data because there was not an enormous variety or volume of data available. Today, there is so much hype around “big data”—the possession of huge volumes of data—that it can be tough to see beyond the hype. There is a perception that having more data is better than less data and that value can be derived only when large volumes of data exist. There is very little discussion about the benefits of small data or scenarios in which small data knocks big data. Having tons of data does not mean all of it is equally valuable. Sometimes having small data can be advantageous.

What does big or small mean in terms of data, anyway? Size is not the only measure of big data. Data has many facets: the three Vs (volume, velocity, variety), as well as several others:^{7, 8}

• Volume—size of data;

• Velocity—speed at which data is generated and processed, even approaching real time;

• Variety—structured, unstructured, or semistructured;

• Value—data’s potential to create value;

• Exhaustiveness—scope of data, for example, from limited to certain groups to covering an entire population;

• Resolution—coarse to as detailed as possible;

• Relationality—weak ability of data to conjoin different data sets to strong relations;

• Flexibility—ability of data to accommodate addition of new fields and scale quickly;

• Experimental—collected as part of research in human manageable ways or machine generated and analyzed.

Different facets increase in strength as we move from small data to big data:

Small data focuses on specific attributes or parts of data sets. It is useful in analyzing current situations, determining causation, and enhancing understanding than prediction.⁹ Unlike small data, big data focuses on crucial, enduring decisions, which can be predictive in nature.

images

Data Facets

Benefits of Big and Small Data

Big data does not hold the answers to all data-related problems. While picking your approach related to data, think in terms of acquisition, the resources you need to process the data, and storage and privacy costs. The benefits of using the data should outweigh these associated costs. It is not about choosing between big and small data, since use cases differ for both of them. Both big and small data can coexist, and companies can benefit from choosing the right volume of data for the problem at hand. The goal is to strike a strategic balance, gaining insights while using the fewest required resources to get there.

Since small data exists in human-manageable volumes, it translates into something both experts and laymen can instantly understand and actions that are easy to deploy. Small data is well suited for research and experimentation use cases. Small data helps to augment the decision making with minimal resources. It can be processed in-house as volume of data is less and thus reduce negative externalities (external entanglements relating to privacy and consent). Start with small data instead of big data as it helps to develop skills and ideas with more focus.¹⁰ Starting with big data often emphasizes learning the technical skills rather than the understanding part that small data fosters.

Big data allows organizations to engage with their users in real time. It also helps organizations make modifications and enhancements to their products and services in line with their users’ sentiments, responses, and comments. Big data helps a company personalize their product, which can contribute to higher traffic and revenue streams. A common use case of big data is fraud detection, for which there is a need to analyze millions of transactions to identify patterns and determine areas with the most fraud cases. American Express analyzed large volumes of data and found a pattern in their big data: people who acquired large bills on their American Express card and then registered a new forwarding address in Florida were more likely to declare bankruptcy.¹¹ Florida state has one of the most liberal bankruptcy laws in the United States, and people with large debts were taking advantage of it. Identifying such correlations in the data (customers with high credit card balance and relocation to Florida) can help credit card companies to proactively trigger an inquiry and/or limit future credit increases. Data correlation—pulling data from various sources to understand the relationship between them and determine a valuable forward path—is an important benefit derived from big data. It is also important to look out for unrelated or misleading correlations—where two things appear to be related to each other but are not—in big data. If done correctly, however, correlations in big data can powerfully predict how acting on one factor will modify or influence another.

Bias is all around us in our daily lives. The human brain as a whole can process 11 million bits of information every second although our conscious minds only handle 40 to 50 bits of information per second.¹² Our brain takes shortcuts, leading to both unconscious and conscious bias by grouping people and things into known types based, for example, on bias like gender, economic background, or sexuality. It is important to look back and analyze to understand how our actions are reflective of our bias and how to break free of them. Data is no different when it comes to bias. Like small data, big data is prone to bias, but due to its huge volumes, bias in big data is less obvious. This does not mean that it doesn’t exist. So it is important to prioritize high-quality, accurate, authentic, and bias-free data with clear understanding of data lineage elements in big data.

Tools and Techniques for Small and Big Data

Consider implementing tools and techniques in phased stages. In other words, consider a data infrastructure that handles the current volume, and continue to build on it as organizational data needs increase. If your organization is small and data volume is small, do not invest in setting up Hadoop yet. This will avoid unnecessary operational headaches and complexity, which is not needed for a small data organization. At such low volumes, data processing can be managed in-house in a relational database like SQL Server or PostgreSQL. In addition, pick a business intelligence tool like PowerBI to unlock data to everyone in the company. Setting up data infrastructure for big data organizations is complex, and every company needs to assess what setup will be most advantageous for their unique situation; it may be tough to handle this volume in-house. At a high level, big data organizations can consider a data framework like Apache Hadoop, along with a NoSQL (Not only SQL) database like MongoDB, a real-time streaming tool like Kafka, a business intelligence tool, and so on. These are just examples to help you get started. Big data is vast, and there is a long list of tools tailored to the specific needs of an organization.

Challenges Maneuvering the Data Landscape

There was a time when data was mainly created and collected on a small scale, primarily for the operating of companies. We as a society have come a long way from it. Over the decade, the data landscape has undergone transformation not just in terms of an increase in data volume but also in terms of customer expectations. Data ecosystem additions include unstructured social media data, enhanced privacy needs, technological improvements, and expectations of close-to-real-time insights. With the evolution of peer-to-peer engagement channels, more people are going online to seek connections with friends and strangers alike. Peer-to-peer engagement channels like Facebook, Twitter, and TikTok generate large amounts of data daily. No consumer-based company can ignore this social data and have a strategy based only on their internal data alone. As wearables and smart devices continue to dominate the market, self-created data is growing further. The pandemic has also forced the world to go online for almost everything, even goods and services previously considered impervious to online influence. One major shift was people buying grocery online instead of visiting stores.¹³ This has ignited another big data surge.

Companies are required to compete in this rapidly changing data landscape, where social media users write about many topics and the perception of a product exercises a strong social influence. This forces the development of new tools that recognize this fluid landscape and somehow make sense of the gargantuan daily data dump. For companies to succeed in this ever-changing data landscape, they must maximize their ability to effectively collect, analyze, store, and secure data, and to innovate and improve efficiencies. These are a few foundational questions to ask:

• Infrastructure—the what/where/how of data: What types of data should I collect? Where will I store it? How much data will I have? How long will I retain it?

• Privacy: How will I secure and protect data?

• Analysis: How will I use this data for decision making? What kinds of data lineage will benefit me? How frequently will I need insights—for example, in real time, or once a month? What is the main purpose of my analysis—for example, to improve operational efficiencies, innovate, or understand customer experience or something else?

Coping With the Data Boom

We live in a world that is being transformed by Datafication. Datafication is a technological trend turning many aspects of our life into data, which is subsequently transferred into information realized as a new form of value.¹⁴ The primary barrier to advantageously utilizing the data boom is a lack of understanding about how to apply analytics to improve business or create value. Datafication can help with value creation and can be broken into three concepts: dematerialization, liquification, and density.¹⁵ “Dematerialization” is to separate information from the physical world, which increases its “liquidity” for free movement and thereby increasing “density” or the value created.

In simple terms, we live increasingly in a data world with more data than ever before and there are new ways of using data, creating a completely new value stream. Let us take an example of datafication in Netflix. Netflix is a subscription-based streaming service in over 190 countries.¹⁶ It can be easy to forget that in the beginning, Netflix was mail-order DVD disc delivery business in which subscribers could add and maintain a list of movies they wanted to rent. The list was ordered, and when a DVD was returned, Netflix would mail out the next DVD in the list. Although there was a limit to how many DVDs one customer could borrow at a time, the list could be as long as they wanted. It was a customer-driven process: the customer initiated the creation of the list and also managed it by adding and deleting movies they wished to watch.

This model has changed—it’s become smarter. Its proactive recommendations algorithm removes Netflix’s dependence on the customer to add movies they want to watch. Of course, Netflix has entered new markets and countries since its mail-in DVD days, but it has also used its big data analytics capabilities to better understand content abandonment (the point at which the customer turned off a show), preferred devices for viewing, and many other metrics.

Netflix created value through datafication:

• Dematerialization = Move away from physical DVDs.

• Liquidity = Streaming allowed for free movement.

• Density = Create and increase value by evolving from streaming only to content production.

Big data creates a lot of data but big does not always mean better or improved. Big data presents an ocean filled equally with opportunities and challenges, and it is up to the organization to sink or swim. Big data requires advanced tools and techniques, which makes it difficult to understand, organize, and process. In addition, some organizations lack the financial funding or infrastructure to embark on such a resource-intensive journey—not to mention limited or nonexistent visualization tools or technical experts. The recent explosion of data volume amplifies this issue, but it should not discourage organizations from starting their data endeavor. One way of coping with this data boom is to start with small data, processing a manageable volume of information through simple and widely available open-source tools and gradually progressing toward large-scale big data initiatives.

Modern Data Stack

When organizations had only small volumes of data, database storage solutions were sufficient. Organizations now find it impossible to store this spike of data, and they are also aware that in-house data storage is vulnerable to data breaches and hacking. Organizations are turning to cloud-based technology to cope with growing storage needs and to avoid the risk of falling behind the competition. They are adapting to cloud solutions that do not require them to install software on their own premises or servers. This also leads to cost savings, since they don’t have to maintain or purchase physical hardware to utilize the solution. With rapid growth of customers, cloud solutions are also helpful because they are quickly scalable. In the past, organizations needed to buy more servers as their data grew, or else they found themselves stuck with unused servers as a result of poor forecasting. With cloud storage, organizations can expand or reduce their resources as demand changes. Cloud solutions also offer better accessibility and latency. Organizations need not worry about enduring downtime or having to head back on-site to deal with a data issue—cloud technologies allow them to do everything remotely.

Cloud migration has changed the way businesses operate by providing a possibility of reaping the Big Data benefits even for organizations that lack the physical space, expertise, and large upfront budget. Big data is no longer the preserve of only large organizations, but it still poses adoption and implementation challenges for smaller businesses.

To survive this data boom, organizations need both cloud-based solutions to store this enormous amount of data and a strong data strategy. In other words, they need to develop their advanced analytics capability by defining how they plan to use their data and what kinds of insights will benefit them in short- and long terms. Without a strong data strategy, collecting data is useless with no definitive way of using it. In my experience, organizations struggle to effectively manage and analyze data in the cloud, simply because it is tough to adapt to new tools and methods.

Cognizance of Data Privacy

Privacy is tough to define. It means different things to different people—or in different countries. In addition, the definition of privacy changes with every advancement in technology. Recently, privacy violations have been in news, appearing in headlines like “Austrian Website’s Use of Google Analytics Found to Breach GDPR” and “France says Google Analytics Breaches GDPR When It Sends Data to U.S.”¹⁷

A decade ago, privacy was defined simply in terms of what personal information you disclosed on a website or form. Although there are various definitions, privacy can be defined in relation to boundaries between the self and others, between private and shared spaces, or even in wholly public forums.^{18, 19}

Now with advanced analytics, businesses track users’ digital behavior and preferences, often with limited or no knowledge on the part of the user. They perform data tracking in several ways, for example by setting opt-out policies (in which the user is opted in by default and must explicitly choose to opt out), tracking cookies, and using online advertising and third-party apps.

On one side, organizations are increasingly applying advanced analytics concepts to better understand user actions. On the other side, users are increasingly aware of just how much their personal information is driving the next generation of products. They are becoming more cautious about what they share and are more likely to question how it may be used. High-profile data breaches and privacy scandals are increasing our awareness of data privacy issues, and customers are looking for ways to protect themselves. Among users, there is a general decline in trust and increase in anxiety over data privacy. Most Americans feel they have lost control of how much personal information is collected, and feel the government should regulate data collection.²⁰

This increased demand for better protection of personal data is pushing governments across the globe to roll out new privacy laws and update existing ones. Privacy laws are slowly catching up with rapid technological growth. There are several privacy laws and initiatives to control and protect privacy, like the Health Insurance Portability and Accountability Act (HIPAA), General Data Protection Regulation (GDPR), the Federal Trade Commission (FTC), the Payment Card Industry (PCI), the Children’s Online Privacy Protection Rule (COPPA), and the California Consumer Privacy Act (CCPA), to name a few. Privacy laws differ geographically, and organizations must comply with those laws related to their data handling. Also, some privacy laws are industry specific—the HIPAA is health care–related and the PCI governs financial data. Navigating different privacy laws is not easy, and ongoing changes to various laws makes it trickier. And there are sometimes fundamental differences between these laws. For example, GDPR requires entities to gain user consent via opt-in, while the CCPA requires entities to provide only an opt-out option. Organizations with global locations, handling transborder flows of data, need to understand and comply with even more of these regulations.

Data solutioning efforts should place utmost importance on issues of privacy. And since privacy is an evolving area undergoing constant change, companies should have mechanisms in place to run data audits and identify problems proactively. Based on the type(s) of data the company holds, requirements of de-identification changes need to be considered. For example, health care protects patient health identifiable (also known as Protected Health Information—PHI) data, and so organizations need to implement a de-identification process to protect patients’ personal information like name, date of birth, and so on. Additional measures should also be in place to scan for stray PHI information, train staff, and encrypt databases.

Details matter when protecting data systems against breaches of privacy. For example, age by itself is not considered PHI. A report on someone aged 25 cannot by itself be used to identify a person. But in a less-populated zip code, it would be much easier to identify a patient cited as a 98-year-old. As a result, PHI requires any age over 89 years to be aggregated within a single 90-plus age group instead of exact years. These are the kind of things to be considered when protecting the privacy of users.

• There are other new trends and interesting areas of discussion related to privacy. For example, there are now efforts to allow people to donate your health care data for research. CMS Blue Button is a government initiative to allow to share your Medicare data with third-party applications, doctors, research programs, and more. It also gives beneficiaries and their caregivers more options and control over your claims data. This is for the betterment of society to improve health care by donating data to science.²¹

• A much-debated topic is the data dividend model, in which you can “sell” your user data. But the question of whether privacy should be used as a commodity—and who decides the price—is a contentious one. There are already ad-supported streaming services that cost less join. These customers are exchanging their behavioral data for a slightly lower service fee. And companies like Datacoup purchase data directly from the individual in exchange for cash, discounts, or cryptocurrency.²² Placing control of personal data management into the hands of the individual is known as personal data economy.²³

• The opposite alternative to personal data economy, of course, is making users pay for their privacy. This model begs the question: How important is privacy to the individual, and how much are they willing to pay to protect and secure it?

Data ecosystem is changing at rapid speeds not just in terms of volume but also in areas of privacy regulations, people’s awareness around data, and advanced technological advancements. As the saying goes “You can lead a horse to water, but can’t make it drink.” Similarly, you can gain all the data-related knowledge and tools but if companies fail to apply their business expertise to supplement data technologies, there will be no outcome. Organizations that anticipate the changes, apply their proprietary business expertise along with data and plan ahead, are the only ones that can survive this data boom.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 1 The Ever-Changing Data Ecosystem

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 1 The Ever-Changing Data Ecosystem