Chapter 2. Challenges of the Smart Data Era for Enterprises

In the smart data era, enterprises should transform themselves from traditional product- and technology-driven enterprises into data-driven ones. Different from traditional enterprises, data-driven enterprises are characterized by the following aspects:

  • Data is regarded as an important asset for management.

  • Specific data applications are used to solve business problems. (These applications are linked to the current data systems of enterprises. Meanwhile, enterprise data—both self-owned and other business-related data—are called).

  • Specialized and structured data teams are set up inside the enterprises (problems are not solved by outsourcing).

  • A data-driven culture is built.

During the transition to becoming a data-driven entity, traditional enterprises are severely challenged by business digitalization and data capitalization. A huge amount of data is not acquired in an effective manner due to the lack of business digitalization. For example, users’ click event data on websites, the interaction data of app users, user subscription and browsing data on WeChat public platforms, customer visit data of offline stores, and other business-related data may not be acquired or used. Nowadays, the prevailing mobile phones (e.g., iPhone, Samsung Galaxy, etc.) are generally equipped with 15 or more sensors, including ambient light condition perception, acceleration, terrestrial magnetism, gyroscope, distance, pressure, RGB light, temperature, humidity, Hall coefficient, heartbeat and fingerprint, and more. If all sensors are activated, each mobile phone could acquire up to 1GB of data per day. Although this data can truly present the contexts of mobile users, most is abandoned.

With both the scale and dimensions of data rapidly increasing, enterprises are unable to effectively prepare and gain insight from data, making it hard for them to support business policymaking. According to a report of BCG (Boston Consulting Group) in 2015, only 34% of the data generated by financial institutions (with a relatively higher degree of IT support) was actually used. And according to a survey report of Experian Data Quality, in 2016 nearly 60% of American enterprises could not actively sense or deal with the issue of data quality and did not have fixed departments or roles responsible for managing data quality. There is clearly still a long way to go in terms of managing complicated data. If not effectively utilized, a large amount of data would not be asset-oriented and thus would not produce any value, which means huge costs for enterprises in turn.

Enterprises struggle with these challenges for a variety of reasons: some have no advanced technical platform, some are deficient in data management, some have not built standard data engineering systems, and some others simply lag behind in terms of their understanding of the value of data science. All these have hampered the transformation of traditional enterprises toward intelligent, data-driven ones. Let’s look at each of these challenges more closely.

Challenges in Data Management

First, enterprises are faced with a series of challenges that need be solved by proper data management. These challenges include:

  • Numerous internal systems and inconsistent data might cause confusion. Take gender, for example. It may differ in a CRM system (actual gender in the fundamental demographics), a marketing system (e.g., a husband may sometimes purchase female-oriented goods in order to send a gift to his wife), and a social networking system (e.g., unique sexual orientation). If gender is purely regarded as a consistent attribute across systems, errors may occur.

  • The descriptive information of data (metadata) is controlled by different people in different departments of an enterprise, and fails to be shared across channels. Even for the same data, the understanding how it may be different due to the possible existence of varying standards. For example, the HR Department of an enterprise would maintain a list of employees and their addresses (home addresses) but the Administration Department may update an address to send employee benefits for the holidays so that such benefits can be properly delivered. In such cases, “home addresses” are changed to “mailing addresses.” However, both parties believe that the correct addresses have been given. Another example is ecommerce. For the number of ecommerce apps activated, the Marketing Department may believe that apps are activated after they are started for the first time but the Product Department may think that apps are activated once they are used to make a purchase for the first time.

  • It is difficult to effectively integrate the data that is distributed on the enterprise’s external platforms. For example, the data acquired by a WiFi probe installed in the store of an enterprise and the data accumulated on each third-party media platform (such as the WeChat public platform) may possibly supplement client data dimensions. However, the IDs used for client follow-up fail to be connected. As a result, the data of all platforms is unable to sync, thus greatly reducing the value of data.

Challenges in Data Engineering

Second, enterprises encounter challenges when data and the current business flow don’t form a complete value chain. In such a case, data engineering is required to solve the issue. These challenges include:

  • Lack of explicit data standards and specifications. Each department or system gives different definitions or descriptions of the same data and acquires data of varying quality, or even misses some data in acquisition, which burdens the data processing later.

  • Lack of explicit definitions about job functions and engineering of data. Data management work is assigned to people at random, typically IT personnel, data architects, data analysts, or data scientists. Also, there are instances when no specific rights and responsibilities are designated to those working with data. As a result, it becomes difficult to conduct continuous data management operation and form a closed loop.

  • Increasing data application contexts and the data processed by various data applications leads to redundant and ineffective data preparation and analysis, thus impacting the efficiency of delivering the data applications.

Challenges in Data Science

Third, shifting practical issues to automatic decisions that can be supported by data also introduces challenges, which need to be solved by data science. These challenges include:

  • Shortage in data science professionals. It seems quite difficult to apply the most cutting-edge technologies of data science as there are not many talents in the field of data science. McKinsey estimated that 190,000 additional data scientists are needed in the United States by 2018, and that figure would be even bigger in China.

  • If the quality of data is unstable, it is difficult to see its value, even if the algorithms used on that data are in working order. According to an EDQ report, the biggest factors that affect data quality include incomplete or lost data, obsolete information, repeated data, inconsistent data, and flawed data (e.g., containing spelling errors). In order to solve these problems, systematic considerations should be made. Thus, it would be difficult for these problems to be solved only by stopgap measures.

  • Enterprises are too eager for quick success and instant benefits to make long-term investments in the data field. Data science is never a cure-all and it is difficult for it to solve all problems in one stroke. In most cases, continuous investment is required. Gradual improvements should be made with algorithm optimization and iterative models that cover each link of data engineering, including data acquisition, organization, analysis, and action. Take the marketing and launching of applications for example. The audience for one round of the launch should be adjusted according to the results of the previous round. The launch process can be improved only after several rounds of iteration.

Challenges in Technical Platform

Finally, the data management, data engineering, and data science teams also present a challenge to the technical platform. The challenges to the platform include:

  • Increasing scale and dimensions of data. In the past, the data acquired by enterprises was mainly derived from emails, web pages, call centers, and so on. Currently, data sources also include mobile phone applications, sensors (such as iBeacon), social media, VR/AR devices, automobiles, and smart home appliances. The data being obtained by enterprises is becoming more and more varied, and helps these organizations capture a huge amount of data of various dimensions.

  • Increasing data sources and types. In addition to traditional structured data, semi-structured data (such as JSON), non-structured data (such as videos, images, and texts) and flow type data (such as click blogs on websites) should also be processed. In addition to the enterprise’s own data stored in internal CRM systems and public platforms such as WeChat, third-party data purchased by enterprises from the data trading market may also need to be processed.

  • Continuously changing data formats. This is the most common challenge in the current data ecology. For example, an upstream data provider may fail to notify all downstream data providers when it adjusts a data format. Additionally, a change in data dimensions upon acquisition may often cause challenges. For instance, a particular sensor might be added to a newly released smart mobile phone, which may require the addition of new fields in the data format collected.

  • As enterprises gradually shift their demands for data analytics from simple presentation to backend business support, there is an increasingly higher demand for real-time performance of the data platform. For example, many results of real-time data statistics now show changes in the real-time customer flow of apps or offline stores and tell us when there are the most visits or which public platform or store is the most active. Also, such results can be used to analyze the flow or number of clients at individual hours of a day. This is of great significance for the time management and resource allocation of websites.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.66.178