278 | Big Data Simplied
11.1 INTRODUCTION
In this book, along the way, we have seen a number of Big Data applications. There are applications
from a functional perspective in the areas of manufacturing, retail, nance. At the same time, we
can also look at applications from a technological perspective. We have seen how BigData provides
an essential foundation to a number of emerging technologies, like Analytics and Data Science, the
use of Recommendation Engines and also in the area of Internet of Things (IoT). Now, why does a
particular organization need a Big Data program in the rst place? Therefore, it is extremely impor-
tant to have a strong business and technology reason to adopt and sustain a Big Data program.
Once the reason is in place, one needs to come up with the data lifecycle for the enterprise.
This is where one decides where data originates, who are the producers and owners of the data,
what are the touchpoints of data as it flows through the enterprise, as in which enterprise appli-
cations use and modify that data, and finally, who are the consumers of the data. This will pro-
vide the basis for defining the data and integration architectures for an enterprise, as well as the
processes around information management.
Finally, the question is to choose which platform. As we have seen, there are a number of
BigData platforms. We need to have a clear understanding of the objective of a Big Data program
and the nature of data and integration architecture of the enterprise to make tooling choices,
which are best suited for a particular enterprise.
This chapter explores all these concepts and it also examines why Big Data projects fail and
what are the common pitfalls to avoid such drawbacks.
11.2 TWO TYPICAL BIG DATA USE CASES
There are two broad use cases for Big Data adoption:
a. Big Data adoption for cost optimization
b. Big Data adoption for enhanced value
11.2.1 Big Data Primarily for Cost Reduction
As discussed earlier in the initial chapters of this book, organizations have been generating, stor-
ing and processing huge volumes of data for several years now. Typically, the humongous data is
nothing but processed and stored information about their customers, the products manufactured
or services offered, information about employees, information about suppliers and vendors, loca-
tion where companies operate, transaction and business operations involving all these entities,
and however, the list is endless. Even before the emergence of Hadoop Distributed File System
(HDFS), organizations have traditionally used large repositories of data.
This data is mostly structured data, and it is stored in relational databases, as explained in
the early chapters of the book while concocting the different types of data. Again, there are huge
volumes of structured data in data warehouses. We shall look into the meaning of data warehouse
and how it differs from the more modern concept of a Data Lake. However, it is now sufficient to
understand that data warehouse is a storehouse of integrated data pouring in from data sources
across different parts of an enterprise. Thus, the data stored in a data warehouse is used primarily
for reporting, analysis and to support various processes related to making business decisions.
Typically, the data is gathered for transactional and operational sources.
M11 Big Data Simplified XXXX 01.indd 278 5/13/2019 9:57:44 PM
Big Data Strategy | 279
Now, as the volume of data increases gradually in data warehouses or in offline archives of
data, the storage costs in traditional infrastructure also tends to increase. Also, as explained
earlier, adding more scale to the infrastructure in the form of storage, memory or processing
capacity, which we call vertical scaling, does not necessarily improve performance in the same
proportion. So, there is a limit to how much one can scale vertically. Therefore, one looks at hor-
izontal scaling, where the data is spread horizontally across nodes in a cluster, where each node
is nothing but low-cost commodity hardware.
A large percentage of Big Data applications actually start from the above motivation of cost
reduction and performance optimization for already existing and constantly growing volumes of
data in an enterprise.
Let us look at some of the applications of this use case (Figure 11.1):
a. Avoid Purging of Historical Data: In this case, huge volumes of data can be moved to
offline archives in a Hadoop infrastructure. In this case, the data is usually rarely used, but
it provides context and history. Had there been no such option of archiving huge volumes
of historical data offline in a cost-optimized Hadoop infrastructure, then a lot of this data,
beyond a certain limit, might have to be purged.
b. More Effective Use of High-value, High-performance Production Systems: A number
of processes around maintenance and archival of huge volumes of data, often historical
data, can now be off-loaded to the Hadoop infrastructure. The organization can actually
execute high-value functions, like Analytics and Data Science on high-performing pro-
duction systems.
c. Extend the Value of the Enterprise Data Warehouse: By incorporating Hadoop in the
enterprise landscape, an organization can now harness the value in new and additional
sources of data, such as data procured from websites or social media or from machines
(IoT data). This process increases the value proposition of the traditional data warehouse
by adding more context to data from external sources. It increases the effectiveness of
business decisions made on the basis of such reporting and analysis.
FIGURE 11.1 Big Data implementation use case 1: cost optimization
Scenario 1
Big Data for cost
optimization
Maintain huge volumes
of historical data
Use high-value, high-
performing production
system more effectively
Extend the value of
traditional enterprise
data warehouse
M11 Big Data Simplified XXXX 01.indd 279 5/13/2019 9:57:44 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.163.13