CHAPTER 10: Big Data Case Studies

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 10

Big Data Case Studies

This chapter examines how the various patterns discussed in previous chapters can be applied to business problems in different industries. To arrive at the solution of a given business problem, architects apply combinations of patterns across different layers of the entire application architecture as appropriate to the unique business requirements and priorities of the problem at hand. The following case studies exemplify how architects combine patterns to solve particular business problems.

Case Study: Mainframe to Hadoop-Based NoSQL Database

Problem

A financial organization’s current data warehouse solution is based on a legacy mainframe platform. This solution is becoming very expensive as more and more data gets generated every day. Moreover, because the databases supported are legacy formats (such as line IMS and IDMS), it is not easy to transform and merge this data with the other data sources in the enterprise for joint analytical processing. The CIO is looking for a less expensive and more current platform.

Solution

The CIO concluded that migrating the legacy data to a NoSQL-based platform (such as HP Vertica) would provide the following benefits:

A higher level of data compression, providing lower storage costs and improved performance
A native data load option, avoiding the need to use a third-party ELT tool
Easier integration
Better co-analysis of data from multiple data sources in the organization

Figure 10-1 shows the patterns implemented in migrating to a NoSQL platform.

Figure 10-1. NoSQL migration architecture

Examples of technologies used include the following:

HP Vertica
VSQL (for Native ELT: Extract, Load, Transform)
AutoSys (for scheduling)
Unix Shell/Perl scripting

Table 10-1. Patterns implemented in the Mainframe to Hadoop case study

Pattern Type	Pattern Name
Big data storage pattern	NoSQL Pattern
Ingestion and streaming pattern	Just-In-Time Transformation Pattern
Analysis and visualization pattern	Compression Pattern
Big data access pattern	Stage Transform Pattern

Case Study: Geo-Redundancy and Near-Real-Time Data Ingestion

Problem

A high-tech organization has multiple applications spread geographically across multiple data centers. All application usage logs have to be synchronized with every data center for near-real-time analysis. The current implementation of the RDBMS is capable of providing replication across data centers, but it is very expensive and the cost is increasing as more data accumulates every day. What cost-efficient solution would enable active-active geo-redundant ingestion across data centers to address failover and provide more near-real-time access to data?

Solution

The big data architects choose an open-source (hence low-cost) NoSQL-based platform (such as Cassandra) that can be configured for fast data synchronization and replication across data centers, high availability, and a high level of data compression for lower storage costs and improved performance. This solution provides very high, terabyte-scale ingestion rates across data centers.

Figure 10-2 shows the patterns implemented in changing to a geo-redundant NoSQL-based platform.

Figure 10-2. Geo-redundancy architecture

Table 10-2. Patterns implemented in the Geo-Redundancy case study

Pattern Type	Pattern Name
Big data storage	NoSQL Pattern
Ingestion and streaming pattern	Real-Time Streaming Pattern

Case Study: Recommendation Engine

Problem

An organization has an existing recommendation engine, but it is looking for a high-performing recommendation engine and reporting tool that can handle its increasing volumes of data. The existing implementation is based on a subset of the total data and hence is failing to generate optimal recommendations. What high-performing recommendation engine could look at the current volume data in its totality and scale up to accommodate load increases going forward?

Solution

The organization’s combinatory solution is to move to a Hadoop-based storage mechanism (providing increased capacity), a NoSQL-based Cassandra database for real-time log-processing (providing higher-speed data access), and an R-based solution for machine-oriented learning.

Figure 10-3 shows the patterns implemented to enable real-time streaming for machine learning.

Figure 10-3. Real-time streaming for machine-learning architecture

Table 10-3. Pattern implemented for the Recommendation Engine case study

Pattern Type	Pattern Name
Ingestion and streaming pattern	Real-Time Streaming Pattern

Examples of technologies used include the following:

Cassandra
HDFS, Hive, HBase, Pig, Hive
Map-R

Case Study: Video-Streaming Analytics

Problem

A telecommunication organization needs a solution for analyzing customer behavior and viewing patterns in advance of a rollout of video-over-IP (VOIP) offerings. The logs have to be compared to region-specific, feature-specific existing system data spread across multiple applications. Because the volume of data is already huge and the VOIP logs data will add many terabytes, the organization is looking for a robust solution to apply across all devices and systems.

Solution

The CTO chooses a Hadoop-based big data implementation capable of storing and analyzing the huge volume of raw system data and scaling up to accommodate the VOIP metadata: namely, a consolidated log-access, log-parse, and analysis platform that is able to transform data using Pig, to store data in HDFS and NoSQL MongoDB, and to incorporate machine-learning tools for analytics.

Figure 10-4 shows the patterns implemented to enable video-streaming analytics.

Figure 10-4. Video analytics architecture

Table 10-4. Pattern implemented for the Video Analytics case study

Pattern Type	Pattern Name
Ingestion and streaming pattern	Real-Time Streaming Pattern

Examples of technologies used include the following:

Hadoop
Python
Memcache
Jetty, Apache
Web/Mobile Dashboards/Analytics
Amazon EMR

Case Study: Sentiment Analysis and Log Processing

Problem

An existing ecommerce organization experienced system failures and data inconsistencies during the holiday season. Major issues included penalties tied to performance-based service-level agreements (SLAs)s. The organization is looking for a new platform that could take the holiday season load, help them avoid penalties, and ensure customer satisfaction.

Solution

The company decided to set up a big data platform with Hadoop and Hive to enable web and application server historic and real-time log analysis: namely, a NoSQL-based solution (such as MongoDB) for analyzing the application logs and an R-based machine-learning engine and visualization tool (such as Tableau) for better viewing of requests, faster resolution of defects, reduced down time, and better customer satisfaction.

Figure 10-5 shows the patterns implemented to enable scalable sentiment analysis and log processing.

Figure 10-5. Sentiment-analysis and log-processing architecture

Table 10-5. Patterns implemented for the Sentiment Analysis case study

Pattern Type	Pattern Name
Ingestion and streaming pattern	Real-Time Streaming Pattern
Big data analysis and visualization pattern	Zoning Pattern Compression Pattern
Big data access pattern	Stage Transform Pattern
Big data storage	NoSQL Pattern

Examples of technologies used include the following:

HDFS, Hive, HBase
NoSQL - MongoDB.
R
Log Data Processing
MapReduce
Compuware DynaTrace
Data Analytics – Tableau

Case Study: Real-Time Traffic Monitoring

Problem

An organization wants to create a real-time traffic analysis and prediction application that can be used to control traffic congestion and streamline traffic flow. The application must be targeted to provide cost optimization in commuting and help reduce waiting time and pollution levels.

Data has to be captured from existing government-provided datasets that include sources such as traffic-camera, traffic-sensor, GPS, and weather-prediction systems. The government data needs to be coupled with social media to assist in predicting traffic speed and volume on roads.

The analysis scenarios include the following:

Analysis of historical data to gain insights and understand patterns of behavior of traffic and road incidents

Prediction of traffic speed and volume well ahead of time, based on analysis of real-time and historical traffic data

Prediction of alternate cost-effective commute paths by analyzing situational traffic conditions across the entire transportation network

The application needs to provide a catalog of services based on social media, governmental data, and different dataset options.

Solution

The organization decided to set up a big data platform using Hadoop, an abstracted layer of data above HDFS in the form of HP Vertica, and a visualization tool. The organization opted to use the cloud-based Amazon Web Service for storage and analytics.

Multiple patterns are applied at various layers of the architecture, as depicted in Figure 10-6. The patterns shown in that figure were used to enable monitoring of traffic in real time.

Figure 10-6. Traffic-monitoring architecture

Table 10-6. Patterns implemented for Traffic Monitoring

Pattern Type	Pattern Name
Ingestion and streaming pattern	Real-Time Streaming Pattern
Big data analysis and visualization pattern	Zoning Pattern Compression Pattern
Big data access pattern	Service Locator Pattern
Big data storage pattern	NoSQL Pattern
NFR patterns	Distributed Search Optimization Access Pattern

Examples of technologies used include the following:

Hadoop
HP Vertica
Web/Mobile Dashboards/Analytics
Amazon Web Services

Case Study: Data Exploration for Suspicious Behavior on a Stock Exchange

Problem

A financial organization processes millions of order entries per day. Whenever online statistical surveillance models identify suspicious behavior, the organization wants to have enhanced capability to gather data pertinent to the suspicious behavior as quickly and cheaply as possible.

The solution needs to be able to do the following:

Integrate social media data with historical orders and trades
Gather information from other sources within the organization
Present this information in an integrated fashion

Solution

The lead architect applied the patterns mentioned in Figure 10-7. The solution is based on Hadoop, Storm, Flume, and IBM Netezza. DataStax Cassandra acted as the NoSQL database to enable real-time analysis.

Figure 10-7 shows the patterns implemented to enable data forensics on a stock exchange.

Figure 10-7. Data forensics on a stock exchange

Table 10-7. Patterns implemented for the Data Forensics case study

Pattern Type	Pattern Name
Ingestion and streaming pattern	Real-Time Streaming Pattern
Big data analysis and visualization pattern	Zoning Pattern Compression Pattern
Big data access pattern	Service Locator Pattern
Big data storage	NoSQL Pattern

Examples of technologies used include the following:

Hadoop
IBM Netezza
DataStaX Cassandra
Tableau
R

Case Study: Environment Change Detection

Problem

An institute wants to build an application that detects environmental changes to water resources in real time. The application has to source data from multiple data sources (such as sensor and meteorological sources) hosted in various environmental institutes and government departments. The data has to be presented to scientists and energy analysts for real-time monitoring of the water resources and environmental data.

Solution

The CTO chooses an all-IBM big data platform with IBM BigInsights, IBM InfoSphere Streams, and IBM Vivisimo as the technologies applied against the patterns shown next.

Figure 10-8 shows the patterns implemented to enable environment change detection.

Figure 10-8. Environment change prediction

Table 10-8. Patterns implemented in Environment Change Prediction

Pattern Type	Pattern Name
Ingestion and streaming pattern	Real-Time Streaming Pattern
Ingestion and streaming pattern	Just-In-Time Transformation Pattern
Analysis and visualization patterns	Compression Pattern
Big data access pattern	Stage Transform Pattern

Examples of technologies used include the following:

IBM Vivisimo
IBM BigInsights
IBM Cognos

Summary

A multitude of practical business, academic, financial, and scientific problems are susceptible to solution using big data architectures. The patterns described in this book can be applied to all the layers of your big data architecture. The rapid pace of technological advances in tools and products ensures the continual emergence of new patterns, new variants of existing patterns, and new combinations of patterns in increasingly industrialized out-of-the box solutions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 10: Big Data Case Studies

Create new playlist

Sign In

Sign Up

Table of Contents for
CHAPTER 10: Big Data Case Studies