Chapter 7. Integrating with Real-Time Response

GemFire-Greenplum Connector

We designed Greenplum to provide analytic insights into large amounts of data. We did not design it for real-time response. Yet, many real-world problems require a system that does both. At Pivotal, we use GemFire for real-time requirements and the GemFire-Greenplum Connector to integrate the two.

Problem Scenario: Fraud Detection

As more businesses interact with their customers digitally, ensuring trustworthiness takes on a critical role. More than 17 million Americans were victims of identity theft in 2014, the latest year for which statistics are available. Fraudulent transactions stemming from identity theft—fraudulent credit card purchases, insurance claims, tax refunds, telecom services, and so on—cost businesses and consumers more than $15 billion that year, according to the Department of Justice’s Bureau of Justice Statistics.

Detecting and stopping fraudulent transactions related to identity theft is a top priority for many banks, credit card companies, insurers, tax authorities, as well as digital businesses across a variety of industries. Building these systems typically relies on a multistep process, including the difficult steps of moving data in multiple formats between analytical systems, which are used to build and run predictive models, and transactional systems, where the incoming transactions are scored for the likelihood of fraud. Analytical systems and transactional systems serve different purposes and, not surprisingly, often store data in different formats fit for purpose. This makes sharing data between systems a challenge for data architects and engineers—an unavoidable trade-off, given that trying to use a single system to perform two very different tasks at scale is often a poor design choice.

Supporting the Fraud Detection Process

Greenplum’s horizontal scalability and rich analytics library (MADlib, PL/R, etc.), help teams quickly iterate on anomaly detection models against massive datasets. Using those models to catch fraud in real time, however, requires using them in an application. Depending on the velocity of data ingested through that application, a “fast data” solution might be required to classify the transaction as fraudulent or not in a timely manner. This activity involves a small dataset and real-time response which needs to be informed by the deep analytics performed in Greenplum. This is where Pivotal GemFire, a Java-based transactional in-memory data grid, supports fraud detection efforts, as well as other use cases like risk management.

Problem Scenario: Internet of Things Monitoring and Failure Prevention

Increasingly, automobiles, manufacturing processes, and heavy duty machinery are instrumented with a profusion of sensors. At Pivotal, the data science team has worked with customers to use historical sensor data to build failure prediction and avoidance models in the Greenplum database. As well tuned as these models are, Greenplum is not built to quickly ingest new data and respond in subsecond time to sensor data that suggests, for example, that certain combinations of pressure and temperature and observed faults are predicting conditions are going awry in a manufacturing process and that operator or automated intervention must be quickly performed to prevent serious loss of material, property, or even life.

For situations like these, disk-centric technologies are simply too slow; in-memory techniques are the only option that can deliver the required performance. Pivotal solves this problem with GemFire, an in-memory data grid.

What Is GemFire?

GemFire is an in-memory data grid based on the open source Apache Geode project. Java objects are stored in memory spread across a cluster of servers so that data can be ingested and retrieved at in-memory speeds, several orders of magnitude faster than disk-based storage and retrieval. GemFire is designed for very-high-speed transactional workloads. Greenplum is not designed for that kind of workload, and thus the two in tandem solve business problems that combine the need for both deep analytics and low-latency response times.

The GemFire-Greenplum Connector

GemFire-Greenplum Connector (GGC) is an extension package built on top of GemFire that maps rows in Greenplum tables to plain old Java objects (POJOs) in GemFire regions. With the GGC, the contents of Greenplum tables now can be easily loaded into GemFire, and entire GemFire regions likewise can be easily consumed by Greenplum. The upshot is that data architects no longer need to spend time hacking together and maintaining custom code to connect the two systems.

GGC functions as a bridge for bidirectionally loading data between Greenplum and GemFire, allowing architects to take advantage of the power of two independently scalable MPP data platforms while greatly simplifying their integration. GGC uses Greenplum’s external table mechanisms (described in Chapter 3) to transfer data between all segments in the Greenplum cluster to all of the GemFire servers in parallel, preventing any single-point bottleneck in the process.

In fraud-detection scenarios, this means that it is now seamless to move the results of predictive models from Greenplum to GemFire via an API. After the scores are applied to incoming transactions, those transactions deemed most likely to be fraudulent can be presented to investigators for further review. When cases are resolved, the results—whether the transaction or claim was fraudulent—can be easily moved back to Greenplum from GemFire to continuously improve the accuracy of the predictive models.

In the Internet of Things use case, sensor data flows into GemFire where it is scored according to the model produced by the data science team in Greenplum. In addition, GemFire pushes the newer data back to Greenplum where it can be used to further refine the analytic processes. There is much similarity between fraud analytics and failure prevention. In both cases, data is quickly absorbed into GemFire, decisions are made in subsecond time, and the data is then integrated back into Greenplum to refine the analysis.

Learning More

The product has evolved since this introductory talk.

There is more detailed information about GGC in the Pivotal documentation.

GemFire is very different from Greenplum. A brief tutorial is a good place to begin learning about it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.36.239