Executive overview
This overview describes how IBM DB2 Analytics Accelerator for z/OS opens a new dimension of analytical processing when integrated with IBM SPSS Modeler and highlights the power of the IBM DB2 Analytics Accelerator Loader for z/OS v2.1 tool in integrating data from sources other than z/OS without increasing your MIPS consumption on z Systems. The overview also discusses the business and IT benefits of enabling predictive analytics on your most reliable mainframe platform without compromising security.
This chapter contains the following topics:
Real-time analytics on z Systems operational data
In-database analytics using SPSS and DB2 Analytics Accelerator
1.1 Introduction
Businesses that rely on data as part of their daily operations are deploying solutions that capture data, process them, and calculate predictions for intelligent decision making to provide customized service to their customers. An essential component to accomplishing this service is understanding the opportunities and risks at the granular level of each individual customer interaction while the interaction is happening. Accordingly, key considerations for these solutions are the richness of analysis and the ability to maintain service level agreements (SLAs) without data breach.
The IBM z Systems™ platform addresses these considerations and provides solutions that produce actionable results in real time. Figure 1-1 shows the technologies and tools presented in this book. The top shows a highly scalable online transaction processing (OLTP) system; the middle shows IBM SPSS Modeler with Scoring Adapter for DB2 z/OS; the bottom shows IBM DB2 for z/OS database engine with DB2 Analytics Accelerator for z/OS appliance to demonstrate a real-time analytics use case that can be applied to business scenarios across industries.
Figure 1-1 Predictive modeling for OLTP applications with DB2 Analytics Accelerator for z/OS
In this case, the platform centers around DB2 for z/OS, with best-in-class data lifecycle management support for fighting fraud, preventing financial crimes, and generating customer insights. Although most of the valuable data already resides on z Systems mainframe, another possibility is to consume additional data that does not originate on the mainframe, such as social data (using the IBM DB2 Analytics Accelerator Loader for z/OS tool, which is described in Chapter 3, “Data integration using IBM DB2 Analytics Accelerator Loader for z/OS” on page 41).
For critical data, businesses want to retain control of the data so it cannot be compromised, another area where the z Systems platform is in a clear lead position. To get valuable insights from that data, data scientists analyze vast amounts of data in an iterative process, experimenting and looking for an optimal solution. After a predictive model is created and validated by the data scientist, it is deployed to the operational system that can use the model for in-database or in-application real-time scoring. Many installations evaluate the scored results against their business rules, although in some simple cases, reacting on the score values directly in the transactional application code might be appropriate.
The z Systems hardware and software stack is designed for mission-critical systems that are highly available and can meet stringent SLAs. It is highly optimized for efficiency and extreme scalability. The traditional z Systems values greatly help the OLTP workload, where real-time scoring of predictive models is tightly integrated. Although building new predictive models and predictive or prescriptive analytics work of data scientists to mine the data for new insights does not have the same requirements for scalability and is not something guarded by SLAs, it might soon become mission-critical. However, for this analytical work, which usually consumes vast amounts of data, the requirement is usually that the data be kept under z Systems governance, with the high level of security that the platform provides.
One of the key innovations in IBM DB2 Analytics Accelerator version 4.6 (with in-database transformation) and version 5.1 (with in-database modeling) is that it allows for pushing the processing to where the (accelerated) data resides—that is, in to the Accelerator. As for reporting, a more appropriate way is to do reporting on a consistent snapshot of the data. If the workload requires near real-time data also for model creation and reporting in order to rapidly discover new trends, then change data capture (CDC) replication technology that is part of the DB2 Analytics Accelerator solution can be used.
A customer study, using an early prototype of the technology now productized in DB2 Analytics Accelerator 5.1, revealed that with the SPSS streams that this retail customer had in place, achieving performance improvements by a factor of 200 was possible just for the data transformation, pushing stream execution times down from hours to seconds (see Figure 1-2 on page 4). Together with modeling and batch scoring, this customer was able to realize a 20x overall performance improvement, allowing the data scientists to more interactively experiment with the data as the analytical model, which previously took hours to compute was made available for validation in minutes. With in-database processing, a slight change in data preparation can show immediate results.
Figure 1-2 Performance improvements by pushing the processing to the accelerator
1.2 Real-time analytics
Innovations in our modern information technology world allow data to be collected, processed, and used at a faster rate than ever before. This speed is fostering the development of more sophisticated analytics that incorporate more data for better decision-making. Such solutions provide faster identification, correlation, and responses to new customer behaviors.
To meet SLAs, avoid penalties, and maintain customer expectations for real-time service, businesses traditionally deploy sub-optimized solutions based on perceived infrastructure constraints. For example, they rely on simpler solutions, such as analyzing a subset of transactions, or post-transactions, and use business rules to make decisions based on aged data. However, these solutions do not closely link transactional systems with database systems so that decisions can be automated, during the transaction, in real-time. Consequently, businesses will miss revenue-generating opportunities, overpay claims, or not uncover criminal activity in a timely fashion.
IBM z Systems platform can score transactions in real time without breaking SLAs and can build richer, more accurate models that are orders of magnitude faster.
1.2.1 Business advantages
The following list highlights several perceived business advantages of real-time analytics:
Integrate advanced analytics as part of each transaction with negligible impact to transaction SLAs
Access to most current data for best analytic outcome, and reduced false positives
Actionable insight on every transaction, real-time or batch
Analytics in the flow of business to stop fraud, increase customer loyalty, increase revenue and reduce risk
1.2.2 IT advantages
The following list highlights several technical advantages of real-time analytics:
Avoiding costly extract, transfer, and load (ETL) processes for analytics with fewer copies of data to manage, secure, and make highly available
Reduced network costs through avoiding off-platform calls during transaction interaction
z Systems governance for integrating new analytic models
Leveraging investments in z Systems data infrastructure, particularly data sharing and DB2 Analytics Accelerator
Extremely efficient scoring in OLTP applications within DB2 for z/OS (using SPSS Scoring Adapter for DB2) or Java (using Zementis), minimizing IT resource consumption
1.3 In-database analytics
In-database analytics refers to processing of analytics algorithms and associated data transformation in the database engine itself. If you perform predictive analytics processing outside your operational database environment, you will be dealing with significant data movement and processing consumption, which can limit your modeling options.
Data preparation, data transformation, and modeling algorithms in any predictive modeling tool, including IBM SPSS Modeler typically involves copying input data from DB2 for z/OS database to the SPSS Modeler server, which in turn performs all data preparation and modeling algorithms. However, the SPSS Modeler server supports integration with data mining and modeling stored procedures (also called as IBM Netezza Analytics stored procedures) available from the DB2 Analytics Accelerator appliance, which would allow you to build, score, and store predictive models within the control of your existing DB2 for z/OS database.
The database engine in the IBM DB2 Analytics Accelerator for z/OS appliance can perform in-database transformation and in-database modeling by keeping the data in its secure place, and at the same time enabling your data scientists to build richer, more accurate models that are orders of magnitude faster using its massively parallel processing (MPP) architecture.
 
Note: In-database transformation in combination with in-database modeling can result in streams that can be run from start to finish in the database appliance (IBM DB2 Analytics Accelerator for z/OS), resulting in significant performance gains over streams that are run in the SPSS Modeler server.
1.3.1 Accelerated in-database transformation
In-database transformation allows you to combine the analytical capabilities and ease of use of IBM SPSS Modeler with the reliability, availability, security and stability of the DB2 for z/OS environment, while taking advantage of the IBM DB2 Analytics Accelerator for z/OS appliance capability to accelerate SQL execution. The data preparation and data transformation operations will basically generate SQL statements using an SPSS Modeler feature called SQL pushback. The generated SQL is executed and accelerated in the DB2 for z/OS appliance database. The DB2 Analytics Accelerator appliance can perform data sampling, cleansing, filtering, aggregating, joining, and so on by executing the generated SQL statement. DB2 Analytics Accelerator can also store the transformed data temporarily in an accelerated table, which in turn can be used by the modeling algorithms.
1.3.2 Accelerated in-database predictive modeling
In-database predictive modeling take advantage of the IBM Netezza Analytics stored procedure algorithms. The predictive models are entirely built inside the database. The IBM Netezza Analytics stored procedures can efficiently scan the input data, which is already available in the database to build the model. The power and performance of IBM Netezza Analytics stored procedures are realized by fully using the MPP architecture of the DB2 Analytics Accelerator appliance. In this way, you can build your predictive models order of magnitude faster than what is possible by other means and you can frequently refresh your models as the operational data changes.
The predictive models built inside the appliance database can then be published to DB2 for z/OS database. Then the models can be browsed and scored through the SPSS Scoring Adapter interface.
With DB2 Analytics Accelerator, large amounts of data can be ingested and analyzed quickly. This can help with building certain models such as naive Bayes, which does well with large amounts of data.
1.4 Enabling applications with machine learning capability
IBM SPSS Modeler is a powerful analytic tool that supports all phases of data analytics process, including data preparation, model building, deployment, and model maintenance. The solution discussed in this book uses SPSS Modeler to build analytical models, which can be used in statistical analysis, data mining and machine learning. The data scientists can work with the SPSS Modeler client interface to access mainframe data with the same level of ease as from any other platform they are accustomed to.
Machine learning includes supervised learning, using complex mathematical algorithms. Until recently, z Systems platform did not offer an efficient solution in the area of complex mathematical processing. So, in the past, maybe you resorted to offloading operational data (a snapshot from a prior point in time) from z Systems platform to a distributed platform in order to implement machine learning. That way often resulted in obsolete and unreliable results in addition to the unwanted security exposures.
The DB2 Analytics Accelerator for z/OS enables implementing your analytical processing on z Systems platform by accelerating the execution of data transformation and analytical modeling algorithms with the power and performance of IBM Netezza technology. You can use SPSS Scoring Adapter for DB2 for z/OS to perform real-time scoring on your predictive models, which can quickly reveal what is interesting in your data. You can combine your business rules with the predicted information from the analytics models to make real-time decisions on your DB2 for z/OS data from within your mainframe applications. This approach basically enables your OLTP and batch applications accessing mainframe data with early machine learning capability to learn hidden patterns in your operational data by using clever mathematical models that are readily available with SPSS Modeler. With this approach, you no longer need to offload data from z Systems platform to distributed environments in order to implement machine learning capability, thereby, completely eliminating a potential data breach.
1.5 Value propositions
The integrated solution on z Systems platform that is shown is in Figure 1-1 on page 2 offers the following value propositions:
Improves productivity of your data scientists. Accelerates turnaround time of predictive analytics lifecycle.
Unveils hidden information in your warehouse data that cannot be derived with single pass of your data using SQL queries or application reports.
Enables your applications accessing mainframe data with machine learning capability.
Eliminates security breach by keeping the hackers away from your sensitive data used in analytics. Reduces IT sprawl for analytics by eliminating data marts in other platforms.
Allows you to keep and reuse the transformed data that is frequently used in extract, load, and transform (ELT) or extract, transform, and load (ETL) or data preparation completely within the Analytics Accelerator, significantly reducing the cost of loading and unloading on DB2 for z/OS. Reduces complexity, latency, and cost of data transformations for analytics.
Allows you to perform analytics on historical data while lowering your storage costs on z/OS by using the High Performance Storage Saver feature of DB2 Analytics Accelerator.
Allows improved access to historical data combined with z/OS transactional data, social media data, and other external data to gain further business insight with the help of IBM DB2 Analytics Accelerator Loader for z/OS tool.
1.6 Related products
Several related product offerings for doing real-time predictive analytics on z Systems platform are as follows:
IBM SPSS Modeler
This data-mining workbench is capable to push data transformation down to the database (with accelerator), and do predictive model creation within the accelerator. Learn more by reading Chapter 5, “Data modeling using SPSS and DB2 Analytics Accelerator” on page 107.
SAS Enterprise Miner
This vendor data-mining workbench can also create predictive models and more.
IBM Netezza Analytics
This set of stored procedures on the accelerator can create predictive models or do statistical or transformational operations on the loaded data, bundled in DB2 Analytics Accelerator 5.1. Learn more by reading 4.4.4, “Netezza based DB2 Analytics Accelerator for z/OS” on page 96.
1.7 Use cases
The real-time analytics solution discussed in this book merges the predictive power of SPSS Modeler with the performance, security, and scalability of the a Systems platform. This combination enables organizations to extract insights from their high-volume transactional applications running on DB2 for z/OS and to use those insights to make smart, proactive decisions. With SPSS Modeler for Linux on z Systems platform, businesses can create models and score new transactional data in real time, reducing the cost and complexity of operational decision-making.
SPSS Modeler can provide broad and deep descriptive and predictive analytics, data preparation and automation, and analytics of structured and unstructured data from virtually any source. With this single solution, you can apply statistical analysis, data mining, real-time scoring, and decision management to human capital management, evidence-based medicine, crime prediction and prevention, supply-chain analysis, and much more.
Explore the following scenarios might stimulate your imagination in arriving at your own solution to meet your business requirements by using technologies discussed in this book.
1.7.1 Countering payment fraud and financial crimes
In this scenario, you find a set of possible sequences of interactions that are typically applied in the banking industry for countering payment fraud, waste, abuse, and financial crimes.
The business goals of this use case are to reduce loss due to card fraud, reduce card deactivation, grow revenue associated with card purchases, reduce call center costs, and improve service yielding preferred card usage. The approach is to incorporate aggregate data from geographic location, merchant, issuer, and card history into the existing card authorization business flow to reduce fraud while preserving transactional SLAs. One feature associated with this use case is that integrated high-performance query optimizations can enable data aggregation several times a day and use this complex data as part of a real-time fraud detection process. A second feature associated with this use case is that enhancements with predictive scoring can be integrated with fraud detection transaction for even more preventive capabilities.
Figure 1-3 shows a business-view diagram for this use case.
Figure 1-3 Countering payment fraud, waste, abuse, and financial crimes
1.7.2 Insurance claims in-process payment analytics
In this scenario, you find a set of possible sequences of interactions that are typically applied in insurance claims processing systems to perform additional in-process payment analytics.
The business goals for this use case are to quickly and efficiently tag each claim with extra business insight. By extending in-process claims scoring across domains, predictive analytics can be transformed from a specialized function to one that best uses transactional systems at a repeatable enterprise scale.
One challenge this scenario addresses is that complex reports for overpaid claims are not completing on time, resulting in monetary losses. The solution is to integrate optimized analytics of the DB2 Analytics Accelerator with overpayment reporting of transactions. Benefits of this solution include up to 2000x improvement in speed of overpayment reports, line-of-business users are enabled to respond with more agility to overpayment trends, and informed decisions can be made at the right time.
Another challenge this scenario addresses is to stop improper payments prior to payment, avoid “pay and chase,” and meet SLAs. The solution is to integrate predictive analytics into claims adjudication. Benefits of this solution include an efficient scale for analytics, scale requirements are only achievable with analytics as part of the transaction flow, and the expected results of efficient in-transaction analytics can be multimillion US dollars per year. Figure 1-4 shows a business-view diagram for this use case.
Figure 1-4 Insurance claims processing: in-process payment analytics
1.7.3 Predictive customer intelligence
In this scenario, you find a set of possible sequences of interactions that can be applied to predict customer intelligence in a banking industry for up-selling purposes.
The business goals of this use case are banks that want to integrate information across all product lines in order to make real-time, targeted decisions. Real-time decisions are necessary because banks can risk losing customer business or loyalty for other products. Therefore, there is a need to incorporate high value, predictive advanced analytics as part of transactional systems. For example, all of the data is aggregated to determine a score that can be used to reduce a bank’s risk in approving a loan application. Real-time predictive customer intelligence leads to smart business decisions at the point of impact. Benefits of this solution include building long-term customer relationships, driving one decision one interaction at a time, and maximizing customer life-time value. Figure 1-5 shows a business-view diagram for this use case.
Figure 1-5 Predictive customer intelligence
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.96.155