0%

This IBM® Redpaper publication focuses on data orchestration in enterprise data pipelines. It provides details about data orchestration and how to address typical challenges that customers face when dealing with large and ever-growing amounts of data for data analytics. While the amount of data increases steadily, artificial intelligence (AI) workloads must speed up to deliver insights and business value in a timely manner.

This paper provides a solution that addresses these needs: Data Accelerator for AI and Analytics (DAAA). A proof of concept (PoC) is described in detail.

This paper focuses on the functions that are provided by the Data Accelerator for AI and Analytics solution, which simplifies the daily work of data scientists and system administrators. This solution helps increase the efficiency of storage systems and data processing to obtain results faster while eliminating unnecessary data copies and associated data management.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. Data orchestration in enterprise data pipelines
    1. 1.1 Introduction
    2. 1.2 Overview
    3. 1.3 Sample use case: Building the correct training and validation data set
  5. Chapter 2. Data Accelerator for AI and Analytics supporting data orchestration
    1. 2.1 Generic components
    2. 2.1.1 Data layer
    3. 2.1.2 High-performance storage with a smart data cache layer
    4. 2.1.3 Compute cluster layer
    5. 2.1.4 Data catalog layer
    6. 2.1.5 Interfaces between the layers
    7. 2.2 Proof of concept environment
    8. 2.2.1 Red Hat OpenShift V4.5.9 cluster
    9. 2.2.2 IBM Spectrum Scale V5.1.0 storage cluster
    10. 2.2.3 IBM ESS storage cluster
    11. 2.2.4 Capacity tier storage
    12. 2.2.5 IBM Spectrum Discover V2.0.2+ metadata catalog
    13. 2.2.6 IBM Spectrum LSF Workload Manager
    14. 2.2.7 Description of the Audi Autonomous Driving Dataset
  6. Chapter 3. Data Accelerator for AI and Analytics use cases
    1. 3.1 Generic workflow
    2. 3.1.1 Provisioning phase
    3. 3.1.2 Analytic usage phase
    4. 3.2 Trigging an analytic job by using an integrated development environment
    5. 3.3 Workload manager starts an analytics job
    6. 3.4 New data ingest triggers an analytics job
    7. 3.5 The layer on top of workload triggers
  7. Chapter 4. Planning for Data Accelerator for AI and Analytics
    1. 4.1 Security and data access rights considerations
    2. 4.2 Data layer
    3. 4.2.1 Network-attached storage (NSF) Filer
    4. 4.2.2 Cloud object storage
    5. 4.2.3 IBM Spectrum Archive Enterprise Edition Tape
    6. 4.3 High-performance storage with smart data cache layer
    7. 4.3.1 IBM ESS 3000 and IBM Spectrum Scale
    8. 4.4 Compute cluster layer
    9. 4.4.1 IBM Spectrum LSF
    10. 4.4.2 Compute cluster
    11. 4.5 Data catalog layer
    12. 4.5.1 IBM Spectrum Discover
  8. Chapter 5. Deployment considerations for Data Accelerator for AI and Analytics
    1. 5.1 Data layer
    2. 5.1.1 Network-attached storage (NAS) Filer
    3. 5.1.2 IBM Cloud Object Storage
    4. 5.1.3 IBM Spectrum Archive Enterprise Edition Tape
    5. 5.2 High-performance storage with smart data cache layer
    6. 5.2.1 IBM ESS 3000 and IBM Spectrum Scale
    7. 5.3 Compute cluster layer
    8. 5.3.1 IBM Spectrum LSF
    9. 5.3.2 IBM Spectrum Scale storage cluster
    10. 5.3.3 Compute cluster
    11. 5.4 Data catalog layer
    12. 5.4.1 IBM Spectrum Discover
    13. 5.5 The Data Accelerator for AI and Analytics interface glue code
  9. Appendix A. Code samples
  10. Related publications
    1. IBM Redbooks
    2. Online resources
    3. Help from IBM
  11. Back cover
3.23.101.60