0%

This IBM® Redpaper publication explains how IBM Spectrum® Discover integrates with the IBM Watson® Knowledge Catalog (WKC) component of IBM Cloud® Pak for Data (IBM CP4D) to make the enriched catalog content in IBM Spectrum Discover along with the associated data available in WKC and IBM CP4D. From an end-to-end IBM solution point of view, IBM CP4D and WKC provide state-of-the-art data governance, collaboration, and artificial intelligence (AI) and analytics tools, and IBM Spectrum Discover complements these features by adding support for unstructured data on large-scale file and object storage systems on premises and in the cloud.

Many organizations face challenges to manage unstructured data. Some challenges that companies face include:


  • Pinpointing and activating relevant data for large-scale analytics, machine learning (ML) and deep learning (DL) workloads.
  • Lacking the fine-grained visibility that is needed to map data to business priorities.
  • Removing redundant, obsolete, and trivial (ROT) data and identifying data that can be moved to a lower-cost storage tier.
  • Identifying and classifying sensitive data as it relates to various compliance mandates, such as the General Data Privacy Regulation (GDPR), Payment Card Industry Data Security Standards (PCI-DSS), and the Health Information Portability and Accountability Act (HIPAA).

This paper describes how IBM Spectrum Discover provides seamless integration of data in IBM Storage with IBM Watson Knowledge Catalog (WKC). Features include:

  • Event-based cataloging and tagging of unstructured data across the enterprise.
  • Automatically inspecting and classifying over 1000 unstructured data types, including genomics and imaging specific file formats.
  • Automatically registering assets with WKC based on IBM Spectrum Discover search and filter criteria, and by using assets in IBM CP4D.
  • Enforcing data governance policies in WKC in IBM CP4D based on insights from IBM Spectrum Discover, and using assets in IBM CP4D.

Several in-depth use cases are used that show examples of healthcare, life sciences, and financial services.

IBM Spectrum Discover integration with WKC enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of data. The integration improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. IBM Spectrum Discover overview
    1. 1.1 Introduction
    2. 1.2 High-level overview
    3. 1.3 Major ways to use IBM Spectrum Discover
    4. 1.3.1 Large-scale analytics / artificial intelligence / machine learning (ML)
    5. 1.3.2 Data / storage optimization use case
    6. 1.3.3 Data governance
    7. 1.3.4 Data management
    8. 1.4 Architecture
    9. 1.4.1 Role-based access control
    10. 1.4.2 Data source connections
    11. 1.4.3 GUI
    12. 1.4.4 Reports
    13. 1.5 A deeper look at metadata
    14. 1.5.1 Cataloging metadata
    15. 1.5.2 Enriching metadata
    16. 1.5.3 Policies and user-defined metadata
    17. 1.5.4 IBM Spectrum Discover Application Catalog and Software Development Kit
    18. 1.5.5 Data movement with IBM Spectrum Discover
    19. 1.6 Deployment patterns
  5. Chapter 2. IBM Watson Knowledge Catalog and IBM Cloud Pak for Data overview
    1. 2.1 Overview of Watson Knowledge Catalog
    2. 2.2 Overview of IBM CP4D
    3. 2.2.1 IBM CP4D and WKC
    4. 2.3 IBM CP4D
  6. Chapter 3. IBM Spectrum Discover integration with IBM Watson Knowledge Catalog architecture and benefits
    1. 3.1 Solution architecture
    2. 3.1.1 Asset registration process
    3. 3.2 Connecting IBM Spectrum Discover to Watson Knowledge Catalog
    4. 3.3 Exporting assets from IBM Spectrum Discover to Watson Knowledge Catalog
    5. 3.3.1 IBM Spectrum Discover tag to WKC tag mapping
    6. 3.4 Using assets in Watson Knowledge Catalog
  7. Chapter 4. Curating unstructured data for IBM Watson Knowledge Catalog with IBM Spectrum Discover
    1. 4.1 Data curation workflow
    2. 4.1.1 Creating tags in IBM Spectrum Discover
    3. 4.1.2 Creating regular expressions
    4. 4.1.3 Creating a content inspection policy
    5. 4.1.4 Searching by title and author
    6. 4.2 Using assets in IBM CP4D and Watson Knowledge Catalog
    7. 4.2.1 Browsing and managing assets in a catalog
    8. 4.2.2 Creating projects from assets in Watson Knowledge Catalog
    9. 4.2.3 Creating data governance policies
  8. Chapter 5. Healthcare and life sciences use cases
    1. 5.1 Generic healthcare use case
    2. 5.1.1 IBM Spectrum Discover large-scale AI and data governance with Watson Knowledge Catalog
    3. 5.1.2 Data governance: Medical file classification example
    4. 5.1.3 Large-scale analytics, AI, and ML for healthcare and life sciences
    5. 5.2 COVID-19 use case
    6. 5.2.1 Classifying images with IBM Visual Insights
    7. 5.2.2 Registering assets and tags / labels into Watson Knowledge Catalog
    8. 5.2.3 Viewing images in Watson Knowledge Catalog
    9. 5.2.4 Uploading an IBM Spectrum Discover custom report into Watson Knowledge Catalog
    10. 5.3 Breast cancer use case
    11. 5.3.1 Using Data Refinery, Jupyter Notebook, or Cognos to analyze report data
  9. Chapter 6. Financial services use case: Personally Identifiable Information detection and data governance
    1. 6.1 Current challenges in financial industries
    2. 6.1.1 Customer expectations
    3. 6.1.2 Increasing pressure from competition
    4. 6.1.3 Investor expectations
    5. 6.1.4 Keeping up with compliance and regulations
    6. 6.1.5 Business agility with the latest technology
    7. 6.2 Protecting cardholder data with PCC DDS use case
    8. 6.2.1 Overview of PCI
    9. 6.2.2 Overview of PCI requirements
    10. 6.2.3 Implementing PCI DSS into business
    11. 6.3 Creating a data governance policy in WKC
    12. 6.3.1 Creating a policy
    13. 6.3.2 Creating rules for data protection
  10. Chapter 7. Conclusion
  11. Related publications
    1. IBM Redbooks
    2. Other publications
    3. Online resources
    4. Help from IBM
  12. Back cover
18.118.166.98