Table of Contents

Cover

Title Page

Copyright

Dedication

Preface

Acknowledgments

About the Authors

Chapter 1: A brief history of data warehousing and first-generation data warehouses

Data Base Management Systems

Online Applications

Personal Computers and 4GL Technology

The Spider Web Environment

Evolution from the Business Perspective

The Data Ware House Environment

What Is a Data Warehouse?

Integrating Data—a Painful Experience

Volumes of Data

A Different Development Approach

Evolution to the DW 2.0 Environment

The Business Impact of the Data Warehouse

Various Components of the Data Warehouse Environment

The Evolution of Data Warehousing from the Business Perspective

Other Notions About a Data Warehouse

The Active Data Ware House

The Federated Data Warehouse Approach

The Star Schema Approach

The Data Mart Data Warehouse

Building a “Real” Data Warehouse

Summary

Chapter 2: An introduction to DW 2.0

DW 2.0—a New Paradigm

DW 2.0—from the Business Perspective

The Life Cycle of Data

Reasons for the Different Sectors

Metadata

Access of Data

Structured Data/Unstructured Data

Textual Analytics

Blather

The Issue of Terminology

Specific Text/General Text

Metadata—a Major Component

Local Metadata

A Foundation of Technology

Changing Business Requirements

The Flow of Data Within DW 2.0

Volumes of Data

Useful Applications

DW 2.0 and Referential Integrity

Reporting In DW 2.0

Summary

Chapter 3: DW 2.0 components—about the different sectors

The Interactive Sector

The Integrated Sector

The Near Line Sector

The Archival Sector

Unstructured Processing

From the Business Perspective

Summary

Chapter 4: Metadata in DW 2.0

Reusability of Data and Analysis

Metadata In DW 2.0

Active Repository/Passive Repository

The Active Repository

Enterprise Metadata

Metadata and the System of Record

Taxonomy

Internal Taxonomies/External Taxonomies

Metadata In the Archival Sector

Maintaining Metadata

Using Metadata—an Example

From the End-User Perspective

Summary

Chapter 5: Fluidity of the DW 2.0 technology infrastructure

The Technology Infrastructure

Rapid Business Changes

The Treadmill of Change

Getting Off the Treadmill

Reducing the Length of Time For It to Respond

Semantically Temporal, Semantically Static Data

Semantically Temporal Data

Semantically Stable Data

Mixing Semantically Stable and Unstable Data

Separating Semantically Stable and Unstable Data

Mitigating Business Change

Creating Snapshots of Data

A Historical Record

Dividing Data

From the End-User Perspective

Summary

Chapter 6: Methodology and approach for DW 2.0

Spiral Methodology—a Summary of Key Features

The Seven Streams Approach—an Overview

Enterprise Reference Model Stream

Enterprise Knowledge Coordination Stream

Information Factory Development Stream

Data Profiling and Mapping Stream

Data Correction Stream (Previously Called the Data Cleansing Stream)

Infrastructure Stream

Total Information Quality Management Stream

Summary

Chapter 7: Statistical processing and DW 2.0

Two Types of Transactions

Using Statistical Analysis

The Integrity of the Comparison

Heuristic Analysis

Freezing Data

Exploration Processing

The Frequency of Analysis

The Exploration Facility

The Sources for Exploration Processing

Refreshing Exploration Data

Project-Based Data

Data Marts and the Exploration Facility

A Backflow of Data

Using Exploration Data Internally

From the Perspective of the Business Analyst

Summary

Chapter 8: Data models and DW 2.0

An Intellectual Road Map

The Data Model and Business

The Scope of Integration

Making the Distinction Between Granular and Summarized Data

Levels of the Data Model

Data Models and the Interactive Sector

The Corporate Data Model

A Transformation of Models

Data Models and Unstructured Data

From the Perspective of the Business User

Summary

Chapter 9: Monitoring the DW 2.0 environment

Monitoring the DW 2.0 Environment

The Transaction Monitor

Monitoring Data Quality

A Data Warehouse Monitor

The Transaction Monitor—Response Time

Peak-Period Processing

The Etl Data Quality Monitor

The Data Warehouse Monitor

Dormant Data

From the Perspective of the Business User

Summary

Chapter 10: DW 2.0 and security

Protecting Access to Data

Encryption

Drawbacks

The Firewall

Moving Data Offline

Limiting Encryption

A Direct Dump

The Data Warehouse Monitor

Sensing an Attack

Security For Near Line Data

From the Perspective of the Business User

Summary

Chapter 11: Time-variant data

All Data In DW 2.0—Relative To Time

Time Relativity In the Interactive Sector

Data Relativity Elsewhere In DW 2.0

Transactions In the Integrated Sector

Discrete Data

Continuous Time Span Data

A Sequence of Records

Nonoverlapping Records

Beginning and Ending a Sequence of Records

Continuity of Data

Time-Collapsed Data

Time Variance In the Archival Sector

From the Perspective of the End User

Summary

Chapter 12: The flow of data in DW 2.0

The Flow of Data Throughout the Architecture

Entering the Interactive Sector

The Role of ETL

Data Flow Into the Integrated Sector

Data Flow Into the Near Line Sector

Data Flow Into the Archival Sector

The Falling Probability of Data Access

Exception-Based Flow of Data

From the Perspective of the Business User

Summary

Chapter 13: ETL processing and DW 2.0

Changing States of Data

Where ETL Fits

From Application Data to Corporate Data

ETL In Online Mode

ETL In Batch Mode

Source and Target

An ETL Mapping

Changing States—an Example

More Complex Transformations

ETL and Throughput

ETL and Metadata

ETL and An Audit Trail

ETL and Data Quality

Creating Etl

Code Creation or Parametrically Driven ETL

ETL and Rejects

Changed Data Capture

ELT

From the Perspective of the Business User

Summary

Chapter 14: DW 2.0 and the granularity manager

The Granularity Manager

Raising the Level of Granularity

Filtering Data

The Functions of the Granularity Manager

Home-Grown Versus Third-Party Granularity Managers

Parallelizing the Granularity Manager

Metadata as a By-Product

From the Perspective of the Business User

Summary

Chapter 15: DW 2.0 and performance

Good Performance—a Cornerstone For DW 2.0

Online Response Time

Analytical Response Time

The Flow of Data

Queues

Heuristic Processing

Analytical Productivity and Response Time

Many Facets to Performance

Indexing

Removing Dormant Data

End-User Education

Monitoring the Environment

Capacity Planning

Metadata

Batch Parallelization

Parallelization for Transaction Processing

Workload Management

Data Marts

Exploration Facilities

Separation of Transactions Into Classes

Service Level Agreements

Protecting the Interactive Sector

Partitioning Data

Choosing the Proper Hardware

Separating Farmers and Explorers

Physically Group Data Together

Check Automatically Generated Code

From the Perspective of the Business User

Summary

Chapter 16: Migration

Houses and Cities

Migration In a Perfect World

The Perfect World Almost Never Happens

Adding Components Incrementally

Adding the Archival Sector

Creating Enterprise Metadata

Building the Metadata Infrastructure

“Swallowing” Source Systems

ETL as a Shock Absorber

Migration to the Unstructured Environment

From the Perspective of the Business User

Summary

Chapter 17: Cost justification and DW 2.0

Is DW 2.0 Worth It?

Macro-Level Justification

A Micro-Level Cost Justification

Company B Has DW 2.0

Creating New Analysis

Executing the Steps

So How Much Does all of This Cost?

Consider Company B

Factoring the Cost of DW 2.0

Reality of Information

The Real Economics of DW 2.0

The Time Value of Information

The Value of Integration

Historical Information

First-Generation DW and DW 2.0—The Economics

From the Perspective of the Business User

Summary

Chapter 18: Data quality in DW 2.0

The DW 2.0 Data Quality Tool Set

Data Profiling Tools and the Reverse-Engineered Data Model

Data Model Types

Data Profiling Inconsistencies Challenge Top-Down Modeling

Summary

Chapter 19: DW 2.0 and unstructured data

DW 2.0 and Unstructured Data

Reading Text

Whereto Do Textual Analytical Processing

Integrating Text

Simple Editing

Stop Words

Synonym Replacement

Synonym Concatenation

Homographic Resolution

Creating Themes

External Glossaries/Taxonomies

Stemming

Alternate Spellings

Text Across Languages

Direct Searches

Indirect Searches

Terminology

Semistructured Data/Value = Name Data

The Technology Needed to Prepare the Data

The Relational Data Base

Structured/Unstructured Linkage

From the Perspective of the Business User

Summary

Chapter 20: DW 2.0 and the system of record

Other Systems of Record

From the Perspective of the Business User

Summary

Chapter 21: Miscellaneous topics

Data Marts

The Convenience of a Data Mart

Transforming Data Mart Data

Monitoring DW 2.0

Moving Data from One Data Mart to Another

Bad Data

A Balancing Entry

Resetting a Value

Making Corrections

The Speed of Movement of Data

Data Warehouse Utilities

Summary

Chapter 22: Processing in the DW 2.0 environment

Summary

Chapter 23: Administering the DW 2.0 environment

The Data Model

Architectural Administration

Defining the Moment When an Archival Sector Will Be Needed

Determining Whether the Near Line Sector Is Needed

Metadata Administration

Data Base Administration

Stewardship

Systems and Technology Administration

Management Administration of the DW 2.0 Environment

Prioritization and Prioritization Conflicts

Budget

Scheduling and Determination of Milestones

Allocation of Resources

Managing Consultants

Summary

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.174.44