Praise Quotes for Viral Data in SOA: An Enterprise Pandemic

“Data quality is a critical success factor for enterprise data management. Assuring a high level of data quality in a service-oriented architecture proves to be difficult for most organizations. Neal Fishman’s book explores the origin of data quality challenges and sheds light on how to tackle complex situations. I found this book packed with practical experience. While reading the book, I learned several effective architectural patterns and implementation tactics. I highly recommend this book for IT architects.”

Xuegang “Harry” Huang, Solution Architect–Business Intelligence
Competency Center, Danske Bank Group, Denmark

Viral Data in SOA provides insight into topics that can get easily overlooked in fast-paced businesses. The book should help provide practitioners with the necessary knowledge to minimize or avoid embarrassing mistakes due to data errors. The book definitely drives home the point that data can be misused like any other resource and if improperly handled can do serious damage to the enterprise.”

Joseph Thomas Farrell, Senior Consultant,
Booz Allen Hamilton, United States

“Neal sprinkles real-world business stories in the midst of more technical examples to show the importance of trustworthy data in a service-orientated architecture. In his terminology, data can quickly become viral, but in this positive and fun-to-read book, Neal encourages individuals and organizations to take an aggressive stand in getting the most out of their service-oriented solutions.”

Lisa Seacat, Master Inventor and Software Engineer,
IBM, United States

“I thoroughly enjoyed reading Viral Data in SOA. The book is written in such a way that it can appeal to everyone from the novice to guru.”

Miranda Needham, Computer Systems Analyst,
Montana Department of Transportation, United States

“When I first laid eyes on the script, I was worried that this might be another of those books extolling the Boy Scout virtues of SOA without contributing anything new. But on closer examination, I found the book packed full of interesting insights. Seldom do we find a book that can deal with the all-important philosophical basis of a subject matter and link it to very detailed low-level examples. It is the sort of book that I will pick up and read more than once. You can benefit from light reading, gleaming useful insights supported by interesting stories/anecdotes, and you can delve deeper and explore the details of the examples for more in-depth understanding of the issues covered.”

Simon Seow, President, InfoSpec, Malaysia

“To travel down the road of regulatory compliance can be challenging for enterprise compliance teams and software architects. Neal Fishman has made a powerful business case for data provenance as the cornerstone of data governance. His book, illustrated by compelling and timely business-world examples, is a most valuable resource since it will help tackle this problem in a principled manner.”

Professor Luc Moreau, School of Electronics and Computer Science,
University of Southampton, England

“This book is a must read for any organization using data-integration or data-interchange technologies, or simply any organization that must trust data. Neal takes the reader through an entertaining and vital journey of SOA information management issues, risks, discovery, and solutions. He provides a fresh perspective that no corporation should overlook; in fact, corporations might head blindly into SOA implementations without this awareness.”

Kevin Downey, Senior Partner, Xteoma Inc., Canada

Viral Data in SOA: An Enterprise Pandemic is a must read for any business or information technology professional who is charged with improving the creation, management, and distribution of information in a large enterprise. This book makes the most important current contribution to understanding the complexities, issues, and practices for improving large enterprise data management organization, architecture, processes, and technology. The book brings a theoretical and intellectual clarity coupled with a practical perspective to enterprise data management.”

Bruce Gallager, President and CEO, Syscore International, United States

“An intriguing discourse. The text is relevant and pithy. If you help manage or create IT projects, this book is worth reading from cover to cover. The book explores the relevance of misrepresented data and its potential undetected wrath on business execution. Many companies have realized that data should be a recognized asset. Viral Data in SOA sets out to help you nurture and protect that asset.”

David Kmetz, Partner: Global Business Services,
IBM, United States

“Neal Fishman’s book on the viral affects of data is a much-needed warning signal for a time when critical consideration of risk and impact take a backseat in the thrive to integrate and apply new technologies. This book alerts its readers of potential risks and indicates how these can be minimized in real-world implementations. A must read for all decision makers for today’s increasingly complex information architectures.”

Thomas Buehlmann, Ph.D., Senior Manager, Accenture, United States

Viral Data in SOA has an amazing amount of information that spans different disciplines and areas of interest. The author analyzes the subject of viral data at various levels of depth, starting from the business level and working down to the database modeling, administration, and development levels. The author uses his experience and knowledge to present a vast area of possible problems with data and provides an in-depth analysis of the causes of viral data, suggesting the right path toward solutions. I believe this is a very important book because it addresses the subject from a realistic perspective coming from real-life experiences.”

Mika Nikolopoulou, Academic Affairs University Ambassador,
IBM, United States

“In this world of exploding data volumes, every CIO is exploring new and exciting ways to leverage information. Few, however, pause to consider the associated risks that come with poorly controlled data quality. In this book, Mr. Fishman provides a detailed understanding of just how pervasive data quality problems are across all levels of an organization. His eclectic use of examples leaves the reader with no doubts about the unexpected dangers of viral data in a modern business. It provides a wake-up call to all IT professionals looking to use business data to create competitive advantage.”

David McCarty, Information Management Consultant, BWH, France

“Did you ever stop to think that the information you count on to make business decisions has the potential to cause a viral data pandemic that can disable the company? Neal Fishman’s latest book will definitely make you rethink your approach to business data. The book is chock full of entertaining illustrations that promptly make the reader realize how easily data can become poisonous bits of misinformation, especially when propagated throughout today’s service-oriented enterprises. Viral Data in SOA: An Enterprise Pandemic drives home the need for all businesses to establish an environment of trustworthy data. As Neal points out, ‘At the end of the day, there is a line between trustworthy information and viral data, and that line is very fine.’ This book provides excellent guidance on a variety of governance models, techniques, and treatments to combat viral data and to derive trusted information. The book is a thought-provoking information technology guide that I will reread frequently.”

Beth Brownhill, Distinguished Engineer,
Information Agenda Executive Architect, IBM, United States

“The most entertaining and informative book on information technology I have read to date. A fascinating look at the role of information in the workplace.”

Mark Stewart, Managing Director, AV Pure & Simple, England

“Neal Fishman has a deep understanding of data and how to keep it from becoming viral. He does an excellent job of explaining how a small piece of data can create a significant business problem. I would highly recommend this book to anyone who wants to keep viral type data out of their SOA project.”

Stan Green, Sr. Project Manager,
Modern Computer Solutions, LLC, United States

Related Books of Interest

image

WebSphere Business Integration Primer

Process Server, BPEL, SCA, and SOA
by Ashok Iyengar, Vinod Jessani, and Michele Chilanti
ISBN-13: 978-0-13-224831-0

Using WebSphere® Business Integration (WBI) technology, you can build an enterprise-wide Business Integration (BI) infrastructure that makes it easier to connect any business resources and functions, so you can adapt more quickly to the demands of customers and partners. Now there’s an introductory guide to creating standards-based process and data integration solutions with WBI.

WebSphere Business Integration Primer thoroughly explains Service Component Architecture (SCA), basic business processes, and complex long-running business flows, and guides you to choose the right process integration architecture for your requirements. Next, it introduces the key components of a WBI solution and shows how to make them work together rapidly and efficiently. This book will help developers, technical professionals, or managers understand today’s key BI issues and technologies, and streamline business processes by combining BI with Service Oriented Architecture (SOA).

image

SOA Governance

Achieving and Sustaining Business and IT Agility
by William A. Brown, Robert G. Laird, Clive Gee, and Tilak Mitra
ISBN-13: 978-0-13-714746-5

In SOA Governance, a team of IBM’s leading SOA governance experts share hard-won best practices for governing IT in any service-oriented environment.

The authors begin by introducing a comprehensive SOA governance model that has worked in the field. They define what must be governed, identify key stakeholders, and review the relationship of SOA governance to existing governance bodies as well as governance frameworks like COBIT. Next, they walk you through SOA governance assessment and planning, identifying and fixing gaps, setting goals and objectives, and establishing workable roadmaps and governance deliverables. Finally, the authors detail the build-out of the SOA governance model with a case study.

image

The New Language of Business

SOA & Web 2.0
by Sandy Carter
ISBN-13: 978-0-13-195654-4

In The New Language of Business, senior IBM executive Sandy Carter demonstrates how to leverage SOA, Web 2.0, and related technologies to drive new levels of operational excellence and business innovation.

Writing for executives and business leaders inside and outside IT, Carter explains why flexibility and responsiveness are now even more crucial to success — and why services-based strategies offer the greatest promise for achieving them.

You’ll learn how to organize your business into reusable process components — and support them with cost-effective IT services that adapt quickly and easily to change. Then, using extensive examples — including a detailed case study describing IBM’s own experience — Carter identifies best practices, pitfalls, and practical starting points for success.

image

Executing SOA

A Practical Guide for the Service-Oriented Architect
by Norbert Bieberstein, Robert G. Laird, Dr. Keith Jones, and Tilak Mitra
ISBN-13: 978-0-13-235374-8

In Executing SOA, four experienced SOA implementers share realistic, proven, “from-the-trenches” guidance for successfully delivering the largest and most complex SOA initiative. This book follows up where the authors’ best-selling Service-Oriented Architecture Compass left off, showing how to overcome key obstacles to successful SOA implementation and identifying best practices for all facets of execution—technical, organizational, and human. Among the issues it addresses include introducing a services discipline that supports collaboration and information process sharing; integrating services with preexisting technology assets and strategies; choosing the right roles for new tools; shifting culture, governance, and architecture; and bringing greater agility to the entire organizational lifecycle, not just isolated projects.

image

Listen to the author’s podcast at: ibmpressbooks.com/podcasts

image

Visit ibmpressbooks.com for all product information

image

Enterprise Master Data Management

by Allen Dreibelbis, Eberhard Hechler, Ivan Milman, Martin Oberhofer, Paul van Run, and Dan Wolfson
ISBN-13: 978-0-13-236625-0

Enterprise Master Data Management provides an authoritative, vendor-independent MDM technical reference for practitioners: architects, technical analysts, consultants, solution designers, and senior IT decision makers. Written by the IBM® data management innovators who are pioneering MDM, this book systematically introduces MDM’s key concepts and technical themes, explains its business case, and illuminates how it interrelates with and enables SOA.

Drawing on their experience with cutting-edge projects, the authors introduce MDM patterns, blueprints, solutions, and best practices published nowhere else—everything you need to establish a consistent, manageable set of master data, and use it for competitive advantage.

image

Enterprise Messaging Using JMS and IBM WebSphere

Yusuf
ISBN-13: 978-0-13-146863-4

image

IBM WebSphere System Administration

Williamson, Chan, Cundiff, Lauzon, Mitchell
ISBN-13: 978-0-13-144604-5

image

WebSphere Engineering

A Practical Guide for WebSphere Support Managers and Senior Consultants
Ding
ISBN-13: 978-0-13-714225-5

image

Rapid Portlet Development with WebSphere Portlet Factory

Step-by-Step Guide for Building Your Own Portlets
Bowley
ISBN-13: 978-0-13-713446-5

image

Enterprise Java Programming with IBM WebSphere, Second Edition

Brown, Craig, Hester, Pitt, Stinehour, Weitzel, Amsden, Jakab, Berg
ISBN-13: 978-0-321-18579-2

image

Service-Oriented Architecture (SOA) Compass

Bieberstein, Bose, Fiammante, Jones, Shah
ISBN-13: 978-0-13-187002-4

image

IBM WebSphere

Barcia, Hines, Alcott, Botzum
ISBN-13: 978-0-13-146862-7

Viral Data in SOA: An Enterprise Pandemic

Neal A. Fishman

Image

IBM Press
Pearson plc

Upper Saddle River, NJ • Boston • Indianapolis • San Francisco
New York • Toronto • Montreal • London • Munich • Paris • Madrid
Cape Town • Sydney • Tokyo • Singapore • Mexico City
ibmpressbooks.com

The author and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.

© Copyright 2010 by International Business Machines Corporation. All rights reserved.

Note to U.S. Government Users: Documentation related to restricted right. Use, duplication, or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corporation.

      IBM Press Program Managers: Steven M. Stansel, Ellice Uffer
      Cover design: IBM Corporation
      Associate Publisher: Greg Wiegand
      Marketing Manager: Kourtnaye Sturgeon
      Publicist: Heather Fox
      Acquisitions Editor: Bernard Goodwin
      Managing Editor: Kristy Hart
      Designer: Alan Clements
      Project Editor: Andy Beaster
      Copy Editor: Keith Cline
      Senior Indexer: Cheryl Lenser
      Compositor: Nonie Ratcliff
      Proofreader: Jennifer Gallant
      Manufacturing Buyer: Dan Uhrig

Published by Pearson plc

Publishing as IBM Press

IBM Press offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact

          U. S. Corporate and Government Sales
          1-800-382-3419
          [email protected].

For sales outside the U. S., please contact

          International Sales
          [email protected].

The following terms are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both: IBM, the IBM logo, IBM Press, DB2, and WebSphere. Microsoft, Windows, Microsoft Excel, Microsoft PowerPoint, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Googlebot is a registered trademark of Google Incorporated in the United States, other countries, or both. Apple, iPod, iTunes, and iPhone are trademarks of Apple Computer Incorporated in the United States, other countries, or both. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.

Library of Congress Cataloging-in-Publication Data

Fishman, Neal.
  Viral data in SOA : an enterprise pandemic / Neal A. Fishman.
       p. cm.
  Includes bibliographical references and index.
  ISBN 978-0-13-700180-4 (pbk. : alk. paper)  1.  Database management. 2.    Business—
Databases—Management. 3.  Information storage and retrieval systems—Reliability.
4.  Disinformation—Prevention. 5.  Service-oriented architecture (Computer science)
I. Title.
  QA76.9.D3F5835 2010
  004.6’54—dc22
                                                          2009020221

All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to:

      Pearson Education, Inc.
      Rights and Contracts Department
      501 Boylston Street, Suite 900
      Boston, MA 02116
      Fax (617) 671 3447

      ISBN-13: 978-0-13-700180-4

      ISBN-10: 0-13-700180-0

Text printed in the United States on recycled paper at R.R. Donnelley in Crawfordsville, Indiana.

First printing July 2009

tm3

Contents

Foreword: Timothy Davis

Foreword: Kamal Bherwani

Preface

Acknowledgments

About the Author

Definition: Viral Data

Prelude

Introduction

Chapter 1 Viral Data

Chapter 2 Data Governance

Chapter 3 Reference Model

Chapter 4 Assessing the Damage

Chapter 5 Data Conditioning

Chapter 6 Putting in Place

Epilogue

Bibliography

Index

Foreword

When Neal came to me to discuss his potential authoring of Viral Data in SOA, I was both very enthusiastic and concerned. I quietly wondered whether the broad audience that the book could potentially reach was really ready to come to terms with the true state of affairs with regard to business information? As IT professionals, do we have the means to stop the progression of viral data?

image

Look around you. A storm is brewing—a perfect storm of viral data, disinformation, and misinformation. The collapse of the world’s financial markets resulted in part from toxic assets that appeared from nowhere and humbled even the most mature and steadfast of institutions. During our careers, many of us have tried to invest a portion of our incomes using trusted research published by ratings agencies and from auditors’ reports. It’s all trusted, unbiased information you can believe in. Not!

As I reflect on the numerous projects that Neal and I have completed together at many of the top Global 2000 companies for both Ascential® and IBM®, it becomes clear to me that the trend and advance of viral data has now become pandemic. With a preponderance of evidence, I am still amazed that people don’t realize the effects that viral data has in their daily lives.

I am reminded of a recent data governance workshop Neal and I delivered to a major federal financial agency that had been caught up in the financial meltdown. Neal argued with the agency’s top architects about the imperative to establish a data quality and governance practice following the recent banking collapse to restore trust in their information. When one of the senior architects retorted that they didn’t need these things because they were already a “very conservative financial organization,” Neal’s response of “apparently not” said it all and brought a deadly silence to the meeting that lasted a full minute.

Within the agency, there had been an unawareness of how their IT practices contributed to viral data by delivering the wrong answers. When combined with other systems used by auditors, accountants, and rating agencies, the systemic problem arises as pan-enterprise and certainly pandemic. Of a greater concern is that a preponderance of corporate and government cultures remain satisfied with their shared illusion of regulatory compliance and transparency. Although many IT individuals are charged with fixing data issues, many inadvertently end up arguing against it!

Neal shows in this book: “What is believed to be true can be true, or false, or both true and false at the same time.” One day your financial statement from Bernie Madoff says you are a millionaire with a 20 percent return, the next day you are the victim of a $50 billion Ponzi scheme. This book strikes at the heart of the issues in SOA and the damaging approaches now being pursued by many vendors more interested in selling software than the long-term success of their enterprise customers.

Without a doubt, viral data in a service-oriented architecture has the capacity to become an enterprise pandemic and disable a company. SOA canonical data models present a fundamentally flawed approach by mapping malaligned data to a common form and promoting consumption of viral data without a data quality, standardization, alignment, and harmonization process. Service-oriented solutions through interoperability, reusability, layering of abstractions, and loose coupling can serve as perfect hosts to propagate misinformation: That is the knife’s edge of SOA, the perfect storm. When the use of SOA achieves a high degree of interoperability within the corporate value chain, viral data can easily become a pandemic situation, becoming a perfect opportunity to miscommunicate with ubiquity and simultaneity.

One of my favorite parts in Neal’s treatise is the discussion Neal shares about how PowerPoint® has potentially become responsible for destroying our ability to accurately communicate. However, Neal goes deeper, too, into other important areas such as governance, provenance, metadata, data modeling, data integration, master data management, trusted information, services-oriented architecture, and data chains.

So, are all of Neal’s readers really ready to confront the true state of affairs in the enterprise today? Can we offer solutions for how to stop the progression of viral data? After reading this great book, I’m now thinking Neal should run for president and we should start our own political party, news channel, and religion. We can call it, “Trusted Unbiased Information You Can Believe In!”

Timothy G. Davis,
Boston, Massachusetts
April 2, 2009

Timothy G. Davis

Tim Davis is an Executive Director within the IBM Software Group. Tim’s mission at IBM is to drive the Trusted Information Agenda through the advancement of industry-leading architectures for IBM customers worldwide.

Tim is also one of the leading content contributors for IBM curriculums, best practices, and methodologies. Tim is the founder of the IBM Center of Excellence for Data Integration, and recently led the development and launch of IBM’s Information Grid, MDM Server Rapid Deployment, and IBM’s SAP deployment accelerators.

Tim has more than 25 years of professional experience in large-scale systems integration, high-performance computing, SOA, MDM, ERP/SAP, CRM, data warehousing, analytics, banking risk management, and compliance. Tim has published numerous papers and holds a Master of Science degree in Electrical Engineering/Computer Science from USC and a Bachelor’s degree in the same discipline from Clarkson University.

Foreword

The evolution of technology has allowed for a great many capabilities. It has also pointed at each phase to the next big problem. In many ways, the world has reached a point where integrated data between silos is the key to prevention, reaction, and compliance. The lack of integrated data leads to failures in both public and private environments. The data must be central to the problem at hand.

image

Public safety, environment monitoring, economic development, financial risk management, education, and other challenges that many nations are facing are quite literally manageable by knowing better. But no one holds all the pieces of the puzzle. Looking at the pieces individually and acting may actually make the problem worse.

Many of these capabilities are locked within vertical silos. A vertical silo can be an application, it can be a business unit, and it can be a company. We have written and rewritten systems to meet the needs of these vertical silos.

But no matter how wonderfully effective and efficient the silos have become, the power of understanding information across the silos has become the key to knowledge. As we struggle to free data from the walls of its silo, we realize that the way technology has evolved within an organization, and between organizations has been without standards for processes, controls, and data. Getting good and useful information in real time across silos is a real cross-boundary challenge. It takes a great deal of governance to orchestrate the connecting of the dots.

In January 2008, the Mayor of New York City, Michael R. Bloomberg, gave his annual State of the City address to the citizens of New York City, launching a bold new initiative called HHS-Connect. The vision behind HHS-Connect is to connect the dots between the various agencies in New York that provide health and human services. Shortly thereafter, I had the pleasure of being announced by the mayor as his chief information officer for health and human services and as the executive director for HHS-Connect.

HHS-Connect was established to link data from more than a dozen city agencies to enable caseworkers to share client information without compromising confidentiality. Mayor Bloomberg stated that “with HHS-Connect, caseworkers will now spend less time managing paperwork, and will be able to spend more time face to face with their clients. Building upon our online prescreening and eligibility tool, ACCESS NYC, beginning this year, in a first for any municipal government, we will link the computer systems of health and human service agencies, so that they can share client information without compromising confidentiality.”

The Deputy Mayor for Health and Human Services, Linda Gibbs, similarly commented that with “HHS-Connect, the City will fundamentally change how we provide services by connecting clients, agencies, and providers to ensure holistic and integrated services that wrap around a family. Data sharing will provide a more complete understanding of the clients’ needs that will ultimately lead to better client care.”

My team oversees technology strategy and architecture to ensure a coordinated approach to facilitate data integration and exchange between nine health and human services agencies under the direction of Deputy Mayor Linda Gibbs. The goal is also to exchange data with other New York City agencies, the state, federal agencies, and even third-party providers. In addition, giving clients new channels of digital access to their data and capabilities that allow for transactions online is critical to improving client experience.

New York City is one of the world’s largest and most vibrant cities, and being tasked with providing the all relevant stakeholders with an accurate and comprehensive single view of the New York resident is an exciting and rewarding challenge. My challenges range from leading the policy and legal effort of understanding what data can be shared with which parties under which circumstances to making sure each linked agency has access to the right cross-agency data in a timely manner to help ensure each citizen receives access to the services for which he or she is entitled.

HHS-Connect will not only increase the quality of life of the clients we serve in the city, but it will also help the city make better and quicker decisions, resulting in better outcomes for families. Collaboratively, I am part of a broad team to make real progress toward the goal of integrated health and human services.

Helping to bridge all of these agencies exemplifies the importance and power of data—from seeing positive results attributable to accurate data, to the potential headaches associated with misinformation. The need to protect against viral data was a concept that resonated. We are building software services, we are using master data solutions, and we are using state-of-the-art techniques, hardware, and software. Ultimately, success will mean paying attention to the big picture, and building it piece by piece. We are building the tracks while the train is in motion.

In Viral Data for SOA, Neal positions the path for trustworthy data in the big picture, taking into account the organization, data needs, as well as software needs. Neal explores a number of topics with depth and insight, including data governance and data provenance.

While my job is fun, exciting, and full of daily challenges, my focus has to include making sure that I can reliably provision data to each agency that HHS-Connect supports. Achieving trustworthiness in data is a goal. Avoiding viral data is a mandate. Without good and consistent underlying data, the usefulness of connecting the dots will be lost.

Kamal Bherwani
Chief Information Officer Health and Human Services
Executive Director HHS-Connect
New York City
March 24, 2009

Preface

The overall quality of information in our organizations continues to be suspect and poor. Spurred by needs to eliminate duplication, cut costs, and provide greener solutions, organizations have renewed efforts to logically centralize IT systems through virtualization, federation, and shared services.

A shared data resource exposed through a layer of services can behave like a virus—unilaterally affecting all those who touch the data. This book addresses the treatment and prevention of harmful data in a service-oriented architecture.

Chapter 1, “Viral Data.” Chapter 1 explores the potential viral consequences of data in a services-oriented architecture is explored. The chapter provides examples of how different anomalies in data can have everything from mild to serious consequences across the organization.

Topics that Chapter 1 covers include the following: the silo structure of organizations; problems with trying to align IT with the business; eminent domain and management prerogative; different ways to view data quality; the interpretation of metrics; effects of positive and negative feedback mechanisms; SOA characteristics and how they equate in data design; and how data designs often force tight coupling.

Chapter 2, “Data Governance.” Many organizations have tried to manage their information quality, but as organizations centralize their services and data layers, a need has arisen to add a layer of governance to the data management framework. This chapter explains what data governance adds to the organization and provides several implementation models.

Topics that Chapter 2 covers include the following: the ubiquity of governance; the benefit of permitting deviations; oversight; basic tenets of business; rules of compliance; rules of behavior; self-governance; master data becoming aggregation hubs; intragovernance versus intergovernance; communication styles; a metamodel for data governance; data governance bodies; a framework for data governance; enforcement; dialing as a means of success; and assessing risk.

Chapter 3, “Reference Model.” This chapter reviews a reference architecture that provides a context for how to assess, condition, and recondition shared data. The reference architecture becomes a backbone for implementing an enterprise-wide data quality methodology.

The methodology is described through the use of a flight-path metaphor and supplements publish and subscribe scenarios that might be used in association with a service bus. Opportunities to address (assess or condition) data quality are explained in the reference architecture using the flight-path stages of preflight, in-flight, and post-flight.

Topics that Chapter 3 covers include the following: transient and persistent data; metadata for structured, semistructured, and unstructured data; the life span of data; the Shannon-Weaver communication model; understanding the processes in which data moves through an organization; data flight paths; customizing data quality assessments to the data flight path; the differences between data alignment and data harmonization; types of work efforts on data quality; benefits of following a scenario-based analysis; using probabilistic and deterministic matching; and merging and surviving moved data.

Chapter 4, “Assessing the Damage.” Guided by the reference architecture, this chapter explains how to approach assessing the quality of information in an organization. The chapter reviews what aspects of data can be addressed: from modeling techniques, to the choices in hardware, to all the different types of data persisted in an enterprise.

Topics that Chapter 4 covers include the following: the impact of revisionism; hidden meanings embedded in business data and overloading concepts; keeping the persistence design separate from the presentation; working with abstract concepts; constructing definitions; targeting definitions to a defined audience; isa and hasa relationships; different techniques for rationalizing and interpreting data; normal forms; the visualization of data models; how to assess ownership of data; a taxonomy for classes of data; and data-value patterns.

Chapter 5, “Data Conditioning.” This chapter explains how to condition and recondition data. Conditioning is the augmenting of data to improve its level of quality either at the time of creation or soon thereafter. Reconditioning largely covers the subject of data decay and the augmenting of data previously conditioned. The chapter discusses data provenance as a means to help manage and evaluate the condition of data. Examples are shown as to how metadata can be used to create reactive systems.

Topics that Chapter 5 covers include the following: reductionism; systems thinking; how to use data provenance; applying value chains and data chains; implications of early- and late-binding techniques; data lineage and conditional data lineage; metadata as a late-binding agent; a technique to evaluate and measure the stability of a business concept; data decay; and reclassifying business products.

Chapter 6, “Putting in Place.” This final chapter describes additional steps that enable an organization to implement a data quality initiative for managing and governing shared data used in an SOA.

Topics that Chapter 6 covers include the following: manipulating information for personal gain; creating software that has a continual ability to react to business needs; how to create abstract designs; metamodels; practical distinctions between conceptual, logical, and physical data models; data architecture; the use of stories to explain an abstraction; the importance of context; and master data management.

Who Should Read This Book

Viral Data in SOA: An Enterprise Pandemic has content that will appeal to a diverse business and technical audience. On the business side, subject matter experts, business liaisons, and managers who rely heavily on data from information technology departments to perform their functions are likely to benefit from the first half of the book.

Technical specialists of any species will find value in the holistic approach to gaining trusted information in a service-oriented environment. Therefore, this book will appeal to IT managers, analysts, programmers, data administrators, database administrators, enterprise architects, technical architects, system architects, SOA architects, and data architects.

This book covers a wide range of topics, including the following (alphabetically), and so will likewise appeal to a diverse audience with interest in any or all of the following:

image

What You Will Learn

This book is a comprehensive guide to understanding:

• The importance of addressing data quality as part of a multidisciplinary effort

• The need to establish or reestablish data design principles

• That achieving trusted information requires a lifetime commitment

Acknowledgments

I want to sincerely thank the following individuals for their contributions during the writing process: Lisa Deluca, Norbert Bieberstein, Don Dejewski, Linda Nadeau, Dr. Jerry Rosenbaum, Dr. Jeffrey Herzog, Bob Leo, Barbara Alarie, Catherine Argento, Demetrios Sapounas, Ph.D., Glen Birrell, Dr. Ramasamy Uthurusamy, and Warren Selkow, Ph.D.

Thanks also to Susan Visser, Tara Woodman, and Michael Curry from IBM.

Pearson Publishing provided guidance and motivation during the entire process, and I want to especially thank Bernard Goodwin, Michelle Housley, Noreen Regina, Andy Beaster, Keith Cline, Cheryl Lenser, Nonie Ratcliff, and Jennifer Gallant.

Finally, I am grateful to Kamal Bherwani for writing a Foreword, and to Timothy Davis for writing a Foreword and for his mentorship over the past five years.

About the Author

Neal Fishman is the program director for information and integration forensics within IBM’s Information Management’s Technical Architecture Group. He has been involved in many aspects of information technology and has developed many unique perspectives throughout his career.

image

Neal is also a co-author of the textbook Enterprise Architecture Using the Zachman Framework, and has been a distance-learning instructor for the University of Washington. In addition, Neal has served on several committees for international technology standards and as a board member of the Data Management Association (DAMA) Atlanta chapter.

Definition

viral data, n.

image [< classical Latin indicium vimageral poisonous bits of bytes, elements of something going awry (as affecting behavior, knowledge, or outcomes) < the same Indo-European base as Sanskrit image dt’m sand storm etched words (lit. sand script), Avestan vimagešh vaêdha fear of typos and vocalos, ancient Greek iika thethomena (lit. nasty icky data). Compare Middle French, French données virales (1503 in sense ‘a host program or service that carries an infectious agent’; 1805 in fig. use), Catalan les dades viral (16th cent.), Spanish datos virales (18th cent.).]1

1. Intended levity—modeled after the definition for the word virus in the 2008 edition of the Oxford English Dictionary.

image Chiefly information technology. A situation or series of undesirable effects associated with engaging (activating or hosting) information through a service. Often the result of interoperability where inadequate or substandard data engender an unfortunate set of circumstances to ripple from one process to the next with minimal periods of rest or the push of inadequate or substandard data by a shared service from a single view to all constituents. Left untreated, uncontained, or unchecked, viral data in a service-oriented architecture can reach epidemic proportions in an enterprise.

Antonym: trusted information.

Prelude

Violet. Assess.

Maybe it is instantly recognizable as the masculine form of the French word violette. Or maybe you will notice it is a misspelling—notoriously, the missing n having gone awry.1 Or just maybe, you will end up with a whole bunch of more maybes.

1. Adding an n to violet spells the word violent.

Maybe violet is...

• A pigmentation, or

• A qualifying name for other colors such as violet black, or

• The name of a girl, or

• A type of flower, or

• A reference to petals, or

• An inference to harmony in Chinese art, or

• A suggestion of royalty, or

• A spherical starship, 75 feet in diameter,2 or

2. See Smith (1930).

• A forward-looking visionary (possessing a violet aura), or

• A reference to Advent and Lent, or

• Part of a system of meditation, or

• A small town of approximately 10,000 people near New Orleans, or

• The Greek city of Athens, the city of the Violet Crown, or

• A doll who travels the world posting pictures of her trips on the Internet, or

• The name of a painting, or

• A reference to the hue of the short-wave end of the visible spectrum, or

• A scent (violet-sweet3), or

3. See Malone (1904).

• An attribute to other senses, a violet-virtue, or

• A inference of breath, or

• A reference to onions, or

• A form of color blindness (violet-blindness), or

• A nice cup of tea, or

• Possibly, a musical instrument, diminutive of the viol, or

• The fly that occurs between April 6 and April 10,4 or

4. See Walton (1915).

• An act of gathering (violating5), or

5. See Russell Mitford (1906).

• A co-owner of the High Cactus Ranch Bed & Breakfast, located eight miles south of Calgary, Canada

Or, could it be that violet...

• Is the thing itself, possessing properties such as hue and tone, or

• Is the property of something else altogether (like that dress), or

• Is something that possesses behavior and possibly expands or contracts depending on the temperature

Or, maybe there should be a little more to go on. Some additional context, perhaps!

To assess something in terms of data quality requires something. That je ne sais quoi, that little something extra, so as making heads or tails of what is to be assessed. It could be some supporting metadata, it could be something about the outcome, or it could be about the provenance.

Why assess? Maybe to observe, to judge, to communicate, to plan, or possibly to carry out an action. If these things are, metaphorically, attempted in the dark, the venture is much more likely to fail than it is to succeed. Context can provide the necessary background information, a reason, a purpose, a heads-up, a frame of reference, or anything else that can provide and contribute to an orientation. Context becomes the provisioning of an anchor point from which to venture, a place from which to succeed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.86.18