PREFACE

Organizations have a choice! They can choose how they perceive the business environment and how they operate in that environment. They can choose how to build and maintain a data resource that supports their operation in the business environment. The data resource is not destined! An organization has the free will to choose whether their data resource is formally developed, or just allowed to develop.

Never Cease To Be Amazed

I never cease to be amazed at the ingenious ways that organizations can screw up their data resource. Just when I think I’ve seen it all, I go into another organization and look at their disparate data resource. My first thought is What the (you fill in the phrase) is this? Usually someone will start into a lengthy explanation about how things evolved over the years. About five minutes into the explanation I get a real splitting headache.

What organizations do to their data resource is totally unnecessary and unreasonable. Organizations seem to lack the basic concepts, principles, and techniques for building and maintaining a high quality data resource, or even a high quality database. They allow brute-force-physical approaches that mush the data around to meet current needs and deadlines.

I’ve been in the data management business nearly 50 years, yet I haven’t seen it all yet. I still encounter disasters of monumental proportions. I often wonder What were people thinking when they built these databases. It reminds me of the popular television show What Were You Thinking?

I frequently ask people where they got their knowledge and skills for building a data resource, or even a database. I get answers like I took a vendor class once, I found a book in the library, or It just seemed to be the right thing to do. The best I’ve heard was We were in a hurry so I just did what needed to be done.

Experiences

Before I retired from the State of Washington, I used to be concerned that all of the data resource problems would be solved before I had a chance to work on data resource integration. To my surprise, and horror, they hadn’t been resolved by the time I retired or even today, some 15 years after I retired. Actually, the situation  has gotten worse—much worse. It seems that the hype is driving the data resource further into disparity.

I heard an interesting comment about turning data into useful answers. I thought that would be a good theme for resolving disparate data. The theme is turning disparate data into comparate data and comparate data into information that is useful to the business. After all, the basic objective of data resource management is to manage a set of raw material (data) that can be used to prepare useful answers to the business (information).

I was on a panel at an international conference several years ago with three other prominent industry leaders. A question was asked if any of the panel members had found, seen, or even heard of any organization, public or private, that had an organization-wide integrated data resource. The normally outspoken panel members looked at each other in silence and shook their heads. None of them, including me, had been involved with, seen, or heard of any organization with an organization-wide integrated data resource, or even a complete model of their existing data resource.

The question was turned back to the audience—did any audience member know of any organization with an organization-wide integrated data resource or complete data resource model. Did any audience member know of a major segment of an organizations data resource being integrated or modeled? Silence. That’s discouraging news indeed.

I was approached by an organization that wanted help with understanding and integrating their disparate data resource. During an initial interview they emphasized that their data were a corporate asset and very critical to the continued success of the organization. After expounding on the critical nature of a valuable resource, which caused me to wonder if they were trying to convince me or themselves, they finally got to the project at hand. The first task they wanted me to perform was to convince executive management that resolving the data disparity and developing an integrated data resource was necessary, and worth the effort and expense they had outlined in their proposal. Needless to say, the project was flawed from the beginning and I chose not to continue.

I recently gave a presentation to a professional group based on Data Resource Simplexity. One person asked me what they should do if only a one-percent chance existed that the organization would follow what I had just explained. First, I pointed out that the organization had a choice about improving the quality of their data resource. Then I asked if his organization was satisfied with the quality of their data resource. Could they meet the business information demand? Were there any concerns over the quality of the information received? Did business professionals fully understand the data? He received the message.

After another presentation I gave on Data Resource Simplexity, a prominent person stated that he agreed with everything I said, but stated that no organization would do all those things. Did I really expect an organization to perform all those tasks and follow all those principles?

I gave him two reasons why they should. First, those concepts, principles, and techniques are the detail of a data management profession. If a person wants to be a data management professional, they need to follow those concepts, principles, and techniques. Second, would he want to ride on an airplane, boat, or train, where the builder didn’t have the time for all of the detail and simply cut corners because they didn’t want to do something? Probably not. Further, a data resource supports many disciplines and professions. If those disciplines and professions are to do things right, then the data resource must provide the right data. He received the message, but had no idea how he would implement the concepts, principles, and techniques in his organization.

The excuse that  I often get is that it’s just too expensive to resolve the data disparity. Resolving data disparity is not worth the expense. My response is along the lines of what’s the cost of wrong business decisions, inappropriate business actions, missed opportunities, and so on? What’s the impact on public sector citizens and private sector customers?

A Knotted Jump Rope

I was working in my garage one summer day with the door open for fresh air. One of the girls in the neighborhood came by with a long jump rope that was all in knots. She came up to me and asked if I could help her. I said I could and asked her what was the problem. She said “My jump rope is all tangled up and I can’t get it untangled.”

I sat down with her and said it’s very simple. You have to find one end, find where it comes from, and pull it back through. Then you do the same thing over and over until it’s all untangled. I showed her how to do it and she said, “Oh, I can do that.”  She took the bundle and kept going one knot at a time. It wasn’t long until she had the jump rope all untangled, thanked me, and went running down the street to her friends.

I got to thinking that her jump rope is an excellent analogy about what we do to untangle disparate data. We take the disparate data, one piece at a time, and figure out what it is within the context of a common data architecture. When we understand the data, we can begin building a new data resource. The process itself is very simple, it’s just very detailed and takes some time to get through all of the knots.

I’ve done considerable reading outside data management about how people face and resolve problems. I learn all I can about how they face the unknown and seemingly insurmountable problems, find a breakthrough, and come to an equitable resolution. I then apply those approaches to data resource management and untangling a disparate data resource.

Critical Resources

The quality of the data resource in most public and private sector organizations today is really bad. The sad news is that the quality is not getting any better: it is getting worse over time. In spite of many new techniques and the continued hyperbole about data resource quality improvement, the quality of the data resource in most organizations is deteriorating.

Every organization has four critical resources that it must properly manage to become and remain a high-quality, fully-successful organization. Those four resources are the human resource, the financial resource, real property, and the data resource. Generally, the first three of those resources are properly managed. However, the data resource is seldom properly managed, which leads to low quality data and to a less than fully successful organization.

One can ask the question what would happen if the first three resources were managed the way that the data resource is currently managed. The answer should be quite clear. Civil and criminal actions are taken for not properly managing the human resource. Civil and criminal actions are also taken for not properly managing the financial resource. The same is true for not properly managing real property, such as the violation of environmental codes, building codes, and so on.

However, the same is not true when the data resource is not properly managed. The data resource can be mismanaged, often to the detriment of the organization, yet few civil or criminal actions are taken. In many situations the reverse is often true. Organizations often require extensive justification for proper management of the data resource, which is implicit approval for the ongoing mismanagement.

I am repeatedly asked why such a situation might be true. The best answer I’ve found over the years is that the data resource is intangible and inexhaustible. The data resource is not tangible like people, money, or real property. Data cannot be held in your hand the same as people, money, and real property. Further, the data resource is inexhaustible because the data can be used over and over again without being depleted. The same is not true for people, money, or real property. They can be exhausted.

The intangible and inexhaustible nature of the data resource seems to be the underlying reason why the data resource is not properly managed as a critical resource of an organization. The lack of proper data resource management has, over the years, led to the existence of large quantities of disparate data. These disparate data are not contributing to a high-quality, fully-successful organization. Further, they will not contribute to such an organization until the current disparity is resolved and future disparity is prevented.

The Answer Is Known

We know why the data resource goes disparate. We know why people don’t share disparate data. We know how to understand and resolve existing disparate data. We know how to prevent disparate data from happening. We know how to build a high-quality, sharable data resource within a single enterprise-wide data architecture. We know how to develop a data resource that supports a high-quality, fully-successful organization. So, why don’t we use that knowledge?

The answer lies partly in the intangible and inexhaustible nature of the data resource. It lies partly in the hyperbole of quick fixes, silver bullets, and magic wands. It lies partly in the attitude of accepting the routine creation of disparate data while requiring justification to create a high-quality, sharable data resource within a single enterprise-wide data architecture.

The real answer is in the recognition and formal management of data as a critical resource for the organization. Given the current state of the data resource in most organizations, a two-pronged approach is needed. First, the further creation of disparate data must be stopped. Second, the existing disparate data must be resolved. The creation of disparate data must be prevented with good data resource management techniques, and the existing disparate data must be resolved so the data are fully useful to the organization.

Formal data resource management is hard, but it is not impossible. It has never really achieved its promises, and investment in data resource management has been cyclic. It has swung between hyperbole and promise to failure and discouragement. It has not delivered consistently in a reasonable time frame. Data resource management has been a collection of disciplines rather than a formal profession. The only hope is to create a formal, certified, recognized, and respected data management profession that delivers on a high quality data resource that supports current and future business information needs.

Bruno Walter, an orchestral conductor, says that in order to achieve precision, we must concentrate on precision. The same approach is true for data resource quality. In order to achieve quality, you must concentrate on quality. In order to improve understanding, you must concentrate on understanding. In order to achieve data resource integration, you must concentrate on data resource integration.

A Demon Haunted World

I read Carl Sagan’s book The Demon Haunted World: Science as a Candle in the Dark. The book describes much of the hyperbole and mysticism prominent in the world, both historically and today. Science is emphasized as the discipline for approaching and either substantiating or disproving things, sorting out the facts, and separating the truth from all of the fiction.

Information technology, including data resource management, is characterized by considerable hype and mysticism, much the same as Carl Sagan explains. Architecture is the discipline for approaching the disparity of information technology. The common data architecture that has been developed over the last 20 years is the discipline for approaching the disparate data problem and developing a high quality, sharable data resource. It is the science for data resource management: the candle in the dark for untangling disparate data.

The concept of a common data architecture evolved through the late 1980’s and early 1990’s and was first presented in Data Sharing Using a Common Data Architecture (Brackett, 1994). The concept was developed to resolve the huge quantities of disparate data that exist in the public sector, and to meet the urgent need to identify, understand, and share those data. The book presented the vision, concepts, and techniques for developing a common data architecture.

The concept was excellent and was used on many projects, resulting in considerable input and enhancements. These enhancements, along with the evolution of data warehouse concepts and techniques, led to The Data Warehouse Challenge: Taming Data Chaos (Brackett, 1996). The term data warehouse was used in the sense of a fully integrated data resource within a common data architecture. The book presented enhancements to the common data architecture and techniques to apply the common data architecture concept to many of the data problems that exist in both public and private sector organizations.

Data Resource Quality: Turning Bad Habits Into Good Practices (Brackett, 2000) put the initial concepts, principles, and techniques in place for properly managing a data resource. After ten years of applying those concepts, principles, and techniques, and getting considerable feedback, revisions were made in Data Resource Simplexity: How Organizations Choose Data Resource Success Or Failure (Brackett, 2011). The primary emphasis was on how to stop any further disparity in the data resource.

The Current Book

The current book deals with how to resolve the rampant disparity that currently exists in the data resource of most public and private sector organizations. It builds on Data Resource Simplexity, and presents the concepts, principles, and techniques necessary for fully understanding and resolving disparate data and creating a comparate data resource. It describes a sound approach for resolving disparate data that does not rely on hype, silver bullets, or magic wands. The approach has been used in many organizations and has proven to be successful.

After many years of working with disparate data, writing about disparate data, and giving presentations about disparate data, I’ve amassed considerable material on how to understand and resolve disparate data. That material, along with all of the nitty-gritty detail, is presented in the current book. I may not have covered every single situation that may exist in every organization, but I covered most of the detail. In addition, I’ve provided the process for understanding and resolving disparate data so that people can use that process to handle any specific situation they may encounter.

At one conference, I gave a tutorial on the Common Data Architecture and data resource integration. After the presentation, one attendee stated that the presentation was very good, but I didn’t cover all of the detail that one might encounter when trying to understand and resolve disparate data. His statement was correct, but limited time was available in a tutorial.

The current book describes all of the concepts, principles, and techniques for understanding and resolving disparate data. It provides the detail that couldn’t be presented in a conference presentation or even a tutorial. It provides a phased approach that produces results that can be done on-the-fly as an organization continues its business activities. It’s based on over 25 years of experience gained as I worked with many public and private sector organizations to build a common data architecture and resolve disparate data.

The current book is not about data integration to temporarily bring data together from different sources and platforms for operational processing. It’s about formally and permanently integrating a disparate data resource within a common data architecture, and developing a comparate data resource that meets the business information demand. It’s about formally integrating not only the disparate data resource itself, but the disparate data culture managing that data resource. Solving one without the other will not resolve the disparate data problem.

The current book is a paradigm shift that not only changes the future, but also changes the past. How does it change the past? After understanding and resolving disparate data, people will never be able to look at an old data resource in the same way. They will see how the data resource happened and the impacts it caused. They will see a new way to manage data as a critical resource that provides quality information to support the business. People who have been through the process tell me they will never, ever manage data the way they did in the past.

The current book shatters past and present hype about how data should be managed. It shatters terms creating the lexical challenge in data resource management, and presents terms with comprehensive and denotative meanings. As Bruno Walter said about orchestras—to achieve quality, you must concentrate on quality.

Another common saying is that by concentrating on precision one arrives at techniques, but by concentrating on techniques, one does not arrive at precision. The key concept for formal data resource integration is to concentrate on precision, and use techniques that achieve that precision.

Comparison To Data Resource Simplexity

I’ve been asked how the material in Data Resource Integration relates to the material in Data Resource Simplexity, and whether a person should buy one or the other. Generally, Data Resource Simplexity describes how to stop rampant data disparity and Data Resource Integration describes how to resolve the existing data disparity. The diagram below shows a more detailed relationship between the two books.

Chapters 1 and 2 of Data Resource Simplexity are briefly summarized in Chapter 1 of Data Resource Integration, and some new material has been added. Chapter 2 of Data Resource Integration  provides an overview of data resource integration. Together, Chapters 1 and 2 of Data Resource Integration provide the problems leading to disparate data and the concept for resolving that disparity.

Chapters 3 through 8 of Data Resource Simplexity have been summarized in Chapter 3 of Data Resource Integration, and some new material has been added. Chapters 4 through 12 in Data Resource Integration are new material describing the concepts and processes for integrating a disparate data resource. Chapter 4 describes the extent of data variability. Chapters 5 and 6 describe the data inventory concept and process. Chapters 7 and 8 describe the data cross-reference concept and process. Chapters 9 and 10 describe the concept and process for designating a preferred data architecture. Chapters 11 and 12 describe the concept of and process for transforming data. Together, Chapters 4 through 12 of Data Resource Integration described the phased approach to integrating a disparate data resource.

Chapters 9 through 12 of Data Resource Simplexity have been summarized in Chapter 13 of Data Resource Integration, and considerable new material has been added about integrating the data culture. Chapter 14 of Data Resource Integration summarizes the effort to manage data as a critical resource of the organization. Together, Chapters 13 and 14 of Data Resource Integration provide a cultural approach to managing data as a critical resource.

The Glossary in Data Resource Integration includes all of the items from the Glossary in Data Resource Simplexity to provide a complete Glossary that helps resolve the lexical challenge in data resource management.

Today, the quality of the data resource is an issue of ever-increasing importance. The quality of the entire data resource, from operational data stores, to true data warehouses, to true data mining is mandatory for a business to be fully successful. That’s what the current book is about: improving the quality of data as a critical resource of the organization. It’s about understanding the problems and attacking those problems with proven techniques and knowledgeable people to stop the continued creation of disparate data and resolve the existing disparate data.

You won’t find little quips and puns in the current book, nor will you find specific problems and disaster stories. You won’t find much humor, because the disparate data problem is not humorous—it’s deadly serious. You won’t find much about sampling, statistics, project management, or justifying the existence of disparate data. You won’t find much about charters and job descriptions.

You will find a meat and potatoes approach to improving the quality of the data resource with sound data resource management techniques. You will find techniques to improve data resource quality, techniques to build an integrated data resource within a common data architecture, and techniques to provide sharable data that support business strategies and goals. You will find a definitive description of data architecture and data culture integration.

Audience

Data Resource Integration is a reference book intended for two broad audiences. The first audience is the experienced data management and business professionals who will use the material for resolving the existing data resource disparity rather than living with the disparity and performing repetitive data integration. The second audience is the data management instructors or trainers who will teach the material to those interested in resolving data resource disparity. The book is not intended for general audiences interested in data resource management, nor is it intended for casual reading from cover to cover.

I have been dealing with data in many different public and private sector organizations, large and small, new and old, for nearly 50 years. I’ve learned how disparate data are created and how a disparate data resource evolved. I have learned how to prevent the creation of disparate data and how to resolve existing disparate data. I have learned how to create a high quality, sharable data resource within an organization-wide common data architecture. The primary purpose of the current book is to pass some of the knowledge and skills I’ve acquired on to the reader.

If you or your organization have no disparate data and do not ever foresee having disparate data, Data Resource Integration is not for you. However, if you do have disparate data in your organization and the situation is getting worse, or you foresee having disparate data in the future, you really need to read Data Resource Integration and apply the concepts, principles, and techniques to create a high quality, sharable data resource. I suspect the latter is true.

Michael Brackett

Olympic Mountains, Washington

January, 2012

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.122.11