CHAPTER 7
Turning the tide

What can you do now to begin to get out of the quagmire?

Assessment

The first thing is to take stock. Determine if you are in the quagmire. In order to take action you likely need to take an inventory, and get some metrics.

This section is about getting a handle of how bad the problem is in your organization.

Inventory of apps

The first step is to get a simple count of the number of applications you are currently supporting. With each application get two statistics:

  • Schema size. Number of tables plus total number of columns for relational systems, number of classes plus total number of attributes for object-oriented systems, number of terminal elements plus total number of attributes for XML-based systems, etc.
  • Number of lines of code. The number you have access to and are maintaining, and don’t count code you can’t modify.

This will give you the factors you need to assess the total complexity of your information infrastructure.

Functionality map

Construct a functional decomposition of the key business functions your organization performs. Keep this at a high level, for example, record resource consumption, establish, or record obligations, record assets transfers including cash, record physical movement, and record production. Create a companion list of facets that qualify these basic functions, such as the type of product being produced, type of materials whose movements are being tracked, or the geographical or organizational sub divisions that might form boundaries for application systems.

Then categorize each application by the functions and the qualifying facets. This should begin to suggest opportunities for rationalization. However, wait until you complete the dependency exercise to determine the sequence of execution.

Interfaces

Very few companies have a good inventory of the interfaces they have in place, which are costing money to support and that are hampering their attempts to migrate systems. There are several approaches to this step, depending on the number and complexity of your interfaces.

If you have less than a couple of hundred applications, you may find it easier to visually map the known interfaces and invite maintenance programmers to opine on the missing interfaces. It is best to get many of them in the room at the same time, as they tend to riff on one another and “remember” interfaces that they may otherwise have forgotten.

At more than a few hundred apps, you will need some automated support. Several software firms can help with this, including Eccenca, GlobalIDs, and Io-Tahoe. After you have profiled your data and determined that six different systems have identical histograms for certain datasets such as customer or product lists, the next step is to instrument the systems in real time and figure out which one is getting populated first and by implication, which ones have interfaces and are being fed.

You also need to map all the dependencies between your applications and operating systems, databases, network protocols, and computer hardware.

You want to map the dependency at the level it exists. If an application is dependent on a particular version of a database or an operating system, you need to document that.

If possible, document these dependencies graphically. If not, document them in a graph database and draw various subsets. What you are most interested in are deep chains of dependencies, and areas where only a few applications depend on a technology, as that may be an easier technology to “sunset.”

Map your dependencies

Three things make getting rid of legacy systems harder than it should be:

  • The functional boundaries are arbitrary
  • There are many deep dependencies that have accreted over time
  • They are mostly a black box

In this section, we’ll cover some tips for the first two, and the following section is about the last bullet.

At a first order of approximation, applications line up with business functions. Never mind the fact that there isn’t anywhere a good list of business functions, as every time we have looked at a client we see combinations of extremely generic functions, with business processes and categories.

Reflecting on this for just a moment or two, we realize this alignment just doesn’t exist. Take the function “Accounts Receivable.” Most companies have this functionality in dozens of their systems, and this functionality doesn’t have a crisp boundary. In some cases, it involves creation of the invoice. In other areas, it includes collection. Some have customer master file maintenance, many have cash application, and some assess fees and penalties. Most include aging and reporting.

Even more difficult for planning the replacement is figuring out the scope of each application. Sometimes the scope is geographic (this is AR for North America), for some the scope will be customer type (customer segments), and for others the scope is product or service type (separate AR functionality for claims over payment, elevator permits, or audit result fees and penalties).

Generally speaking, you are going to need to replace narrow point solutions with broader flexible solutions. For example, you can replace the AR for North American elevator permits with a general AR solution more easily than you will replace broad AR with a North American elevator permit system. At a minimum, this still requires two things: understanding where the functionality is and what functions the general solution must support.

First thing to do is catalog all your applications and cross-reference them to their scope. Unfortunately, this can be a fair bit of work, but it is essential. Create an inventory of applications, as a matrix to the various facets that define their scope (business function, geography, entity type, and internal organization). This will be the key resource for planning.

The next step is to create from this a dependency map. Each row on the matrix will become a node in a graph. If you have fewer than 150 applications, you may be able to draw this directly, otherwise you’ll have to rely on some graph visualization tool. You need to document all the dependencies. The most common are:

  • Database (vendor and version)
  • Middleware technology
  • Programming languages
  • Inter-application dependencies (an application will be dependent on other applications that feed it data)
  • Third party data feeds
  • APIs

These dependencies are mostly one-way (thankfully). An application will depend on a database, but I don’t know of any cases where a database depends on an application.

This resource is going to tell you how to proceed. You may want to weaken dependencies, for instance. If you find that an application is dependent on APIs in a particular version of an operating system, you may want to scan the code (see the next section) and profile the API calls. Often, a small number of APIs lock you into a particular version dependency. You can generally make changes to the code to replace those calls with those that depend on more stable parts of the API. While there is still a dependency, it has moved from a particular version of the operating system to a particular brand of operating system. This may not sound like a lot, but it’s surprising the flexibility some small shifts like this can improve.

We have contemplated, but never implemented, the idea of metrics of dependency, which would be helpful in a long-term legacy improvement project.

It will help to get a total picture of the internal and external costs to support the applications and their supporting infrastructure. If you can break it down by application or group all the better, but even if you can’t this will provide metrics to justify what needs to be done.

Starting to extricate yourself

Recognizing where we are doubling down on bad investments is the beginning of turning this around. We believe that the end game, a world where application implementation will cost a fraction of what it does now, and where system conversions are a thing of the past, will require embracing the Data Centric Revolution, as covered in the companion book. In the meantime, there are many things you can do to free up wasted resources, and prepare for your future:

  • Stop the bleeding
  • Set up metrics for cost of change and complexity
  • Adopt leader/follower practice
  • Bake off
  • True contingency
  • Reverse engineer your legacy systems
  • Launch pilots where you need skills
  • Build an enterprise ontology

Stop the bleeding

We work with state agencies a fair bit. Recently one agency retained some consultants who convinced them of two things that we happen to agree with, but we were amazed they got agreement on. The two things were:

  • The several projects underway were unlikely to end well and were not going to progress the agency toward their desire of getting off their legacy systems, and
  • The money freed up by cancelling these projects would immediately fund some more broadly scoped initiatives that were much more likely to help them toward their goal of getting off their legacy systems.

This is very impressive. It is easy to get caught in the year-to-year application replacement cycle, but as we’ve discussed throughout this book, most of these projects, while they have some tactical advantages often result in strategic backsliding.

This is a largely transferrable strategy, but a bit gutsy to execute. There are usually only two times when this can be executed: when new leadership takes over, or when consultants are brought in at a very strategic level.

In most organizations, it is very hard to get substantial funding for initiatives that address the systemic issues because they don’t have immediate payoff. At the same time, there is usually more than enough budget in tactical projects that are approved or underway. This also has the advantage of not having to wait through another budget cycle.

This particular agency is off to a great start. Of course that is no assurance that they will succeed in this endeavor, but they are certainly on the right track, and continuing would have only deepened their technical and integration debt.

Set up metrics for cost of change and complexity

It borders on being a cliché, but it is so true and powerful, I’m going to repeat it: “you can’t manage what you don’t measure.”

What a good metrics program does is focus everyone’s attention on the same (and hopefully correct) measures. Many metrics programs suffer from measuring that which is easy to measure. We already mentioned some of the worst things you can measure, such as “lines of code / developer day.” Even metrics that most would agree seem reasonable, like schedule and budget attainment, can backfire if they are the only or primary measures. They tend to encourage overly conservative estimating and have project managers emphasize cost over delivering benefits.

Good metrics programs have a few, very important metrics, even if they are not the easiest metrics to gather.

What we want to encourage are metrics that measure the overall health of the information infrastructure, such that a project that makes things worse overall would have a high hurdle to overcome.

A good metrics program would give an overall measure of “goodness” (Are we getting better over time?) but would also be able to single out existing systems and help understand which ones are contributing the most to the dis-economy.

Two metrics that have the most promise (but these are not easy to measure) are:

  • Overall complexity
  • Cost of change

Overall complexity

This is a measure that you can start simply, and then crank it up as your data gathering improves. The first order approximation of complexity is the total number of concepts under management. Concepts are classes (e.g. tables, entities) and properties (e.g. columns, attributes, elements). You may want to count enumerated values (e.g. drop-down lists) but only if they are called out in programs or queries. In a traditional system, almost all the classes and properties contribute to the overall complexity of the code that manages it. What the enumerated values do is a system-by-system call. Some systems or styles of development encourage people to code against the controlled vocabulary, which means they contribute to the complexity.

This measure can be gathered by system. You will quickly find out which applications are contributing the most to the overall complexity of your info-scape. You might think that your larger or more complex applications are entitled to add the most to the overall complexity, but this isn’t always the case. Some systems are just arbitrarily complex.

If you decommission a complex system, you should take credit for it. By the way, many systems project start off with the intention of decommissioning a system, but end up co-existing with it. This is pretty much the worst of all worlds, as the old complexity persists and the new adds to it.

Cost of change

This one is hard to measure in a way that is comparable across systems, but it is the most important. In a well-designed agile environment, the cost of making a single incremental change is low. In most legacy environments the cost of making a similar change is very high.

What we need is visibility into the gradient between these two extremes, which is where most applications live. The easier thing that many firms have is a log of change requests and changes implemented. The famous “legacy backlog” is the long list of change requests that haven’t yet been implemented because they are hard. Most firms do not have access to the “shadow backlog” which are changes that users would like but don’t even bother to submit because they would take so long to implement.

By the way, the presence of a rogue system such as one developed in Microsoft Access or Excel, might be a good proxy for change metrics. No one wants to build a rogue system. People build rogue systems because they have a need that would be too hard to implement in the existing systems. About half of the rogue systems we have examined are systems where someone wants to view existing data using a category that doesn’t exist in the base data, and then do some simple calculations. Keep in mind that if the cost of change was low, there would be no need for rogue systems.

At the first level of approximation, it would be good to get an idea of the overall cost of change by application. How much are we spending on maintenance for this system?

At the second level, we must figure out how deep the change backlog is and if possible, estimate the shadow backlog or rogue systems that have been spawned. A powerful metric is the comparable change cost of one system over another.

A good cost-of-change system would help an analyst predict the cost of the change by understanding the complexity of the environment. The very statistics that help an analyst predict the cost of change will rapidly shine a light on the issues.

Adopt leader/follower practice

The defense industry has a practice which they occasionally follow, called the “leader/follower.” The goal is to avoid becoming overly dependent on a single defense contractor.

The essence of the idea is to award a major weapons program primarily to the winner of the RFP, but to award a portion of it to one of the rivals. You may decide that Contractor A will get 80% of the orders for a new fighter jet or intercontinental ballistic missile. Contractor B will get the other 20%. Contractor B is at an economy of scale disadvantage. However, what may make up for their disadvantage are the incentives. After two years, the customer reevaluates performance of A and B. If B is outperforming A, B gets the 80% for the next two years and A must take the 20%. This is the strategy NASA is using for the Space Station resupply contract. Boeing has been awarded the leader contract at $4.2 billion and SpaceX the follower contract at $2.6 billion.48

In the absence of the leader/follower arrangement, “A” (who has 100% of the contract) is motivated to issue change orders and claim that the work is “way harder than it seemed.” Moreover, the customer is more or less stuck with accepting this.

In the leader/follower model, this is a prescription to have your book of business cut by 75%.

Conversely, the follower is incented to invest in anticipation of a fourfold increase in revenue. This is a brilliant way to flip becoming captive to your vendor to reinstating market-based competition, even if it is only a market of two.

To make this work in software, there is a need to make sure the assignable work is interchangeable. This entails that there are no proprietary dependencies, and that the specifications are transparent and easily ported. While this takes a bit of discipline, it’s hard to imagine any effort with a higher payoff.

Bake off

Instead of launching a $100 million implementation project (and therefore being limited to the small pool of companies who have “successfully” implemented $100 million projects in your particular subdomain), consider a bake off or a tournament.

Most companies would be far better off instead of launching a $100 million project, to award 10 companies $1 million each to build a MVP (Minimum Viable Product) version of the system.

You will have invested $10 million. You will have at least three viable solutions (10% of the way into a $100 million project to have a 50% chance that you are 10% of the way toward a solution).

Double down with three of the best from the bake off, give them each $2 million more to enhance or elaborate their solution. Add in some requirements you had not previously shared, to judge how well each solution accommodates unanticipated change. Now you are out $16 million. Not only will you have at least one solution that will work, you will know the cost of change. Any additional requirements that haven’t been articulated can easily be estimated. Compare this to the traditional approach: all unanticipated changes are change orders, and very expensive ones at that.

At this point, it should be easy to pick a winner, and invest whatever $5 million or $10 million to finish the requirements, conversion, and rollout. The risk is virtually eliminated. The cost is 10-20% of what it would have been.

True contingency

Almost all large software projects have a contingency budget. Typically, this is 10-20% of the total project budget. The estimator will tell you there is a 90% likelihood that the actual project cost will fall within this range.

If you study the history of large software projects, you’ll agree that this is ridiculous. Large software projects do not distribute around a mean with one or two standard deviations being +/- 10 or 20%. (By the way, no software project ever comes in under budget. Ever.)

Steve McConnell invented a test to help people understand how well they can estimate confidence intervals. Go ahead and take the test before you read on:

https://blog.codinghorror.com/how-good-an-estimator-are-you/

We don’t want to give away the result, but most estimators are overly confident in the accuracy of their estimates.

Suffice it to say, the contingency for a software implementation project is far too low. In addition, the implementer knows it is too low. They will charge up to the contingency before they get into the charge order business.

However, you have another option: spend your contingency budget on a contingency.

Here is how this would work:

  • Let’s say you have a $100 million project, and therefore a $20 million contingency.
  • You take some percentage of that contingency (say 20% of the 20%, or 4% of the original budget) and launch a “contingency project.”
  • Best to have a bit of stealth around this, we suggest calling the contingency project a “fully functioning prototype.”
  • The purported aim of the “fully functioning prototype” is to allow you to work out a lot of UX (user experience) and data quality issues. You can try out different user interfaces and see what will work long before committing to a particular strategy. Likewise, you can get a lot of lead time on your data quality issues.
  • But the real point of it being “fully functional” is that you will have a version of the system working with full data volumes, ready to step in should the main contractor get in trouble, and thereby need the contingency.
  • You also inform the prime contractor that you have spent the contingency, and that it is not available to them. You needn’t tell them what you are doing. They either wouldn’t believe you or would try to sabotage it.

In the worst case, you will have spent 4% of your project budget on a “second opinion” and on a working laboratory for the user experience. In the best case, you implement the fully functioning prototype.

Reverse engineer your legacy systems

One of the things that keep legacy systems in place is the fear of the unknown. Employees know that the system works–they know there is a great deal of cumulative wisdom and response to previous requirements captured in the legacy system. But rarely does anyone feel confident enough to replace them, due to uncertainty about the adverse effects of abandoning the legacy system.

The truth is most legacy systems have found very complex ways to do very simple tasks. People believe there are complex business rules and algorithms hidden in the vastness of their codebases. There is a bit of truth to this, and in most of the cases where we have looked the real functionality is marbled throughout long stretches of code (one bit of code will set an indicator, another will pick up on the indicator and do some calculation, another will pick up the calculation and use it to categorize a customer or a vendor). All this is just classification. Most classifications can be done in a single step. When you put the logic in one place, it is easy to change it or eliminate it when it no longer serves it purpose. When you spread it around, no one feels confident enough to eliminate a step, because nobody knows what the downstream effect might be.

To counter this paralyzing conservatism, you need comprehensive data. The two best sources are the application code and the persisted data.

Code understanding systems

Systems that parse source code and reverse engineer its meaning have gotten better and better. At a first level of approximation, they can whittle down huge amounts of code to manageable subsets. The ground truth is that most of the code in a legacy application is doing little more than moving data back and forth. The data is moved from the database to some intermediate representation such as copybooks in the COBOL days, object structures in the Java and C++ days, and dictionaries and arrays in JavaScript, Python, and their cousins. It is moved again to APIs, transactions, or representations on the screen. There are tiny bits of logic amid all this moving, such as if the amount field is negative, move it to the credit attribute, otherwise move it to the debit attribute.

Anyone familiar with mining may have heard the term “overburden.” This refers to the dirt that must be scraped away before you can reach the valuable ore. Most of the code in enterprise systems is the equivalent of overburden. Scraping it away makes looking for the nuggets easier.

The second level of low-grade functionality is validation, constraint, and integrity management. This is often the second largest category of code in a legacy system, and is often sprinkled throughout. Valid values for enumerations are often in code, as are routines to check for the presence of keys in other tables. When you recognize these patterns, you have another major category of code that you can easily describe and not fear.

Understanding legacy code will also help you find dead code. There is a lot of code in your existing system that is unreachable. Maybe it was code in packaged software that can’t be reached because of the way you configured it. Maybe code was prepared to handle conditions that can no longer be set.

Some legacy-understanding software can monitor systems and can report on code that cannot be reached, as well as code that wasn’t reached over a long period. This isn’t as convincing as a static analysis that can prove that code isn’t reachable, but it is a strong indicator that if over time a chunk of code hasn’t been accessed, conditions in the business may have changed such that this code isn’t needed.

The goal of this exercise is to whittle down a large system to the small amount of code that actually contains important business logic. It is often surprising how little truly algorithmic code exists, and how easily it can be replicated in modern environments. When you strip away most of the repetitive and redundant code, there is often little left. A complex inventory system might have a few algorithms buried in it to set reorder points, calculate efficient order sizes, and flag unusual demand patterns. This is augmented by some very simple logic around re-determining average item cost based on recent receipts and choice of costing method.

I once managed the building of a very complex custom ERP system for a continuous processing manufacturing company making lot-based materials with highly overlapping specifications. The system managed four forms of outbound logistics: a rail car fleet, trucks, containerized shipping, and air shipments. It also contained complex contracts, complex sales incentive compensations, and an ISO-certified laboratory management system. We built this system using a model driven approach that generated code based on simple descriptions of base functionality. In the end, less than 2% of the 3 million lines of code in the system were custom written. This was a much more complex system than most of the systems we see in place in most companies. Once decomposed, it is not uncommon for less than 1% of an application systems code to be responsible for all the custom functionality.

One other thing you want your legacy code-understanding project to do is find and document the dependencies that are baked into your code. This will update the dependency analysis we mentioned earlier and give you a view that will tell you what things will break when you change out various components.

One bit of good news here. By tying the dependencies to licensing costs, it is often possible to pay for the legacy modernization project in its early phases by some strategic rationalization of licensed components in the infrastructure.

Data understanding

A benefit of understanding the legacy system is understanding the data in this system. Again, many tools will help with this endeavor.

First create an automated profile of the data in your existing systems. This will tell you if the metadata defined isn’t doing anything and the data you have provides no information.

If you have a table with 20 columns, and only half of them are populated, there is a reasonable chance that you have code in your legacy systems that is “moving” data from the cells these empty columns represent to attributes which will also be empty. If the code is not moving the empty data, it is often testing for nulls, which is still superfluous work.

The next case is a data value that never varies. We often see tables with values that are the same for every row. This usually isn’t a coincidence. There is a configuration in the system that has set this value as a constant and repeated it in every row. This happens when implementing a package. Application software vendors often build flexibility into their packages and then constrain it at implementation time. For instance, there may be the potential to have multiple currencies in a purchasing system. If the configuration says that the currency will always be in US dollars, this field gives no new information. The data would be as rich without repeating that at every row.

This is the low hanging fruit, but just harvesting the low hanging fruit will simplify a model considerably. Many products take the analysis to the next level, where they generate histograms for all the values encountered in a given column. The pattern and value set tells a lot. First, it will spot the foreign key / primary key relationships, even if they aren’t being managed by the database management system. This is even more powerful across applications.

A data understanding product will detect a set of values in a system. We know them as customer ids, but the data understanding system only knows them as a histogram of values. When it detects the exact same histogram in another system, it rightfully concludes that these two columns represent the same thing.

Some of the more sophisticated data understanding systems look into the metadata or into patterns in related data to find some initial semantic distinctions. When it finds a column that is exclusively filled with numbers that match the United States social security pattern, it makes two initial assessments: this column represents social security number identifiers, and that the row represents people. If it has deduced a histogram for a person id, and finds that same histogram in another table or even another system, then it concludes that the other column is also referring to the same set of people.

This type of analysis can often reduce the complexity of the data scape by a factor of 10 or more.

Legacy understanding

Understanding your legacy system is a lot of work, but the alternative is far direr. We have watched ill-conceived legacy modernization programs invest hundreds of millions of dollars only to flounder when the implementers could not assure sponsors that the replacement system would not overlook some important use cases that the current system handles well.

Once you understand the dependencies, have uncovered the small part of your data model that is in play, and documented the rules in the application code that will need to be brought forward, you will have taken most of the risk out of your legacy modernization program.

There is still a lot of work to do, but when you shine a light on all the risks before you start, you may proceed with confidence.

Launch pilots where you need skills

You will be replacing your legacy systems with something. Your main choices are a “neo legacy” system. That is, a system with all the economic characteristics of a legacy system, but done with modern languages, or a modern system.

Your developers will want to implement a neo legacy system. They won’t say it in those terms but they do. They don’t want to change the way they think, they just want to learn some new more marketable skills.

Being aware of this you can layer in some trendy technology with the long-term change you’re trying to establish. You will also need to establish multi-disciplinary teams. You will want to define projects that will build individual skills as well as corporate competence.

Corporate competence in this area comes from being open and sharing. For instance, you might stand up a RESTful endpoint to allow internal users to access your reference data. Each group that uses this learns several things simultaneously:

  • They learn that effort can be saved by sharing
  • They learn how to consume a RESTful endpoint
  • If you make the code available internally they can learn how it was built, and could do something similar for their subdomain
  • It may cause other reference data sets to come out of the woodwork

The multi-disciplinary teams that you sponsor should contain roles for:

  • Modern development languages
  • Semantic modeling, which is an important skill in economizing and rationalizing data models
  • Agile scrum masters to promote and implement agile principles
  • Specific technology areas, such as:
    • Natural Language Processing
    • Social Media Data Processing
    • Predictive Analytics
    • Machine Learning
    • Data Science
    • Big Data and Spark
    • Statistical Languages such as R or NumPy
    • Web Scraping
    • Graph Based Visualization
    • Model Driven Development
    • Data Profiling
    • Containerization
    • Semantic Technology

Build an enterprise ontology

An enterprise ontology is like an enterprise data model, but is typically 100 times simpler, and is far more flexible.

Your enterprise ontology will form a simple stable core for all your additional endeavors.

A core model is an elegant, high fidelity, computable, conceptual and physical data model for your enterprise.

Let’s break that down a bit.

Elegant

By elegant we mean appropriately simple, but not so simple as to impair usefulness. All enterprise applications have data models. Many of them are documented and up to date. Data models come with packaged software, and often these models are either intentionally or unintentionally hidden from the data consumer. Even hidden though, their presence is felt through the myriad screens and reports they create. These models are the antithesis of elegant. We routinely see data models with thousands of tables and tens of thousands of columns, to solve simple problems. Most large enterprises have hundreds to thousands of these data models.

Our experience tells us that at the heart of most large enterprises lies a core model that consists of fewer than 500 concepts, qualified by a few thousand taxonomic modifiers. When we use the term “concept” here, we mean a class (set, entity, or table) or property (attribute, column, or element). An elegant core model is typically 10 times simpler than the application it is modeling, 100 times simpler than a sub domain of an enterprise, and at least 1000 times simpler than the “datascape” of a large firm.

High fidelity

An overly simple model is not terribly useful. Sure, we could build a model that says that customers place orders for products. This is literally true, but not sufficiently detailed to build systems to drive analytics. This is the main reason that application data models have gotten complex: an attempt to represent requisite detail.

But virtually every application we’ve looked at has way overshot the mark, and done it poorly to boot. When they encounter a new requirement, application developers tend to do one of two things: write some code to address it, or amend the data model to address it (and then write some more code). It rarely occurs to them to consider a way to represent the distinction that would be reusable. Furthermore, very often the additions being made to a model are “distinctions without a difference.” That is, they add something that was “required” but never used in a way that affected any outcome.

Our lens for fidelity is this: if the distinction is needed to support “data structuration,” business rules, or classification for retrieval or analytics, then you need that distinction in the model. I have grown very fond of the phrase “data structuration,” which is one of the terms that our European customers use. It essentially means decisions around how you want your data structured. So if you decide that you need to store different information on exempt employees versus non-exempt employees, then you need to be able to represent that distinction in the model.

For business rules, if you charge more to insure convertibles than hard top cars, then the model has to have a place to keep the distinction between the two. If your users want to sort their customer lists between VIPs and riff raff, then the model needs that distinction. If your analytics need to aggregate and do regressions on systolic versus diastolic blood pressure readings, then you must keep that distinction.

You’d be forgiven for believing this justifies the amount of complexity found in most data models. It doesn’t. For the most part these necessary distinctions are redundantly (but differently) stored in different systems, and distinctions that could easily be derived are modeled as if they could not be.

The real trick we have found is determining which distinctions warrant being modeled as concepts (classes or properties) and which can be adequately modeled as taxonomic distinctions. The former have complex relationships between them. Changing them can disrupt any systems depending on them. The later are little more than tags in a controlled vocabulary, which are easier to govern and evolve in place.

The other trick for incorporating the needed distinctions is the judicious use of faceting. Taxonomists today feel the urge to create a single rooted, giant taxonomic tree to represent their domain. There are generally many smaller, orthogonal facets trapped in those big trees. Extricating them will not only reduce the overall complexity, it has the added benefit of making the pieces far more reusable than the whole.

We have found the secret to high fidelity coupled with elegance is in moving as many distinctions as possible to small, faceted taxonomies. Facets are small, independent ways to categorize things.

Computable

A computable model is one that a program can do something useful with directly.

By analogy, it is the difference between a paper Rand McNally road map and Google Maps. Both model the same territory. Either one might be more or less useful for the purpose at hand. Either could be more detailed. However, the Google Map is computable in a way the paper map isn’t. You can ask Google Maps for a route between two points. You can ask Rand McNally all day long, but nothing will happen. You can ask Google what coffee shops are nearby.

A data model on a whiteboard is not computable, nor is one in Visio. Sophisticated data modeling tools give some computability, but this is often not available in the final product. Rand McNally probably uses Geospatial Information System software to build their maps, but it is no longer present in the delivered environment.

The core model that we advocate continues to be present, in its original design form, in the delivered application. It can be interrogated in ways previously only available in the design environment.

Conceptual and physical

Received wisdom these days is that a data model is a conceptual model, a logical model, or a physical model. This is mostly driven from the construction analogy where a conceptual model is the architect’s drawings, the logical models are the blueprints, and the physical model is the item actually built.

In the data world, these models are often derived from each other. More specifically, the logical is derived from the conceptual and the physical from the logical. Sometimes these derivations are partially automated. To the extent the transformation is automated, there is more likely to be some cross reference between the models, and there is more possibility that a change will made in the conceptual model and propagated down. However, in practice this is rarely done.

The need for three models is more closely tied to the state of tooling and technology decades ago, rather than what is possible now. Applications can now be built directly on top of graph databases. The graph database makes it possible to have your cake and eat it too with regard to structure. The graph database, when combined with the new standard SHACL, allows application builders to define minimum structure that will be enforced. At the same time the inherent flexibility of the graph database, coupled with the open world assumptions of OWL, allows us to build models that have structure, but are not limited by that structure.

By using URIs as identifiers in the data model, once a concept has been defined (say in the equivalent of a conceptual model), the exact same URI is used in the equivalent of the logical and physical models. The logical conclusion is that the conceptual, logical, and physical are the same.

The real shift that needs to happen is a mental one. We’ve been separating conceptual, logical, and physical models for decades. We have a tendency to do conceptual modeling at a more abstract level, but this isn’t necessary. If you start your conceptual core modeling project with “concrete abstractions,” they can be used just as well in implementation as in design. Concrete abstractions are concepts that while they are at a more general level, can be implemented directly. The classes Person, Organization, Event, and Document fit this, as do properties such as hasPart, hasJurisdiction, governs, startDate or name.

Summary

I hope that this book has suggested that the so-called “best practices” in implementing enterprise applications is anything but. Moreover, that you (as the sponsor of these systems) are being held hostage. Perhaps you (as the hostage-takers, if you have persevered), realize that the gig is up.

Hopefully, it is time for a new normal to emerge.

Frankly, the normal we have now is so bizarre that anything will be an improvement.

This change will have to be led by the buyers of systems, because the providers of systems have billions to gain by preserving the status quo and nothing to gain by improving things.

I hope that this book gives you enough to break the cycle of dependency.

To close the loop back to the first chapter: Lean manufacturing has made a religion the value of attacking waste at all its manifestations.

The Enterprise IT / Application Implementation Industries have waste at a level that would embarrass even the least lean of current manufacturers.

We hope this book is a call to arms. The cushy business as usual application implementation business is done. Million dollar projects with billion dollar prices tags are doomed.

It is time to begin curtailing projects that take us away from the naturally-integrated agile future. We must move toward an infrastructure that encourages reuse at every level and incremental improvement along the way, rather than “moon shots.”

In the companion book, we outline a possible future. The Data Centric Revolution is not the only possible future, it is just an exemplar. We are putting it out there as an open standard so hopefully there will be many implementations. Many of you will be able to save yourselves hundreds of millions of dollars without even postulating what your end state will look like.

This book is about helping you out of the quagmire. The next book is an exemplar of what a post quagmire world might look like.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.214.56