CHAPTER 11
Application Software is the Problem

The previous section showcased firms that are reaping great gains from their Data-Centric initiatives. One aspect of their success that we only touched on in the case studies, and which we will elaborate on now, is the massive reduction in their code base. In this chapter, we will examine more closely the four ways that Data-Centric methods lead to massive reduction in code bloat. But before we get to the “how,” we must address the “why.” When you start thinking about reducing your code, you may find yourself asking the following questions:

  • Isn’t software a good thing? Why would we want to reduce the amount of application software that we have?
  • How much application software do we have? How much do we want?
  • If most of our application code is unnecessary, where does it all come from?

Pointing out that most application software is unnecessary will make a lot of people uncomfortable. There are millions of application developers and application maintainers who have built their careers around coding software. There are hundreds of thousands of executives who have been investing in application implementation projects. There is an established $400 billion System Integration industry and another $400 billion Application Software industry.

The players in these arenas will probably not be thrilled at the idea that their efforts have been largely inefficient and misdirected. Unfortunately, though, we must face and accept this hard truth.

Isn’t software a good thing?

Of course software is a good thing. We couldn’t have traveled to the Moon without it, and it brings us the wonders of the internet (and its kitten videos). Even enterprises have benefited from their investments in inventory control systems and customer relationship management systems.

However, there are many ways in which software is also a liability. As we examine several of these drawbacks, you’ll see how they can quickly stack upon each other, ultimately tipping the scales deep into liability for most companies. We will explore techniques that will allow us to keep the value of our applications while shedding most of the liability.

Is application code an asset?

The current application-centric view of technology naturally considers application code to be an asset. After all, it took million lines of complex application code in order to automate and streamline our existing processes. Because this code has played such a vital role in the past, it is difficult for many people to change their view and see it as a liability.

The sheer volume and complexity of code eats up most of the support costs in IT. We have seen large projects justified by the large number of complex applications that must be integrated to solve a problem. Until we can see clearly that this isn’t necessary, and indeed is a detriment, it is easy to be taken in by this line of reasoning.

When you implement an application system, you incur a one-time capital expense (the often very expensive cost of a new application implementation project) and three ongoing liabilities:

  1. You must maintain the integration of this application with, potentially, all other applications in your firm.
  2. You must maintain the code base, correcting defects as they arise and modifying it to meet changes in the environment. You may incur this cost yourself or you may pay a vendor to do this, but the cost is incurred.
  3. Each new system increases the cognitive load of the people who are to use it. The more systems, and the more complex they are, the higher this cognitive load.

You also incur the operating costs of running the system (the hardware, services, and networking costs). While reducing code bloat will generally also reduce operating costs, the effect is not nearly as linear.

The first two liabilities make up most of the IT budget of enterprises. The developers are either maintaining existing code, writing new code (making the problem worse as we’ll see), or maintain the integration between systems.

The third liability is mostly felt in the lines of business, where staff spends additional time learning more systems and using multiple systems to get a task accomplished. The third liability also drives a great deal of the cost of internal help desks.

These liabilities are the gifts that keep on taking. Once you have installed an application system, the obligation is to maintain it and continue until it is decommissioned.

How much code do we have?

There is a micro answer to this (at the firm level) and a macro answer (the total economy).

The micro answer varies a great deal from firm to firm. But if you are a $5 billion a year firm or agency, you will have thousands of application systems. Each has its own data model, which has thousands of concepts. You easily have 1 million concepts, and the upper limit (as discussed in the second section) is 1 billion concepts.

The industry average is to have 1,300 lines of code per every concept in the schema of your databases. It is therefore highly likely that you have 1.3 billion lines of code under management.

What is surprising is how much of this code is functionally doing the same thing, but it can’t be reused either because it was written to a different data model, or it was written in a language that makes it hard to reuse. In the next section, we will attempt to estimate how much code you would need if you built for reuse.

The difference is staggering, and it is in line with what we have seen in these case studies.

The other way of looking at this is from the macro point of view: how much unique software code there is in the world? It’s a bit hard to nail this down because most estimates are overcounting because of the many nearly cost-free copies of code that exist.

Perhaps the best way to back into an upper limit for the amount of code that exists in the world is to reflect on the cost of producing it. Code is mostly written by hand. Unfortunately, there is still very little automated code generation going on.

Currently, most software code is written by hand, at a cost between $10 and $100 per line of tested code in production. If you take the high end of software development professional services expenditures at $1 trillion and the most optimistic estimate for productivity, we get an upper limit of 100 billion lines of code being produced per year.

Many IT professionals do not write code at all. 100 billion lines per year seems a bit high. Looking a bit deeper, I found these estimates:

  • 111 billion lines per year—a curiously precise number but attempts to drill down into the supporting detail came up empty-handed.46
  • 250 billion lines of code for the installed base of COBOL47—given it has been around for 50 years, this would suggest they have been adding to it at the rate of 5 billion lines per year. I’m pretty sure these numbers include many copies of programs including packages and libraries.
  • 100 million “pull requests”48—github, which is the most popular code repository, reported that they had hit their 100 millionth posting of code. Most posts contain more than a handful of lines of code (although there is a lot of copying going on). If the average net addition for a pull request is 100 lines, then we are at 10 billion (and this is likely less than half of all active code).

Let’s say we’ve been adding a billion lines of code to the installed base every year for the last several decades, which seems reasonable, we have a large legacy to deal with.

As mentioned above, the COBOL industry is believed to have 250 billion lines of installed code.

Whether tens of billions or hundreds of billions, we collectively have a lot of application software, and even at hundreds of millions to billions, each firm has a great deal of that under management.

Software is eating the world

Marc Andreessen, co-founder of Netscape and partner in the VC firm, Andreessen Horowitz, wrote a memorable article called, “Why Software is Eating the World,”49 in which he correctly points out that the value add in category after category of companies is shifting from bricks to bits. Amazon, the world’s largest retailer, is essentially a software company.

But Marc is overlooking the dark side of software eating the world. For the typical enterprise, the creation and maintenance of software are eating their resources. Most firms (except software companies, interestingly) are incurring more costs maintaining their software systems than they are generating in profits.

Maintaining application software that is marginally useful, and at the same time is consuming resources at a prodigious rate, is a major drag on most enterprises.

How much do we need?

Visual Studio is one of the single largest pieces of software in the world. It’s over 55 million lines of code. And one of the things that I found out in this study is more than 98 percent of it is completely irrelevant.50

Chris Granger

Windows 10 has 50 million lines of code. The Linux operating system is 15 million lines of code. An incredibly high percentage of both are consumed with backward compatibility. The Chromebook OS is about 5 million lines of code. It is not hard to conceive of operating systems using far, far less. But let’s accept 10 million as a reasonable number for an operating system.

Let’s accept tens of millions of lines for each database and piece of middleware we need (do we need dozens or hundreds?).

Finally, the real question is: how much application software code does the world need? QuickBooks is 10 million lines of code. Estimates on the size of the SAP (the dominant enterprise software vendor) range from 40 million to 400 million. There are thousands and thousands of enterprise application systems.

If an operating system didn’t need to backwardly support our vast legacy systems, then it would need far less than 1 million lines of code. The same goes for any piece of middleware. And far, far less for any given application.

As I was putting this chapter together, I thought I would check whether these statistics mirrored recent developments. In the chapter on enabling technology, I mentioned a new type of data store called a ‘graph database’ or a ‘triple store.’ These databases play an analogous role to the relational DBMS’s such as Oracle and Microsoft SQL. I asked the CEO of one of the triple store vendors how many lines of code were in their offering. After conferring with the technology team, he reported that their server, which supports a surprisingly high number of features and interfaces, was implemented in 250,000 lines of code. Another 150,000 lines of code are for regression and systems testing. This is 1% the size of most commercial Relational Databases and, I think, is indicative of the order of magnitude of improvement that is possible.

I have personal experience with leading teams to build, from scratch, complex ERP systems. In each case, the functionality was more complex than leading ERP vendor offerings. But unlike the tens of millions of lines of code in a packaged ERP system, our systems were built with precursors to the approaches we will discuss in the next chapter. In each case, most of the application code was generated (the total amount of code in each system was in the order of 5 million lines), but the number of lines that were custom written was under 100,000. As a result, the cost to build the system in addition to implementation in each case was under $3 million, which would be a fraction of the cost of a packaged implementation for the same functionality.

The world has many billions of lines of software code currently. We need several million. We probably have 1000 times more than we need. Someone paid to have this all built. That is a sunk cost. Even as I write, more software of marginal net value is being introduced.

But the real tax is the liability of using all this software. As we said earlier, the cost of maintaining, integrating, and learning this code is the real ongoing cost.

It’s sort of like cleaning out your attic. You may realize that, in principle, you need almost none of what you have in your attic, but you are unlikely to do anything until you move. Even then there is a pretty good chance you will pack up most of that crap and put it in your new attic.

I’m not sure that programming has to exist at all. Or at least software developers.51

Bret Victor

Application software is just the same. Unlike the junk in your attic, though, this application software bloat is costing you a great deal of money on an ongoing basis.

Where does it all come from?

If we have a lot of unnecessary software, it behooves us to ask: where did it come from, and why does it persist?

This is a deep question, with many facets, some of which I touched on in Software Wasteland. Let me summarize from there and elaborate a bit more.

I believe there are three major contributors:

  1. The relationship of code to schema
  2. Perverse incentives
  3. How software developers think.

The relationship of code to schema

As I mentioned earlier, in a traditional application system, the code and the database schema are bound in a very unhelpful way. The code is written to the schema. If there is a Customer Table, Orders Table, and a Products Table, the code will access the customer table, pick up a few attributes (perhaps create an order header), and write out the date, customer ship to, and bill to address. Order lines will be written by accessing the product tables and getting on hand availability, prices, and descriptions.

Ah, if it were only that simple. And if it were only done once and done consistently. But there are dozens to hundreds of systems with customer data. Each structured differently. Each table and field have a different name. The level of abstraction might be different. The tables might be in different brands of databases. This means that the code written for one application is not usable for the other. There is virtually no reuse at the business concept level across applications, despite huge potential benefit for doing so.

Why didn’t everyone write to a single shared schema? That was what ERP was meant to be, but the technology at the time meant that a single shared schema would have to be massively complex to handle even a portion of the needs of a large enterprise.

The technology now exists to implement a single, simple, shared, extensible model, but habits die hard.

Perverse incentives

A great deal of the IT industry benefits from writing and maintaining large bodies of code.

One of the strategies that many systems implementers and systems integrators employ at the start of a project is to bring in as much software as they can. There is a belief that the problems of the implementation can be solved at a categorical level. That is, just list all the aspects of the problem that need to be solved. Let’s say you need some content management functionality, and some search engine optimization functionality, and some multilingual functionality, and some messaging capability, and some task management functionality. You acquire one of each of these and start your project. This has become far easier in the age of open source as there is always an open source project you might use.

This creates the illusion of progress, while at the same time making the project bigger, and therefore putting you behind schedule. The reason these acquisitions make the project bigger is each has its own level of complexity. Each piece of the puzzle must be mastered. Several people will be assigned to each technology and will be consumed with mastering it. Each piece of the puzzle has its own shortcomings and its own latent defects. These take time to uncover, but they are there.

The real complexity, and therefore, the added cost, lies in getting all these disparate pieces to interact. They were designed independently and making them interact takes a great deal of effort.

By the time it dawns on the sponsors of the project that things are ballooning out of control, it’s usually too late to go back. When you realize that it is spiraling out of control, statistically you would be better off admitting defeat and canceling the project. However, we know from a host of studies that people are reluctant to abandon their “investments” and fail to treat them as the sunk costs that they are.

The real question, which I don’t have an answer for, is: “Is this an intentional strategy that system implementers employ, or do they sincerely believe that loading up on software at the beginning of a project will be beneficial?” It almost doesn’t matter, as the result is the same, but I’d be curious. Most of the small number of systems integrators that I know, seem to sincerely think that this is an inherently complex thing to do, and that adding more complex software to their project is helping.

How programmers think, and why this is a business problem

What follows are some generalizations. Some programmers are thoughtful designers. But most aren’t.

Programmers are problem solvers. At least, the good ones are. However, they tend to solve the problem at hand and care very little about the overall impact on the firm.

Additionally, programmers care far more about the structure of the data they are dealing with and much less about its meaning. Programmers like to solve problems with code. When a business analyst tells them about an exception, they tend to write an “if statement” to handle the exception. Their assessment is that solving a problem this way is more expedient. It is. But it misses all sorts of opportunity to solve problems in ways that non-programmers could maintain (through table-driven or parametric approaches), and it tends to create point solutions rather than study a family of problems and solve them as one.

We worked with a multilevel-marketing client (who shall remain nameless) who retained a firm to build them a custom online system. There are some complexities to multilevel marketing, especially in the area of what they call “genealogy” or the management of the relationships that lead to the building of their referrals tree. There is some very complex logic in calculating the upstream and downstream commission sharing. The rest of the site is just generic product sales.

We weren’t quite prepared for what we saw. The system had 2,500 tables. What made for a very curious coincidence is that they offered 2,500 products for sale. This really was just a coincidence, each product did not have its own table, but I’ve been fascinated ever since as to how anyone could possibly design a system like that. Earlier, we discussed a client that worked with 1,000,000 complex electrical parts. Their existing system had 700 tables, but the new system we designed had only 46. Clearly, there is no relation between the number of tables in a schema and the number of parts in a warehouse.

We would expect this multilevel marketing system to require about 100 concepts (total of classes plus properties) to handle much more complex product catalogs and online point of sale. We know, because we prepared a design for their genealogy system, that the genealogy portion of the system would also add far less than 100 new concepts to the model. We didn’t count the number of columns in the existing systems, but our observation is that most relational systems average more than 10 columns per table, so it seems reasonable to guess that this system has over 27,500 concepts that must be programmed to (2,500 tables plus 25,000 columns). This system is approximately 100 times more complex than it needs to be. This happens all the time. There was clearly very little thought put into the design of this system, just lots of coding, reacting, and adding more code to solve the next request.

A system that is 100 times as complex as it needs to be, 100 times more expensive to build, and 100 times more expensive to implement has 100 times as many latent defects and will cost 100 times as much to maintain.

Chapter Summary

It seems like application software is a good thing. Certainly, the first application systems built were great boons. In the 1950s, a company that automated its payroll system had an advantage over one that didn’t. This led us to believe that we needed application code to build all the user interfaces and codify all the rules that we need to automate our business processes.

But application software (really all software) is a liability as much as it is an asset. The complexity of the software is what makes the liability. As systems become more complex, they become more of a liability. They are a liability because of the latent defects that the code harbors and which come out to cause problems at very inopportune times. The complexity of application software exhibits liability characteristics when we try to change it. The complexity of application software is what makes change difficult.

The good news is that it is now possible to build most of most application systems without application code. After decades of staring at application systems, people have finally begun treating applications as if they were a business domain. The result, as we’ll see in the next chapter, is what is often called “model-driven development,” or the “low-code / no-code” movement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.171.52