Chapter 7. Technical Debt and Production

In this chapter, we explore technical debt that arises in the process of putting ­software in the production environment and into the hands of its end users. This process includes the build and integration, testing, deployment, and release aspects of software development. These release activities involve essential software artifacts that can cause technical debt or that can be subject to technical debt themselves.

We explain how to recognize technical debt in the infrastructure of the release activities. We again illustrate our lightweight analysis technique to assess technical debt in such artifacts and to ensure traceability so that misalignments between these artifacts do not introduce technical debt. We focus on automated testing, continuous integration, and deployment aspects.

Beyond the Architecture, the Design, and the Code

In Chapters 5, “Technical Debt and the Source Code,” and 6, “Technical Debt and Architecture,” we looked at how technical debt appears in the traditional activities we usually associate with software development: code, design, and architecture. But technical debt can also appear in the steps that deliver the software to its end users, wherever there are code and structural considerations.

How does software get into the hands of users? Industry practices vary widely. Software can be embedded in another physical product, such as your TV monitor; it can be delivered to individual computers or devices, such as your laptop or cell phone; or it can run in large operations centers using the SaaS paradigm (Software as a Service).

SaaS has been undergoing a big transformation lately, evolving from software development teams throwing candidate software releases over the wall to operations teams, to more integrated approaches, nicknamed DevOps, for development and operations.

Just as processes used by the software industry vary, so does the terminology they use to describe this tail-end process. We will begin by defining a few terms.

We use the term release for the part of the process that brings completed code to a running, operational system in the hands of its end users. So, release is the process that brings the software into production, as shown in Figure 7.1.

A figure depicts the release pipeline of the software process.
Figure 7.1 Release pipeline

The release part of the process encompasses the following four activities:

  1. Build: Creating the executable software

  2. System test: Validating that the software is ready for use

  3. Deployment: Bringing the software (and data) to the place of use

  4. Turn it on: Making the software operational

Release occurs at various time increments—from years, to months, to weeks, to more or less continuously. Continuous integration and deployment enable developers to push a code change through the release activities immediately into production.

Continuous integration involves rebuilding the software when any significant change occurs and is practiced throughout the industry. Continuous integration involves integrating artifacts on every change, notifying the team immediately of success or failure, and requiring issues to be fixed before moving forward. Continuous deployment involves deploying changes into production as soon as possible, to make the software operational.

These activities are supported by tools, and there are many good ones to choose from today. These tools are usually driven by programs called scripts that are written in various languages, including operating system shell scripts.

Because of all this script-driven automation, technical debt in production is not very different conceptually from technical debt in code or software architecture. You can think about your infrastructure as a complicated codebase. Infrastructure as code refers to the process of managing the IT infrastructure through automated processes. All assets are versioned, scripted, and shared, where possible.

All the three project examples we’ve been examining (the three moons of ­Saturn) have a significant production element: They have an operations team. Atlas uses a DevOps approach, Phoebe is an agile shop, and Tethys uses a more traditional method. Here is an example from the Phoebe project about its build automation tool, called Make, which automatically builds executable programs from source code:

Make’s dependency calculation is taking 20% of the time for an incremental build, and we need to speed things up. We had been able to make some small performance improvements in the past but are no longer able to continue with such workarounds.

So, the Phoebe project has both the software that is the product, which is “shipped,” and the software that helps build the software, which is the product. ­Previous chapters discuss Phoebe’s software product; here we consider the software that builds the product. For shrink-wrapped software (what is in the box or the installer you download) or embedded software, the distinction between the software that is the product and the software that helps build the product is pretty ­obvious. For SaaS, it is a little trickier. But this software still impacts what the end user experiences.

There are several important differences between software products and software used in production:

  • Different tools: The production phase often employs a chain of several tools, using plugins to refine and specialize them; this is an extension of the traditional build tool chain of compilation/linking and not a fundamentally different animal.

  • Different languages: The languages used in production software are often not known for their legibility and maintainability.

  • Different people doing operations or different maturities of the personnel involved: These differences can lead to cultural issues; some organizations do not treat the infrastructure code as first-class software.

  • Different degrees of automation: Often some manual steps need to be performed.

And above all, greater degree of difficulty to test before putting software into ­production. This was easy in the shrink-wrapped context, but much harder in an SaaS environment.

In developing the codebase, the language often provides some conceptual integrity, especially when using well-known frameworks. For example, you may have all your application code written in JavaScript, using the MEAN stack (MongoDB, Explorer.js, Angular.js, Node.js), and manage it in Git repositories. In contrast, the tools in the release process may be more scattered and may have evolved organically (as opposed to being well designed), sometimes in the hands of people with a lesser degree of software engineering sophistication. Version control may have a 1990s feel, or it may not be done at all.

The field of infrastructure and its code is not as mature as the software development field, despite the availability of many tools to assist in the process, so it is more difficult to have a top-down design. There is little in the way of standard practices, guidelines, or education available. In large systems, the tool chain will also contain elements to monitor the behavior or health of the running system, collect metrics to allow reflection on the system, react automatically to specific misbehavior, and guide future evolution.

Build and Integration Debt

Technical debt in build and integration appears in two ways:

  • Imperfect or suboptimal design and coding of the build scripts themselves: Build scripts are, in effect, code, sometimes supported by special code embedded in the application under development.

  • Misalignment between the build dependencies and the actual code: As the software rapidly evolves, new components may not be backward compatible.

A loop that includes all code generated during the development, testing, and deployment processes, illustrates the concept, "All code matters."

Build automation keeps builds consistent. Build scripts build the product but are often used for other tasks, such as running unit tests, packaging binaries, and generating project documentation, test coverage reports, and internal release notes. The absence of build infrastructure is a source of technical debt because it increases the setup time when new developers join the team or a new machine is installed.

Automation and continuous integration require an investment in infrastructure and the ramp-up time to design, develop, and use the continuous integration server. Building such infrastructure involves architecting and implementation and hence can introduce technical debt—much as described in Chapters 5 and 6.

Testing Debt

Technical debt in testing appears in three ways:

  • Imperfection or suboptimal design and coding of tests: Test suites are, in effect, code, and they are sometimes supported by special code embedded in the application under development. Large sets of automated tests may not have a clear purpose; when they fail, something is probably wrong, but it is unclear what artifacts contributed to the failure and why.

  • Misalignment between the tests and the actual code: As software evolves rapidly, new tests may be missing or may test an older interpretation of the requirements. Very fine-grained tests introduced early in development, especially with mockup software, become a nightmare to maintain as they create complex webs of code around the production code; one small change might, for example, cause 60 tests to fail.

  • Challenges of SaaS contexts: Development, testing, and production environments can become misaligned. If your developers use version X, your continuous integration system version Y, and your production servers version Z, then your tests aren’t testing the right thing, and your developers might not know about it. Or code that worked perfectly during development might fail when deployed to the test infrastructure.

Here is an example of a technical debt item from the Tethys project, whose developers have grown frustrated because multiple tests have a similar purpose, and other tests override each other:

Page_test_runner and benchmark_runner_test are duplicates. The duplication is a consequence of trying to expedite a request by the controls team. When the actual test code got written, they did not realize that the test got dubbed. These test codes should be merged and refactored, as the code also includes a page setup test that can be overwritten.

This example demonstrates that an organization needs a deliberate strategy for managing technical debt not only for development but also for testing and production. Tests need to be designed and aligned to their purpose, implemented following sound coding practices, and executed in alignment with the functionality and attributes they are meant to test.

Infrastructure Debt

Technical debt in deployment appears in two ways:

  • In the structure of the operational system: This may include the lack of “observability” of the system, which may be referred to as monitoring debt.

  • In scripts: This may include scripts that enact the deployment of the code, the data, and the updates on the operational system.

This is infrastructure debt hiding in infrastructure code. A task that must be performed manually, again and again, by the staff on the operational system is an example of such infrastructure debt. The operations team must continuously pay the recurring interest, while dealing with significant risks.

The lack of verification of deployment scripts is a source of technical debt. It is essential to check that the scripts are compatible with the architecture to avoid inconsistencies between development, testing, and production environments and to minimize risk.

The Case of Technical Debt in the Production of Phoebe

Previous chapters describe how Team Phoebe identified technical debt items in the code and the architecture. Let’s continue with the Phoebe example to see what additional information the team can uncover by analyzing the infrastructure. Treating infrastructure as code, team members again follow the steps of technical debt analysis (described in Chapter 4, “Recognizing Technical Debt”). In the first iteration, they define key business goals. The development team has already looked at pain points related to two of the business goals—“Create an easy-to-evolve product” and “Increase market share”—from code and architecture perspectives. Another related business goal is “Reduce time to market.” There is growing concern that velocity keeps dropping. It takes forever to implement even a simple change and test it. The developers turn their attention to improving the build time and test infrastructure.

Improve the Build Time

As Team Phoebe evaluates possible solutions for improving performance for Make’s dependency calculation, team members consider the consequences of technical debt. Should the team continue to incur more debt, pay it off at the expense of some performance, or make a partial payment on the debt while still meeting their performance goal?

The sample technical debt item in Table 7.1 shows the team’s analysis of the build infrastructure to get insight into the maintainability of the build and integration scripts.

Table 7.1 Techdebt on build infrastructure

Name

Phoebe #500: Improve build time

Summary

Make’s dependency calculation is taking 20% of the time for an incremental build. The team is considering three alternative solutions and the trade-offs involved in incurring technical debt to optimize performance.

Consequences

Slowing build time and turnaround time for feedback.

Remediation approach

I tried three approaches:

  1. extra_cflags on the cc compiler command, separate precompile header command

  2. override cflags per rule to add -include for source files and -x for precompiled header files

  3. base_cflags with normal flags, set cflags to $base_cflags -include, override it with $base_cflags -x for precompiled header files

1 is messy but fast, 2 is cleaner but a lot slower (due to cflags per object file), and 3 is cleanish and fast.

Reporter/assignee

Build team

Improve the Test Infrastructure

Team Phoebe would also like to reuse new test helper modules for a legacy test framework. While the development team has been migrating its integration tests to the new test framework, there have been two parallel test helpers to maintain. This code duplication is a source of technical debt and requires team members to make changes in two places. They often forget, which leads to unintended drift between the two frameworks.

The remediation approach the team is taking allows the legacy test framework to reuse the new test framework’s helper modules, which are essentially a cleaned up port (better documentation, linted, obvious errors fixed). The sample technical debt item in Table 7.2 shows analysis of the test infrastructure to get insight into the maintainability of the test framework.

Table 7.2 Techdebt on test infrastructure

Name

Phoebe #501: Improve test infrastructure

Summary

While DevTeam has been migrating its integration tests to the new test framework, there have been two parallel test helpers to maintain.

Consequences

This code duplication is a source of technical debt and requires team members to make changes in two places. They often forget, which leads to unintended drift between the two frameworks.

Remediation approach

Reuse the new test framework’s helper modules. The goal isn’t 100% code reuse between the old and new test frameworks but 80”90%.

The test methods that remain are here for three reasons:

  • When ported to the new test framework, they were refactored into different modules, and legacy tests need to be updated to load new modules.

  • Navigating the page in the old test framework is hacky and has been cleaned up in the new test framework, so the tests wont ever share implementations.

  • Subtle refactoring changes make the new implementation fail certain tests. This test failure should be followed up by using the old implementation and then refactoring once all tests have been migrated.

Reporter/assignee

DevTeam developers

Service the Production Debt

After inspecting the infrastructure, team members have added a few more technical debt items to the registry. These items pertain to the build and test infrastructure. They will need to consider trade-offs with other system properties and understand the consequences of partial payment of the debt. They also need to examine the legacy test framework and assess how the debt will change over time as the developers migrate tests to the new framework. We will say more on these topics in Chapter 9, “Servicing the Technical Debt.”

What Can You Do Today?

At this point, it is important to identify the software that helps you build the software that is the product and start treating it as first-class code. These activities may be useful at this stage:

  • Put it under configuration management.

  • Document it (see Chapter 12, “Avoiding Unintentional Debt”).

  • Integrate its operation into your overall development process.

  • Architect for ease of deployment, observability, and automated processes.

  • Analyze the code and the design of the infrastructure for the presence of ­technical debt as you would for the product.

You need to identify steps that require manual intervention, that are error prone, and that could be automated. You also need to integrate elements and tools to observe software in development and operation (static analysis, monitoring, logging) to obtain information about its architecture health and run-time behavior that can inform priorities and guide future decisions.

For Further Reading

Andrew Clay Shafer (2010) came up with the concept of infrastructure debt hiding in infrastructure code, and Infrastructure as Code is actually the title of a book by Kief Morris (2016).

In their novel The Phoenix Project, Gene Kim and coauthors (2013) give a great illustration of the impact of technical debt on infrastructure and the notion of DevOps. In Site Reliability Engineering, Beyer and colleagues (2016) emphasize that thoughtless automation in the production and testing infrastructure will create more problems than it solves.

To learn more about DevOps, you can find many resources that provide practical guidance. The DevOps Adoption Playbook, by Sanjeev Sharma (2017), provides guidance on implementing DevOps in large organizations. The DevOps Handbook, by Gene Kim and Patrick Debois (2016), is another such industrial reference on what is good DevOps. For a software architect’s perspective on the DevOps movement, see the book DevOps by Len Bass and colleagues (2016).

On documentation, especially documenting the allocation views of the architecture, see Simon Brown (2018) and Clements and colleagues (2011). The deployment and install views describe the mapping of architecture elements to the computing platform and production environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.144.216