Chapter 8. Costing the Technical Debt

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8. Costing the Technical Debt

Despite the adjective, technical, technical debt is ultimately an economic issue. Your strategy for managing it revolves around how many resources to spend and when to pay back the debt. In this chapter, we shine an economic spotlight on technical debt items to reveal the information you need to make decisions about how to service your debt. We explain how to estimate the remediation cost and the resulting cost savings when you reduce the recurring interest.

Shining an Economic Spotlight on Technical Debt

In general, the key driver for making decisions about a software project is maximizing value while minimizing costs. This is also the case with technical debt and the decisions you make about whether to do something about it, as well as how much and when. At some point in the life of a software product, you must be able to calculate costs of doing whatever you need to do with technical debt items. This involves computing or estimating the cost to carry and to eliminate the debt.

Here is how Team Atlas weighed the value of reducing recurring interest against the cost of paying the debt:

Running a static checker, the Atlas team found 34 clones of a certain piece of code. They noticed the issue because an inconsistent modification to only 32 of the clones had triggered a bug that was hard to find. The proposed refactoring to service the debt consists of encapsulating the logic of these 12 lines of code in a single method and then replacing all the 34 clones by an invocation of this method. The cost? About one hour. Oh, wait, they probably need to do some regression testing to validate that they have not affected the logic of the whole system. Oh, wait, they do not have unit and regression tests for several of the affected locations. Adding the tests, running the tests in the “before” version, and then running the regression tests will add another two hours.

The bottom line is that eliminating this technical debt item requires one day of work. The team determines that the benefit of reducing the debt by tracking down these bugs is worth the cost of the fix.

If you take a technical debt item from your registry, you can estimate the total effort involved in eliminating the associated technical debt. The associated debt is what we have called the current principal, and it includes the cost of changing the code or design option and all the accruing interest—that is, undoing the modifications and workarounds that piled up on the not-quite-right code, design, or production infrastructure.

Let us assume that you have to break apart a class into two distinct classes. If you’ve waited very long to repay this technical debt item, a lot of other code has been written that depends on the class. You will need to revisit and modify all these places in the code. And these modifications may have further consequences on other dependent code. An original naive estimate of requiring one day to reorganize the class rapidly grows to three days of work to deal with all the ramifications of accrued interest.

A simple return on investment (ROI) calculation for debt reduction compares the benefit of reducing the recurring interest with the cost of paying the current principal and accruing interest (remediation cost).

In the technical debt timeline we introduced in Chapter 2, “What Is Technical Debt?” you need to know the cost of the technical debt you have in your system and understand when you will reach the tipping point (see Figure 8.1). Refining technical debt items will enable you to estimate the cost and prioritize actions to take.

A figure shows the reaching of tipping point in the timeline of technical debt. — **Figure 8.1** *Reaching the tipping point*

The technical debt timeline depicts how time plays a key role. A rightward arrow representing the "Time," is marked as T1, T2, T3, and T4. In which, T1 represents Occurrence, T2 represents Awareness, T3 represents Tipping point (Highlighted), and T4 represents Remediation. The technical debt at the periods, T1, T2, and T3 indicates "Technical debt net asset," whereas the debt from the period T3 to T4 indicates "Technical debt net liability." Further, the period between T1 and T2 is labeled "Blissful Ignorance," the period between T1 and T3 is labeled "Getting value out of debt," the period between T3 and T4 is labeled "Suffering from debt," and the period from T4 to the end of the timeline is labeled "Debt-free." Note: The Highlighted period indicates the reaching of the tipping point.

Shining an economic spotlight on the technical debt items involves doing the following:

Refining the technical debt description to identify the impacted and related software artifacts (code, tests, build scripts, and so on)
Using the artifacts to calculate the cost of remediation
Using the artifacts and consequences to calculate the recurring interest

Let’s look more closely at the technical factors of principal and interest.

Refine the Technical Debt Description

When you or your manager, client, or CTO asks, “How much technical debt do we have?” the real questions are “How much would it cost to fix the issues now?” “What benefit would it have?” and “How much impact would it have if we didn’t fix it now?” These questions about the future do not consider only the code, the architecture, or the production infrastructure; they assume that when the issue is fixed, all the associated tasks will be fixed. Any calculation of technical debt should assess it from such a holistic perspective.

Holistically automating the entire decision-making and resource allocation process is not possible, and automated static analysis tools cannot make these calculations for you. You can identify issues and make design trade-offs for fixing them, but assessing issues as technical debt and managing them as such requires building an end-to-end economic argument. Sometimes the fix is a trivial code change, even if you find the issue during an architecture analysis; other times remediation requires a re-architecting effort, even though the technical debt item was discovered through static code analysis.

Looking back at the Phoebe agile shop that we studied in Chapter 6, “Technical Debt and Architecture,” the large negative-letter spacing issue was initially addressed with a patch, completed with two hours of a developer’s time. That is when the debt started accumulating because the team initially failed to assess the architecture, in addition to the code, until one of the developers sensed that the system required a more involved analysis and fix.

So, one of the developers entered a technical debt description, an excerpt of which is shown here (see Chapter 6 for the full description):

Name	Phoebe #421: Screen spacing creates unexpected crashes due to API incompatibility.
Summary	The source code uses a very large negative letter spacing in an attempt to move the text offscreen. The system handles up to ”186 em fine but crashes on anything larger.

This is a critical issue that impacts multiple fronts: The software crashes leave the users frustrated, and the negative spacing causes integer overflow, which creates a security vulnerability and leaves the software brittle. The developers have patched the code, but they have not yet identified the root cause, leading them to believe the fix may be more complicated.

Table 8.1 shows the refinement of the technical debt description and identifies the concrete software artifacts related to it. Although Team Phoebe recorded the technical debt item during an architecture analysis, team members now know that the code, architecture, and production infrastructure are related to each other, and it is not always easy to tease them apart. One or the other may be the starting point of the analysis and may trigger reflection on other related aspects. When team members plan remediation, they need to consider how changes to one artifact could impact the others.

Table 8.1 What and where is the debt?

Name	Phoebe #421: Screen spacing creates unexpected crashes due to API incompatibility
Affected components	UIsetuplayer, transparency layer, UILogic
Affected code	Isolated to the frame renderers the text is fed into
Dependent components	LayoutTests, external web component
Other analysis data	40 reports from 7 clients in 10 days

The driving analysis questions guide the developers in tracing symptoms such as crashes to the codebase. (Recall the questions for the “Increase market share” business goal in Chapter 5, “Technical Debt and the Source Code.”) For example, in the context of this particular issue, the team sees that the negative out-of-bounds problem creates a crash in three components. The team identifies the cause in the frame renderers and an internal dependent component. Team members recognize through architectural thinking that this error is being injected externally to several different areas in the code; hence, they need to understand the influence of the external component on the code to develop an appropriate remediation approach.

This refinement exercise guides developers in assembling the analysis of code, architecture, and production that we discussed in Chapters 5, 6, and 7, “Technical Debt and Production.” As teams become more sophisticated, they can link their development environments and autofill some of these fields with the relevant information. The goal is not to trigger analysis paralysis but to be aware of the added costs related to accruing interest and to make the changes so the system is production ready. We strongly underscore the benefits of a robust integrated configuration management and version control environment. You can use these tools to refine your technical debt items and manage them throughout the software development lifecycle.

Calculate the Cost of Remediation

Table 8.2 lists the activities for remediating the debt from source code through unit tests for Phoebe’s unexpected crashes. The cost of fixing the quality problems comprises the current principal and the accrued interest. The team adjusts these costs by an uncertainty factor and the cost to test the fix. Accounting for uncertainty provides team members a mechanism to express their confidence in their ability to localize changes, so they can determine how much they need to account for unexpected ripple effects.

Table 8.2 Cost of remediation

	Remove Technical Debt	Retrofit Other Areas of Software
Architecture (design and analysis)	The real cost was finding the dependency to the external web component and the existing patches.	In a later release, we can just remove the patches. Trivial.
Code	Write a wrapper around the external web component. We estimate one-half day.	A bunch of debug code needs to be cleaned, though, like GetLastError() following the UIFrame calls. These should now return null, too. Maybe spend another half day to ensure cleanup.
Infrastructure (test)	Write new test for the wrapper. One-half day.	Run the previous tests to ensure that the fix and removed patches resolve the problem. One-half day.
Uncertainty multiplier for propagating issues	Hopefully none as we were able to localize the fix.

Team Phoebe analyzed the issue and decided to write a wrapper to remediate the problem. The developers refined the technical debt description to reflect this decision:

Remediation approach

We could just fend off negative numbers near the crash site, or we can dig deeper and find out how this ”10000 is happening. Code changes are trivial but distributed in the classes. That was the mistake made with the patches. With Brant, we decided to write a wrapper around the external web component.

The artifacts that constitute the debt (the architecture, code, and infrastructure) identified in Table 8.1 provide input into the cost of remediation.

With this information, the cost of the remediation becomes clearer, but Team Phoebe needs a little more information to weigh this decision against the benefit of removing the recurring interest. Remember that team members already patched the software several times in the local sites and then figured out that this was not a routine bug but rather technical debt. So now they have to consider the trade-off between the quick solution of patches (recurring interest) and fixing the software properly (paying off the principal).

Calculate the Recurring Interest

This next step is to calculate the resulting benefit of reducing the recurring interest. This requires understanding the nature of future changes and putting some quasi values around them. Table 8.3 shows the factors involved. You need to know the consequences of continuing to carry the existing debt so you can weigh them against the consequences of your strategy to remediate the debt (which may or may not pay off the entire principal). The symptom measures and the artifacts identified in Tables 8.1 and 8.2 provide the information to assess the consequences of continuing to create patches compared to the proposed remediation.

Table 8.3 Trade-offs of change

	Carrying Debt	Remediating Debt
Cost of future change	Medium: Each patch costs one-half day.	Low
Frequency (adjust for accumulating interest)	High: Many sites use this renderer, so they will also experience the issue requiring the patch.	High: Many sites use this renderer; they expect a smooth and secure experience.
Uncertainty (adjust for potential propagating issues)	High: Without rework, each new function is messier and messier.	Low

To make a simple calculation of the benefit, you look at only the cost saved from no longer carrying the debt. This assumes that you completely pay off the principal and eliminate the debt, so there will be no recurring interest. You know the cost of living with the debt up to this point. You might base predictions about future costs on an extrapolation of the past debt, the rework cost of anticipated changes to the system, or the growing gap between the state of the software and good software engineering practices.

To make a more nuanced calculation of the benefit, subtract the recurring interest of your remediation strategy from the cost of carrying the debt. This difference becomes more important when you are contemplating a partial fix—reducing but not eliminating the recurring interest.

Compare Cost and Benefit

Determining the ROI of the proposed remediation involves comparing the cost of remediation with the benefit of the reduced interest. Team Phoebe refined the description of the techdebt in their backlog to include the ROI of the remediation approach:

Remediation approach

ROI of remediation: High. The remediation cost is paid back in reduced developer effort to patch and rework the software almost immediately. There is less time spent considering the already implemented multiple local patches at crash sites. Even if we get only three or four more of these issues and continue with the patch-locally approach, which we will, the architectural fix pays off.

Comparing strategies for managing technical debt depends on understanding both the probability and impact of future change.

In this example, we have explained how to refine the technical debt description to include economic information by using consecutive analysis steps. In reality, this is an iterative process throughout development. Filling in the details of where the debt is found (refer to Table 8.1) can and should happen as developers discover or take on the debt. They can supplement their efforts with tool-supported analysis as well as architecture reviews. This supplemental analysis (for which we discuss several techniques in Chapters 5, 6, and 7) should happen for issues that require substantial changes. This analysis can be another task on the backlog with the goal of providing further details.

Remediation requires a team to generate possible solutions and evaluate the alternatives and cost. Some items are simple fixes with known costs and can easily happen through local refactorings. Other items involve substantial changes and require a design exercise and understanding of trade-offs—and maybe even several dedicated iterations. These changes will likely resolve multiple technical debt items and other issues that make them worth the time and effort. Finally, capturing the information about the cost savings of the change requires knowledge of the business context as well as team skill sets.

An illustration represents the concept of no absolute measure in technical debt. The description of technical debt reads the following, principal- Large, interest- Medium, code, architecture, and production.

In the case of Phoebe, the backlog prioritization approach resulted in the team getting tunnel vision, even after fixing the same issue a number of times. The situation—with the customer reports and the potential impact of the vulnerability—became so disruptive that the team had no choice but to take an approach based on design analysis rather than continuing the one-off patches. The information we present about the artifacts in an organized way here happened as organic and opportunistic discussions and team members’ comments on the open issue in their project issue tracker. An explicit focus on a technical debt item will signal that at some point,the team may need to go through a trade-off analysis to remediate the debt. Not all debt has equal impact. Some debt can be serviced locally during routine refactoring exercises. A team will have to do more analysis when paying back debt requires architecture-level changes.

With an analysis approach that costs all the impacted software development artifacts and considers associated uncertainty with and without remediation, you should be able to identify the technical debt items that have high cost consequences today or that have low risk but high return in fixes and then allocate them to your releases. However, software development is rarely so simple.

Six months after releasing our software to the world, the WIRE team was in trouble. Customer support requests were increasing. Four AM pages were firing far too frequently. Our velocity slowed to a crawl. As if this weren’t bad enough, it was also quickly becoming apparent that pieces of our architecture were not going to be able to handle the next major batch of features. On the road to our first release, we purposefully, and occasionally accidentally, accepted technical debt so we could ship our software sooner. Now we were feeling the consequences of that debt. The question the team now faced was “What are you going to do about it?”

Our first actions were purely tactical. We needed to create breathing room to relieve pain and buy time to hatch a more strategically focused repayment plan. We started by focusing on the greatest pain points in our system. We fixed our monitoring dashboards, logging, and debugging tools so we could diagnose problems faster. We reevaluated our alerting strategies to remove superfluous pages. We fixed the most disruptive bugs. After a few months of hard work, the pain lessened, people started getting a full night’s sleep again, and morale began a slow ascent from its all-time low.

We needed a strategic plan for not only repaying our technical debt but also managing it better in the future if we were to continue delivering software. To create this plan, we hosted a simple workshop. The software engineers kicked off the workshop by showing where potential technical debt might live in our architecture. One afternoon we measured potential debt in our system by examining various code quality metrics, such as churn, conceptual design integrity, and defect data. Most of the metrics came from readily available sources such as git logs. Next, our product manager shared the roadmap for the next three to six months. Starting with the highest-priority roadmap items, we worked together to determine which parts of the architecture would need to be touched and how much effort might be required so we could deliver each roadmap item.

By the end of the workshop, we had a technical debt repayment plan. Surprisingly, some of the worst-quality code would not be scheduled for cleanup for another six months or more. As it turned out, though the potential technical debt in these components was high, they required few changes over the next three to six months. Through our analysis we also learned that it would be impossible to deliver some potentially important features beyond the six-month time horizon if we didn’t start repaying some technical debt right away.

Perhaps the greatest outcome of the workshop was that engineering and product management had a shared strategic vision for paying down technical debt. The conversation about debt shifted. Instead of complaining about bad code or making excuses for slow velocity, the team now talked about positioning the architecture so it could successfully carry us into the future. In addition, discussions about technical debt had elevated from pain to prevention. Our analysis made the metaphor of “debt” concrete in a way everyone could understand. We added new stories to our backlog to prevent us from taking on more technical debt accidentally and adjusted the process to have more meaningful discussions about design decisions that introduced potential technical debt.

Reflecting on this experience, I think the WIRE team was successful for a few important reasons. First, we relied on data instead of gut feelings to find pockets of potential debt, and we found simple, reliable ways to measure code quality. Second, we collaborated with product management to understand how our software system might need to change instead of simply fixing the worst code. Finally, the team’s mindset shifted away from thinking of technical debt as something always to avoid toward using technical debt responsibly to help us move faster.

Manage Technical Debt Items Collectively

In the larger system of Tethys, team members waited two years to thoroughly analyze their technical debt. Even though they followed the technical debt identification process to filter nonessential issues, they still came up with a list of about 200 techdebt items. This became rapidly overwhelming. The amount of debt they have estimated far exceeds the available resources for several iterations. It may even exceed the amount of effort expended so far to develop the system!

Bringing in an army of contractors or student summer interns to knock down your technical debt is not going to resolve it. Making a large number of scattered changes can introduce new defects and new items of technical debt. And the debt at the architectural level is hard to parcel out into small bursts of activities. Refactoring at this structural level may halt development for several weeks.

Development teams clearly need additional criteria to decide what to do about a long list of technical debt items. A naive strategy of repaying them all one by one does not scale up. More often than not, the team will have to treat the technical debt items in reference to each other as they think about possible ways to restructure the system to service the debt and the implications over time.

The problem is even more complicated. You cannot treat technical debt in isolation from satisfying new requirements, adding new features, and other evolutions of the system, and you cannot separate treating technical debt from correcting defects and flaws in the system because they compete for the same resources: developers. Remember the four categories of items you have on your backlog: features, defects, architecture and infrastructure, and technical debt items (see the sidebar “What Color Is Your Backlog?” in Chapter 4, “Recognizing Technical Debt”).

Figure 8.2 shows a backlog of product issues consisting of desired features, architectural elements, defect fixes, and technical debt items. As team members groom the backlog, they identify and refine the top-priority issues, which become candidates for tasks in the next release.

An overview of grooming the product backlog is illustrated. — **Figure 8.2** *Grooming the product backlog*

In backlog grooming, the priority of issues from bottom to top reads: To be refined; Delete obsolete items; Break down items such as features, architectural infrastructure, and technical debt; Insert items such as architectural infrastructure and technical debt; and Next release with the items such as, Features, architectural infrastructure, defects, and technical debt.

The decisions in prioritizing the backlog are challenging because of all the hidden dependencies. Some features depend on elements of technical debt. Similarly, features may depend on some architectural element. And the same is true for defects: Their resolution may depend on some missing structural element, or they may be linked to some technical debt items.

To determine whether to include a technical debt item or postpone it for subsequent iterations while grooming backlog items, consider the answers to these questions:

In what ways are technical debt items that are related to development of features visible to the customer?
What architectural decisions have an impact on technical debt?
What defects can be traced back to the consequences of a technical debt item?
Are any technical debt items blocking progress?
Do any technical debt items need further refinement?

If the answers reveal that a technical debt item has dependencies with other issues on the backlog, then it becomes a higher priority to consider remediating it when working in this code for other reasons. How backlog issues concentrate in areas of the code can be another factor in setting priorities. For example, code with high defect rates or code that has been modified a lot in the past (assuming that the same will be true in the future) could be symptomatic of technical debt and thus worth prioritizing. If a technical debt item has no dependencies with other issues on the backlog, it has potential to incur cost, though not for the moment or the foreseeable future.

There is a clear distinction between approaches that help you identify technical debt and those that help you manage technical debt. We have already discussed tools that help you assess your code. These approaches, such as SQALE or OMG’s Automated Technical Debt Measures, create assessments of technical debt reduction based on fixing all these issues and assigning an effort estimate to each line of code to fix. These techniques can help you detect technical debt. However, they cannot help you manage your technical debt throughout the software development lifecycle. They are only part of the toolbox.

We will take up the challenge of servicing the debt in Chapter 9, “Servicing the Technical Debt,” where we explain how to use information about costing debt to resolve your technical debt during release planning and the delivery cycle.

What Can You Do Today?

At this point, it is important to calculate the technical factors of principal and interest in the artifacts that they trace to. These activities may be useful at this stage:

Refine technical debt descriptions to identify the software artifacts at the root of the debt and any other components affected by the debt. This will help you calculate costs.
For identified technical debt items, estimate not only the cost to pay them (in effort: person-days or person-weeks) but also the cost to not pay them (how much will it slow current progress?). In making your estimates, include the overall uncertainty associated with the cost of future change.
If you are not able to provide an actual cost, use a “T-shirt sizing” strategy: XS, S, M, L, XL.

At the very least, you need to describe qualitatively the impact of any technical debt item on productivity or quality.

For Further Reading

Cost can be measured very accurately post facto: Just ask your accounting division to tally all the development costs, direct and indirect. For cost estimation, software developers have moved away from using a direct monetary value. They use various proxies—that is, point-based systems. Over the years we have seen function points in the 1970s (Albrecht & Gaffney 1983; ISO 20926:2009), object points in the 1980s (Boehm et al. 2000), use-case points in the 1990s (Alan et al. 2012), story points in the 2000s (Cohn 2006), and associated methods and tools to assist in making estimates (Grenning 2002). These approaches come with specific ways to calibrate what a “point” actually represents, so you can be consistent inside a development project or—better—across multiple development projects in a given organization. When the actual costs are known, it is also possible to use a cost-per-point or dollar-per-point factor to help with planning.

Automated tools that have rules for finding code quality issues often have a default value and a remediation strategy with an associated cost that you can tailor. Value is often qualitative, such as high, medium, and low or the top-ten rules in a given category. Costs for these more localized fixes are on the order of minutes or hours, computed as a constant function per fix, an increasing function based on complexity, or a base function for common infrastructure plus a cost per fix.

The Agile Alliance Technical Debt Initiative has developed guidelines for executives, managers, and developers. In particular, it proposes the Agile Alliance Debt Analysis Model(A2DAM), which gives directions on how to estimate remediation costs for known code quality violations (Fayolle et al. 2018).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8. Costing the Technical Debt

Create new playlist

Sign In

Sign Up

Chapter 8. Costing the Technical Debt

Shining an Economic Spotlight on Technical Debt

Refine the Technical Debt Description

Calculate the Cost of Remediation

Calculate the Recurring Interest

Compare Cost and Benefit

Manage Technical Debt Items Collectively

What Can You Do Today?

For Further Reading

Table of Contents for
Chapter 8. Costing the Technical Debt