Metrics for quality

Metrics are a means of capturing something that is measured as a number. In software development, metrics are often used to represent a particular quality aspect that can be hard to quantify in itself. For example, the quality of a piece of software can be very hard to describe by itself. This holds even more for how quality changes. For this reason, we often capture numbers that, taken together, say something about the quality of software.

It is important to realize that metrics are a great tool, but should always be used with caution. For one thing, there might be more factors influencing the (perceived) quality of software than the metrics that are being measured. Also, once people know that a specific metric is recorded, they can optimize their work to increase or decrease the metric. While this might show the desired numbers in reports, this might not necessarily mean software quality is really improving. To combat this, often, more than one metric is recorded.

A well-known example is that of story point velocity in agile work environments. Recording the sprint velocity for a team to see whether it is becoming more efficient over time sounds effective; however, if the team size varies from sprint to sprint, then the metric might be useless since attendance is influencing velocity as well. Also, the metric can be easily falsified by a team agreeing on multiplying all estimations by a random number every sprint. While this would increase the numbers every sprint, this would not relate to an increase in team throughput anymore.

Moving on to metrics for measuring the quality of software, it can be difficult to objectively measure the quality of written code. Developers often have many opinions as to what constitutes good code, and the more the topic is discussed, the harder it can be to find consent in a team; however, when shifting attention to the results that come from using that code, it becomes easier to identify metrics that can help to provide insights into the quality of the code.

Some examples of this are as follows:

The percentage of integration builds that fails: If code does not compile or pass automated tests, then this is an indication that it is of insufficient quality. Since tests can be executed automatically by build pipelines whenever a new change is pushed, they are an excellent tool for determining the quality of code. Also, since they can be run and their results gathered before we deploy a change to production, the results can be used to cancel a change before deploying it to the next stage of a release pipeline. This way, only changes of sufficient quality propagate to the next stage.
The percentage of code covered by automated tests: If a larger part of the code is being tested by unit tests, this increases the quality of the software.
The change failure rate: This is the percentage of deployments of new versions of the code that lead to issues. An example of this is a situation where the web server runs out of memory after the deployment of a new version of the application.
The amount of unplanned work: The amount of unplanned work that has to be performed in any period of time can be a great metric of quality. If the team is creating a SaaS offering that it is also operating, there will be time spent on operational duties. This is often referred to as unplanned work. The amount of unplanned work can be an indicator of the quality of the planned work. If the amount of unplanned work increases, then this may be because the quality has gone down. Examples of unplanned work can be live site incidents, following up on alerts, hotfixes, and patches.
The number of defects that are being reported by users: If the number of bugs reported by users increases, this can be a sign that quality has been declining. Often, this is a lagging indicator, so once this number starts increasing, quality might have been going down for a while already. Of course, there can be many other reasons for this number increasing: new operating systems, an increase in the number of users, or changing expectations from users.

The number of known issues: Even if there are very few new defects being found or reported, if defects are never fixed and the number of known issues keeps increasing slowly, then the quality of the software will slowly decline over time.
The amount of technical debt: Technical debt is a term used to describe the consequences of sacrificing code quality for short-term gains, such as the quick delivery of code. Technical debt is discussed in detail in the next section.

Testing is an activity that is performed to find and report on the quality of software. Test results (insights into quality) can be used to allow or cancel a change progressing to the next release stage.

In the next section, another dimension of quality is explored: the amount of technical debt in a code base.

Table of Contents for Metrics for quality

Create new playlist

Sign In

Sign Up

Table of Contents for
Metrics for quality