Bad metrics

There are seemingly no industries in the world that can escape metrics. The crazed obsession with measuring things is as much a cult-like obsession as it is a genuine need that produces necessary introspection and change. In the world of software engineering, we are not strangers to this need. As programmers, we are very interested in metrics that provide us with insights into our code:

How many bugs are there?
How long does this code take to run?
How much test coverage do I have?

Managers and other stakeholders, however, will usually harbor their own interests and metrics. The more infamous among these are the metrics that attempt to measure a developer's output or productivity:

How many lines of code or commits are there?
How many features did we ship?
How many lines of documentation did we write?

These are good questions if they're asked for the right reasons. For example, lines of code can be a useful metric if we're using them as a proxy for complexity when discussing whether to refactor specific classes/utilities. But many metrics are entirely divorced from the thing they are attempting to measure.

A non-technical manager or stakeholder may assume that writing a certain amount of code should always take the same amount of time. They may be confused when a developer who once wrote 200 lines of code in a single day has recently taken 10 days to commit only 10 lines of code. Their confusion, of course, demonstrates a gross misunderstanding of the programming process and its chaotic complexity. But these misunderstandings are rife, so we need to be wary of them.

The clear solution to bad metrics is to push for and create better metrics. And to create good metrics, it is essential to know what underlying question we're trying to answer and then brainstorm ways of answering that question. Let's take a look at an example:

The question	The bad metric	Example of why it's bad	A better metric or approach
Are we being productive?	Lines of code/commits	A programmer could reasonably take many days to solve a crucial bug that only requires a one-line change.	Ask developers and explore what is dragging their productivity down; have team retrospectives to discover areas of improvement.
Are we delivering value to users?	Number of features shipped	Users may receive more benefit from fewer features that are of high quality.	Build metrics or A/B experiments to judge which features are being used and enjoyed. Focus on the quality of each feature.
Are we writing useful documentation?	Lines of documentation	Developers may only end up documenting the things they know well, not the areas of the code base that are most in need of documentation.	Create a metric that tracks the usage of documentation. Discern what areas of code are under-documented by asking developers.
Do we have a well-tested code base?	Test coverage	If it only measures whether certain lines of code are called, then it could be fooled with only a handful of very broad integration tests.	Use traditional test coverage in combination with other metrics. Keep track of areas of regression where bugs often occur.
Do we have a buggy code base?	Number of bugs	A code base may have many bugs in an area of the app that is virtually unused. Bugs in certain areas may be unreported.	Don't count bugs; instead, focus on and measure user happiness and developer happiness. Prioritize bugs based on how they are affecting your users.

Fixation on bad metrics within an organization or team can lead to the wrong things being optimized. Developers who are more concerned with writing more lines of code will be less interested in the underlying quality of their code. Developers who are pushed to release more features will compromise on best practices and clean code, optimizing for speed and shipment.

It's important to ensure that any metrics we track are tempered by reality and that we do not judge success based purely on those metrics. Be especially wary when you see metrics running in opposition to our principles of clean code. Over time, as well, if a metric is chased too ambitiously, it may end up corrupting the very thing it was trying to measure. This is done via an effect known as Goodhart's law:

"When a measure becomes a target, it ceases to be a good measure."
– Marilyn Strathern

Table of Contents for Bad metrics

Create new playlist

Sign In

Sign Up

Table of Contents for
Bad metrics