3
Working with Code

There’s an ancient Roman amphitheater in Arles, France. It used to provide entertainment—chariot races and gladiatorial combat—for up to 20,000 people. After the fall of Rome, a small town was built right in the arena. This made sense; it had walls and a drainage system. Later inhabitants probably found the setup odd and inconvenient. They might have judged the architects of the amphitheater for choices that made it difficult to turn it into a town.

Codebases are like the amphitheater in Arles. Layers are written in one generation and modified later. Many people have touched the code. Tests are missing or enforce assumptions of a bygone era. Changing requirements have twisted the code’s usage. Working with code is hard. It’s also one of the first things you’ll have to do.

This chapter will show you how to work with existing code. We’ll introduce concepts that cause the mess—software entropy and technical debt—to give you some perspective. We’ll then give practical guidance on how to safely change code, and we’ll conclude with tips to avoid accidentally contributing to code clutter.

Software Entropy

As you explore code, you’ll notice its shortcomings. Messy code is a natural side effect of change; don’t blame developers for the untidiness. This drift toward disarray is known as software entropy.

Many things cause software entropy. Developers misunderstand each other’s code or differ in style. Evolving technical stacks and product requirements cause chaos (see Chapter 11). Bug fixes and performance optimizations introduce complexity.

Luckily, software entropy can be managed. Code style and bug detection tools help keep code clean (Chapter 6). Code reviews help spread knowledge and reduce inconsistency (Chapter 7). Continuous refactoring reduces entropy (see “Changing Code” later in this chapter).

Technical Debt

Technical debt is a major cause of software entropy. Technical debt is future work that’s owed to fix shortcomings in existing code. Like financial debt, technical debt has principal and interest. The principal is the original shortcoming that needs to be fixed. Interest is paid as code evolves without addressing the underlying shortcoming—increasingly complex workarounds are implemented. Interest compounds as the workarounds are replicated and entrenched. Complexity spreads, causing defects. Unpaid technical debt is common, and legacy code has a lot of it.

Technical decisions that you disagree with are not technical debt. Neither is code that you don’t like. To be debt, the problem must require the team to “pay interest,” or code must risk triggering a critical problem—one that requires urgent payment. Don’t abuse the phrase. Saying “technical debt” too often will weaken the statement, making it harder to address important debt.

We know debt is frustrating, but it’s not all bad. Martin Fowler divides technical debt into a two-by-two matrix (Table 3-1).

Table 3-1: Technical Debt Matrix

Reckless Prudent
Deliberate “We don’t have time for design.” “Let’s ship now and deal with consequences.”
Inadvertent “What’s layering?” “Now we know how we should’ve done it.”

Source: https://martinfowler.com/bliki/TechnicalDebtQuadrant.html

Prudent, deliberate debt is the classic form of tech debt: a pragmatic trade-off between a known shortcoming in the code and speed of delivery. This is good debt as long as the team is disciplined about addressing it later.

Reckless, deliberate debt is created when teams are under pressure to deliver. “Just” is a hint that reckless debt is being discussed: “We can just add structured logging later,” or, “Just increase the timeout.”

Reckless, inadvertent debt comes from unknown unknowns. You can mitigate the danger of recklessly inadvertent debt by preemptively writing down and getting feedback on implementation plans and doing code reviews. Continuous learning also minimizes inadvertent recklessness.

Prudent, inadvertent debt is a natural outcome of growing experience. Some lessons are only learned in hindsight: “We should have created user accounts even for people who didn’t complete the sign-up flow. Marketing needs to capture failed sign-ups, and now we have to add extra code that could’ve been avoided if it was part of the core data model.” Unlike prudent and deliberate debt, the team will not know it’s taking on debt. Unlike inadvertent, reckless debt, this type of debt is more of a natural outcome of learning about the problem domain or growing as a software architect—not the result of simply not doing one’s homework. Healthy teams use practices such as project retrospectives to discover inadvertent debt and discuss when and whether to pay it down.

An important takeaway from this matrix is that some debt is unavoidable, as you can’t prevent inadvertent mistakes. Debt might even be a mark of success: the project survived long enough to become messy.

Addressing Technical Debt

Don’t wait until the world stops to fix problems for a month. Instead, clean things up and do minor refactoring as you go. Make changes in small, independent commits and pull requests.

You might find that incremental refactoring is insufficient—larger changes are needed. Large refactors are a serious commitment. In the short term, paying down debt slows feature delivery, while taking on more debt accelerates delivery. Long term, the opposite is true: paying down debt speeds up delivery, and taking on more slows delivery. Product managers are incentivized to push for more features (and thus, more debt). The right balance is highly context-dependent. If you have suggestions for large refactoring or rewriting, make the case to your team first. The following is a good framework for discussing technical debt:

  1. State the situation factually.
  2. Describe the risk and cost of the debt.
  3. Propose a solution.
  4. Discuss alternatives (including not taking action).
  5. Weigh the trade-offs.

Make your proposal in writing. Do not base your appeal on a value judgment (“this code is old and ugly”). Focus on the cost of the debt and the benefit of fixing it. Be specific, and don’t be surprised if you are asked to demonstrate the benefits after the work is done.

Hey all,

I think it’s time we split the login service into two services: one for authentication and the other authorization.

Login service instability accounts for more than 30 percent of our on-call issues. The instability seems to come mostly from the intermingling of authentication and authorization logic. The current design makes it really difficult to test all of the security features we need to provide. We guarantee the safety of our customers’ data, and the login service as is makes that an increasingly hard promise to keep. I haven’t spoken with compliance, but I’m concerned that they’ll raise an issue when we go through our next audit.

I think the access control logic was put in the service mostly out of expedience, given the various time and resource constraints at the time. There isn’t an overarching architectural principle that led to this decision. Addressing it now, though, will mean refactoring the login service and moving the authorization code out—a big project. Still, I think it’s worth it to fix the stability and correctness challenges.

One way to reduce the amount of work is to piggyback off of the backend team’s authorization service instead of creating our own. I don’t think this is the right approach because they’re solving for a different set of use cases. We’re dealing with user-facing authorization, while they’re solving for system-to-system authorization. But maybe there’s a nice way to handle both cleanly.

What do you think?

Thanks!

Johanna

Changing Code

Changing code is not like writing code in a fresh repository. You have to make changes without breaking existing behavior. You must understand what other developers were thinking and stick to existing styles and patterns. And you must gently improve the codebase as you go.

Code change techniques are largely the same, whether the change is adding new features, refactoring, deleting code, or fixing a bug. In fact, different types of changes are often combined. Refactoring—improving internal code structure without changing functionality—happens while adding a feature because it makes the feature easier to add. Code is deleted during a bug fix.

Changing large existing codebases is a skill refined over years—decades, even. The following tips will get you started.

Use the Legacy Code Change Algorithm

In his book Working Effectively with Legacy Code (Pearson, 2004), Michael C. Feathers proposes the following steps to safely modify existing codebases:

  1. Identify change points.
  2. Find test points.
  3. Break dependencies.
  4. Write tests.
  5. Make changes and refactor.

Think of the first four steps as clearing space and building a fence around a field before planting seeds in step 5. Until the fence is up, wild animals can wander in and dig up your plants. Find the code you need to change and figure out how to test it. Refactor the code to make testing possible if needed. Add tests that validate existing behavior. Once the fence is up and the area around your change points is well protected, you can make changes on the inside.

First, locate the code that needs to be changed (the change points) using the strategies in Chapter 2: read the code, experiment, and ask questions. In our gardening metaphor, the change points are where you will plant your seeds.

Once you’ve located the code, find its test points. Test points are entry points into the code that you want to modify—the areas that tests invoke and inject into. Test points show code behavior before you change anything, and you’ll need to use these points to test your own changes.

If you’re lucky, the test points are easily accessible; if not, you’ll need to break dependencies to get to them. In this context, dependencies aren’t library or service dependencies; they are objects or methods that are required to test your code. Breaking dependencies means changing code structure so that it’s easier to test. You will need to change the code to hook your tests up and supply synthetic inputs. These changes must not change behavior.

Refactoring to break dependences is the riskiest part of the work. It may even involve changing preexisting tests, which makes it harder to detect if behavior changed. Take small steps, and don’t introduce any new functionality while in this phase. Make sure you can run tests quickly so you can run tests frequently.

A wide variety of techniques exist to break dependencies, including the following:

  • Pulling apart a large, complex method into multiple smaller methods so separate pieces of functionality can be tested independently
  • Introducing an interface (or other indirection) to give tests a way to supply a simple implementation of a complex object—incomplete, but sufficient for testing
  • Injecting explicit control points that permit you to simulate aspects of execution that are hard to control, such as passage of time

Don’t change access modifiers to make tests easier. Making private methods and variables public lets tests access code, but it also breaks encapsulation—a poor workaround. Breaking encapsulation increases the surface area of behavior you have to guarantee across the lifetime of the project. We discuss this more in Chapter 11.

As you refactor and break dependencies, add new tests to verify old behavior. Run the test suite frequently as you iterate, including both new and old tests. Consider using automated test tooling to generate tests that capture existing behaviors. See Chapter 6 for more on test writing.

Once dependencies are broken and good tests exist, it’s time to make the “real” changes. Add tests that validate the changes, and then refactor code to further improve its design. You can make bold changes knowing you’ve secured the perimeter of the code.

Leave Code Cleaner Than You Found It

Coding lore on the internet often quotes the Boy Scout principle: “Always leave the campground cleaner than you found it.” Like a campground, a codebase is shared, and it’s nice to inherit a clean one. Applying the same philosophy to code—leave code cleaner than you found it—will help your code get better over time. No stop-the-world refactoring project is needed. The cost of refactoring will be amortized across many changes.

As you fix bugs or add features, clean adjacent code. Don’t go out of your way to find dirty code. Be opportunistic. Try to keep the code-cleanup commits separate from your behavior-changing commits. Separating commits makes it easier to revert code changes without losing code-cleanup commits. Smaller commits also make changes easier to review.

Refactoring isn’t the only way to clean code. Some code just stinks. Target smelly code as you go. Code smell is a term for code that isn’t necessarily buggy but uses patterns known to cause problems; it “smells funny.” Consider the following Java code snippet:

if (a < b)
  a += 1;

The snippet is perfectly correct. In Java, a single statement can follow a conditional without needing braces around it. However, the code is “smelly” because it makes it easy to make the following mistake down the line:

if (a < b)
  a += 1;
  a = a * 2;

Unlike Python, Java ignores indentation and relies on braces to group statements. So the a will be doubled regardless of the if condition. This mistake would be much harder to make if the optional braces surrounding a += 1; were used when the original code was written. The lack of braces is a code smell.

Many linters and code quality tools will detect this problem, as well as other code smells like really long methods or classes, duplicate code, excessive branching or looping, or having too many parameters. More subtle anti-patterns are harder to identify and correct without tooling and experience.

Make Incremental Changes

Refactoring often takes one of two forms. The first is a giant change-the-world code review that changes dozens of files at once. The second is a muddled pull request that has both refactoring and new features. Both types of changes are hard to review. Combined commits make it difficult to roll back functional changes without affecting refactoring you want to keep. Instead, keep your refactoring changes small. Make separate pull requests for each of the steps in the code change algorithm (see “Use the Legacy Code Change Algorithm” earlier). Use smaller commits if the changes are hard to follow. Finally, get buy-in from your team before you go on a refactoring spree. You’re changing your team’s code; they get to weigh in, too.

Be Pragmatic About Refactoring

It is not always wise to refactor. There are deadlines and competing priorities. Refactoring takes time. Your team might decide to ignore refactoring opportunities to ship new features. Such decisions add to the team’s technical debt, which might be the right call. The cost of the refactor might also exceed its value. Old, deprecated code that’s being replaced doesn’t need to be refactored, nor does code that’s low risk or rarely touched. Be pragmatic about when you refactor.

Use an IDE

Integrated development environments (IDEs) carry a stigma among l33t coders; they see getting “help” from an editor as a weakness and fetishize Vim or Emacs—“an elegant weapon for a more civilized age.” This is nonsense. Take advantage of the tools that are available to you. If your language has a good IDE, use it.

IDEs are particularly helpful when refactoring. They have tools for renaming and moving code, extracting methods and fields, updating method signatures, and doing other common operations. In large codebases, simple code operations are both tedious and error prone. IDEs will automatically go through the code and update it to reflect the new changes. (To forestall the hate mail: we are aware of ways to get Vim and Emacs to do this, too.)

Just don’t get carried away. IDEs make refactoring so easy that a few simple tweaks can create huge code reviews. A human still has to review your automated IDE changes. Automatic refactoring has its limits, too. A reference to a renamed method might not get adjusted if it is invoked through reflection or metaprogramming.

Use Version Control System Best Practices

Changes should be committed to a version control system (VCS), such as Git. A VCS tracks the history of a codebase: who made each change (commit) and when it was made. A commit message is also attached to each commit.

Commit your changes early and often during development. Frequent commits show how code changes over time, let you undo changes, and act as a remote backup. However, frequently committing often leads to meaningless messages like “oops” or “fix broken test.” There’s nothing wrong with shorthand commit messages when you’re cranking out code, but they’re worthless to everyone else. Rebase your branch, squash your commits, and write a clear commit message before submitting a change for review.

Your squashed commit messages should follow your team’s conventions. Prefixing commit messages with an issue ID is common: “[MYPROJ-123] Make the backend work with Postgres.” Tying a commit to an issue lets developers find more context for the change and allows for scripting and tooling. Follow Chris Beams’s advice (https://chris.beams.io/posts/git-commit) if there are no established rules:

  • Separate subject from body with a blank line.
  • Limit the subject line to 50 characters.
  • Capitalize the subject line.
  • Do not end the subject line with a period.
  • Use the imperative mood in the subject line.
  • Wrap the body at 72 characters.
  • Use the body to explain what and why versus how.

Chris’s post is worth a read; it describes good hygiene.

Avoiding Pitfalls

Existing code comes with baggage. Libraries, frameworks, and patterns are already in place. Some standards will bother you. It’s natural to want to work with clean code and a modern tech stack, but the temptation to rewrite code or ignore standards is dangerous. Rewriting code can destabilize a codebase if not done properly, and rewrites come at the expense of new features. Coding standards keep code legible; diverging will make it hard on developers.

In his book The Hard Thing About Hard Things (Harper Business, 2014), Ben Horowitz says:

The primary thing that any technology startup must do is build a product that’s at least ten times better at doing something than the current prevailing way of doing that thing. Two or three times better will not be good enough to get people to switch to the new thing fast enough or in large enough volume to matter.

Ben is talking about startup products, but the same idea applies to existing code. If you want to rewrite code or diverge from standards, your improvement must be an order of magnitude better. Small gains aren’t enough—the cost is too high. Most engineers underestimate the value of convention and overestimate the value of ignoring it.

Be careful about rewrites, breaking with convention, or adding new technology to the stack. Save rewrites for high-value situations. Use boring technology when possible. Don’t ignore convention, even if you disagree with it, and avoid forking code.

Use Boring Technology

Software is a fast-moving field. New tools, languages, and frameworks come out constantly. Compared to what’s online, existing code looks dated. However, successful companies have durable code with older libraries and older patterns for a reason: success takes time, and churning through technologies is a distraction.

The problem with new technology is that it’s less mature. In his presentation “Choose Boring Technology,” Dan McKinley points out, “Failure modes of boring technology are well understood” (http://boringtechnology.club/). All technology is going to break, but old stuff breaks in predictable ways. New things break in surprising ways. Lack of maturity means smaller communities, less stability, less documentation, and less compatibility. New technologies have fewer Stack Overflow answers.

Sometimes new technology will solve your company’s problems, and sometimes it won’t. It takes discipline and experience to discern when to use new technology. The benefit has to exceed the cost. Each decision to use new technology costs an “innovation token,” a concept Dan uses to show that effort spent on new technologies could also be spent on innovative new features. Companies have a limited number of such tokens.

To balance the cost and benefit, spend your tokens on technologies that serve high-value areas (core competencies) of your company, solve a wide range of use cases, and can be adopted by multiple teams. If your company specializes in predictive analytics for mortgages and has a team of PhD data scientists, adopting bleeding-edge machine learning algorithms makes sense; if your company has 10 engineers and is building iOS games, use something off the shelf. New technology has a greater benefit if it makes your company more competitive. If it can be adopted widely, more teams will benefit, and your company will have less software to maintain overall.

Choosing a new programming language for a project has particularly far-reaching consequences. Using a new language pulls an entire technology stack into your company’s ecosystem. New build systems, test frameworks, IDEs, and libraries must all be supported. A language might have major advantages: a particular programming paradigm, ease of experimentation, or eliminating some kinds of errors. A language’s advantages have to be balanced against its trade-offs. If using a new framework or database costs one innovation token, a new language costs three.

The maturity of the ecosystem around a new language is particularly crucial. Is the build and packaging system well thought out? How is IDE support? Are important libraries maintained by experienced developers? Are test frameworks available? Can you pay for support if you need it? Can you hire engineers with relevant skills? How easy is the language to pick up? How does the language perform? Does the language ecosystem integrate with existing tools at the company? Answers to these questions are as important as the features of the language itself. Billion-dollar companies have been built on boring languages. Great software has been written in C, Java, PHP, Ruby, and .NET. Unless the language is dying, its age and lack of buzz are hardly arguments against it.

Don’t Go Rogue

Don’t ignore your company’s (or industry’s) standards because you don’t like them. Writing nonstandard code means that it won’t fit in with the company’s environment. Continuous integration checks, IDE plugins, unit tests, code linting, log aggregation tools, metrics dashboards, and data pipelines are all integrated already. Your custom approach will be costly.

Your preferences might truly be better. Going rogue still isn’t a good idea. In the short term, do what everyone else is doing. Try to understand the reasoning for the standard approach. It’s possible that it is solving a nonobvious problem. If you can’t figure out a good reason, ask around. If you still can’t find an answer, start a conversation with your manager and the team that owns the technology.

There are many dimensions to consider when changing standards: priority, ownership, cost, and implementation details. Convincing a team to kill something that they own is not easy. There will be many opinions. You need to be pragmatic.

As with rewrites, changing something that’s widely adopted is slow. This doesn’t mean it’s not worth doing. Good things will happen to you if you go through the proper channels. You’ll be exposed to other parts of the organization, which is great for networking and promotions. You’ll also get to be an early adopter on the new solution—you’ll get to use the new thing first. By providing input, you’ll get what you want. But don’t get distracted from your daily work, and make sure your manager is aware you’re spending time on these projects.

Don’t Fork Without Committing Upstream

A fork is a complete, independent copy of another source code repository. It has its own trunk, branches, and tags. On a code-sharing platform like GitHub, forks are used before submitting a pull request to the upstream repository. Forking lets people who don’t have write access to the main repository contribute to the project—a normal and healthy practice.

It is less healthy to fork with no intention of contributing changes back. This happens when there are disagreements about the direction of a project, the original project is abandoned, or it’s hard to get changes merged into the main codebase.

Maintaining an internal company fork is particularly pernicious. Developers will tell each other that they’ll contribute the changes back “later.” This rarely happens. Minor tweaks that are not contributed upstream compound over time. Eventually, you’re running an entirely different piece of software. Features and bug fixes become increasingly difficult to merge upstream. The team discovers that it has implicitly signed up to maintain an entire project. Some companies even fork their own open source projects because they don’t contribute internal changes!

Resist the Temptation to Rewrite

Refactoring efforts often escalate into full-blown rewrites. Refactoring existing code is daunting; why not throw away the old system and rewrite everything from scratch? Consider rewrites a last resort. This is hard-won advice from years of experience.

Some rewrites are worth doing, but many are not. Be honest about your desire for a rewrite. Code written in a language or framework that you don’t like is not a good reason. Rewrites should only be undertaken if the benefit exceeds the cost; they are risky, and their cost is high. Engineers always underestimate how long a rewrite will take. Migrations, in particular, are awful. Data needs to be moved. Upstream and downstream systems need to be updated. This can take years—or even decades.

Rewrites aren’t always better, either. In his famous book The Mythical Man-Month (Addison-Wesley Professional, 1995), Fred Brooks coined the phrase “Second System Syndrome,” which describes how simple systems get replaced by complex ones. The first system is limited in scope, since its creators don’t understand the problem domain. The system does its job, but it is awkward and limited. The developers, who now have experience, see clearly where they went wrong. They set out to develop a second system with all the clever ideas they have. The new system is designed for flexibility—everything is configurable and injectable. Sadly, second systems are usually a bloated mess. If you set out to rewrite a system, be very cautious about overextending.

Do’s and Don’ts

Do’s Don’ts
DO refactor incrementally.
DON’T overuse the phrase “technical debt.”
DO keep refactoring commits separately from feature commits.
DON’T make methods or variables public for testing purposes.
DO keep changes small.
DON’T be a language snob.
DO leave code cleaner than you found it.
DON’T ignore your company’s standards and tools.
DO use boring technology.
DON’T fork codebases without committing upstream.

Level Up

We make extensive use of Michael C. Feathers’s book Working Effectively with Legacy Code (Pearson, 2004). The book goes into far more detail than we can in a few pages. If you find yourself dealing with large and messy codebases, we recommend Michael’s book. You might also find Jonathan Boccara’s book helpful: The Legacy Code Programmer’s Toolbox (LeanPub, 2019).

Martin Fowler has written a lot about refactoring. For shorter reads, his blog is a great place to find content. If you’re interested in the canonical book on refactoring, he’s written Refactoring: Improving the Design of Existing Code (Addison-Wesley Professional, 1999).

Finally, we must mention The Mythical Man-Month by Fred Brooks (Addison-Wesley Professional, 1995). This is a classic that every software engineer should read. It talks about how software projects run in practice. You’ll be surprised at how much this book applies to your daily experiences on the job.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.183.89