© Daniel Heller 2020
D. HellerBuilding a Career in Softwarehttps://doi.org/10.1007/978-1-4842-6147-7_16

16. Professional-Grade Code

Daniel Heller1 
(1)
Denver, CO, USA
 

This chapter evangelizes my personal favorite principles of code quality. It’s neither an introduction to coding nor a comprehensive guide to writing good code but instead a small set of selected subjects I consider relevant to engineers adapting their academic skills to professional software. I’ll note that a couple of these points (like “Don’t Check Conditions that Your Codebase Precludes”) may be controversial; if your team fervently rejects them, you can do your best to persuade, but remember that “You Can’t Change Everything on Your First Day” (Chapter 9) and you should, when in doubt, Match The Codebase.

Master Language Conventions

The first step to writing good code is knowing your language backward and forward. I never start coding in a new language without reading a whole book about it (really) and reading whatever online style guides I can find. Small slipups are inevitable when you’re getting started, but poor adherence to language conventions is easy to avoid, so it comes off as amateurish to experienced engineers. You can ask a colleague what sources they’d recommend for mastering your team’s language(s).

Naming

Most of our job turns out to be reading code as we try to find the perfect way to make changes to existing codebases; that includes reading our own code, which often feels like it was written by someone different and dumber than yourself.

The single best gift you can give future editors of your code, including yourself, is good function, type, and variable names. Comments and READMEs can become stale, and you usually have to jump around to consume them, but a crisp, explicit name often speaks for itself in-line; even if you still need richer documentation, a good name reduces the cognitive load of reading. Good naming has three objectives, in this order (according to me):
  1. 1.

    Explicitness: The name captures the role of the entity.

     
  2. 2.

    Brevity: The name is short and easy to read.

     
  3. 3.

    Consistency: The name follows the conventions of other similar entities in the codebase.

     

These objectives are in perpetual tension—the most explicit names often sacrifice some brevity, and the perfect name in isolation may stand out from other names in a codebase. You’ll have to weigh all three, but my advice is to favor explicitness above all; try to save your readers the trouble of reading every line of a function to understand its meaning in a codebase. Once the name is explicit, make it as short as you can without loss of information. Here are a couple of examples of good and bad names I’ve seen in real life (or perpetrated):

Bad
// From the xnu (Mac OS X) kernel: "open1" is clearly a convenience for
// a programmer who can't think of a name that means something.
// How is it different from open()? Why does it exist?
int
open1(vfs_context_t ctx, struct nameidata *ndp, int uflags,
    struct vnode_attr *vap, fp_allocfn_t fp_zalloc, void *cra,
    int32_t *retval)
Not so bad
// Also from xnu; this function's name clearly indicates its relationship to other functions—it is
// called with the lock already held.
vnode_t cache_lookup_locked(vnode_t dvp, struct componentname *cnp)

Match the Codebase

When writing code in an existing codebase, you should almost always match the coding conventions of that codebase even if you disagree with them; going off on your own will likely leave a confusing mess for future readers even if it does get past code reviewers, which it shouldn’t. Some conventions—adding more comments, writing a better commit message, or naming more carefully—may be easy to integrate in a codebase incrementally but should always pass the smell test of “fitting in.” Braces, spaces (as opposed to tabs), and compound_namingPatterns should never be changed unilaterally. If you think you should change the convention, remember that “You Can’t Change Everything on Your First Day” (Chapter 9); be patient and stay humble when making your case.

Commenting

I’m sure you’ve heard the excellent advice to (a) comment liberally and (b) avoid comments that add no new information (like /* Merge merges */). Let me add a new principle: comments exist to help future readers (including yourself) gain a holistic sense of your code as quickly as possible. Therefore, like any good writer, you have the hard job of imagining the mindset of future readers and what they may not understand when they open YourStrangeDecisions.java. Here are four guidelines:

Motivation: The code is its own description of what it does and can be, if necessary, read in every detail to figure out its behavior. The motivation for your decision lives only in your own mind unless you write it down. and you can be certain that future readers will be curious.
// PageCache is a simple in-memory cache that can populate the
// homepage in the rare event that the recommender engine is down.
// It provides a slightly stale, incomplete, but extremely simple and
// highly available view of the homepage to ensure that we can
// always serve *something*. We have intentionally accepted a larger
// window of potential staleness to reduce the load on the DB.
Design: The reader’s eyes can only see one piece of code at a time, and bootstrapping a grasp of the overall design through that peephole is really tough. Give the reader a hand.
// A refresher job repopulates the cache once per hour; see
// github.com/somecompany/cacherefresher. A "new-user" view of the page is synced
// into each frontend container and periodically reloaded from disk.

Assumptions or invariants : If a function or component has preconditions for correct behavior, even if asserted at runtime, they’re good candidates for commenting; your colleague’s shouldn’t have to run their code and fail to learn your requirements.

Examples
// Until Status() returns success, all other APIs will return errors.
// _lookup must be called with the lock already held
Non-example (too obvious to need commenting):
// Save() only succeeds if the Database is available
Anything else nonobvious a reader might wonder about: “Nonobviousness” is slippery and very much in the eye of the beholder, but to the best of your ability, any nonobvious behaviors of a piece of code should be commented. Examples:
// Close() must be called when done or the connection will be leaked
// fetch() may fall back to a local cache if the DB is unavailable

Commit Messages Are Underrated

I’ve recently seen a trend to extremely brief, unhelpful commit messages like Fix bug or Changes for my feature you've never heard of. That’s a problem; codebases evolve constantly, and in the limit, a codebase is its own documentation of what it does, but a year down the road, it can be hard or impossible to answer the perennial question: Why? Or, to put it differently, what the hey-hoo were those engineers thinking? A commit message is our one chance to capture our goals, design, and trade-offs at the time of our work for eternal posterity, even if the code later changes. Therefore, let’s do better for our future selves and teammates. I suggest the following template:
<One-sentence summary of purpose>
<Blank line>
<Detailed explanation of purpose> <Design> <Tradeoffs, quirks, caveats>
Here’s an example I dug up from the history of Git itself. Note that it includes the initial motivation, caveats, and thoughts for the future—you instantly get a sense of the point in time this commit messages represents in the evolution of the Git version control system.
commit bfe19f876cb20bea606e1a698030c017f31965c1
Author: Linus Torvalds <[email protected]>
Date:   Sat Aug 6 18:01:03 2005 -0700
    [PATCH] Extend "git reset" to take a reset point
    This was triggered by a query by Sam Ravnborg, and extends "git reset" to
    reset the index and the .git/HEAD pointer to an arbitrarily named point.
    For example
            git reset HEAD^
    will just reset the current HEAD to its own parent - leaving the working
    directory untouched, but effectively un-doing the top-most commit. You
    might want to do this if you realize after you committed that you made a
    mistake that you want to fix up: reset your HEAD back to its previous
    state, fix up the working directory and re-do the commit.
    If you want to totally un-do the commit (and reset your working directory
    to that point too), you'd first use "git reset HEAD^" to reset to the
    parent, and then do a "git checkout -f" to reset the working directory
    state to that point in time too.
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>

Quick aside: Shouldn’t we hear more discussion of how astonishing it is that Linus started Linux and Git?

Testing

I won’t cover the subject of good testing in detail here; it’s a profound and highly technical subject. I will offer a few simple principles:
  • Code isn’t high quality without tests, whatever the form your team may favor (unit, integration, end to end).

  • If in doubt, start with true unit tests (i.e., tests that examine a single subcomponent in isolation, like a function, package, or class)—there’s a special joy in refactoring a component with confidence because your unit tests cover its interface perfectly.

  • You should measure code coverage, that is, what lines and branches of your program are exercised by your tests.

  • There’s no magic number for code coverage, but I prefer 100% when at all possible—the exceptions are usually libraries that make it hard to inject failure. If every line tested automatically, you can sleep much easier.

Cleverness

I’m opposed to it. Favor explicitness and simplicity over-cleverness whenever possible. If you can’t avoid being clever, document it prodigiously. So, if you see some neat bitwise math that might get your calculation done in one line, or a neat pattern in a Haskell blog post you think you can emulate in Java, think twice or more; if you absolutely can’t contain yourself, you had better add a comment explaining exactly where your life went off the rails.

Duplication vs. Reuse

Code reuse, and pushing ourselves to design carefully for reuse, is perhaps the single most-quoted principle of software design, and rightly so. I want to call out just one caveat to that principle: it can sometimes exist in tension with risk in production systems. Judiciously repeating yourself is underrated; as far as you are in this book, you can probably already tell that I think that. Generalizing a shared codepath can sometimes be higher risk than creating a new, similar function. Good tests reduce that risk. Your first impulse should always be toward reuse, but if it’s difficult, if your confidence in tests is poor, and if the codepath is critical, you might consider duplicating.

Don’t Check Nonsensical Conditions

If a function in your codebase requires a non-null argument and receives all its input from trusted functions, don’t check for null. How to treat NULL, nil, or None depends on context—some types of code, like logging libraries, can make their users’ lives easier by smoothly handling null-like arguments, and code that takes external input (e.g., over the wire) always needs to be ready for anything. However, internal routines that protect against nonsensical input encourage sloppy coding and confuse the reader—we should hold our own codebase to a higher standard. For example, the following cache should not check for nil; the callers should never let a nil get that far.
func (c *EntityCache) SaveUser(u *entities.User) error {
      ...
}

Open Source Conservatism

In 2018, a widely used open source Node.js package was accidentally handed off to a malicious new maintainer who surreptitiously injected code to steal cryptocurrency. The takeaway: taking on an open source dependency is not much different from installing and frequently running a program written by some rando on the Internet. That program may be very useful and worth the risk of malice, but try to think twice before you casually add new dependencies; if it only has one GitHub star, you should pass.

Tests Should Be Fast (Never Sleep)

A slow test is an infuriating obstacle to development velocity. Therefore, your tests should be fast. It’s often tempting to sleep waiting for a condition like a timeout to occur (I’ve been guilty of this in my impetuous youth, but those sleeps can quickly add up to a painfully slow test suite). You can do better by injecting a clock interface everywhere, so you can mock that clock and accelerate time for tests. If you find your test suite taking more than a minute or two of every iteration on a diff, you should consider putting time into optimizing it; can you parallelize? Can you speed up setup?

Unit vs. Integration Testing

Integration tests can be slow; unit tests can be incomplete. A healthy test suite should have both; unit tests let us iterate quickly on changes in a component, shaping our work with frequent testing. Integration tests run longer but give us a higher degree of confidence that a component works with adjacent systems (which are no longer just mocked) before we send it to customers. My experience is that without unit tests, iteration becomes intolerably slow (see “Tests Should Be Fast”), but without integration tests, our confidence is unacceptably low before a system goes to production. Use both! I’d say that if you can absolutely only afford one, I’d prefer the confidence of an integration test, but it’s a judgment call.

Inject Dependencies

Dependency injection is a coding convention of providing components their dependencies at the time they’re created, rather than having a component instantiate its own dependencies. For example:
class ClassWithoutInjection {
    private IDBConnection dbConn;
    public ClassWithoutInjection() {
          this.dbConn = new DBConnection();
    }
}
class ClassWithInjection {
    private IDBConnection dbConn;
    public ClassWithInjection(IDBConnection dbConn) {
          this.dbConn = dbConn;
    }
}

Dependency injection’s aim is to decouple components, making it easy to change the implementation of a dependency. Most importantly, if we inject an interface, we can swap in a mocked implementation, which is a lifesaver for testing, but we can equally swap in a wrapped object or, for example, a different database backend. You’ll find countless DI frameworks on GitHub, and I’m not going to recommend one here. Suffice to say that I always favor the simpler, less magical, lighter-on-reflection ones.

Performance Last or a List of Priorities for Your Code

When I worked on the iOS kernel, I once reviewed a framework change where an engineer had researched my syscall interface and inserted the special trap instructions in our own code. Why?!, I asked, blind with rage—this handwritten assembly, inscrutable to the reader, would become an immensely hard-to-track-down bug if we ever changed our interface. His answer was performance, based on a superstitious theory of branch prediction. He was a smart and motivated engineer, but I felt then and feel now that this was profoundly misguided. I think our default priorities for a codebase should be
  1. 1.

    Correctness: The code does what it’s supposed to do and is easy to verify (i.e., test).

     
  2. 2.

    Maintainability: The code is simple, easy for ourselves and others to change, and is likely to work under reasonable changes in operating conditions.

     
  3. 3.

    Cost of development: The code is fast to write, that is, minimizes engineering costs.

     
  4. 4.

    Performance: The code runs fast.

     

This is a simplification, because different systems have different priorities and sufficiently poor performance will break your system (no result is correct if the user gives up before they see it); you’ll always have to weigh the Big Four priorities in context. However, it’s a decent guideline to start your analysis.

In most systems, subtle, brilliant micro-optimizations buy a tiny grain of performance at a steep cost in maintainability and robustness, while network roundtrips and database calls limit the user-observed latency; we should only pay readability for performance when (a) performance really matters and (b) we have data to suggest that the specific optimization in question will help. Some codebases really do require ultrafast performance, like the inner loops in graphics and machine learning systems. In those rare cases, once you’ve measured and confirmed where the time is going, use every unreadable trick you have to make the code lightning fast.

Code Review

Code review is a standard industry practice where all production code changes are read by one or more other engineers to verify their correctness and quality. The practice is intended to prevent breakages and sustain the quality of codebases over time, and it’s usually considered an important part of an engineer’s duties. It’s a partnership between two engineers; the author and the reviewer iterate together to produce the best change they can, meaning not just the best code quality but the best balance of code quality, testing, simplicity, and speed specific to the exact situation. It usually proceeds in cycles, with the reviewer asking questions and making suggestions, then the author answering those questions, accepting some suggestions, and pushing back on others. You’re going to give and get tons of code reviews; this section will offer tips for how to do both effectively.

Receiving Code Reviews

I don’t know of any science about the effectiveness of code review; you’ll hear plenty about how it’s cheaper to catch bugs in development than in production (which is true), but it also costs a lot of time; I don’t think anyone truly knows whether it has positive returns. Nevertheless, as a neurotic, reliability-focused engineer, I love it. Every code review I receive is a gift from a colleague, and every bug they catch turns the grim embarrassment of an outage into the tiny embarrassment of an update to a diff.

You should strive for the same attitude; remember that when a reviewer finds a bug, it might slow you down today but not half as much as writing a postmortem would, and when they help you fix your style, they save you the contempt of future readers.

Therefore, you should seek out the toughest code reviewers you can, take their feedback seriously, and thank them for their help. When you receive feedback, every single comment should be addressed, and your default position should be to accept every comment unless you have a good reason not to. You absolutely can push back on feedback with good reason; you might feel that a suggestion is the wrong balance of simplicity vs. comprehensiveness, that it should best be handled in a separate change, or that it’s simply wrong, and you can say those things. If you do, it should be gently and gratefully, with consideration for the reviewer’s generosity in helping you. You should never ignore any comment; even if you choose not to act on it, you should acknowledge it. Because many questions may be subtle or debatable, you’re welcome and encouraged to ask follow-up questions on feedback—does the reviewer feel strongly about that suggestion? Do they think it’s okay to defer some changes for later? Why do they think that using this pattern is better than the other? You should strive to deeply understand their feedback and learn, not just rotely do what they say.

Reviewing Code

Giving a code review is partnering with a colleague to help them produce the best code they can for the situation. They send you the diff; you exhaustively scrutinize the code as you would your own; you ask questions and give suggestions; your partner answers your questions, tweaks their code, pushes back on some suggestions, and sends it back to you; you rinse and repeat until you agree the code is ready for prime time.

A great code review flows from domain knowledge and strong opinions about code quality; nothing I can tell you can substitute for mastery of a system and programming language. However, masters or not, review we must, and below are the questions I ask myself about any diff:
  1. 1.

    Is it correct? This is obviously the primary question. To evaluate correctness, you must first understand exactly what a change is intended to do; if you’re not sure, you’re not ready to review (and the commit message probably leaves something to be desired!). Second, you must understand the system—do the functions return what this code expects them to? Is that really the right port? Those are things you should check, not assume. Finally, it requires you to read in detail, checking edge cases, error handling, threading, language usage, etc., etc., etc.

     
  2. 2.

    Is it clear? For me, this is foremost about naming, commenting, and simplicity, in that order, but can also bear on function length, commit messages, file layout, and so on. I take a heavy hand when commenting about clarity, and you should too.

     
  3. 3.

    Does it match the code around it? This is related but not identical to clarity; you should enforce consistency. Code style is the first consideration here, but quality trade-offs can also be matched of flouted. For example, it might (or might not) come off as silly to demand polish in an ad hoc test script; looking at the existing conventions is the only way to fine-tune your feedback.

     
  4. 4.

    Does it reinvent anything? Duplication isn’t always bad, but you should prima facie expect reuse, and duplication should be explained. Is there already a function they could use for some of the new code? Could they reuse an existing tool?

     
  5. 5.

    Is it well-tested? This more or less speaks for itself. Again, you need to weigh this in light of the existing conventions—demanding 100% coverage may be out of place in some teams/systems. However, you should err toward asking for full coverage.

     
  6. 6.

    Is it idiomatic? This question can only be answered by mastering your team’s languages yourself, which you must do.

     
  7. 7.

    Is the diff reasonably sized? Large diffs are hard to review and hard to get right; though not always avoidable, you should discourage them and look for smaller atoms that could be delivered separately.

     

The biggest question you’ll likely have to ask yourself as you start a review is: How deep should I go? If you’re a master of the code they’re editing, you’re in luck, but usually, you’ll need to make some decisions about what to trust and what to verify (like, “what exactly are the semantics of that value of the state enum?”). My advice is to go deep; answering questions for yourself is a great way to learn codebase, and the extra scrutiny will (often) be appreciated. As always, judge in context, and ask for advice to find your way to the right balance.

Beyond the Code

There are a few key considerations beyond how you read the actual code.

First is the tone of feedback. Code review feedback should always be supportive; we err toward trusting our colleagues’ diligence. You should never try to score points or let contempt creep into your feedback; you’re helping a friend. Softeners like “I think” and “what do you think?” may seem superfluous, but they can make all the difference; “This is bad style; it should be snake_case” will put people on edge compared to “I don’t think this is idiomatic Python; snake_case is preferred according to the style guide.” If that seems contrived, trust me that you’ll hear both tones all the time.

Second is latency. You’ll discover quickly that long waits for code reviews are incredibly frustrating, especially when collaborating across time zones. Save your colleague that frustration by treating code reviews as top-priority tasks, preempting any nonurgent work (unless you’re so deep in something that switching is very costly).

Third, I’ll advise against a common antipattern: the ping-pong review pair. Two people shouldn’t constantly review each other’s code without anyone else getting a look, because a pair system like that can easily devolve into rubber-stamping at low quality. You should try to get some diversity of perspective in the code review stream for every area.1

Finally, there’s the question of when to hold the line on quality and when to bend. Remember that this is contextual, not absolute; some code really doesn’t need to be great. If your team doesn’t philosophically value code quality the way you do, remember that you can’t change everything on the first day; you should try to level the team up gradually and respectfully (never throwing a fit). When it comes to incorrectness, I’ll usually draw a hard line. For a codebase I own or wrote from scratch, I also feel empowered to demand that code meets my standards. For common code, especially when you’re junior to your colleagues, you can aim for epsilon above the median—inch in the right direction, but accept that conventions take time to change.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.73.125