Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 1. Introduction to Robust Python

This book is all about making your Python better. To help you manage your codebase, no matter how large it is. To provide a toolbox of tips, tricks and strategies to build maintainable code. This book will guide you towards less bugs and happier developers. You’ll be taking a hard look at how you write code, and learn the implications of your decisions. When discussing how code is written, I am reminded of these wise words from C.A.R. Hoare:

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.¹

This book is about developing systems the first way. It will be more difficult, yes, but have no fear. I will be your guide on your journey to leveling up your Python game such that, as C.A.R. Hoare says above, there are obviously no deficiencies in your code. Ultimately, this is a book all about writing robust Python.

In this chapter I’m going to cover what robustness means and why you should care about it. I’ll go through how your communication method implies certain benefits and drawbacks, and how best to represent your intentions. The Zen of Python² states that there should be one-- and preferably only one — obvious way to do it. You’ll learn how to evaluate if your code is the obvious way or not, and what you can do to fix it. First, I need to talk basics. What is robustness in the first place?

Robustness

What Does “Robust” Mean?

Every book needs at least one dictionary definition, so I’ll get this out of the way nice and early in the book. Merriam-Webster offers many definitions for robustness ³:

having or exhibiting strength or vigorous health
having or showing vigor, strength, or firmness
strongly formed or constructed
capable of performing without failure under a wide range of conditions

These are fantastic descriptions of what we are aiming for. We want a healthy system, one that stays bug-free for years. We want our software to exhibit strength; it should be obvious that this code will stand the test of time. We want a strongly constructed system, one that is built upon solid foundations. Crucially, we want a system that is capable of performing without failure; we don’t want the system to be fragile or brittle as conditions change.

It is common to think of a software like a skyscraper, some grand structure that stands as a bulwark against all change and a paragon of immortality. The truth is, unfortuately messier. Software systems constantly evolve. Bugs are fixed, user interfaces get tweaked, features are added, removed, and then re-added. Frameworks shift, components go out of date, security bugs arise. Software changes. It is more akin to handling sprawl with city planning than it is building a static building. With ever changing codebases, how can you make your code robust? How can you build a strong foundation that is resilient to bugs?

The truth is, you have to accept the change. Your code will be split apart, stitched together and reworked. New use cases will alter huge swaths of code. And that’s okay. Embrace it. Understand that it’s not enough that your code can easily be changed; it might be best for it to be deleted and rewritten as it goes out of date. That doesn’t diminish its value; it will still have a long life in primetime. Your job is to make it easy to rewrite parts of the system. Once you start to accept the ephemeral nature of your code, you start to realize that it’s not enough to write bug-free code for the present; you need to enable the codebase’s future owners to be able to change your code with confidence. That is what this book is about.

You are going to learn to build strong systems. This strength doesn’t come from rigidity, like a bar of iron. It instead comes from flexibility. Your code needs to be strong like a tall willow tree, swaying in the wind, flexing, but not breaking. Your software will need to handle situations you would never dream of. Your codebase needs to be able to adapt to new circumstances, and it won’t always be you maintaining it. Those future maintainers need to know they are working in a healthy codebase. Your codebase needs to communicate it’s strength. You must write Python code in a way that reduces failure, even as future maintainers tear it apart and reconstruct it.

Writing robust code means deliberately thinking about the future. You want future maintainers to look at your code and understand your intentions easily, not curse your name during late night debugging sessions. You must convey your thoughts, your reasoning, and cautions. Future Developers need to bend your code into new shapes, and do it without worrying that each change knocks over a teetering house of cards.

Put simply, you don’t want your systems to fail, especially when the unexpected happens. Testing and quality assurance are huge parts of this, but neither of those bake quality completely in. They are more suited to illuminating gaps in expectations and offering a safety net. Instead, you must make your software stand the test of time. In orrder to do that, you must write clean and maintainable code.

Clean code expresses its intent clearly and concisely, in that order. When you look at a line of code and say to yourself “ah, that makes complete sense”, that’s an indicator of clean code. The more you have to step through a debugger, the more you have to look at a lot of other code to figure out what’s happening, the more you have to stop and stare at the code, the less clean it will be. Clean code does not favor clever tricks if it makes the code unreadable to other developers. Just like C.A.R. Hoare said earlier, you do not want to make your code so obtuse that it will be difficult to visually inspect it to understand it.

Maintainable code is code that, well, can be easily maintained. Maintenance starts immediately after the first commit and continues until there is not a single developer looking at the project anymore. Developers will be fixing bugs, adding features, reading code, extracting code for use in other libraries, etc. Maintainable code makes these tasks frictionless. Software lives for years, if not decades, so you need to focus on maintainability today.

You don’t want to be the reason systems fail, whether you are actively working on them or not. You need to be proactive in making your system stand the test of time. You need a testing strategy to be your safety net, but you also need to be able to avoid falling in the first place. So with all that in mind, I offer my definition of robustness in terms of software:

Robust software is resilient and error-free, in spite of constant change

Why Does Robustness Matter?

A lot of energy goes into making software do what it is supposed to. Development milestones are not easily predicted. It doesn’t help that you build something brand-new just about every time. Human factors such as UX, accessibility and documentation only increase the complexity. Now add in testing to ensure that you’ve covered a slice of known and unknown behaviors, and you are looking at lengthy cycles.

The purpose of software is to provide value. It is in stakeholder’s interests to deliver that full value as early as possible. Given the uncertainty around some development schedules, there is often extra pressure to meet expectations. We’ve all been on the wrong end of an unrealistic schedule or deadline. Unfortunately, many of the tools to make software incredibly robust only add onto our development cycle.

This does not mean that robust code is unimportant or “not worth it”. It’s true that there is an inherent tension between immediate delivery of value and making code robust. If your software is “good enough”, why add even more complexity? To answer that, consider how often that piece of software will be iterated upon. Delivering software value is typically not a static exercise; it’s rare that a system provides value and is never modified again. Software is ever-evolving by its very nature. The codebase needs to be prepared to deliver value frequently, and for long periods of time. This is where robust software engineering practices come into play. If you can’t painlessly deliver features quickly and without compromising quality), you need to re-evaluate techniques to make your code more maintainable.

If you deliver your system late, or broken, there are real-time costs that are incurred. Think through your codebase. Ask yourself what happens if your code breaks a year from now because someone wasn’t able to understand your code. How much value do you lose? Your value might be measured in money, time, or even lives. Ask yourself what happens if the value isn’t delivered on time? What are the repercussions? If the answers to these questions are scary, good news, the work you’re doing is valuable. But it also underscores why it’s so important to eliminate future errors.

You need to consider future developers. Multiple developers work on the same codebase simlutaneously. Many software projects will outlast most of those developers. You need to find a way to communicate to the present and future developers, without having the benefit of being there in person to explain. Future developers will be building off of your decisions. Every false trail, every rabbit hole, and every yak-shaving⁴ adventure will slow them down, which impedes value. You need empathy for those who come after you. You need to step into your shoes. This book is your gateway to thinking about your collaborators and maintainers. You need to write code that lasts. The first step to making code that lasts is being able to communicate through your code. You need to make sure future developers understand your intent.

What’s Your Intent?

Why should you strive to write clean code? Why should you care so much about robustness? The heart of these answers lies in communication. You’re not delivering static systems; software evolves and grows over time. Maintainers change over time. Your goal, when writing code, is to deliver value, but it’s also to write your code in such a way that other developers can deliver value just as quickly. In order to do that, you need to be able to communicate reasoning and intent without ever meeting your future maintainers.

Let’s take a look at a code block found in a hypothetical legacy system. I want you to estimate how long it takes for you to understand waht this code is doing. It’s okay if you’re not familiar with all the concepts here, or if you feel like this code is convoluted (it intentionally is!).

# Take a meal recipe and change the number of servings
# by adjusting each ingredient
# A recipe's first element is the number of servings, and the remainder
# of elements is (name, amount, unit), such as ("flour", 1.5, "cup")
def adjust_recipe(recipe, servings):
    new_recipe = [servings]
    old_servings = recipe[0]
    factor = servings / old_servings
    recipe.pop(0)
    while recipe:
        ingredient, amount, unit = recipe.pop(0)
        # please only use numbers that will be easily measurable
        new_recipe.append((ingredient, amount * factor, unit))
    return new_recipe

This function takes a recipe, and adjusts every ingredient to handle a new number of servings. however, this code breeds many questions.

What is the pop for?
What does recipe[0] signify? Why is that the old servings?
Why do I need a comment for numbers that will be easily measurable?

This is a bit of questionable python, for sure. I won’t blame you if you feel the need to rewrite it. It looks much nicer if it were something like this:

def adjust_recipe(recipe, servings):
    old_servings = recipe.pop(0)
    factor = servings / old_servings
    return ({"servings": servings} |
            {ingredient: (amount*factor, unit)})
            for ingredient, amount, unit in recipe

Those who favor clean code probably prefer the second version (I certainly do). No raw loops. Variables do not mutate. I’m returning a dictionary instead of a list of tuples. All these changes can be seen as positive, depending on the circumstances. But I may have just introduced three subtle bugs.

In the first example, I was clearing the original recipe out. Even if it’s just one area of calling code that is relying on this behavior, I broke assumptions.
By returning a dictionary, I have removed the ability to have duplicate ingredients in a list. This might have an effect on recipes that have multiple parts (such as a main dish and a sauce) that both use the same ingredient.
If any of the ingredients are named “servings” you’ve just introduced a collision with naming.

Whether these are bugs or not depends on two inter-related things: the author’s intent and calling code. The author intended to solve a problem, but I am unsure of why they wrote the code the way they did. Why are they popping elements? Why is servings a tuple inside the list? Why is a list used? Presumably, the author knew why, and communicated it locally to their peers. Their peers wrote calling code based on those assumptions, but as time wore on, that intent became lost. Without communication to the future, I am left with two options of maintaining this code:

Look at all calling code and confirm that this behavior is not relied upon before implementing. Good luck if this is a public API for a library with external callers. I spend a lot of time doing this, which frustrates me.
Make the change and wait to see what the fallout is (customer complaints, broken tests, etc.). If I’m lucky, nothing bad will happen. If I’m not, I spend a lot of time, which frustrates me.

Neither option feels good in a maintenance setting (especially if I have to modify this code). I don’t want to waste time; I want to deal with my current task quickly and move on to the next one. It gets worse if I consider how to call this code. Think about how you interact with previously unseen code. You might see other examples of calling code, copy them to fit your use case, and never realize that you needed to pass a specific string called servings as your first element of your list.

These are the sort of decisions that will make you scratch your head. We’ve all seen them in larger codebases. They aren’t written maliciously, but organically over time with the best intentions. Functions start simple, but as use cases grow and multiple developers contribute, that code tends to morph and obscure original intent. This is a sure sign that maintainability is suffering. You need to express intent in your code upfront.

So what if the original author made use of better naming patterns and better type usage? What would that code look like?

# Take a meal recipe and change the number of servings
# recipe should be a Recipe class
def adjust_recipe(recipe, servings):
    new_ingredients = list(recipe.ingredients)
    recipe.clear_ingredients()

    for ingredient in new_ingredients:
            ingredient.adjust_propoprtion(Fraction(servings, recipe.servings))
    return Recipe(servings, new_ingredients)

This looks much better, and expresses original intent clearly. The original developer encoded their ideas directly into the code. From this snippet, you know the following is true:

I am using a Recipe class. This allows me to abstract away certain operations. Presumably, inside the class itself, there is an invariant that allows for duplicate ingredients. (I’ll talk more about classes and invariants in Chapter 5.) This provides a common vocabulary that makes the function’s behavior more explicit.
Servings are now an explicit part of a recipe class, rather than needing to be the first element of the list, which was handled as a special case. This greatly simplifies calling code, and prevents inadvertent collisions.
It is very apparent that I want to clear out ingredients on the old recipe. No ambiguous reason for why I needed to do a .pop(0).
Ingredients are a separate class, and handle fractions rather than an explicit float. It’s clearer for all involved that I am dealing with fractional units, and can easily do things such as limit_denominator(), which can be called when people want to restrict measuring units (instead of relying on a comment)

I’ve replaced fields with types, such as a recipe type and an ingredient type. I’ve also defined operations (clear_ingredients, adjust_proportion) to communicate my intent. By making these changes, I’ve made the code’s behavior crystal clear to future readers. They no longer have to come talk to me to understand the code. Instead, they comprehend what I’m doing without ever talking to me. This is asynchronous communication at its finest.

Asynchronous Communication

It’s weird talking about asynchronous communication in a Python book without mentioning async and await. But I’m afraid I have to talk about asynchronous communication in a much more complex place: the real world.

Asynchronous communication means that producing information and consuming that information are independent of each other. There is a time gap between the production and consumption. It might be a few hours, as is the case of collaborators in different time zones. Or it might be years, as future maintainers try to do a deep dive into the inner workings of code. You can’t predict when somebody will need to understand your logic. You might not even be working on that codebase (or for that company) by the time they consume the information you produced.

Contrast that with synchronous communication. Synchronous communication is when people talk face-to-face (in-person or otherwise) and share knowledge. This form of direct communication is one of the best ways to express your thoughts, but unfortunately, it doesn’t scale, and you won’t always be around to answer questions.

In order to evaluate how appropriate each method of communication is when trying to understand intentions, I’ll look at two axes: proximity and cost.

Proximity is how close in time the communicators need to be in order for that communication to be fruitful. Some methods of communication excel with real-time transfer of information. Other methods of communication excel at communicating years later.

Cost is the measure of effort to communicate. You must weigh the time and money expended to communicate with the value provided. Your future consumers then have to weigh the cost of consuming the information with the value they are trying to deliver. Writing code and not providing any other communication channels is your baseline; you have to do this to produce value. To evaluate additional communication channel’s cost, here is what I factor in:

Discoverability: How easy was it to find this information outside of a normal workflow? How ephemeral is the knowledge? Is it easy to search for information?
Maintenance Cost: How accurate is the information? How often does it need to be updated? What goes wrong if this information is out of date?
Production Cost: How much time and money went into producing the communication?

In Figure 1-1, I plotted some common communication methods’ cost and proximity required.

A graph plotting cost vs proximity of different communication methods.

There are 4 quadrants that make up the cost/proximity graph.

Low Cost, High Proximity Required: These are cheap to produce and consume, but are not scalable across time. Direct communication and instant messaging are great examples of these methods. Treat these as snapshots of information in time; they are only valuable when the user is actively listening. Don’t rely on these methods to communicate to the future.
High-cost, High Proximity Required: These are costly events, and often only happen once (such as meetings or conferences). There should be a lot of value delivered through these events at the time of communication, because they do not provide much value to the future. How many times have you been to a meeting that felt like a waste of time? This is direct loss of value you are feeling. Talks require a multiplicative cost for each attendee (time spent, hosting space, logistics, etc.). Code reviews are rarely looked at once they are done.
High Cost, Low Proximity Required: These are costly, but that cost can be paid back over time in value delivered, due to the low proximity needed. Emails and agile boards contain a wealth of information, but are not discoverable by others. These are great for bigger concepts that don’t need frequent updates. It becomes a nightmare to try and sift through all the noise just to find the nugget of information you are looking for. Video recordings and design documentation are great for understanding snapshots in time, but are costly to keep updated. Don’t rely on these communication methods to understand day-to-day decisions.
Low Cost, Low Proximity Required: These are cheap to create, and are easily consumable. Code comments, version control history and project READMES all fall into this category, since they are adjacent to the source code we write. Users can view this communication years after it was produced. Anything that is in a developer’s workflow will become easily discoverable. These communication methods are a natural fit for the first place someone will look after the source code. However, your code is one of your best documentation tools, as it is the living record and single source of truth for your system.

Discussion Topic

This plot was created based on generalized use cases - think about the communication paths you and your organization uses. Where would you plot them on the graph? How easy is it to consume accurate information? How costly is it to produce information? Your answers to these questions may result in a slightly different graph, but the single source of truth will be in the executable software you deliver.

Low cost, low proximity communication methods are the best tool for communicating to the future. You should strive to minimize the cost of production and of consumption of communication. You have to write software to deliver value anyway, so the lowest cost option is making your code your primary communication tool. Your codebase becomes the best possible option for expressing your decisions, opinions, and workarounds clearly.

However, for this assertion to hold true, the code has to be cheap to consume as well. Your intent has to come across clearly in your code. Your goal is to minimize the time needed for a reader of your code to understand it. Ideally, a reader does not need to read your implementation, but just your function signature. Through the use of good types, comments and variable names, it should be crystal clear what your code does.

Self-Documenting Code

The wrong response to this plot is “Self-documenting code is all I need!” Every communication path provides value that code alone won’t be able to. Version control will give you a history of changes. Design Documents discuss sweeping ideals that are not local to any one code file. Meetings (when done right) can be an important event for synchronizing plan execution. You should absolutely strive for code to be self-documenting, but realize that just handles what the code is doing. Don’t disparage any other communication path.

The other quadrants of communication are still valuable. Design documentation absolutely has a place for big picture decisions. Talks are an incredibly effective way of sharing your ideas across large audiences. Meetings (when done effectively) are essential to act as a sync point between interconnected teams. Do not discount these methods, but understand that each communication method is tailored for specific use cases. This book focuses on what you can do in your code, but do not rely on just code to communicate your intent.

Examples of Intent In Python

Now that I’ve talked through what intent is and how it matters, let’s look at examples through a Python lens. How can you make sure that you are correctly expressing your intentions? Consider some common mistakes you come across when reading Python code?

Collections

When you pick a collection, you are communicating specific information. You must pick the right collection for the task at hand. Otherwise, maintainers will infer the wrong intention from your code.

Consider this code that takes a list of cookbooks and provides a count of how many times an author shows up:

def create_author_count(cookbooks: List[Cookbook]):
    counter = {}
    for cookbook in cookbooks:
        if cookbook.author not in counter:
            counter[cookbook.author] = 0
        counter[cookbook.author] += 1
    return counter

What does my use of collections tell you? Why am I not passing a dictionary or a set? Why am I not returning a list? Based on my current usage of collections, here’s what you can assert:

I pass in a list of cookbooks. There can be duplicate cookbooks in this list (I might be counting a shelf of cookbooks in a store with multiple copies).
I am returning a dictionary. Users can look up a specific author, or iterate over the entire dictionary. I do not have to worry about duplicate authors in the returned collection.

What if I wanted no duplicates in the list? A list communicates the wrong intention. Instead, I should have chosen a set to communicate that this code absolutely will not handle duplicates.

Choosing a collection tells readers about your specific intentions. Here’s a list of common collection types, and the intentions they convey:

List: This is a collection to be iterated over. It is mutable: able to be changed at any time. Very rarely do you expect to be retrieving specific elements from the middle of the list (using a static list index). There may be duplicate elements. The cookbooks on a shelf might be stored in a list.
String: An immutable collection of characters. The name of a cookbook would be a string.
Generators: A collection to be iterated over, and never indexed into. Each element access is performed lazily, so it may take time and/or resources through each loop iteration. They are great for computationally expensive or infinite collections. An online database of recipes might be returned as a generator; you don’t want to fetch all the recipes in the world when the user is only going to look at the first ten results of a search.
Tuple: Tuples are immutable collections. You do not expect it to change, so it is more likely to extract specific elements from the middle of the tuple (either through indices or unpacking). It is very rarely iterated over. The information about a specific cookbook might be represented as a tuple, such as (cookbook_name, author, pagecount)
Set: An iterable collection that contains no duplicates. You cannot rely on ordering of elements. The ingredients in a cookbook might be stored as a set.
Dictionary: A mapping from keys to values. Keys are unique across the dictionary. Dictionaries are typically iterated over, or indexed into using dynamic keys. A cookbook’s index is a great example of a key to value mapping (from topic to page number.)

Do not use the wrong collection for your purposes. Too many times have I come across a list that should not have had duplicates or a dictionary that wasn’t actually being used to map keys to values. Everytime there is a disconnect between what you intend and what is in code, you create a maintenance burden. Maintainers must pause, work out what you really meant, and then work around their faulty assumptions.

Dynamic vs. Static Indexing

Depending on the collection type you are using, you may or may not want to use a static index. A static index is when you always have the same index into the collection, regardless of collection, such as my_list[4] or my_dict["Python"]. In general, lists, and dictionaries will not often need a use case for this. You have no guarantee that the collection has the element you are looking for at that index, due to their dynamic nature. If you are looking for specific fields in these types of collections, this is a good sign that you need a user-defined type. It is safe to use a static index into tuples, since they are fixed size. Sets and generators are never indexed into.

Exceptions to this rule include:

Getting the first or last element of a sequence (my_list[0] or my_list[-1])
Using a dictionary as an intermediate data type such as reading JSON or YAML
Operations on a sequence that specifically deal with fixed chunks ( such as always splitting after the third element, or checking for a specific character in a fixed-format string)
Performance reasons for a specific collection type

In contrast, dynamic indexing is whenever you index into a collection with a variable that is not known until runtime. This is the most appropriate choice for lists, and dictionaries. You’ll see this when iterating over collections or searching for a specific element with a index() function..

These are basic collections, but there are more ways to express intent. Here are some special collection types that are even more expressive in communicating to the future:

frozenset: a set that is immutable.
OrderedDict: a dictionary that preserves order of elements based on insertion time
defaultdict: A dictionary that provides a default value if the key is missing. For example, I could rewrite my earlier example as follows:

from collections import defaultdict
def create_author_count(cookbooks: List[Cookbook]):
    counter = defaultdict(lambda: 0)
    for cookbook in cookbooks:
        counter[cookbook.author] += 1
    return counter

This introduces a new behavior for end users - if they query the dictionary for a value that doesn't exist, they will receive a 0. This might be beneficial in some use cases, but if it not, you just return `dict(counter)` instead.

Counter: a special type of dictionary used for counting how many times an element appears. This greatly simplifies our above code to the following:

from collections import Counter
def create_author_count(cookbooks: List[Cookbook]):
    return Counter(book.author for book in cookbooks)

Note

Built-in dictionaries are also ordered from CPython 3.6 and Python 3.7 onwards.

Take a minute to reflect on that last example. Notice how using a Counter gives us much more concise code without sacrificing readability. If your readers are familiar with Counter, the meaning of this function (and how the implementation works) is immediately apparent. This is a great example of communicating intent to the future through better selection of collection types. I’ll continue to explore collections further in Chapter 5.

There are plenty more types to explore, such as array, bytes, ranges and more. Whenever you come across a new collection type, built-in or otherwise, ask yourself how it differs from other collections and what it conveys to future readers.

Iteration

Iteration is another example where the abstraction you choose dictates the intent you convey.

How many times have you seen code like this?

text = "This is some generic text"
index = 0
while index < len(text):
    print(text[index])
    index += 1

This simple code prints each character on a separate line. This is perfectly fine for a first pass at Python for this problem, but the solution quickly evolves into the more Pythonic:

for character in text:
    print(character)

or even more simply:

`print("
".join(text))`

Take a moment and reflect on why these last two options are preferable. In the join() case, it is because I am using a named abstraction of a loop (which again, communicates intent clearly). But even the for loop is clearer than the while loop. It’s because the for loop is the more appropriate choice for my use case. Just like collection types, the looping construct you select explicitly communicates different concepts. Here’s a list of looping constructs and what they convey:

For-loops: For loops are used for iterating over each element in a collection or range and performing an action/side effect
While-loops: While loops are used for iterating until a certain condition occurs
Comprehensions: Comprehensions are used for transforming one collection into another (normally does not have side effects, especially if the comprehension is lazy)
Recursion: Recursion is used when the sub-structure of a collection is identical to the structure of a collection (for example, each child of a tree is also a tree).

You want each line of your codebase to deliver value. Furthermore, you want each line to clearly communicate what that value is to future developers. This drives a need to minimize any amount of boilerplate, scaffolding, and superfluous code. In the example above, I am iterating over each element and performing a side-effect ( printing an element), which makes the for loop an ideal looping construct. I am not wasting code. In contrast, the while loop requires us to explicitly track looping until a certain condition occurs. In other words, I need to track a specific condition, and mutate a variable every iteration. This distracts from the value the loop provides, and provides unwanted cognitive burden.

Law of Least Surprise

Distractions from intent are bad, but there’s a class of communication that is even worse: when code actively surprises your future collaborators. You want to adhere to the Law of Least Surprise. When someone reads through the codebase, they should almost never be surprised at behavior or implementation (and when they are surprised, there should be a great comment near the code to explain why it is that way). This is why communicating intent is paramount. Clear and clean code lowers chances for miscommunication.

Note

The Law Of Least Surprise, also known as the Law of Least Astonishment states that a program should always respond to the user in the way that astonishes them the least⁵. Surprising behavior leads to confusion. Confusion leads to misplaced assumptions. Misplaced assumptions lead to bugs. And that is how you get unreliable software.

Bear in mind, you can write completely correct code and still surprise someone in the future. There was one nasty bug I was chasing early in my career that crashed due to corrupted memory. Putting the code under a debugger or putting too many print statements in affected timing such that the bug would not manifest (a true “heisenbug⁶“). There were literally thousands of lines of code that related to this bug.

So I had to do a manual bisect, splitting the code in half, see which half actually had the crash by removing the other half, and then do it all over again in that code half. After two weeks of tearing my hair out, I finally decided to inspect an innocuous sounding function called getEvent. It turns out that this function was actually setting an event with invalid data. Needless to say, I was very surprised. The function was completely correct in what it was doing, but because I missed the intent of the code, I overlooked the bug for at least three days. Surprising your readers will cost their time.

A lot of this surprise ends up coming from complexity. There are two types of complexity: necessary complexity and accidental complexity. Necessary complexity is the complexity inherent in your domain. Deep learning models are necessarily complex - they are not something you browse through the inner workings of and understand in a few minutes. Optimizing Object-Relational Mapping is necessarily complex - there is a large variety of possible user inputs that have to be accounted for. You won’t be able to remove necessary complexity, so your best bet is to make sure it doesn’t sprawl across your codebase.

In contrast, accidental complexity is the complexity that produces superfluous, wasteful or confusing statements in code. It’s what happens when a system evolves over time and developers are jamming features in without re-evaluating old code to see if their original assertions hold true. I once worked on a project where adding a single command line option (and associated means of programmatically setting it) touched no fewer than 10 files. Why would adding one simple value ever need to require changes all over the codebase?

You know you have accidental complexity if you’ve ever experienced the following:

Things that sound simple (adding users, changing a UI control, etc.) are non-trivial to implement
Difficulty onboarding new developers into understanding your codebase. New developers on a project are your best indicators of how maintainable your code is right now - no need to wait years.
Estimates for adding functionality are always high, and you slip the schedule anyway.

Remove accidental complexity and isolate your necessary complexity wherever possible. Those will be the stumbling blocks for your future collaborators. These sources of complexity compound miscommunication, as they obscure and diffuse intent throughout the codebase.

Discussion Topic

What accidental complexities do you have in your codebase? How challenging would it be to understand simple concepts if you were dropped into the codebase with no communication to other developers? What can you do to simplify those complexities (especially if they are in often-changing code)?

Throughout the rest of the book, I will look at different techniques in Python to make our systems more robust. This book is broken up into 4 parts

Part 1: I’ll start with types in Python. Types are fundamental to the language, but are not often delved into. The types you choose matter, as these convey a very specific intent. I’ll talk about type annotations and what specific annotations communicate to the developer. I’l also go over typecheckers and how those help catch bugs early.
Part 2: After talking about how to think about Python’s types, I’ll focus on how to create your own types. I’ll talk about enumerations, dataclasses and classes in depth. I’ll explore how making certain design choices in designing a type can increase or decrease the robustness of your code.
Part 3: Now that you’ve learned how to communicate your intentions, I’ll focus on how to enable users to change your code effortlessly. You will learn how to take that strong foundation, and let others build with confidence. I’ll cover extensibility, dependencies, and architectural patterns that allow you to modify your system with minimal impact.
Part 4: Lastly, I’ll explore about how to build a safety net, so that you can gently catch your future collaborators when they do fall. Their confidence will increase, knowing that they have a strong, robust system that they can fearlessly adapt to their use case. I’ll cover a variety of static analysis and testing tools that will help you catch rogue behavior.

Wrap-up

Robust code matters. Clean code matters. Your code needs to be maintainable for the entire lifetime of the codebase, and in order to do that, you need to put active foresight into what you are communicating and how. You need to clearly embody your knowledge as close to the code as possible. It will feel like a burden to continuously look forward, but with practice it becomes natural, and you start reaping the gains as you work in your own codebase.

Every abstraction, every line, and every choice in a codebase communicates something, whether intentional or not. I encourage you to think about each line of code you are writing and ask yourself “What will a future developer learn from this?”. You owe it to future maintainers to be able to deliver value at the same speed that you can today. Otherwise, your codebase will get bloated, schedules will slip, and complexity will grow. It is your job as a developer to mitigate that risk.

Look for potential hotspots, such as incorrect abstractions (such as collections or iteration) or accidental complexity. These are prime areas where communication can break down over time. If these are areas that change often, these are a priority to address now.

In the next chapter, you’re going to take what you learned from this chapter, and apply it to a fundamental Python concept: types. The types you choose express your intent to future developers, and picking the correct type is just as important as picking the correct abstraction.

¹ 1980 Turing Award Lecture “The Emperor’s Old Clothes”

² https://www.python.org/dev/peps/pep-0020/

³ https://www.merriam-webster.com/dictionary/robust

⁴ https://seths.blog/2005/03/dont_shave_that/

⁵ Geoffry James, The Tao of Programming

⁶ A bug that displays different behavior when being observed. SIGSOFT ’83: Proceedings of the ACM SIGSOFT/SIGPLAN software engineering symposium on High-level debugging

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
1. Introduction to Robust Python

Chapter 1. Introduction to Robust Python

Robustness

What Does “Robust” Mean?

Why Does Robustness Matter?

What’s Your Intent?

Asynchronous Communication

Figure 1-1. Plotting Cost and Proximity of Communcation Methods

Discussion Topic

Examples of Intent In Python

Collections

Note

Iteration

Law of Least Surprise

Note

Discussion Topic

Wrap-up

Table of Contents for 1. Introduction to Robust Python

Create new playlist

Sign In

Sign Up

Chapter 1. Introduction to Robust Python

Robustness

What Does “Robust” Mean?

Why Does Robustness Matter?

What’s Your Intent?

Asynchronous Communication

Figure 1-1. Plotting Cost and Proximity of Communcation Methods

Discussion Topic

Examples of Intent In Python

Collections

Note

Iteration

Law of Least Surprise

Note

Discussion Topic

Wrap-up

Table of Contents for
1. Introduction to Robust Python