Chapter 3. Type Annotations

Python is a dynamically-typed language; types can be changed at runtime.. This is an obstacle when trying to write robust code. Since types are embedded in the value itself, developers have a very tough time knowing what type they are working with. Sure, that name looks likes a string today, but what happens if someone makes it bytes? Assumptions about types are built on shaky grounds with dynamically typed languages. Hope is not lost, though. In Python 3.5, a brand-new feature was introduced: type annotations.

Type annotations brings your ability to write robust code to a whole new level. Guido van Rossum, creator of Python, says it best

I’ve learned a painful lesson that for small programs dynamic typing is great. For large programs you have to have a more disciplined approach and it helps if the language actually gives you that discipline, rather than telling you “Well, you can do whatever you want”1

Type annotations are the more disciplined approach, the extra bit of care you need to wrangle larger codebases. In this chapter, you’ll learn how to use type annotations, why they are so important, and how to utilize a tool called a typechekcer to enforce your intentions throughout your codebase.

Type Annotations

In Chapter 2, you got your first glance at a type annotation:

def close_kitchen_if_past_close(point_in_time: datetime.datetime): 1
    if point_in_time >= closing_time():
        close_kitchen()
        log_time_closed(point_in_time)
1

The type annotation here is : datetime.datetime

Type annotations are an additional syntax notifying the user of an expected type of your variables. These annotations serve as type hints; they provide hints to the reader, but they are not actually used by the Python language. In fact, you are completely free to ignore the hints:

# CustomDateTime offers all the same functionality with
# datetime.datetime. I'm using it here for it's better
# logging facilities
close_kitchen_if_past_close(CustomDateTime("now")) # no error
Warning

It should be a rare case where you go against a type hint. The author very clearly intended a specific use case. If you go against that use case, and the code changes, you don’t have any protections that you can work with the changed method.

Python will not throw any error at runtime in this scenario. As a matter of fact, it won’t use the type annotations at all during runtime. There is no checking or cost for using these when Python executes. These type annotations still serve a crucial purpose: informing your readers of the expected type. Maintainers of code will know what types they are allowed to use when changing your implementation. Calling code will also benefit, as developers will know exactly what type to pass in. By implementing type annotations, you reduce friction.

Put yourself in your future maintainer’s shoes. Wouldn’t it be nice to come across code that is intuitive to use? You wouldn’t have to dig through function after function to determine usage. You wouldn’t assume a wrong type and then need to deal with fallout of exceptions and wrong behavior.

Consider a piece of code that takes in employee’s availability and a restaurant’s opening time, and then schedules available workers for that day. You want to use this piece of code, and you see the following:

def schedule_restaurant_open(open_time, workers_needed):

Let’s ignore the implementation for a minute, because I want to focus on first impressions. What do you think can get passed into this? Stop, close your eyes, and ask yourself what are reasonable types that can be passed in before reading on. Is open_time a datetime, the number of seconds since epoch, or maybe a string containing an hour? Is workers_needed a list of names, a list of Worker objects, or something else? If you guess wrong, or aren’t sure, you need to go look at either the implementation or calling code, which I’ve established takes time and is frustrating.

Let me provide an implementation and you can see how close you were.

import datetime
import random

def schedule_restaurant_open(open_time: datetime.datetime,
  workers_needed: int):
    workers = find_workers_available_for_time(open_time)
    # use random.sample to pick X available workers
    # where X is the number of workers needed.
    for worker in random.sample(workers, workers_needed):
        worker.schedule(open_time)

You probably guessed that open_time is a datetime, but did you consider that workers_needed could have been an int? As soon as you see the type annotations, you get a much better picture of what’s happening. This reduces cognitive overhead and reduces friction for maintainers.

Warning

This is certainly a step in the right direction, but don’t stop here. If you see code like this, consider renaming the variable to number_of_workers_needed to reflect just what the integer means. In the next chapter, I’ll also explore type aliases, which provides an alternate way of expressing yourself..

So far, all the examples I’ve shown have focused on parameters, but you’re also allowed to annotate return types.

Consider the schedule_restaurant_open function. In the middle of that snippet, I called find_workers_available_for_time. This returns to a variable named workers. Suppose you want to change the code to pick workers who have gone the longest without working, rather than random sampling? Do you have any indication what type workers is?

If you were to just look at the function signature, you would see the following:

def find_workers_available_for_time(open_time: datetime.datetime):

Nothing in here helps us do your job quicker. You could guess and the tests would tell us, right? Maybe it’s a list of names? Instead of letting the tests fail, maybe you should go look through the implementation.

def find_workers_available_for_time(open_time: datetime.datetime):
    workers = worker_database.get_all_workers()
    available_workers = [worker for worker in workers
                           if is_available(worker)]
    if available_workers:
        return available_workers

    # fall back to workers who listed they are available in
    # in an emergency
    emergency_workers = [worker for worker in get_emergency_workers()
                           if is_available(worker)]


    if emergency_workers:
        return emergency_workers

    # Schedule the owner to open, they will find someone else
    return [OWNER]

Oh no, there’s nothing in here that tells you what type you should be expecting. There are three different return statements throughout this code, and you hope that they all return the same type (surely every if statement is tested through unit tests to make sure they are consistent, right? Right?) You need to dig deeper. You need to look at worker_database. You need to look at is_available and get_emergency_workers. You need to look at the OWNER variable. Every one of these needs to be consistent, or else you’ll need to handle special cases in your original code.

And what if these functions also don’t tell you exactly what you need? What if you have to go deeper through multiple function calls? Every layer you have to go through is another layer of abstraction you need to keep in your brain. Every piece of information contributes to cognitive overload. The more cognitive overload you are burdened with, the more likely a mistake will happen.

All of this is avoided by annotating a return type. Return types are annotated by putting -> <type> at the end of the function declaration. If you came across this function signature:

def find_workers_available_for_time(open_time: datetime.datetime) -> list[str]:

You now know that you should indeed treat workers as a list of strings. No digging through databases, function calls or modules needed.

Finally, you can annotate variables when needed.

workers: list[str] = find_workers_available_for_time(open_time)
numbers: list[int] = []
ratio: float = get_ratio(5,3)

Most of the time, I won’t bother annotating variables, unless there is something specific I want to convey in my code (such as a type that is different from expected). I don’t want to get too into the realm of putting type annotations on literally everything - the lack of verbosity is what drew many developers to Python in the first place. The types can clutter your code, especially when it is blindingly obvious what the type is.

number: int = 0
text: str = "useless"
values: list[float] = [1.2, 3.4, 6.0]
worker: Worker = Worker()

None of these type annotations provide useful value than what is already provided by Python itself. Readers of this code know that "useless" is a string. Remember, type annotations are used for type hinting; you are providing notes for the future to improve communication. You don’t need to state the obvious everywhere.

Benefits

As with every decision you make, you need to weigh the costs and benefits. Thinking about types up front helps your deliberate design process, but are there other benefits type annotations provide? I’ll show you how type annotations really pull their weight through tooling.

Autocomplete

I’ve mainly talked about communication to other developers, but your Python environment benefits from type annotations as well. Since Python is dynamically-typed, it is difficult to know what operations are available. With type annotations, many Python-aware code editors will autocomplete your variable’s operations.

In Figure 3-1, you’ll see a screenshot that illustrates VSCode detecting a datetime and offering to autocomplete my variables.

an IDE showing autocompletion
Figure 3-1. An IDE showing autocompletion

Typecheckers

Throughout this book, I’ve been talking about how types communicate intent, but have been leaving out one key detail: No programmer has to honor these type annotations if they don’t want to. If your code contradicts a type annotation, it is probably an error, and you’re still relying on humans to catch bugs. I want to do better. I want a computer to find these sorts of bugs for me.

I showed this snippet when talking about dynamic typing back in Chapter 2:

a: int = 5
a = "string"
a
>>> "string"

Herein lies the challenge: How do type annotations make your codebase robust, when you can’t trust that develoeprs will follow their guidance? In order to be robust, you want your code to stand the test of time. To do that, you need some sort of tool that can check all your type annotations and flags if anything is amiss. That tool is called a typechecker.

Typecheckers are what allow the type annotations to transcend from communication method to a safety net. It is a form of static analysis. Static analysis tools are tools that run on your source code, and don’t impact your runtime at all. You’ll learn more about static analysis tools in a later chapter, but for now, I will just explain typecheckers.

First, I need to install one. I’ll use mypy, a very popular typechecker.

pip install mypy

Now I’ll take the file with incorrect behavior:

a: int = 5
a = "string"
mypy invalid_type.py

chapter3/invalid_type.py:2: error: Incompatible types in assignment (expression has type "str", variable has type "int")
Found 1 error in 1 file (checked 1 source file)

And just like that, my type annotations become a first line of defense against errors. Anytime you make a mistake and go against the author’s intent, a type checker will find out and alert you. In fact, in most development environments, it’s possible to get this analysis in real-time, notifying you of errors as you type. (Without reading your mind, this is about as early as a tool can catch errors, which is pretty great.)

Exercise: Spot the Bug

Here are some more examples of mypy catching errors in my code. I want you to look for the error in each code snippet and time how long it takes you to find the bug or give up, and then check the output listed below the snippet to see if you got it right.

def read_file_and_reverse_it(filename: str) -> str:
    with open(filename) as f:
        # Convert bytes back into str
            return f.read().encode("utf-8")[::-1]
mypy chapter3/invalid_example1.py
chapter3/invalid_example1.py:3: error: Incompatible return value type (got "bytes", expected "str")
Found 1 error in 1 file (checked 1 source file)

Whoops, I’m returning bytes, not a string. I made a call to encode instead of decode, and got my return type all mixed up.I can’t even tell you how many times I made this mistake moving Python 2.7 code to Python 3. Thank goodness for typecheckers.

Here’s another example:

from typing import List
# takes a list and adds the doubled values
# to the end
# [1,2,3] => [1,2,3,2,4,6]
def add_doubled_values(my_list: List[int]):
    my_list.update([x*2 for x in my_list])

add_doubled_values([1,2,3])
mypy chapter3/invalid_example2.py
chapter3/invalid_example2.py:6: error: "List[int]" has no attribute "update"
Found 1 error in 1 file (checked 1 source file)

Another innocent mistake I made by calling update on a list instead of extend. These sort of mistakes can happen quite easily when moving between collection types (in this case from a set, which does offer an update method, to a list, which doesn’t)

One more example to wrap it up:

# The restaurant is named differently in different
# in different parts of the world
def get_restaurant_name(city: str) -> str:
    if city in ITALY_CITIES:
            return  "Trattoria Viafore"
    if city in GERMANY_CITIES:
            return "Pat's Kantine"
    if city in US_CITIES:
            return "Pat's Place"
    return None


if get_restaurant_name('Boston'):
    print("Location Found")
chapter3/invalid_example3.py:14: error: Incompatible return value type (got "None", expected "str")
Found 1 error in 1 file (checked 1 source file)

This one is subtle. I’m returning None when a string value is expected. If all the code is just checking conditionally for the restaurant name to make decisions, like I do above, tests will pass, and nothing will be amiss. This is true even for the negative case, because None is absolutely fine to check for in if statements (it is false-y). This is an example of Python’s dynamic typing coming back to bite us.

However, a few months from now, some developer will start trying to use this return value as a string, and as soon as a new city needs to be added, the code starts trying to operate on None values, which causes exceptions to be raised. This is not very robust; there is a latent code bug just waiting to happen. But with typecheckers, you can stop worrying about this, and catch these mistakes early.

Warning

With typechecker available, do you even need tests? You certainly do. Typecheckers catch a specific class of errors - those of incompatible types. There are plenty of other classes of errors that you still need to test for. Treat typecheckers as just one tool in your arsenal of finding bugs.

In all of these examples, typecheckers found a bug just waiting to happen. It doesn’t matter if the bug would have been caught by tests, or by code review, or by customers; typecheckers catch it earlier, which saves time and money. Typecheckers start giving us the benefit of a statically-typed language, while still allowing the Python runtime to remain dynamically-typed. This truly is the best of both worlds.

At the beginning of the chapter, you’ll find a quote from Guido van Rossum. Guido van Rossum is the creator of the Python programming language. While working at Dropbox he found that large codebases struggled without having a safety net. He became a huge proponent for driving type hinting into the language. If you want your code to communicate intent and catch errors, start adopting type annotations and typechecking today.

Discussion Topic

Has your codebase had an error slip through that could have been caught by typecheckers? How much do those errors cost you? How many times has it been a code review or an integration test that caught the bug? How about bugs that made it to production?

When To Use

Now, before you go adding types to everything, I need to talk about the cost. Adding types is simple, but can be overdone. As users try to test and play around with code, they may start fighting the typechecker because they feel bogged down when writing all the type annotations. There is an adoption cost for users who are just getting started with type hinting. I also mentioned that I don’t type annotate everything. I won’t annotate all my variables, especially if the type is obvious. I also won’t typically type annotate parameters for every small private method in a class.

When should you use typecheckers?

  • Functions that you expect other modules or users to call (e.g. public API, library entry points, etc.)

  • Code that you want to highlight where a type is complicated (e.g. a dictionary of strings mapped to lists of objects) or unintuitive.

  • Areas where mypy complains that you need a type (typically when assigning to an empty collection - it’s easier to go along with the tool than against it.)

Typecheckers will infer types for any value that it can, so even if you don’t fill in all types, you still reap the benefits.

Wrap-up

There was consternation in the Python community when type hinting was introduced. Developers were afraid that Python was becoming a statically typed language like Java or C++. Developers felt that adding types everywhere would slow them down and destroy the benefits of the dynamically typed language they fell in love with.

However, type hints are just that: hints. They are completely optional. I don’t recommend them for small scripts, or any piece of code that isn’t going to live a very long time. But if your code needs to be maintainable for the long term, type hints are invaluable. They serve as a communication method, make your environment smarter, and detect errors when combined with typecheckers. They protect the original author’s intent. When annotating types, you decrease the burden a reader has in understanding your code. You reduce the need to read the implementation of a function to know what its doing. Code is complicated, and you should be minimizing how much code a developer needs to read. By using well thought out types, you reduce surprise and increase reading comprehension.

The typechecker is also a confidence builder. Remember, in order for your code to be robust it has to be easy to change, rewrite and delete if needed. The typechecker can allow developers to do that with less trepidation. If something was relying on a type or field that got changed or deleted, the typechecker will flag the offending code as incompatible. Automated tooling makes you and your future collaborator’s jobs simpler; less bugs will make it to production and features will get delivered quicker.

In the next chapter, you’re going to go beyond basic type annotations and learn how to build a vocabulary of all new types. These types will help you constrain behavior in your codebase, limiting the ways things can go wrong. I’ve only scratched the surface of how useful type annotations can be.

1 A Language Creators’ Conversation, PuPPy Annual Benefit 2019, https://www.youtube.com/watch?v=csL8DLXGNlU

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.114.125