Chapter 4. Constraining Types

Many developers learn the basic type annotations and call it a day. But I’m far from done. There is a wealth of advanced type annotations that are invaluable. These advanced type annotations allow you to constrain types, further restricting what they can represent. Your goal is to make illegal states unrepresentable. Developers should physically not be able to create types that are contradictory or otherwise invalid in your system. You can’t have errors in your code if it’s impossible to create the error in the first place. You can use type annotations to achieve this very goal, saving time and money. In this chapter I’ll teach you six different techniques:

Optional

Use to replace null references in your codebase

Union

Use to present a selection of types

Literal

Use to restrict developers to very specific values

Annotated

Use to provide additional description of your types

NewType

Use to restrict a type to a specific context

Final

Use to prevent variables from being rebound to a new value.

Let’s start with handling null references with Optional types.

Optional Type

Null references is often referred to as the “billion dollar mistake”, coined by C.A.R. Hoare:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language. My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. 1

While null references started in Algol, they would pervade countless other languages. C and C++ are often derided for null pointer dereference (which produces a segmentation fault or other program-halting crash). Java was well-known for having to catch NullPointerException throughout your code. It’s not a stretch to say that these sorts of bugs have a price tag measured in the billions - think of the developer time, customer loss, and system failures due to accidental null pointers or references.

So, why does this matter in Python? C.A.R Hoare’s quote is about object oriented compiled languages back in the 60s; Python must be better by now, right?. I regret to inform you that this billion-mistake is in Python as well. It appears to us under a different name: None. I will show you a way to avoid the costly None mistake, but first, let’s talk about why None is so bad.

Note

It is especially illuminating that C.A.R. Hoare admitted that null references were born out of convenience. It goes to show you how taking the quicker path can lead to all sorts of pain later in your development lifecycle. Think how your short-term decisions today will adversely affect your maintenance tomorrow.

Let’s consider some code that runs an automated hot dog stand. I want my system to take a bun, put the hotdog in the bun, and then squirt ketchup and mustard through automtaed dispensers, as described in Figure 4-1. What could go wrong?

Worfklow for the automated hotdog stand
Figure 4-1. Workflow for the automated hot dog stand
def create_hot_dog():
    bun = dispense_bun()
    hotdog = dispense_hotdog()
    hotdog.place_in_bun(bun)
    ketchup = dispense_ketchup()
    mustard = dispense_mustard()
    hotdog.add_condiments(ketchup, mustard)
    return hotdog

Pretty straightforward, no? Unfortunately, there’s no way to really tell. It’s easy to think through the happy path, or the control flow of the program when everything goes right, but when talking about robust code, you need to consider error conditions. If this were an automated stand with no manual intervention, what errors can you think of?

Here’s my non-comprehensive list of errors I can think of:

  • Out of ingredients (buns, hotdog, or ketchup/mustard)

  • Order cancelled mid-process.

  • Condiments get jammed

  • Power gets interrupted

  • Customer doesn’t want ketchup or mustard, and tries to move the bun mid-process

  • Rival vendor switches the ketchup out with catsup. Chaos ensues.

Now, your system is state-of-the-art and will detect all of these conditions, but it does so by returning None when any one ingredient fails. What does this mean for this code? You start seeing errors like the following:

Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'place_hotdog'

Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'add_condiments'

It would be catastrophic if these errors bubbled up to your customers; you pride yourself on a clean UI and don’t want ugly tracebacks defiling your interface. To address this, you start to code defensively, or coding in such a way that you try to foresee every possible error case and account for it. Defensive programming is a good thing, but it leads to code like this:

def create_hot_dog():
    bun = dispense_bun()
    if bun is None:
        print_error_code("Bun unavailable. Check for bun")
        return None
    hotdog = dispense_hotdog()
    if hotdog is None:
        print_error_code("Hotdog unavailable. Check for hotdog")
        return None
    hotdog.place_in_bun(bun)


    ketchup = dispense_ketchup()
    mustard = dispense_mustard()
    if ketchup is None or mustard is None:
        print_error_code("Check for invalid catsup")
        return None
    hotdog.add_condiments(ketchup, mustard)
    return HotDog

This feels, well, tedious. Because any value can be None in Python, it seems like you need to engage in defensive programming and do an is None check before every dereference. This is overkill; most developers will trace through the call stack and ensure that no None values are returned to the caller. That leaves calls to external systems and maybe a scant few calls in your codebase that you always have to wrap with None checking. This is error-prone; you cannot expect every developer to ever touch your codebase to know instinctively where to check for None. Furthermore, the original assumptions you’ve made when writing (e.g. this function will never return None) can be broken in the future, and now your code has a bug. And herein lies your problem: counting on manual intervention to catch error cases is unreliable.

The reason this is so tricky (and so costly) is that None is treated as a special case. It exists outside the normal type hierarchy. Every variable can be assigned to None. In order to combat this, you need to find a way of representing None inside your type hierarchy. You need Optional types.

Optional types offer you two choices: either you have a value or you don’t. In other words, it is optional to set the variable to a value.

from typing import Optional
maybe_a_string: Optional[str] = "abcdef" # This has a value
maybe_a_string: Optional[str] = None     # This is the absence of a value

This code indicates that the variable maybe_a_string may optionally contain a string. That code typechecks just fine, whether there is a string value bound to maybe_a_string or a None value.

At first glance, it’s not apparent what this buys you. You still need to use None to represent the absence of a value. I have good news for you, though. There are three benefits I associate with Optional types.

First, you communicate your intent more clearly. If a developer sees an Optional type in a type signature, they view that as a big red flag that they should expect None as a possibility.

def dispense_bun() -> Optional[Bun]:
# ...

If you notice a function returning an Optional value, take heed and check for None Values.

Secondly, you are able to further distinguish the absence of value from an empty value. Consider the innocuous list. What happens if you make a function call and receive an empty list? Was it just that no results were provided back to you? Or was it that an error occurred and you need to take explicit action? If you are receiving a raw list, you don’t know without trawling through source code. However, if you use an Optional, you are conveying one of three possibilities:

  • A list with elements - valid data to be operated on

  • A list with no elements - no error occurred, but no data was available (provided that no data is not an error condition)

  • None - An error occurred that you need to handle

Finally, typecheckers can detect Optional types and make sure that you aren’t letting None values slip through.

Consider:

def dispense_bun() -> Bun:
    return Bun('Wheat')

Let’s add some error cases to this code:

def dispense_bun() -> Bun:
    if not are_buns_available():
        return None
    return Bun('Wheat')

When run with a typechecker, you get the following error:

code_examples/chapter4/invalid/dispense_bun.py:12: error: Incompatible return value type (got "None", expected "Bun")

Excellent! The typechecker will not allow you to return a None value by default. By changing the return type from Bun to Optional[Bun], the code will typecheck successfully. This will give developers hints that they should not return None without encoding information in the return type. You can catch a common mistake and make my code more robust. But what about the calling code?

It turns out that the calling code benefits from this as well. Consider:

def create_hot_dog():
    bun = dispense_bun()
    hotdog = dispense_hotdog()
    hotdog.place_in_bun(bun)
    ketchup = dispense_ketchup()
    mustard = dispense_mustard()
    hotdog.add_condiments(ketchup, mustard)
    return hotdog

If dispense_bun returns an Optional this code will not typecheck. It will complain with the following error:

code_examples/chapter4/invalid/hotdog_invalid.py:27: error: Item "None" of "Optional[Bun]" has no attribute "place_hotdog"
Warning

Depending on your typechecker, you may specifically need to enable an option to catch these sort of errors. Always look through your typechecker’s documentation to learn what options are available. If there is an error you absolutely want to catch, you should test that your typechecker does indeed catch the error (I highly recommend testing out Optionals specifically. For the version of mypy I am running, I have to use --strict-optional as a command line flag to catch this error.

If you are interested in silencing the typechecker, you need to check for None explicitly and handle the None value, or assert that the value cannot be None. The following code typechecks successfully.

def create_hot_dog():
    bun = dispense_bun()
    if bun is None:
        print("Bun could not be dispensed")
        return
    hotdog = dispense_hotdog()
    hotdog.place_in_bun(bun)
    ketchup = dispense_ketchup()
    mustard = dispense_mustard()
    hotdog.add_condiments(ketchup, mustard)
    return hotdog

None values truly are a billion dollar mistake. If they slip through, programs can crash, users are frustrated, and money is lost. Use Optional types to tell other developers to beware of None, and benefit from the automated checking of your tools.

Discussion Topic

How often do developers deal with None in your codebase? How confident are you that every possible None value is handled correctly? Look through bugs and failing tests to see how many times you’ve been bitten by incorrect None handling. Discuss how Optional types will help your codebase.

Union Types

Optional types are great, but what if you want to communicate error messages with specific information attached?

def dispense_hotdog() -> HotDog:
    if not are_ingredients_available():
        throw RuntimeError("Not all ingredients available")
    if order_interrupted():
        throw RuntimeError("Order interrupted")
    return create_hot_dog()

If I convert this code to use Optional, I lose information: the error messages are no longer returned. In these situations. Instead of an Optional, I can instead use a Union. Unions are versatile. If Optional lets you choose between a type and None, Union allows you to choose between any two types.

In the example above, I choose to return a HotDog or a string instead of throwing an exception.

from typing import Union
def dispense_hotdog() -> Union[HotDog, str]:
    if not are_ingredients_available():
        return "Not all ingredients available"
    if order_interrupted():
        return "Order interrupted"
    return create_hot_dog()
Note

Optional is just a specialized version of a Union. Optional[int] is the same exact thing as Union[int, None].

Unions can be used for more than error handling as well. If you can return more than one type, you can indicate that with a Union as well. Suppose you want your hot dog stand to get into the lucrative pretzel business too. Instead of trying to deal with weird class inheritance (we’ll cover more about inheritance in Part 2) that don’t belong between hot dogs and pretzels, you simply can return a Union of the two (plus a string for catching errors).

from typing import Union
def dispense_snack(user_input: str) -> Union[HotDog, Pretzel, str]:
    if user_input == "Hotdog":
        return dispense_hotdog()
    elif user_input == "Pretzel":
        return dispense_pretzel()
    return "ERROR: Invalid User Input"

You will find Unions very useful in a variety of situations:

  • Handling disparate types returned based on user input (as above)

  • Handling error return types a la Optionals, but with more information, such as a string or error code.

  • Handling different user input (such as if a user is able to supply a list or a string.)

  • Returning different types, say for backwards compatibility (returning an old version of an object or a new version of an object depending on requested operation)

  • And any other case where you may legitimately have more than one value represented.

Using a Union offers much of the same benefit as an Optional. First, you reap the same communication advantages. A developer encountering a Union knows that they must be able to handle more than one type in their calling code. Furthermore, a typechecker is just as aware of Union as it is Optional.

Suppose you had code that called the dispense_snack function but were only expecting a Hotdog or a string to be returned:

from typing import Optional, Union
def place_order() -> Optional[HotDog]:
    order = get_order()
    result = dispense_snack(order.name)
    if isinstance(result, str):
        print("An error occurred" + result)
        return None
    # Return our HotDog
    return result

As soon as dispense_snack starts returning Pretzels, this code fails to typecheck.

code_examples/chapter4/invalid/union_hotdog.py:22: error: Incompatible return value type (got "Union[HotDog, Pretzel]", expected "Optional[HotDog]")

The fact that your typechecker errors out in this case is fantastic. If any function you depend on changes to return a new type, their return signature must be updated to Union a new type, which forces you to update your code to handle the new type. This means that your code will be flagged when your depenendencies change in a way that contradicts your assumptions. With the decisions you make today, you can catch errors in the future. This is the mark of robust code; you are making it increasingly harder for developers to make mistakes, which reduces their error rates, which reduces the number of bugs users will experience.

There is one more fundamental benefit of using a Union, but to explain it, I need to teach you a smidge of type theory, which is a branch of mathematics around type systems.

Product and Sum Types

Unions are beneficial because they help constrain representable state space. Representable state space is the set of all possible combinations an object can take.

Take this dataclass:

from dataclasses import dataclass
from typing import Set
# If you aren't familiar with dataclasses, you'll learn more in chapter 10
# but for now, treat this as four fields grouped together and what types they are
@dataclass
class Snack:
    name: str
    condiments: Set[str]
    error_code: int
    disposed_of: bool


Snack("Hotdog", {"Mustard", "Ketchup"}, 5, False)

I have a name, the condiments that can go on top, an error code in case something goes wrong, and if something does go wrong, a boolean to track if I have disposed of the item correctly or not. How many different combinations of values can be put into this dictionary? Potentially infinite right? The name alone could be anything from valid values (“hotdog”, “pretzel”) to invalid values (“samosa”, “kimchi”, “poutine”) to absurd (“12345”, “”, “(╯°□°)╯︵ ┻━┻”). Condiments has a similar problem. As it stands, there is no way to compute the possible options.

For the sake of simplicity, I will artificially constrain this type:

  • The name can be one of three values: hotdog, pretzel or veggie burger

  • The condiments can be empty, mustard, ketchup or both

  • There are 6 error codes (0-6) (0 indicates success)

  • disposed_of is only true or false

Now how many different values can be represented in this combination of fields? The answer is 144, which is a grossly large number. I achieve this by the following:

3 possible types for name * 4 possible types for condiments * 6 error codes * 2 boolean values for if the entry has been disposed of = 3*4*6*2 = 144. If you were to accept that any of these values could be None, the total balloons to 280. While you should always think about None while coding (see earlier in this chapter about Optional), for this thought exercise, I’m going to ignore None values.

This sort of operation is known as a product type; the number of representable states is determined by the product of possible values. The problem is, not all of these states are valid. The variable disposed_of should only be set to True if an error code is set to non-zero. Developers will make this assumption, and trust that the illegal state never shows up. However, one innocent mistake can bring your whole system crashing to a halt. Consider the following code:

def serve(snack):
    # if something went wrong, return early
    if snack.disposed_of:
        return
    # ...

In this case, a developer is checking disposed_of without checking for the non-zero error code first. This is a logic bomb waiting to happen. This code will work completely fine as long as disposed_of is true and the error code is non-zero. If a valid snack ever sets the disposed_of flag to True erroneously, this code will start producing invalid results. This can be hard to find, as there’s no reason for a developer who is creating the snack to check this code. As it stands, you have no way of catching this sort of error other than manually inspecting every use case, which is intractable for large code bases. By allowing an illegal state to be representable, you open the door to fragile code.

To rememdy this, I need to make this illegal state unrepresentable. To do that, I’ll rework my example and use a Union:

from dataclasses import dataclass
from typing import Union, Set
@dataclass
class Error:
    error_code: int
    disposed_of: bool

@dataclass
class Snack:
    name: str
    condiments: Set[str]

snack: Union[Snack, Error] = Snack("Hotdog", {"Mustard", "Ketchup"})

snack = Error(5, True)

In this case, snack can be either a Snack (which is just a name and condiments) or an Error (which is just a number and a boolean). With the use of a Union, how many representable states are there now?

For Snack, there are 3 names and 4 possible list values, which is a total of 12 representable states. For ErrorCode, I can remove the 0 error code (since that was only for success) which gives me 5 values for the error code and 2 values for the boolean for a total of 10 representable states. Since the Union is an either/or construct, I can either have 12 representable states in one case or 10 in the other, for a total of 22. This is an example of a sum type, since I’m adding the number of representable states together rather than multiplying.

22 total representable states. Compare that with the 144 states when all the fields were lumped in a single entity. I’ve reduced my representable state space by almost 85%. I’ve made it impossible to mix and match fields that are incompatible with each other. It becomes much harder to make a mistake, and there are far fewer combinations to test. Anytime you use a sum type, such as a Union, you are dramatically decreasing the number of possible representable states.

Literal Types

When calculating the number of representable states, I made some assumptions in the last section. I limited the number of values that were possible, but that’s a bit of a cheat, isn’t it? As I said before, there are almost an infinite number of values possible. Fortunately, there is a way to limit the values through Python: Literals. Literal types allow you to restrict the variable to a very specific set of values.

I’ll change my earlier Snack class to employ Literal values:

from typing import Literal, Set
@dataclass
class Error:
    error_code: Literal[1,2,3,4,5]
    disposed_of: bool

@dataclass
class Snack:
    name: Literal["Pretzel", "Hotdog"]
    condiments: Set[Literal["Mustard", "Ketchup"]]

Now, if I try to instantiate these dataclasses with wrong values:

Error(0, False)
Snack("Not Valid", set())
Snack("Pretzel", {"Mustard", "Relish"})

I receive the following typechecker errors:

code_examples/chapter4/invalid/literals.py:14: error: Argument 1 to "Error" has incompatible type "Literal[0]"; expected "Union[Literal[1], Literal[2], Literal[3], Literal[4], Literal[5]]"

code_examples/chapter4/invalid/literals.py:15: error: Argument 1 to "Snack" has incompatible type "Literal['Not Valid']"; expected "Union[Literal['Pretzel'], Literal['Hotdog']]"

code_examples/chapter4/invalid/literals.py:16: error: Argument 2 to <set> has incompatible type "Literal['Relish']"; expected "Union[Literal['Mustard'], Literal['Ketchup']]"

Literals were introduced in Python 3.8, and they are an invaluable way of restricting possible values of a variable.

Annotated Types

What if I wanted to get even deeper and specify more complex constraints? It would be tedious to write hundreds of literals, and some constraints aren’t able to be modelled by Literal types. There’s no way with a literal to constrain a string to a certain size or to match a specific regex. This is where Annotated comes in. With Annotated, you can specify arbitrary metadata alongside your type annotation.

x: Annotated[int, ValueRange(3,5)]
y: Annotated[str, MatchesRegex('[0-9]{4}')]

Unfortunately, the above code will not run, as ValueRange and MatchesRegex are not built-in types; they are arbitrary expressions. You will need to write your own metadata as part of an Annotated variable. Secondly, there are no tools that will typecheck this for you. The best you can do until such a tool exists is write dummy annotations or use strings to describe your constraints. At this point, Annotated is best served as a communication method.

NewType

While waiting for tooling to support Annotated, there is another way to represent more complicated constraints: NewType. NewType allows you to, well, create a new type.

Suppose I want to separate my hot-dog stand code to handle two separate cases : a hotdog in its unprepared form, and a hotdog that is ready to be served (a prepared hotdog). However, some functions should only be operating on the hot dog in one case or the other.

For example:

  • An unprepared hot dog needs to be put into a bun and can have condiments dispensed on top of it.

  • A prepared hot dog needs to be put on a plate, given napkins, and served to a customer.

For example, our plating function might look something like this:

class HotDog:
    # ... snip hot dog class implementation ...

def place_on_plate(hotdog: HotDog):
    # note, this should only accept prepared hot dogs.
    # ...

However, nothing in the language prevents us from passing in an unprepared hot dog. If a developer makes a mistake and passes an unprepared hot dog to this function, customers will be quite surprised to just see their order with no bun or condiments come out of the machine.

Rather than relying on developers to catch these errors whenever they happen, you need a way for your typechecker to catch this. To do that, you can use NewType

from typing import NewType

class HotDog:
    ''' Used to represent an unprepared hot dog'''
    # ... snip hot dog class implementation ...

PreparedHotDog = NewType(HotDog)

def place_on_plate(hotdog: PreparedHotDog):
    # ...

A NewType takes an existing type and creates a brand new type that has all the same fields and methods as the existing type. In this case, I am creating a type PreparedHotDog that is distinct from HotDog; they are not interchangeable. What’s beautiful about this is that this type restricts implicit type conversions. You cannot use a HotDog anywhere you are expecting a PreparedHotDog (you can use a PreparedHotDog in place of HotDog, though). In the above example, I am restricting place_on_plate to only take PreparedHotDog values as an argument. This prevents developers from invalidating assumptions. If a developer were to pass a HotDog to this method, the typechecker will yell at them:

code_examples/chapter4/invalid/newtype.py:10: error: Argument 1 to "place_on_plate" has incompatible type "HotDog"; expected "PreparedHotDog"

It is important to stress the one-way nature of this type conversion. As a developer, you can control when your old type becomes your new type.

For example, I’ll modify a function from earlier in the chapter:

def create_hot_dog() -> PreparedHotDog:
    bun = dispense_bun()
    if bun is None:
        print("Bun could not be dispensed")
        return
    hotdog = dispense_hotdog()
    hotdog.place_in_bun(bun)
    ketchup = dispense_ketchup()
    mustard = dispense_mustard()
    hotdog.add_condiments(ketchup, mustard)
    return PreparedHotDog(hotdog)

Notice how I’m explicitly returning a PreparedHotDog instead of a normal hotdog. This acts as a “blessed” function; it is the only sanctioned way that I want developers to create a PreparedHotDog. Any user trying to use a method that takes a PreparedHotDog needs to create a hot dog using create_hot_dog first.

It is important to notify users that the only way to create your new type is through a set of “blessed” functions. You don’t want users creating your new type in any circumstance other than a predetermined method, as that defeats the purpose.

def make_snack():
    place_on_place(PreparedHotDog(HotDog))

Unfortunately, Python has no great way of telling users this, other than a comment.

from typing import NewType
# NOTE: Only create PreparedHotDog using create_hot_dog method.
PreparedHotDog = NewType(HotDog)

Still, NewType is applicable to many real-world scenarios. For example, these are all scenarios that I’ve run into that a NewType would solve.

  • Separating a str from a SanitizedString, to catch bugs like SQL injection vulnerabilities. By making SanitizedString a NewType, I made sure that only properly sanitized strings were operated upon, eliminating the chance of SQL injection.

  • Tracking a User object and LoggedInUser separately. By restricting Users with NewType from LoggedInUser, I wrote functions that only applicable to users that were logged in.

  • Tracking an integer that should represent a valid User ID. By restricting the User ID to a NewType, I could make sure that some functions were only operating on IDs that were valid, without having to check if statements.

In another chapter, you’ll see how you can use classes and invariants to do something very similar, with a much stronger guarantee of avoiding illegal states. However, NewType is still a useful pattern to be aware of, and is much more lightweight than a full-blown class.

Final Types

Finally (pun intended), you may want to restrict a type from changing it’s value. That’s where Final comes in. Final, introduced in Python 3.8, indicates to a typechecker that a variable cannot be bound to another value. For instance, I want to start franchising out my hot dog stand, but I don’t want the name to be changed by accident.

VENDOR_NAME: Final = "Viafore's Auto-Dog"

If a developer accidentally changed the name later on, they would see an error.

def display_vendor_information():
    vendor_info = "Auto-Dog v1.0"
    # whoops, copy-paste error, this code should be vendor_info += VENDOR_NAME
    VENDOR_NAME += VENDOR_NAME
    print(vendor_info)
code_examples/chapter4/invalid/final.py:3: error: Cannot assign to final name "VENDOR_NAME"
Found 1 error in 1 file (checked 1 source file)

In general, Final is best used when the variable’s scope spans a large amount of code, such as a module. It is difficult for developers to keep track of all the uses of a variable in such large scopes; letting the typechecker catch immutability guarantees is a boon in these cases.

Warning

Final will not error out when mutating an object through a function. It only prevents the variable from being rebound (set to a new value)

Wrap-up

You’ve learned about many different ways to constrain your types in this chapter. All of them serve a specific purpose, from handling None with Optional to restricting to specific values with Literal to preventing a varible from being rebound with Final. By using these techniques, you’ll be able to encode assumptions and restrictions directly into your codebase, preventing future readers from needing to guess about your logic. Typecheckers will use these advanced type annotations to provide you with stricter guarantees about your code, which will give maintainers confidence when working in your codebase. With this confidence, they will make less mistakes, and your codebase will become more robust because of it.

In the next chapter, you’ll move on from type annotating single values, and learn how to properly annotate collection types. Collection types pervade most of Python; you must take care to express your intentions for them as well. You need to be well versed in all the ways you can represent a collection, including in cases where you must create your own.

1 Null References: The Billion Dollar Mistake (QCon London 2009)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.119.148