Chapter 2. Introduction to Python Types

Welcome to Part 1, where I will focus on types in Python. Type model behavior of your program. Beginner programmers understand that there are different types in Python, such as float or string. But what is a type? How does mastering types make your codebase stronger? Types are a fundamental underpinning of any programming language, but, unfortunately, most introductory texts gloss over just how types benefit your codebase (or if misused, those same types increase complexity).

Tell me if you’ve seen this before:

>>>type(3.14)
<class 'float'>

>>>type("This is another boring example")
<class 'str'>

>>> type(["Even", "more", "boring", "examples"])
<class 'list'>

This could be pulled from almost any beginner’s guide to Python. You learn about ints, strings, floats, bools, and all sorts of things the language offers. And then, boom, you move on, because let’s face it, this Python is not flashy. You want to dive into the cool stuff, like functions and loops and dictionaries, and I don’t blame you. But it’s a shame that many tutorials never revisit types and give them their proper due. As users dig deeper, they may discover type annotations (which I cover in the next chapter) or start writing classes, but often miss out on the fundamental discussion about when to use types appropriately.

That’s where I’ll start. To write maintainable Python, you must be aware of the nature of types and be deliberate about using them. I’ll start by talking about what a type actually is and why that matters. I’ll then move on to how the Python language’s decisions about its type system affects the robustness of your codebase.

What’s In a Type?

I want you to pause and answer a question: Without mentioning numbers, strings, text, or booleans, how would you explain what a type is?

It’s not a simple answer for everyone. It’s even harder to explain what the benefits are, especially in a language like Python where you do not have to explicitly declare types of variables.

I consider a type to have a very simple definition: a communication method. Types convey information. They provide a representation that users and computers can reason about. I break the representation down into two different facets:

Mechanical Representation

Types communicate behaviors and constraints to the Python language itself

Semantic Representation

Types communicate behaviors and constraints to other developers

Let’s go learn a little more about each representation:

Mechanical Representation

At its core, computers are all about binary code. Your processor doesn’t speak Python, all it sees are the presence or absence of electrical current on circuits going through it. Same goes for what’s in your computer memory.

Suppose your memory looked like the following

0011001010001001000101001001000100100010000010101001010101010100000011111111001001010011111010010010100100010010100*010100000100000101010100*101001001001000101010001010010010101010010010010010000111101010110101101001010111

Looks like a bunch of gibberish. Let’s zoom in on the part that I’ve bolded:

*01010000 01000001 01010100*

There is no way to tell exactly what this number means by itself. Depending on computer architecture it is plausible that this could represent the number 5259604 or 5521744. It could also be the string “PAT”. Without any sort of context, you can’t know for certain. This is why computers need types. Type information gives Python what it needs to know to make sense of all the ones and zeroes. Let’s see it in action:

from ctypes import string_at
from sys import getsizeof
from binascii import hexlify

a = 0b01010000_01000001_01010100
print(a)
>>> 5259604

# prints out the memory of the variable
print(hexlify(string_at(id(a), getsizeof(a))))
>>> b'0100000000000000607c054995550000010000000000000054415000'

text = "PAT"
print(hexlify(string_at(id(text), getsizeof(text))))
>>>b'0100000000000000a00f0649955500000300000000000000375c9f1f02acdbe4e5379218b77f0000000000000000000050415400
Note

I am running CPython 3.9.0 on a little-endian machine, so if you see different results, don’t worry, there are subtle things that can change your answers. (This code is not guaranteed to run on other Python implementations such as Jython or PyPy).

These hex-strings display the actual memory of a Python object. You’ll find pointers to the next and previous object in a linked list (for garbage collection purposes), a reference count, a type, and the actual data itself. You can see the bytes at the end of each returned value to see the number or string (look for the bytes 0x544150 or 0x504154). The important part of this is that there is a type encoded into that memory. When Python looks at a variable, it knows exactly what type everything is at runtime (such as when you use the type() function.)

It’s easy to think that this is the only reason for types - the computer needs to know how to interpret various blobs of memory. It is important to be aware of how Python uses types, as it has some implications for writing robust code, but even more important is the second representation: semantic representation.

Semantic Representation

While the first definition of types is great for lower-level programming, it’s the second definition that applies to every developer. Types, in addition to having a mechanical representation, also manifest a semantic representation. A semantic representation is a communication tool; the types you choose communicate information across time and space to a future developer.

Types tell a user what behaviors they can expect about that entity. These behaviors are the operations that you associate with that type (plus any pre-conditions or post-conditions). They are the boundaries, constraints, and freedoms that a user interacts with whenever they use that type. Types used correctly have low barriers to understanding; they become natural to use. Conversely, types used poorly are a hindrance.

Consider the lowly int. Take a minute to think about what behaviors an integer has in Python. Here’s a quick (non-comprehensive) list I came up with:

  • Constructible from integers, floats, or strings

  • Mathematical operations such as addition, subtraction, division, multiplication, exponentiation and negation

  • Relational comparison such as <, >, ==, and !=

  • Bitwise operations (manipulating individual bits of a number) such as &, |, ^, ~, and shifting

  • Convertible to a string using str or repr functions

  • Able to be rounded through ceil, floor, round methods (even though these return the integer itself, these are supported methods).

An int has many behaviors. You can view the full list if you if you type help(int) into your REPL.

Now consider a datetime:

>>>import datetime
>>>datetime.datetime.now()
datetime.datetime(2020, 9, 8, 22, 19, 28, 838667)

A datetime is not that different from an int. Typically it’s represented as a number of seconds or milliseconds from some epoch of time (such as January 1st, 1970). But think about the behaviors a datetime has:

  • Constructible from a string, or a set of integers representing day/month/year/etc

  • Mathematical Operations such as addition and subtraction of Time Deltas

  • Relational comparison

  • No bitwise operations available

  • Convertible to a string using str or repr functions

  • Is not able to be rounded through ceil, floor, round methods

Datetimes support addition and subtraction, but not of other datetimes. We only add time deltas (such as adding a day or subtracting a year). Multiplying and dividing really don’t make sense for a datetime. Similarly, rounding dates is not a supported operation in the standard library. However, datetimes do offer comparison and string formatting operations with similar semantics to an integer. So even though datetime is at heart an integer, it contains a constrained subset of operations.

Note

Semantics refers to the meaning of an operation. While str(int) and str(datetime.datetime.now()) will return different formatted strings, the meaning is the same: I am creating a string from a value.

Datetimes also support their own behaviors, to further distinguish them from integers. These include

  • Changing values based on time zones

  • Being able to control the format of strings

  • Finding what weekday it is

Again, if you’d like a full list of behaviors, type import datetime; help(datetime.datetime) into your REPL.

Datetimes are more specific than an integer. They convey a more specific use case than just a plain old number. When you choose to use a more specific type, you are telling future contributors that there are operations that are possible and constraints to be aware of that aren’t present in the less specific type.

Let’s dive into hhow this ties into robust code. Say you inherit a codebase that handles the opening and closing of an completely automated kitchen. You need to add in functionality to extend a kitchen’s hours on holidays.

def close_kitchen_if_past_cutoff_time(point_in_time):
    if point_in_time >= closing_time():
        close_kitchen()
        log_time_closed(point_in_time)

You know you need to be operating on point_in_time, but how do you get started? What type are you even dealing with? Is it a string, integer, datetime, or some custom class? What operations are you allowed to perform on point_in_time? You didn’t write this code, and you have no history with it. The same problems exist if you want to call the code as well. You have no idea what is legal to pass into this function.

If you make a wrong assumption one way or the other, and that code makes it to production, you’ve made the code less robust. Maybe that code doesn’t lie on a codepath that is executed often. Maybe some other bug is hiding this code from being run. Maybe there aren’t a whole lot of tests around this piece of code, and it becomes a runtime error later on. No matter what, there is a bug lurking in the code, and you’ve decreased maintainability.

Responsible developers do their best to not have bugs hit production. They will search for tests, documentation (with a grain of salt, of course — documentation can go out of date quickly), or calling code. They will look at closing_time() and log_time_closed() to see what types they expect or provide, and plan accordingly. This is a correct path in this case, but I still consider it a suboptimal path. While an error won’t reach production, they are still expending time in looking through the code, which prevents value from being delivered as quickly. With such a small example, you would be forgiven for thinking that this isn’t that big a problem if it happens once. But beware of “death by a thousand cuts”: any one slice isn’t too detrimental on its own, but thousands piled up and strewn across a codebase will leave you limping along, trying to deliver code.

The root cause is that the semantic representation was not clear for the parameter. So as you write code, do what you can to express your intent through types. You can do it as a comment where needed, but I recommend using type annotations (supported in Python 3.5+) to explain parts of your code.

def close_kitchen_if_past_cutoff_time(point_in_time: datetime.datetime):
    if point_in_time >= closing_time():
        close_kitchen()
        log_time_closed(point_in_time)

All I need to do is put in a : <type> after my variables in the function signature. Most of my code examples in this book will annotate the types to make it clear what type I’m expecting.

Now, as developers come across this code, they will know what’s expected of point_in_time. They don’t have to look through other methods, tests or documentation to know how to manipulate the variable. They have a crystal-clear clue on what to do, and they can get right to work performing the modifications they need to do. You are conveying semantic representation to future developers, without ever directly talking to them.

Furthermore, as developers use a type more and more, they become familiar with it. They won’t need to look up documentation or help() to use that type when they come across it. You begin to create a vocabulary of well-known types across your codebase. This lessens the burden of maintenance. When a developer is modifying existing code, they want to focus on the changes they need to make, without getting bogged down.

Semantic representation of a type is extremely important, and the rest of Part 1 of this book will be dedicated to covering how you can use types to your advantage. Before I move on though, I need to talk about some fundamental choices Python has made as a language, and how they impact codebase robustness.

Discussion Topic

Think about types used in your codebase. Pick a few and ask yourself what their semantic representations are. Enumerate their constraints, use cases and behaviors. Could you be using these types in more places? Are there places where you are misusing types?

Typing Systems

As discussed earlier in the chapter, a type system aims to give a user some way to model the behaviors and constraints in the language. Programming languages set expectations about how their specific type system works, both during code construction and runtime.

Strong vs. Weak

Typing systems are classified on a spectrum from weak to strong.

Languages towards the stronger side of the spectrum tend to restrict the use of operations to the types that support them. In other words, if you break the semantic representation of the type, you are told (sometimes quite loudly) through a compiler error or a runtime error. Languages such as Haskell, TypeScript, Rust are all considered strongly typed. Proponents advocate strongly typed languages because errors are more apparent when building or running code.

In contrast, languages towards the weaker side of the spectrum will not restrict the use of operations to the types that support them. Types are often coerced into a different type to make sense of an operation. Languages such as JavaScript, Perl, and older versions of C are weakly typed. Proponents advocate the speed in which it takes to quickly iterate on code, without fighting language along the way.

Python falls towards the stronger side of the spectrum. There are very few implicit conversions that happen between types. It is noticeable when you perform illegal operations:

>>>[] + {}
TypeError: can only concatenate list (not "dict") to list

>>> {} + []
TypeError: unsupported operand type(s) for +: 'dict' and list

Contrast that with a weakly typed language, such as JavaScript:

>>> [] + {}
"[object Object]"

>>> {} + []
0

In terms of robustness, a strongly typed language such as Python certainly helps us out. While errors still will show up at runtime instead of at development time, they still will show up in a very obvious TypeError exception. This reduces the time taken to debug issues significantly, again allowing you to deliver incremental value quicker.

Dynamic vs. Static

There is another typing spectrum I need to discuss: static vs dynamic typing. This is fundamentally a difference in handling mechanical representation of types.

Languages that offer static typing embed their typing information in variables during build time. Developers may explicitly add type information to variables or some tool such as a compiler infers types for the developer. Variables do not change their type at runtime (hence the name static.) Proponents of static typing tout the ability to write safe code out of the gate and to benefit from a strong safety net.

Dynamic typing, on the other hand, embeds type information with the value or variable itself. Variables can change types at runtime quite easily, because there is no type information tied to that variable. Proponents of dynamic typing advocate the flexibility and speed that it takes to develop; there’s nowhere near as much fighting with compilers.

Python is a dynamically typed language. As you saw during the discussion about mechanical representation, you saw that there was type information embedded inside the values of a variable. Python has no qualms about changing the type of a variable at runtime:

>>>a = 5
>>>a = "string"
>>>a
"string"

>>>a = tuple()
()

Unfortunately, the ability to change types at runtime is a hindrance to robust code in many cases. You cannot make strong assumptions about a variable throughout its lifetime. As assumptions are broken, it’s easy to write unstable assumptions on top of them, leading to a ticking logic bomb in your code.

To make things worse, the type annotations I showed earlier have no effect on this behavior at runtime:

>>>a: int = 5
>>>a = "string"
>>>a
"string"

No errors, no warnings, no anything. But hope is not lost, and you have plenty of strategies to make code more robust. (Otherwise, this would be quite the short book). We will discuss one last thing as a contributor to robust code, and then start diving into the meat of improving our codebase

Duck Typing

I feel like it is some unwritten law that whenever someone mentions duck typing, someone must reply with:

_If it walks like a duck and it quacks like a duck, then it must be a duck._

My problem with this saying is that I find it completely unhelpful for explaining whta duck typing actually is. It’s catchy, concise, and crucially, only comprehensible to those who already understand duck typing. When I was younger, I just nodded politely, afraid that I was missing something profound by this simple phrase. It wasn’t until later on that I truly understood the power of duc typing.

Duck typing is the ability to use objects and entities in a programming language as long as it adheres to some interface. It is a wonderful thing in Python, and most people use it without even knowing it. Let’s look at a simple example to illustrate what I’m talking about.

def print_items(items):
    for item in items:
        print(item)

print_items([1,2,3])
print_items({4, 5, 6})
print_items({"A": 1, "B": 2, "C": 3})

In all three invocations of print_items, we loop through the collection and print each item. Think about how this works. print_items has absolutely no knowledge of what type it will receive. It just receives a type at run-time and operates upon it. It’s not introspecting each argument and deciding to do different things based on the type. The truth is much simpler. Instead, all print_items is doing is checking that whatever is passed in can be iterated upon (by calling an __iter__ method). If the attribute __iter__ exists, it’s called and the returned iterator is looped over.

We can verify this with a simple code example:

>>>for x in 5:
>>>    print(x)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable

Duck typing is what makes this possible. As long as a type supports the variables and methods expected by a function (based on what’s actually used), you can use that type in that function freely.

Here’s another example:

>>>def double_value(value):
>>>    return value + value

>>>double_value(5)
10

>>>double_value("abc")
"abcabc"

It doesn’t matter that we’re passing an integer in one place or a string in another; both support the + operator, so either will work just fine. Any object that supports the + operator can be passed in. We can even do it with a list:

>>>double_value([1, 2, 3])
[1, 2, 3, 1, 2, 3]

So how does this play into robustness? It turns out that duck typing is a double-edged sword. It can increase robustness because it increases composability (we’ll learn more about composability in a later chapter). Building up a library of solid abstractions able to handle a multitude of types lessens the need for complex special cases. However, if duck typing is overused, you start to break down assumptions that a developer can rely upon. When updating code, it’s not simple enough to just make the changes; you must look at all calling code and make sure that the types passed into your function satisfy your new changes as well.

With all this in mind, it might be best to reword the idiom at the beginning of this chapter as such:

If it walks like a duck, and quacks like a duck, and you are looking for things that walk and quack like ducks, then you can treat it as if it were a duck

Doesn’t quite roll off the tongue as well, does it?

Discussion Topic

Do you use duck typing in your codebase? Are there places where you can pass in types that don’t match what the code is looking for, but things still work? Do you think these increase or decrease robustness for your use cases?

Wrap-up

Types are a pillar of clean, maintainable code and serve as a communication tool to other developers. If you take care with types, you communicate a great deal, creating less burden for future maintainers. The rest of Part 1 will show you how to use types to enhance a codebase’s robustness.

Remember, Python is dynamically and strongly typed. The strongly typed nature will be a boon for us; Python will notify us about errors when we use incompatible types. But its dynamically typed nature is something we will have to overcome in order to write better types. These language choices shape how Python code is written, and we’ll be referring back often to them throughout the book.

In the next chapter, we’re going to talk about type annotations, which is how we can be explicit about the type we use. Type annotations serve a crucial role: our primary commmunication method of behvaiors to future developers. They help overcome the limitations of a dynamically-typed language and allow you to enforce intentions throughout a codebase.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.34.191