I learned a painful lesson that for small programs, dynamic typing is great. For large programs you need a more disciplined approach. And it helps if the language gives you that discipline rather than telling you “Well, you can do whatever you want”.1
Guido van Rossum
In 2006, PEP 3107 introduced the function annotation syntax for Python 3.0.
For nine years, that syntactic feature had no standard meaning, to allow for experimentation.
The Python community tried different ways of using them,
and coverged to PEP 484 — Type Hints,
giving a very specific meaning to the annotations, supported by the typing
module since Python 3.5.
PEP 484 introduced a gradual type system to Python. Besides Microsoft’s TypeScript, other languages with gradual type systems are Dart (the language of the Flutter SDK, created by Google), and Hack (a dialect of PHP supported by Facebook’s HHVM virtual machine). The Mypy type checker itself started as a language: a gradually typed dialect of Python with its own interpreter. Guido van Rossum convinced the creator of Mypy, Jukka Lehtosalo, to make it a tool for checking annotated Python code.
The best usability feature of gradual typing is that annotations are always optional. With static type systems, most type constraints are easy to express, many are cumbersome, some are hard, and a few are impossible.2 You may very well write an excellent piece of Python code, with good test coverage and passing tests, but still be unable to add type hints that satisfy a type checker. That’s ok, just leave out the problematic type hints and ship it!
This chapter focuses on Python’s type hints in function signatures.
[Link to Come] explores type hints in the context of classes,
and other typing
module features.
The major topics in this chapter are:
A hands-on introduction to gradual typing with Mypy.
The complementary perspectives of duck typing and nominal typing.
Overview of the main categories of types that can appear in annotations—this is about 60% of the chapter.
Function signature overloading.
Type hinting variadic parameters (*args
, **kwargs
).
Runtime access to annotations.
Most of the content of this chapter is new. Type hints appeared in Python 3.5 after I wrapped up the first edition of Fluent Python.
In addition to the new content, I moved section “Reading annotations at runtime” from Chapter 7
because holding type hints is now the only recommended use for __annotations__
.
Now let’s review the essence of gradual typing, then see it in practice through an example.
A gradual type system:
By default, the type checker should not emit warnings for code that has no type hints.
Instead, if a type cannot be inferred, the type checker assumes the Any
type, which is consistent with all other types.
Type hints are used by static type checkers, linters, and IDEs to raise warnings; they do not prevent inconsistent values to be passed to functions or assigned to variables at runtime.
Type annotations provide data that can help generating optimized byte code or machine code, but such optimizations are not implemented in any Python runtime that I am aware in early 2020.3
Let’s see how gradual typing works in practice, starting with a simple function and gradually adding type hints to it, guided by Mypy.
There are several Python type checkers compatible with PEP 484, including Google’s Pytype, Microsoft’s Pyright, Facebook’s Pyre—in addition to type checkers embedded in IDEs such as PyCharm. I picked Mypy for the examples because it’s the best known, by far. However, one of the others may be a better fit for some projects or teams. Pytype, for example, is designed to handle code bases with no type hints and still provide useful advice. It is more lenient than Mypy, and can also generate annotations for your code.
We will annotate a show_count
function that returns a string with a count and
a singular or plural word, depending on the count:
>>>
show_count
(
99
,
'bird'
)
'99 birds'
>>>
show_count
(
1
,
'bird'
)
'1 bird'
>>>
show_count
(
0
,
'bird'
)
'no bird'
Example 8-1 shows the source code of show_count
, without annotations.
show_count
from messages.py
without type hints.def
show_count
(
count
,
word
):
if
count
==
0
:
return
f
'no {word}'
elif
count
==
1
:
return
f
'{count} {word}'
return
f
'{count} {word}s'
To begin type checking, I run the mypy
command on the messages.py
module:
…
/
no_hints
/
$
pip
install
mypy
[
lots
of
messages
omitted
...
]
…
/
no_hints
/
$
mypy
messages
.
py
Success
:
no
issues
found
in
1
source
file
Mypy finds no problem with the code in Example 8-2 when run with its default settings.4
If a function signature has no annotations, Mypy ignores it by default. That’s the spirit of gradual typing.
For this example, I also have pytest
unit tests.
This is the code in messages_test.py
.
messages_test.py
without type hints.from
pytest
import
mark
from
messages
import
show_count
@mark.parametrize
(
'qty, expected'
,
[
(
1
,
'1 part'
),
(
2
,
'2 parts'
),
])
def
test_show_count
(
qty
,
expected
):
got
=
show_count
(
qty
,
'part'
)
assert
got
==
expected
def
test_show_count_zero
():
got
=
show_count
(
0
,
'part'
)
assert
got
==
'no part'
Let’s check messages_test.py
:
$
mypy
messages_test
.
py
messages_test
.
py
:
3
:
error
:
Skipping
analyzing
'pytest'
:
found
module
but
no
type
hints
or
library
stubs
messages_test
.
py
:
3
:
note
:
See
https
:
//
mypy
.
readthedocs
.
io
/
en
/
latest
/
running_mypy
.
html
#missing-imports
Found
1
error
in
1
file
(
checked
1
source
file
)
The problem is that messages_test
imports pytest
, which doesn’t have type hints as I write
this.5
I add this comment to the import
line to make Mypy ignore pytest
:
from
pytest
import
mark
# type: ignore
Now Mypy doesn’t report any issues with messages_test.py
.
The command-line option --disallow-untyped-defs
makes Mypy flag any
function definition that does not have type hints for all its parameters
and for its return value.
Using --disallow-untyped-defs
on the test file produces three errors and a note:
…
/
no_hints
/
$
mypy
--
disallow
-
untyped
-
defs
messages_test
.
py
messages
.
py
:
14
:
error
:
Function
is
missing
a
type
annotation
messages_test
.
py
:
10
:
error
:
Function
is
missing
a
type
annotation
messages_test
.
py
:
15
:
error
:
Function
is
missing
a
return
type
annotation
messages_test
.
py
:
15
:
note
:
Use
"-> None"
if
function
does
not
return
a
value
Found
3
errors
in
2
files
(
checked
1
source
file
)
For the first steps with gradual typing, I prefer to use another option:
--disallow-incomplete-defs
. Initially, it tells me nothing:
…
/
no_hints
/
$
mypy
--
disallow
-
incomplete
-
defs
messages_test
.
py
Success
:
no
issues
found
in
1
source
file
But now, I may add just the return type to show_count
in messages.py
:
def
show_count
(
count
,
word
)
->
str
:
This is enough to make Mypy look at it. Using the same command line as before
to check messages_test.py
, will lead Mypy to look at messages.py
again:
…
/
no_hints
/
$
mypy
--
disallow
-
incomplete
-
defs
messages_test
.
py
messages
.
py
:
14
:
error
:
Function
is
missing
a
type
annotation
for
one
or
more
arguments
Found
1
error
in
1
file
(
checked
1
source
file
)
Now I can gradually add type hints function by function, without getting warnings about functions that I haven’t annotated. This is a fully annotated signature that satisfies Mypy:
def
show_count
(
count
:
int
,
word
:
str
)
->
str
:
Instead of providing options like --disallow-incomplete-defs
on the command line,
it’s better to create a configuration file as described in the
Mypy configuration file documentation.
You can have global settings and per-module settings.
Here is a good mypy.ini
to get started—which also ignores pytest
:
[mypy] python_version = 3.8 warn_unused_configs = True disallow_incomplete_defs = True [mypy-pytest] ignore_missing_imports = True
The show_count
function—first shown in Example 8-2—has an obvious limitation: it only works with regular nouns.
If the plural can’t be spelled by appending an 's'
, we should let the user provide the plural form, like this:
>>>
show_count
(
3
,
'mouse'
,
'mice'
)
'3 mice'
Let’s do a little “type driven development”.
First we add a test that uses that third argument.
Don’t forget the return type hint -> None
to the test function otherwise Mypy will not check it.
def
test_irregular
()
->
None
:
got
=
show_count
(
2
,
'child'
,
'children'
)
assert
got
==
'2 children'
Mypy detects the error:
…
/
hints_2
/
$
mypy
messages_test
.
py
messages_test
.
py
:
22
:
error
:
Too
many
arguments
for
"show_count"
Found
1
error
in
1
file
(
checked
1
source
file
)
Now I edit show_count
, adding the optional plural
parameter:
showcount
from hints_2/messages.py
with an optional parameter.def
show_count
(
count
:
int
,
singular
:
str
,
plural
:
str
=
''
)
->
str
:
if
count
==
0
:
return
f
'no {singular}'
elif
count
==
1
:
return
f
'1 {singular}'
else
:
if
plural
:
return
f
'{count} {plural}'
else
:
return
f
'{count} {singular}s'
The following details are not mandatory, but are considered good style for type hints:
There should be no space between the parameter name and the :
, and one space after the :
.
There should be one space on each side of the =
that precedes the default parameter value.
The PEP 8 style guide has different recommendations for default parameter values, depending on the use of type hints. Here is an actual example from section Other Recommendations in PEP 8:
def
munge
(
input
:
AnyStr
,
sep
:
AnyStr
=
None
,
limit
=
1000
):
If there is a type hint, there must be spaces around the =
preceding the default value.
If there is no type hint, there should be no spaces.
None
as a defaultIn Example 8-3 the parameter plural
is anotated as str
,
and the default value is ''
, so there is no type conflict.
I like that solution, but in other contexts None
is a better default.
If the optional parameter expects a mutable type,
then None
is the only sensible default—as we saw in
“Mutable Types as Parameter Defaults: Bad Idea”.
To have None
as the default for the plural
parameter,
here is how the signature would look like:
from
typing
import
Optional
def
show_count
(
count
:
int
,
singular
:
str
,
plural
:
Optional
[
str
]
=
None
)
->
str
:
Let’s unpack that:
Optional[str]
means plural
may be a str
or None
.
You must explicitly provide the default value = None
.
If you don’t assign a default value to plural
, the Python runtime will treat it as a required parameter. Remember: at runtime, type hints are ignored.
Note that we need to import Optional
from the typing
module.
When importing types, it’s good practice to use the syntax from typing import X
,
to reduce the length of the function signatures.
Optional
is not a great name, because that annotation does not make the parameter optional.
What makes it optional is assigning a default value to the parameter.
Optional[str]
just means: the type of this parameter may be str
or NoneType
.
In the Haskell and Elm languages, a similar type is named Maybe
.
Companies with large Python 2 codebases have learned that type hints are very helpful when migrating to Python 3. It’s possible to annotate code that will run on Python 2.7 and Python 3.x using special comments described in PEP 484.
This is how the final version of the show_count
signature looks like using
syntax that works in Python 2.7 and 3.x—also supported by Mypy and other type checkers:
from
typing
import
Optional
def
show_count
(
count
,
singular
,
plural
=
None
):
# type: (int, str, Optional[str]) -> str
Note that only the parameter types appear in the comment.
If the parameter list is too long, the signature may be annotated like this:
from
typing
import
Optional
def
show_count
(
count
,
# type: int
singular
,
# type: str
plural
=
None
# type: Optional[str]
):
# type: (...) -> str
The last type comment would be exactly as shown:
the ...
replaces the parameter types already given,
and the -> str
defines the return type.
For more details, see Suggested syntax for Python 2.7 and straddling code in PEP 484.
Another way of making type hints compatible with Python 2.7 and 3.x is to use stub files,
which contain just annotated function and class declarations—much like header files in C and C++.
Mypy, PyCharm, and other type checkers know how to read stub files, and they share the
typeshed project,
a collection of stub files for the Python standard library and
popular external packages like Flask, attr
, requests
, etc.
I will not cover how to create and manage stub files.
If you are interested, see PEP 484 section Stub Files
and PEP 561—Distributing and Packaging Type Information
Now that we’ve had a first practical view of gradual typing, let’s consider what the concept of type means in practice.
There are many definitions of the concept of type in the literature. Here we assume that type is a set of values and a set of functions that one can apply to these values.
PEP 483: The Theory of Type Hints
In practice, it’s more useful to consider the set of supported operations as the defining characteristic of a type.6
For example, from the point of view of applicable operations, what are the valid types for n
in the following function?
def
double
(
n
):
return
n
*
2
The n
parameter type may be numeric (int
, complex
, Fraction
, numpy.uint32
etc.)
but it may also be a sequence (str
, tuple
, list
, array
),
an N-dimensional numpy.array
or any other type that implements or inherits
a __mul__
method that accepts an int
argument.
However, consider this annotated double
.
Please ignore the missing return type for now,
let’s focus on the parameter type:
from
collections
import
abc
def
double
(
n
:
abc
.
Sequence
):
return
n
*
2
A type checker will reject that code.
If you tell Mypy that n
is of type abc.Sequence
, it will flag n * 2
as an error because the
Sequence
ABC
does not implement or inherit the __mul__
method.
At runtime, that code will work with concrete sequences such as str
, tuple
, list
, array
etc.—as
well as numbers, because at runtime the type hints are ignored.
But the type checker only cares about what is explicitly declared, and abc.Sequence
has no __mul__
.
That’s why the title of this section is “Types are defined by operations”.
The Python runtime accepts any object as the n
argument for both versions of the double
function.
The computation n * 2
may work, or it may raise TypeError
if the operation is not supported by n
.
In contrast, Mypy will declare n * 2
as wrong while analyzing the annotated double
source code,
because its an unsupported operation for the declared type: n: abc.Sequence
.
In a gradual type system, we have the interplay of two different views of types:
The view adopted by Smalltalk—the pioneering OO language—as well as Python and Ruby.
Objects have types, but variables (including parameters) are untyped.
In practice, it doesn’t matter what is the declared type of the object,
only what operations it actually supports.
If I can invoke birdie.quack()
, then birdie
is a duck in this context.
By definition, duck typing is only enforced at runtime, when operations on objects are attempted.
This is more flexible than nominal typing, at the cost of allowing more errors at
runtime.7
The view adopted by C++, Java, and C#, supported by annotated Python.
Objects and variables have types.
But objects only exist at runtime, and the type checker only cares about the source code
where variables (including parameters) are annotated with type hints.
If Duck
is a subclass of Bird
,
you can assign a Duck
instance to a parameter annotated as birdie: Bird
.
But in the body of the function, the type checker considers the call birdie.quack()
illegal,
because birdie
is nominally a Bird
—even if at runtime it’s actually a Duck
.
Nominal typing is enforced statically, before the program is run.
This is more rigid than duck typing,
with the advantage of catching some bugs earlier in a build pipeline,
or even as the code is typed in an IDE.
Here is a silly example that contrasts duck typing and nominal typing, as well as static type checking and runtime behavior8:
birds.py
class
Bird
:
pass
class
Duck
(
Bird
)
:
def
quack
(
self
)
:
(
'
Quack!
'
)
def
alert
(
birdie
)
:
birdie
.
quack
(
)
def
alert_duck
(
birdie
:
Duck
)
-
>
None
:
birdie
.
quack
(
)
def
alert_bird
(
birdie
:
Bird
)
-
>
None
:
birdie
.
quack
(
)
Duck
is a subclass of Bird
.
alert
has no type hints, so the type checker ignores it.
alert_duck
takes one argument of type Duck
.
alert_bird
takes one argument of type Bird
.
Type checking birds.py
with Mypy, we see a problem:
…
/
birds
/
$
mypy
birds
.
py
birds
.
py
:
16
:
error
:
"Bird"
has
no
attribute
"quack"
Found
1
error
in
1
file
(
checked
1
source
file
)
Just by analyzing the source code, Mypy sees that alert_bird
is problematic:
the type hint declares the birdie
parameter with type Bird
,
but the body of the function calls birdie.quack()
—and
the Bird
class has no such method.
Now let’s try to use the birds
module in daffy.py
:
daffy.py
from
birds
import
*
daffy
=
Duck
(
)
alert
(
daffy
)
alert_duck
(
daffy
)
alert_bird
(
daffy
)
Valid call, because alert
has no type hints.
Valid call, because alert_duck
takes a Duck
argument, and daffy
is a Duck
.
Valid call, because alert_bird
takes a Bird
argument, and daffy
is a also a
Bird
—the superclass of Duck
.
Running Mypy on daffy.py
raises the same error about the quack
call in the alert_bird
function defined in birds.py
:
…
/
birds
/
$
mypy
daffy
.
py
birds
.
py
:
16
:
error
:
"Bird"
has
no
attribute
"quack"
Found
1
error
in
1
file
(
checked
1
source
file
)
But Mypy sees no problem wit daffy.py
itself: the three function calls are OK.
Now, if you run daffy.py
, this is what you get:
…/birds/ $ python3 daffy.py Quack! Quack! Quack!
Everything works! Duck typing FTW!
At runtime, Python doesn’t care about declared types. It uses duck typing only.
Mypy flagged an error in alert_bird
, but calling it with daffy
works fine at runtime.
This may surprise many Pythonistas at first: a static type checker will sometimes find
errors in programs that we know will execute.
However, if months from now you are tasked with extending the silly bird example, you may be grateful for Mypy. Consider this woody.py
module which also uses birds
:
woody.py
from
birds
import
*
woody
=
Bird
()
alert
(
woody
)
alert_duck
(
woody
)
alert_bird
(
woody
)
Mypy finds two errors while checking woody.py
:
…
/
birds
/
$
mypy
woody
.
py
birds
.
py
:
16
:
error
:
"Bird"
has
no
attribute
"quack"
woody
.
py
:
5
:
error
:
Argument
1
to
"alert_duck"
has
incompatible
type
"Bird"
;
expected
"Duck"
Found
2
errors
in
2
files
(
checked
1
source
file
)
The first error is in birds.py
: the birdie.quack()
call in alert_bird
, which we’ve seen before.
The second error is in woody.py
: woody
is an instance of Bird
,
so the call alert_duck(woody)
is invalid because that function requires a Duck
.
Every Duck
is a Bird
, but not every Bird
is a Duck
.
At runtime, none of the calls in woody.py
succeed. Given that woody
is a Bird
:
alert(woody)
fails, and Mypy could not help us because there are no type hints in alert
.
alert_duck(woody)
fails, and Mypy saw the problem: Argument 1 to "alert_duck" has incompatible type "Bird"; expected "Duck"
.
alert_bird(woody)
fails, and Mypy has been telling us since Example 8-4 that the body of the alert_bird
function is wrong: "Bird" has no attribute "quack"
.
This little experiment shows that duck typing is easier to get started and is more flexible,
but allows unsupported operations to cause errors at runtime.
Nominal typing detects errors before runtime, but sometimes can reject code that actually runs—such
as the call alert_bird(daffy)
in Example 8-5.
Even if it sometimes works, the alert_bird
function is misnamed:
its body does require an object that supports the .quack()
method, which Bird
doesn’t have.
In this silly example, the functions are one-liners.
But in real code they could be longer,
they could pass the birdie
argument to more functions,
and the origin of the birdie
argument could be many frames away,
making it hard to pinpoint the cause of a runtime error.
The type checker prevents many such errors from ever happening at runtime.
The value of type hints is questionable in the tiny examples that fit in a book. The benefits grow with the size of the codebase. That’s why companies with millions of lines of Python code—like Dropbox, Google, and Facebook—invested in teams and tools to support the company-wide adoption of type hints, and have significant and increasing portions of their Python codebases type checked in their CI pipelines.
In this section we explored the relatioship of types and operations in duck typing and nominal typing,
starting with the simple double()
function—which we left without proper type hints.
Now we will tour the most important types used for annotating functions.
We’ll see a good way to add type hints to double()
when we reach “Protocols”.
But before we get to that, there are more fundamental types to know.
Pretty much any Python type can be used in type hints, but there are
restrictions and recommendations. In addition, the typing
module
introduced special constructs with semantics that are sometimes surprising.
This section covers all the major types you can use with annotations:
typing.Any
;
Simple types and classes;
typing.Optional
and typing.Union
;
Generic collections, including tuples and mappings;
typing.TypedDict
—for type hinting dicts
used as records;
Abstract Base Classes—and a few you should not use;
Generic iterables;
Parameterized generics and the TypeVar
class;
typing.Protocols
—the key to static duck typing;
typing.Callable
;
typing.NoReturn
—a good way end to this list.
We’ll cover each of these in turn, starting with a type that is strange, apparently useless, but crucially important.
Any
typeThe keystone of any gradual type system is the Any
type, also known as the dynamic type.
When a type checker sees an untyped function like this:
def
double
(
n
):
return
n
*
2
It assumes this:
def
double
(
n
:
Any
)
->
Any
:
return
n
*
2
That means the n
argument and the return value can be of any type,
including different types. Any
is assumed to support every possible operation.
Contrast Any
with object
. Consider this signature:
def
double
(
n
:
object
)
->
object
:
This function also accepts arguments of every type, because every type is a subtype of object
.
However, a type checker will reject this function:
def
double
(
n
:
object
)
->
object
:
return
n
*
2
The problem is that object
does not support the __mul__
operation. This is what Mypy reports:
…
/
birds
/
$
mypy
double_object
.
py
double_object
.
py
:
2
:
error
:
Unsupported
operand
types
for
*
(
"object"
and
"int"
)
Found
1
error
in
1
file
(
checked
1
source
file
)
More general types have narrower interfaces, i.e. they support less operations.
The object
class implements fewer operations than abc.Sequence
,
which implements fewer operations than abc.MutableSequence
,
which implements fewer operations than list
.
But Any
is a magic type that sits at the top and the bottom of the type hierarchy.
It’s simultaneously the most general type—so that an argument n: Any
accepts values of every type—and the most specialized type, supporting every possible operation.
At least, that’s how the type checker understands Any
.
Of course, no type can support every possible operation,
so using Any
prevents the type checker from fulfilling its core mission:
detecting potentially illegal operations before your program crashes with a runtime exception.
Traditional object-oriented nominal type systems rely on the is-subtype-of relationship.
Given a class C1
and a subclass C2
, then C2
is-subtype-of C1
.
Consider this code:
class
C1
:
...
class
C2
(
C1
):
...
def
f1
(
p
:
C1
)
->
None
:
...
o2
=
C2
()
f1
(
o2
)
# OK
The call f1(o2)
is an application of the Liskov Substitution Principle—LSP.
Barbara Liskov9 actually defined is-sub-type-of terms of supported operations:
if an object of type C2
substitutes an object of type C1
and the program
still behaves correctly, then C2
is-subtype-of C1
.
Continuing from the previous code, this shows a violation of the LSP:
def
f2
(
p
:
C2
)
->
None
:
...
o1
=
C1
()
f2
(
o1
)
# type error
From the point of view of supported operations, this makes perfect sense:
as a subclass, C2
inherits and must support all operations that C1
does.
So an instance of C2
can be used anywhere a instance of C1
is expected.
But the reverse is not necessarily true: C2
may implement additional methods,
so an instance of C1
may not be used everywhere an instance of C2
is expected.
This focus on supported operations is reflected in the name
behavioral subtyping, also used to refer to the LSP.
In a gradual type system, there is another relatioship: is-consistent-with,
which applies wherever is-subtype-of applies, except when the type Any
is involved.
The rules for is-consistent-with are:
Given T1
and a subtype T2
, then T2
is-consistent-with T1
(Liskov substitution).
Every type is-consistent-with Any
: you can pass objects of every type to an argument declared of type Any
.
Any
is-consistent-with every type: you can always pass an object of type Any
where an argument of another type is expected.
Considering the previous definitions of the objects o1
and o2
,
here are examples of valid code, illustrating rules #2 and #3:
def
f3
(
p
:
Any
)
->
None
...
o0
=
object
()
o1
=
C1
()
o2
=
C2
()
f3
(
o0
)
#
f3
(
o1
)
# all OK: rule #2
f3
(
o2
)
#
def
f4
():
# implicit return type: `Any`
...
o4
=
f4
()
# inferred type: `Any`
f1
(
o4
)
#
f2
(
o4
)
# all OK: rule #3
f3
(
o4
)
#
Every gradual type system needs a wildcard type like Any
.
Now we can explore the rest of the types used in annotations.
Simple types like int
, float
, str
, bytes
may be used directly in type hints.
Concrete classes from the standard library, external packages, or user defined—
FrenchDeck
, Vector2d
, and Duck
—may also be used in type hints.
Abstract Base Classes are also useful in type hints. We’ll get back to them as we study collection types, and in “Abstract Base Classes”.
Among classes, is-consistent-with is defined like is-subtype-of: a subclass is consistent with all its superclasses.
However, “practicality beats purity” so there is an important exception:
int
is consistent with complex
There is no nominal subtype relationship between the built-in types int
, float
and complex
:
they are direct subclasses of object
.
But PEP 484 declares
that int
is-consistent-with float
, and float
is-consistent-with complex
.
It makes sense in practice:
int
implements all operations that float
does,
and int
implements additional ones as well—bitwise operations
like &
, |
, <<
etc.
The end result is: int
is-consistent-with complex
.
For i = 3
, i.real
is 3
, and i.imag
is 0
.
Optional
and Union
typesWe saw the Optional
special type in “Using None
as a default”.
It solves the problem of having None
as a default, as in this example from that section:
from
typing
import
Optional
def
show_count
(
count
:
int
,
singular
:
str
,
plural
:
Optional
[
str
]
=
None
)
->
str
:
The construct Optional[str]
is actually a shortcut for Union[str, None]
which means the type of plural
may be str
or None
.
The ord
built-in function’s signature is a simple example of Union
—it accepts str
or bytes
,
and returns an int
:10
def
ord
(
c
:
Union
[
str
,
bytes
])
->
int
:
...
There are functions that acccept a str
or bytes
argument
but return str
if the argument was str
or bytes
if the arguments was bytes
.
In those cases, the return type is determined by the input type,
so Union
is not an accurate solution.
To properly annotate such functions, we need a type variable—presented in “Parameterized generics and TypeVar
”—or
overloading, which we’ll see in “Overloaded signatures”.
Here is an example of a function takes a str
, but may return a str
or a float
:
from
typing
import
Union
def
parse_token
(
token
:
str
)
->
Union
[
str
,
float
]:
try
:
return
float
(
token
)
except
ValueError
:
return
token
If possible, avoid creating functions that return Union
types, as they put an extra burden on the user—forcing them to check the type of the returned value at runtime to know what to do with it.
But the parse_token
above is a reasonable use case in the context of a simple expression evaluator.
Union[]
requires at least two types.
Nested Union
types have the same effect as a flattened Union
.
So this type hint:
Union
[
A
,
B
,
Union
[
C
,
D
,
E
]]
is the same as:
Union
[
A
,
B
,
C
,
D
,
E
]
Union
is more useful with types that are not consistent with each other.
For example: Union[int, float]
is rarely useful as a type hint,
because int
is-consistent-with float
.
Usually it’s better to use float
to annotate the parameter, then it will accept int
values as well.
A new syntax for Union
is under consideration:
instead of Union[str, float]
, we could write str | float
.
See PEP 604 — Complementary syntax for Union[].
The status of PEP 604 is “Draft” as I write this.
It was intended for Python 3.9, but missed the feature freeze for beta 1 and now is slated for 3.10.
Most Python collections are heterogeneous: for example, you can put any mixture of different types in a list
. However, in practice that is not very useful: if you put objects in a list, you are likely to want to operate on them later, and usually this means they share at least one common method.11
With Python’s type hints, a collection can be annotated with a generic type to constrain the type of the elements in the collection. For example:
def
tokenize
(
text
:
str
)
->
list
[
str
]:
return
text
.
upper
()
.
split
()
In Python 3.9, that means tokenize
returns a list
where all items are str
.
However, list
and the other built-in collections only support that notation in Python 3.7
and 3.8 with a __future__
import:
from
__future__
import
annotations
def
tokenize
(
text
:
str
)
->
list
[
str
]:
return
text
.
upper
()
.
split
()
Sadly, that __future__
does not work with Python 3.5 or 3.6,
nor is it supported by Mypy 0.770—which I am using as I write this.
So this is how to annotate tokenize
in a way that works with Python ≥ 3.5 and Mypy in May 2020:
from
typing
import
List
def
tokenize
(
text
:
str
)
->
List
[
str
]:
return
text
.
upper
()
.
split
()
Note that you need to import the List
type from the typing
module.
To annotate a list that can hold any type of object, the type hint would be List[Any]
.
That’s the same as writing just List
, or even list
.
Besides List
, the typing
module defines dozens of types that
are derived from existing standard library classes with
the added feature of supporting generic type notation with []
.
Table 8-1 lists the collections that are not mappings.
I’d rather use the built-in collections as generics,
but I chose to import the generic collections from the typing
module
because when this book is released most readers will probably be using Python 3.8 or earlier.
collection | type hint equivalent |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Python 3.9 implements PEP 585—Type Hinting Generics In Standard Collections,
which means the collection types in the left column of Table 8-1 support the generic []
,
so you can write set[str]
and don’t need to import typing.Set
.
With the release of 3.9,
the type hint equivalents in the right column of Table 8-1
become reduntant and are deprecated.
Type checkers are expected to warn about this deprecation.
To minimize runtime impact, Python itself will issue no warnings,
and the type hint equivalents will only removed from
the typing
module five years after Python 3.9 is released.
As of May 2020, there is no good way to annotate array.array
taking into account the typecode
constructor
argument which determines whether integers or floats are stored in the array.12
Tuple
There are three ways to annotate tuple types.
The first is using typing.Tuple
and specifying the type of each field in the tuple.
For example, to accept a tuple with city name, population and country—('Shanghai', 24.28, 'China')
—the type hint would be Tuple[str, float, str]
.
Consider a function that takes a pair of geographic coordinates and returns a Geohash, used like this:
>>>
shanghai
=
31.2304
,
121.4737
>>>
geohash
(
shanghai
)
'wtw3sjq6q'
This is how geohash
is defined, using the geolib
package from PyPI:
coordinates.py
with the geohash
function.from
typing
import
Tuple
from
geolib
import
geohash
as
gh
# type: ignore
PRECISION
=
9
def
geohash
(
lat_lon
=
Tuple
[
float
,
float
])
->
str
:
return
gh
.
encode
(
*
lat_lon
,
PRECISION
)
The second way is using typing.NamedTuple
—as seen in Chapter 5.
Here is a variation of Example 8-7 with NamedTuple
:
coordinates_named.py
with the NamedTuple
Coordinates
and the geohash
function.from
typing
import
Tuple
,
NamedTuple
from
geolib
import
geohash
as
gh
# type: ignore
PRECISION
=
9
class
Coordinate
(
NamedTuple
):
lat
:
float
lon
:
float
def
geohash
(
lat_lon
:
Coordinate
)
->
str
:
return
gh
.
encode
(
*
lat_lon
,
PRECISION
)
def
display
(
lat_lon
:
Tuple
[
float
,
float
])
->
str
:
lat
,
lon
=
lat_lon
ns
=
'N'
if
lat
>=
0
else
'S'
ew
=
'E'
if
lon
>=
0
else
'W'
return
f
'{abs(lat):0.1f}°{ns}, {abs(lon):0.1f}°{ew}'
As explained in “Overview of data class builders”, typing.NamedTuple
is a factory for tuple
subclasses, so Coordinate
is-consistent-with Tuple[float, float]
but the reverse is not true—after all, Coordinate
has extra methods added by NamedTuple
,
like ._as_dict()
, and could also have user-defined methods.
The third way is for annotating tuples of unspecified length that are used as immutable lists:
you must specify a single type, followed by a comma and ...
(that’s Python’s ellipsis token,
made of three periods, not Unicode U+2026—HORIZONTAL ELLIPSIS).
For example, the type for a tuple with int
elements is Tuple[int, ...]
.
The ellipsis indicates that any number of elements >= 1 is acceptable. There is no way to specify fields of different types for tuples of unspecified length.
Here is a columnize
function that transforms a sequence into a table of
rows and cells in the form of list of tuples with unspecified lengths.
This is useful to display items in columns, like this:
>>>
animals
=
'drake fawn heron ibex koala lynx tahr xerus yak zapus'
.
split
()
>>>
table
=
columnize
(
animals
)
>>>
table
[('drake', 'koala', 'yak'), ('fawn', 'lynx', 'zapus'), ('heron', 'tahr'),
('ibex', 'xerus')]
>>>
for
row
in
table
:
...
(
''
.
join
(
f
'{word:10}'
for
word
in
row
))
...
drake koala yak
fawn lynx zapus
heron tahr
ibex xerus
Example 8-9 shows the implementation of columnize
. Note the return type, List[Tuple[str, ...]]
.
columnize.py
returns a list of tuples of strings.from
typing
import
Sequence
,
List
,
Tuple
def
columnize
(
sequence
:
Sequence
[
str
],
num_columns
:
int
=
0
)
->
List
[
Tuple
[
str
,
...
]]:
if
num_columns
==
0
:
num_columns
=
round
(
len
(
sequence
)
**
.
5
)
num_rows
,
reminder
=
divmod
(
len
(
sequence
),
num_columns
)
num_rows
+=
bool
(
reminder
)
return
[
tuple
(
sequence
[
i
::
num_rows
])
for
i
in
range
(
num_rows
)]
The annotations Tuple[Any, ...]
, Tuple
, and tuple
mean the same thing.
In codebases supporting only Python 3.9 or later, the recommended signature for Example 8-9 is:
from
collections.abc
import
Sequence
def
columnize
(
sequence
:
abc
.
Sequence
[
str
],
num_columns
:
int
=
0
)
->
list
[
tuple
[
str
,
...
]]:
Note there is no typing
import. The list
and tuple
built-ins are used,
as well as collections.abc.Sequence
instead of typing.Sequence
.
PEP 585—Type Hinting Generics In Standard Collections
also affects the use of typing.Tuple
,
which is deprecated in Python 3.9 and will be removed five years after that release.
Generic mapping types are annotated as MappingType[KeyType, ValueType]
.
For example, a JSON object must have string keys, but the values can be anything,
so this would be written as Dict[str, Any]
.
Example 8-10 shows a practical use of a function returning an inverted index to search Unicode characters by name—a variation of Example 4-21 more suitable for server-side code that we’ll study in [Link to Come].
Given starting and ending Unicode character codes,
name_index
returns a Dict[str, Set[str]]
which is an inverted index mapping
each word to a set of characters that have that word in their names.
For example, after indexing ASCII characters from 32 to 64,
here are the sets of characters mapped to the words 'SIGN'
and 'DIGIT'
,
and how to find the character named 'DIGIT EIGHT'
:
>>>
index
=
name_index
(
32
,
65
)
>>>
index
[
'SIGN'
]
{'$', '>', '=', '+', '<', '%', '#'}
>>>
index
[
'DIGIT'
]
{'8', '5', '6', '2', '3', '0', '1', '4', '7', '9'}
>>>
index
[
'DIGIT'
]
&
index
[
'EIGHT'
]
{'8'}
Below is the source code for charindex.py
with the name_index
function.
Besides a Dict[]
type hint,
this example has three features appearing for the first time in the book.
charindex.py
import
sys
import
re
import
unicodedata
from
typing
import
Dict
,
Set
,
Iterator
RE_WORD
=
re
.
compile
(
r
'
w+
'
)
STOP_CODE
=
sys
.
maxunicode
+
1
def
tokenize
(
text
:
str
)
-
>
Iterator
[
str
]
:
"""return iterable of uppercased words"""
for
match
in
RE_WORD
.
finditer
(
text
)
:
yield
match
.
group
(
)
.
upper
(
)
def
name_index
(
start
:
int
=
32
,
end
:
int
=
STOP_CODE
)
-
>
Dict
[
str
,
Set
[
str
]
]
:
index
:
Dict
[
str
,
Set
[
str
]
]
=
{
}
for
char
in
(
chr
(
i
)
for
i
in
range
(
start
,
end
)
)
:
if
name
:
=
unicodedata
.
name
(
char
,
'
'
)
:
for
word
in
tokenize
(
name
)
:
index
.
setdefault
(
word
,
set
(
)
)
.
add
(
char
)
return
index
tokenize
is a generator function. [Link to Come] is about generators.
The local variable index
is annotated. Without the hint, Mypy says:
error: Need type annotation for 'index' (hint: "index: Dict[<type>, <type>] = ...")
.
We’ll come back to variable type hints in [Link to Come].
I used the walrus operator :=
in the if
condition.
It assigns the result of the unicodedata.name()
call to name
,
and the whole expression evaluates to that result.
When the result is ''
, that’s falsy and the index
is not updated.13
The typing
module also defines several mapping types and view types, listed in Table 8-2.
collection | type hint equivalent |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
With the release of Python 3.9, the collection types in the left column of Table 8-2
support generic notation []
and the type hint equivalents on the right are deprecated and will be
removed five years later.
To instantiate an annotated defaultdict
,
a variable type hint is required for Python 3.8 and earlier—because
typing.DefaultDict
cannot be called to construct an object,
and collections.defaultdict
does not accept the generic syntax.
Here is an example:
my_dict
:
DefaultDict
[
str
,
List
[
int
]]
=
defaultdict
(
list
)
This is a rather long section for one specific type.
However, exploring TypedDict
with Mypy illustrates important points about gradual typing in Python.
In particular, it’s tempting to use TypedDict
to protect against errors while
handling dynamic data structures like JSON API responses.
But the examples here make clear that correct handling of JSON
is essentially a matter of runtime validation, not static type checking.
The mapping types we just saw limit all values to have the same type.
However, Python dictionaries are often used as records, with field names as keys. For example, consider a record describing a book in JSON or Python:
{
"isbn"
:
"0134757599"
,
"title"
:
"Refactoring, 2e"
,
"authors"
:
[
"Martin Fowler"
,
"Kent Beck"
],
"pagecount"
:
478
}
Before Python 3.8, there was no good way to annotate a record like that. Here are two possibilities, both unsatisfactory:
Dict[str, Any]
The values may be of any type.
Dict[str, Union[str, int, List[str]]]
Hard to read, and doesn’t preserve the relationship between field names
and field types: title
is supposed to be a str
, it can’t be an int
or a List[str]
.
Python 3.8 solved that problem by implementing PEP 589—TypedDict: Type Hints for Dictionaries with a Fixed Set of Keys.
Here is a simple TypedDict
:
books.py
: the BookDict
definition.from
typing
import
TypedDict
,
List
import
json
class
BookDict
(
TypedDict
):
isbn
:
str
title
:
str
authors
:
List
[
str
]
pagecount
:
int
At first glance, typing.TypedDict
may seem like a data class builder,
similar to the @dataclass
decorator or typing.NamedTuple
—both covered in Chapter 5.
The syntactic similarity is misleading. TypeDict
is very different.
It exists only for the benefit of type checkers, and has no runtime effect.
TypedDict
provides two things:
Class-like syntax to annotate a dict
with type hints for the value of each “field”.
A constructor that tells the type checker to expect a dict
with the keys and values as specified.
At runtime, a TypeDict
constructor such as BookDict
is placebo:
it has the same effect as calling the dict
constructor with the same arguments.
The fact that BookDict
creates a plain dict
also means that:
The “fields” in the pseudo-class definition don’t create instance attributes.
You can’t write initializers with default values for the “fields”.
Method definitions are not allowed.
Let’s explore the behavior of a BookDict
at runtime.
BookDict
, but quite as intended.>>>
from
books
import
BookDict
>>>
pp
=
BookDict
(
title
=
'
Programming Pearls
'
,
...
authors
=
'
Jon Bentley
'
,
...
isbn
=
'
0201657880
'
,
...
pagecount
=
256
)
>>>
pp
{'title': 'Programming Pearls', 'authors': 'Jon Bentley', 'isbn': '0201657880',
'pagecount': 256}
>>>
type
(
pp
)
<class 'dict'>
>>>
pp
.
title
Traceback (most recent call last):
File
"<stdin>"
, line
1
, in
<module>
AttributeError
:
'dict' object has no attribute 'title'
>>>
pp
[
'
title
'
]
'Programming Pearls'
>>>
BookDict
.
__annotations__
{'isbn': <class 'str'>, 'title': <class 'str'>, 'authors': typing.List[str],
'pagecount': <class 'int'>}
You can call BookDict
like a dict
constructor with keyword arguments,
or passing a dict
argument—including a dict
literal.
Ooops… I forgot authors
takes a list. But gradual typing means no type checking at runtime.
The result of calling BookDict
is a plain dict
…
… therefore you can’t read the data using object.field
notation.
To get the type hints at runtime, read BookDict.__annotations__
.
Without a type checker, TypedDict
is as useful as comments:
it may help people read the code, but that’s it.
In contrast, the class builders from Chapter 5 are useful even if you don’t use a type checker
because at runtime they generate or enhance a custom class that you can instantiate.
They also provide several useful methods or functions listed in Table 5-1.
Example 8-13 builds a valid BookDict
and tries some operations on it.
This shows how TypedDict
enables Mypy to catch errors, shown in Example 8-14.
demo_books.py
: legal and ilegal operations on a BookDict
.from
books
import
BookDict
from
typing
import
TYPE_CHECKING
def
demo
(
)
-
>
None
:
book
=
BookDict
(
isbn
=
'
0134757599
'
,
title
=
'
Refactoring, 2e
'
,
authors
=
[
'
Martin Fowler
'
,
'
Kent Beck
'
]
,
pagecount
=
478
)
authors
=
book
[
'
authors
'
]
if
TYPE_CHECKING
:
reveal_type
(
authors
)
authors
=
'
Bob
'
book
[
'
weight
'
]
=
4.2
del
book
[
'
title
'
]
if
__name__
==
'
__main__
'
:
demo
(
)
Remember to add a return type, so that Mypy doesn’t ignore the function.
This is a valid BookDict
: all the keys are present, with values of the correct types.
Mypy will infer the type of authors
from the annotation for the 'authors'
key in BookDict
.
typing.TYPE_CHECKING
is only True
when the program is being type checked. At runtime, it’s always false.
The previous if
statement prevents reveal_type(authors)
from being called at runtime.
reveal_type
is not a runtime Python function, but a debugging facility provided by Mypy.
That’s why there is no import
for it.
See its output in Example 8-14.
The last three lines of the demo
function are illegal.
They will cause error messages in in Example 8-14.
Type checking demo_books.py
from Example 8-13, this is what we get:
demo_books.py
.…
/
typedict
/
$
mypy
demo_books
.
py
demo_books
.
py
:
13
:
note
:
Revealed
type
is
'
built-ins.list[built-ins.str]
'
demo_books
.
py
:
14
:
error
:
Incompatible
types
in
assignment
(
expression
has
type
"
str
"
,
variable
has
type
"
List[str]
"
)
demo_books
.
py
:
15
:
error
:
TypedDict
"
BookDict
"
has
no
key
'
weight
'
demo_books
.
py
:
16
:
error
:
Key
'
title
'
of
TypedDict
"
BookDict
"
cannot
be
deleted
Found
3
errors
in
1
file
(
checked
1
source
file
)
This note is the result of reveal_type(authors)
.
The type of the authors
variable was inferred from the type of the book['authors']
expression that initialized it. You can’t assign a str
to a variable of type List[str]
.
Type checkers usually don’t allow the type of a variable to
change.14
Cannot assign to a key that is not part of the BookDict
definition.
Cannot delete a key that is part of the BookDict
definition.
Now let’s see BookDict
used in function signatures, to type check function calls.
Imagine you need to generate XML from book records, similar to this:
<BOOK>
<ISBN>
0134757599</ISBN>
<TITLE>
Refactoring, 2e</TITLE>
<AUTHOR>
Martin Fowler</AUTHOR>
<AUTHOR>
Kent Beck</AUTHOR>
<PAGECOUNT>
478</PAGECOUNT>
</BOOK>
If you were writing MicroPython code to embed in a tiny microcontroller, you might write a function like this:15
books.py
: to_xml
function.AUTHOR_EL
=
'
<AUTHOR>{}</AUTHOR>
'
def
to_xml
(
book
:
BookDict
)
-
>
str
:
elements
:
List
[
str
]
=
[
]
for
key
,
value
in
book
.
items
(
)
:
if
isinstance
(
value
,
list
)
:
elements
.
extend
(
AUTHOR_EL
.
format
(
n
)
for
n
in
value
)
else
:
tag
=
key
.
upper
(
)
elements
.
append
(
f
'
<{tag}>{value}</{tag}>
'
)
xml
=
'
'
.
join
(
elements
)
return
f
'
<BOOK>
{xml}
</BOOK>
'
The whole point of the example: using BookDict
in the function signature.
It’s often necessary to annotate collections that start empty, otherwise Mypy can’t infer the type of the elements.16
Mypy understands isinstance
checks, and treats value
as a list
in this block.
When I used key == 'authors'
as the condition for the if
guarding this block,
Mypy found an error in this line: "object" has no attribute "__iter__"
,
because it inferred the type of value
returned from book.items()
as object
,
which doesn’t support the __iter__
method required by the generator expression.
With the isinstance
check,
this works because Mypy knows that value
is a list
in this block.
Here is a function that parses a JSON str
and returns a BookDict
:
books_any.py
: from_json
function.def
from_json
(
data
:
str
)
-
>
BookDict
:
whatever
=
json
.
loads
(
data
)
return
whatever
The return type of json.loads()
is Any
.17
I can return whatever
—of type Any
—because Any
is consistent with every type,
including the declared return type, Bookdict
.
The second point of Example 8-16 is very important to keep in mind:
Mypy will not flag any problem in this code, but at runtime the value in whatever
may not conform
to the BookDict
structure—in fact, it may not be a dict
at all!
If you run Mypy with --disallow-any-expr
it will complain about the two lines in the body of from_json
:
…
/
typedict
/
$
mypy
books_any
.
py
--
disallow
-
any
-
expr
books
.
py
:
30
:
error
:
Expression
has
type
"Any"
books
.
py
:
31
:
error
:
Expression
has
type
"Any"
Found
2
errors
in
1
file
(
checked
1
source
file
)
In this case, the type error can be silenced by adding a type hint to the
initialization of the whatever
variable, as in Example 8-17:
books.py
: from_json
function with variable annotation.def
from_json
(
data
:
str
)
-
>
BookDict
:
whatever
:
BookDict
=
json
.
loads
(
data
)
return
whatever
--disallow-any-expr
does not cause errors when an expression of type Any
is immediately assigned to a variable with a type hint.
Now whatever
is of type BookDict
, the declared return type.
Don’t be lulled into a false sense of type safety by Example 8-17!
Looking at the code at rest, the type checker cannot predict that
json.loads()
will return anything that resembles a BookDict
.
Only runtime validation can guarantee that.
Static type checking is unable to prevent errors with code that is inherently dynamic,
such as json.loads()
, which builds a Python objects of different types at runtime.
Example 8-18, Example 8-19, and Example 8-20 demonstrate.
demo_not_book.py
: from_json
returns an invalid BookDict
, and to_xml
accepts it.from
books
import
to_xml
,
from_json
from
typing
import
TYPE_CHECKING
def
demo
(
)
-
>
None
:
NOT_BOOK_JSON
=
"""
{
"
title
"
:
"
Andromeda Strain
"
,
"
flavor
"
:
"
pistachio
"
,
"
authors
"
: true}
"""
not_book
=
from_json
(
NOT_BOOK_JSON
)
if
TYPE_CHECKING
:
reveal_type
(
not_book
)
reveal_type
(
not_book
[
'
authors
'
]
)
(
not_book
)
(
not_book
[
'
flavor
'
]
)
xml
=
to_xml
(
not_book
)
(
xml
)
if
__name__
==
'
__main__
'
:
demo
(
)
This line does not produce a valid BookDict
—see the content of NOT_BOOK_JSON
.
Let’s have Mypy reveal a couple of types.
This should not be a problem, as print
can handle object
and every subtype.
BookDict
has no 'flavor'
key, but the JSON source does… what will happen?
Remember the signature: def to_xml(book: BookDict) -> str:
How will the XML output look like?
Checking demo_not_book.py
with Mypy:
demo_not_book.py
, reformatted for clarity.…
/
typedict
/
$
mypy
demo_not_book
.
py
demo_not_book
.
py
:
12
:
note
:
Revealed
type
is
'
TypedDict(
'
books
.
BookDict
'
, {
'
isbn
'
: built-ins.str,
'
title
'
:
built
-
ins
.
str
,
'
authors
'
:
built
-
ins
.
list
[
built
-
ins
.
str
]
,
'
pagecount
'
:
built
-
ins
.
int
}
)
'
demo_not_book
.
py
:
13
:
note
:
Revealed
type
is
'
built-ins.list[built-ins.str]
'
demo_not_book
.
py
:
16
:
error
:
TypedDict
"
BookDict
"
has
no
key
'
flavor
'
Found
1
error
in
1
file
(
checked
1
source
file
)
The revealed type is the nominal type, not runtime content of not_book
.
Again, this is the nominal type of not_book['authors']
, as defined in BookDict
. Not the runtime type.
This error is for line print(not_book['flavor'])
: that key does not exist in the nominal type.
Now let’s run demo_not_book.py
.
demo_not_book.py
.…/typedict/ $ python3 demo_not_book.py {'title': 'Andromeda Strain', 'flavor': 'pistachio', 'authors': True} pistachio <BOOK> <TITLE>Andromeda Strain</TITLE> <FLAVOR>pistachio</FLAVOR> <AUTHORS>True</AUTHORS> </BOOK>
This is not really a BookDict
.
The value of not_book['flavor']
.
to_xml
takes a BookDict
argument, but at runtime it’s more flexible: garbage in, garbage out.
Example 8-20 shows that demo_not_book.py
outputs nonsense, but has no runtime errors.
Using a TypeDict
while handling JSON data did not provide much type safety.
If you look at the code for to_xml
in Example 8-15 through the lens of duck typing,
the argument book
must provide an .items()
method that
returns an iterable of tuples like (key, value)
where:
key
must have an .upper()
method;
value
can be anything.
The point of this demonstration: when handling data with a dynamic structure,
such as JSON or XML, TypeDict
is absolutely not a replacement for data validation at runtime.
If you are interested in runtime validation of JSON schemas with type annotations,
check out the pydantic
package on PyPI.
TypedDict
has more features, including support for optional keys, a limited form of inheritance,
and an alternative declaration syntax for Python versions before 3.6.
These will be covered in [Link to Come], section [Link to Come].
Be conservative in what you send, be liberal in what you accept.
Postel’s law, a.k.a. the Robustness Principle
Table 8-1 and Table 8-2 list several abstract classes from collections.abc
.
Ideally, a function should accept arguments of those abstract types—or
their type hint equivalents before Python 3.9—and not concrete types.
This gives more flexibility to the caller.
Consider this function signature:
def
name2hex
(
name
:
str
,
color_map
:
Mapping
[
str
,
int
])
->
str
:
Using typing.Mapping
allows the caller to provide an instance of
dict
, defaultdict
, ChainMap
, a UserDict
subclass or any other type that is a subtype of Mapping
.
In contrast, consider this signature:
def
name2hex
(
name
:
str
,
color_map
:
Dict
[
str
,
int
])
->
str
:
Now color_map
must be a dict
or one of its subtypes such as DefaultDict
or OrderedDict
.
In particular, a subclass of collections.UserDict
would not
pass the type check for color_map
, despite being the recommended way to
create user-defined mappings, as we saw in “Subclassing UserDict”.
Mypy would reject a UserDict
or an instance of class derived from it,
because UserDict
is not a subclass of dict
; they are siblings, both are
subclasses of abc.MutableMapping
.18
Therefore, in general it’s better to use typing.Mapping
or typing.MutableMapping
instead of dict
or typing.Dict
as a parameter type.
If the name2hex
function doesn’t need to mutate the given color_map
,
the most accurate type hint for color_map
is typing.Mapping
.
That way, the caller doesn’t need to provide an object that implements
methods like setdefault
, pop
and update
which are part of the MutableMapping
interface,
but not of Mapping
.
This has to do with the second part of Postel’s law: “be liberal in what you accept”.
Postel’s law also tells us to be conservative in what we send.
The return value of a function is always a concrete object,
so the return type hint should be a concrete type,
as in the example from “Generic collections”—which uses list[str]
assuming the code will run on
Python 3.9—otherwise the typing
equivalent List[str]
should be used.
def
tokenize
(
text
:
str
)
->
list
[
str
]:
return
text
.
upper
()
.
split
()
Under the entry of typing.List
, the Python documentation says:
Generic version of
list
. Useful for annotating return types. To annotate arguments it is preferred to use an abstract collection type such asSequence
orIterable
.
A similar comment appears in the entries for
typing.Dict
and typing.Set
.
Remember that most ABCs from collections.abc
and other concrete classes from collections
,
as well as built-in collections, support generic type hint notation like collections.deque[str]
starting with Python 3.9. The corresponding typing
collections will only be needed to support
code written in Python 3.8 or earlier. The full list of classes that became generic appears in
section Implementation of
PEP 585—Type Hinting Generics In Standard Collections.
To wrap up our discussion of ABCs in type hints, we need to talk about Numbers
.
The numbers
module is a little known corner
of the standard library since it appeared Python 2.6.
It defines a hierarchy of ABCs with Number
at the top, then
Complex
, Real
, Rational
, and Integral
.
Those ABCs allow isinstance
checks independent of specific implementations.
For example, isinstance(x, numbers.Real)
is True
for x
of type float
,
but also for NumPy types like float32
, longdouble
etc.
PEP 484 section The Numeric Tower
rejects the numbers
ABCs and handles the built-in types complex
, float
, and int
as special cases, as explained in int
is consistent with complex
.
Mypy does not support the use of the numbers
ABCs in type hints.19
The typing.List
documentation I just quoted
recommends Sequence
and Iterable
for function parameter type hints.
One example of Iterable
argument appears the math.fsum
function from the standard library:
def
fsum
(
__seq
:
Iterable
[
float
])
->
float
:
As of Python 3.8, the standard library has very few annotations but the
Typeshed
project has stub files for it.
The signature for math.fsum
is in /stdlib/2and3/math.pyi
.
The leading underscores in __seq
are a PEP 484 convention explained in “Annotating positional-only and variadic parameters”.
Example 8-21 is another example using an Iterable
parameter
that produces items that are Tuple[str, str]
. Here is how the function is used:
>>>
l33t
=
[(
'a'
,
'4'
),
(
'e'
,
'3'
),
(
'i'
,
'1'
),
(
'o'
,
'0'
)]
>>>
text
=
'mad skilled noob powned leet'
>>>
from
replacer
import
zip_replace
>>>
zip_replace
(
text
,
l33t
)
'm4d sk1ll3d n00b p0wn3d l33t'
And here is how it’s implemented:
replacer.py
from
typing
import
Iterable
,
Tuple
FromTo
=
Tuple
[
str
,
str
]
def
zip_replace
(
text
:
str
,
changes
:
Iterable
[
FromTo
]
)
-
>
str
:
for
from_
,
to
in
changes
:
text
=
text
.
replace
(
from_
,
to
)
return
text
FromTo
is a type alias: I assigned Tuple[str, str]
to FromTo
, to make
the signature of zip_replace
more readable.
changes
needs to be an Iterable[FromTo]
; that’s the same as Iterable[Tuple[str, str]]
,
but shorter and easier to read.
PEP 613—Explicit Type Aliases introduces a special
type, TypeAlias
, to make the assignments that create type aliases more visible and easier to typecheck.
PEP 613 is aproved but was not implemented in Python 3.9 before the feature freeze in May 2020.
When TypeAlias
lands in the typing
module, we’ll use it like this:
from
typing
import
TypeAlias
,
Tuple
FromTo
:
TypeAlias
=
Tuple
[
str
,
str
]
Iterable
versus Sequencce
Both math.fsum
and replacer.zip_replace
must iterate over the entire Iterable
arguments to return a result.
Given an endless iterable such as the itertools.cycle
generator as input,
these functions would consume all memory and crash the Python process.
Despite this potential danger, it is fairly common in modern Python
to offer functions that accept an Iterable
input even if they must process it completely to return a result.
That gives the caller the option of providing input data as a generator instead of a pre-built sequence,
potentially saving a lot of memory if the number of input items is large.
On the other hand, the columnize
function from Example 8-9 needs a Sequence
parameter,
and not an Iterable
, because it must get the len()
of the input to decide the number or rows.
Like Sequence
, Iterable
is best used as a parameter type. It’s too vague as a return type.
A function should be more precise about the concrete type it returns.
Closely related to Iterable
is the Iterator
type, used as a return type in Example 8-10.
We’ll get back to it in [Link to Come] which is about generators and classic iterators.
TypeVar
A parameterized generic is a generic type, written as List[T]
where T
is a type variable
that will be bound to a specific type with each usage.
This allows a parameter type to be reflected on the result type.
Example 8-22 defines sample
, a function that takes two arguments:
a Sequence
of elements of type T
, and an int
.
It returns a List
of elements of the same type T
, picked at random from the first argument.
Here are two examples illustrate the behavior of sample
:
If called with a tuple of type Tuple[int, ...]
—which is-consistent-with
Sequence[int]
—then the type parameter is int
, so the return type is List[int]
;
If called with a str
—which is-consistent-with Sequence[str]
—then
the type parameter is str
, so the return type is List[str]
.
This is the implementation:
sample.py
from
random
import
shuffle
from
typing
import
Sequence
,
List
,
TypeVar
T
=
TypeVar
(
'T'
)
def
sample
(
population
:
Sequence
[
T
],
size
:
int
)
->
List
[
T
]:
if
size
<
1
:
raise
ValueError
(
'size must be >= 1'
)
result
=
list
(
population
)
shuffle
(
result
)
return
result
[:
size
]
TypeVar
needed?Python’s parser doesn’t recognize parameterized generic type notation as special case,
so the name T
in the example must be introduced in the local namespace by calling the
typing.TypeVar
constructor.
You may have studied other languages such as Java, C#, or TypeScript
which support parameterized generics since their inception.
These languages don’t require the symbol for a type variable to be declared beforehand,
so they have no equivalent of Python’s TypeVar
class.
Another example is the statistics.mode
function from the standard library,
which returns the most common data point from a series.
Here is one usage example from the documentation:
>>>
mode
([
1
,
1
,
2
,
3
,
3
,
3
,
3
,
4
])
3
Without using a TypeVar
, mode
could have this signature:
mode_float.py
: mode
that operates on float
and subtypes.20from
collections
import
Counter
from
typing
import
Iterable
def
mode
(
data
:
Iterable
[
float
])
->
float
:
pairs
=
Counter
(
data
)
.
most_common
(
1
)
if
len
(
pairs
)
==
0
:
raise
ValueError
(
'no mode for empty data'
)
return
pairs
[
0
][
0
]
Many uses of mode
involve int
or float
values, but Python has other numerical types,
and it is desirable that the return type follows the element type of the given Iterable
.
We can improve that TypeVar
. Let’s start with a simple but wrong parameterized signature:
from
typing
import
Iterable
,
TypeVar
T
=
TypeVar
(
'T'
)
def
mode
(
data
:
Iterable
[
T
])
->
T
:
When it first appears in the signature, the type parameter T
can be any type.
The second time it appears, it will mean the same type as the first.
Therefore, every iterable is-consistent-with Iterable[T]
,
including iterables of unhashable types that collections.Counter
cannot handle.
We need to restrict the possible types assigned to T
.
We’ll see two ways of doing that in the next two sections.
TypeVar
with constraintsTypeVar
accepts extra positional arguments to constrain the type parameter.
So the signature can be improved like this, to accept more number types:
from
typing
import
Iterable
,
TypeVar
from
decimal
import
Decimal
from
fractions
import
Fraction
NumberT
=
TypeVar
(
'NumberT'
,
float
,
Decimal
,
Fraction
)
def
mode
(
data
:
Iterable
[
NumberT
])
->
NumberT
:
That’s better than before, and it was the signature for mode
in the
statistics.pyi
stub file on typeshed
on May 25, 2020.
However, the statistics.mode
documentation includes this example:
>>>
mode
([
"red"
,
"blue"
,
"blue"
,
"red"
,
"green"
,
"red"
,
"red"
])
'red'
In a hurry, we could just add str
to the NumberT
definition:
NumberT
=
TypeVar
(
'NumberT'
,
float
,
Decimal
,
Fraction
,
str
)
That certainly works, but NumberT
is badly misnamed if it accepts str
.
More importantly, we can’t keep listing types forever as we realize mode
can deal with them.
We can do better with another feature of TypeVar
, introduced next.
TypeVar
Looking at the body of mode
in Example 8-23, we see that
the Counter
class is used for ranking. Counter is based on dict
,
therefore the element type of the data
iterable must be hashable.
At first, this signature may seem to work:
from
typing
import
Iterable
,
Hashable
def
mode
(
data
:
Iterable
[
Hashable
])
->
Hashable
:
Now the problem is that the type of the returned item is Hashable
:
an ABC that implements only the __hash__
method.
So the type checker will not let us do anything with the return value
except call hash()
on it. Not very useful.
The solution is another optional parameter of TypeVar
: the bound
keyword parameter.
It sets an upper bound for the acceptable types.
In Example 8-24, we have bound=Hashable
,
which means the type parameter may be Hashable
or any subtype of it.21
mode_hashable.py
: same as Example 8-23, with a more flexible signature.from
collections
import
Counter
from
typing
import
Iterable
,
Hashable
,
TypeVar
HashableT
=
TypeVar
(
'HashableT'
,
bound
=
Hashable
)
def
mode
(
data
:
Iterable
[
HashableT
])
->
HashableT
:
pairs
=
Counter
(
data
)
.
most_common
(
1
)
if
len
(
pairs
)
==
0
:
raise
ValueError
(
'no mode for empty data'
)
return
pairs
[
0
][
0
]
The typing.TypeVar
constructor has other optional parameters—covariant
and contravariant
—that
we’ll cover in [Link to Come], [Link to Come].
Let’s conclude this introduction to TypeVar
with AnyStr
.
AnyStr
predefined type variableThe typing
module includes a predefined TypeVar
named AnyStr
.
It’s defined like this:
AnyStr
=
TypeVar
(
'AnyStr'
,
bytes
,
str
)
AnyStr
is used in many functions that accept either bytes
or str
,
and return values of the given type.
Now, on to typing.Protocol
, a new feature of Python 3.8 that can support more
Pythonic use of type hints.
In Object-Oriented programming, the concept of a “protocol” as an informal interface is
as old as Smalltalk, and is an essential part of Python from the beginning.
However, in the context of type hints, a protocol is a typing.Protocol
subclass defining an interface that a type checker can verify.
Both kinds of protocols are covered in [Link to Come].
This is just a brief introduction in the context of function annotations.
The Protocol
type as presented in
PEP 544—Protocols: Structural subtyping (static duck typing)
is similar to interfaces in Go:
a protocol type is defined by specifying one or more methods,
and the type checker verifies that those methods are implemented where that protocol type is required.
In Python, a protocol definition is written as a typing.Protocol
subclass.
However, classes that implement a protocol don’t need to inherit,
register or declare any relationship with the class that defines the protocol.
It’s up to the type checker to find the available protocol types and enforce their usage.
Here is a problem that can be solved with the help of Protocol
and TypeVar
.
Suppose you want to create a function top(it, n)
that
returns the largest n
elements of the iterable it
:
>>>
top
([
4
,
1
,
5
,
2
,
6
,
7
,
3
],
3
)
[7, 6, 5]
>>>
l
=
'mango pear apple kiwi banana'
.
split
()
>>>
top
(
l
,
3
)
['pear', 'mango', 'kiwi']
>>>
>>>
l2
=
[(
len
(
s
),
s
)
for
s
in
l
]
>>>
l2
[(5, 'mango'), (4, 'pear'), (5, 'apple'), (4, 'kiwi'), (6, 'banana')]
>>>
top
(
l2
,
3
)
[(6, 'banana'), (5, 'mango'), (5, 'apple')]
A parameterized generic top
would look like this:
top
function with an undefined T
type parameter.def
top
(
series
:
Iterable
[
T
],
length
:
int
)
->
List
[
T
]:
ordered
=
sorted
(
series
,
reverse
=
True
)
return
ordered
[:
length
]
The problem is how to constrain T
?
It cannot be Any
or object
, because the series
must work with sorted
.
The sorted
built-in actually accepts Iterable[Any]
, but that’s because the optional
argument key
takes a function that computes an arbitrary sort key from each element.
What happens if you don’t provide key
and give a list of plain objects to sorted
?
Let’s try that:
>>>
l
=
[
object
()
for
_
in
range
(
4
)]
>>>
l
[<object object at 0x10fc2fca0>, <object object at 0x10fc2fbb0>,
<object object at 0x10fc2fbc0>, <object object at 0x10fc2fbd0>]
>>>
sorted
(
l
)
Traceback (most recent call last):
File"<stdin>"
, line1
, in<module>
TypeError
:'<' not supported between instances of 'object' and 'object'
That’s interesting: sorted
needs the <
operator on the elements of the iterable.
Is this all it takes? Let’s do another quick experiment:
>>>
class
Spam
:
...
def
__init__
(
self
,
n
):
self
.
n
=
n
...
def
__lt__
(
self
,
other
):
return
self
.
n
<
other
.
n
...
def
__repr__
(
self
):
return
f
'Spam({self.n})'
...
>>>
l
=
[
Spam
(
n
)
for
n
in
range
(
5
,
0
,
-
1
)]
>>>
l
[Spam(5), Spam(4), Spam(3), Spam(2), Spam(1)]
>>>
sorted
(
l
)
[Spam(1), Spam(2), Spam(3), Spam(4), Spam(5)]
That confirms it: I can sort
a list of Spam
because Spam
implements
__lt__
—the special method that supports the <
operator.22
So the T
type parameter in Example 8-25 should be limited to types that implement __lt__
.
In Example 8-24 we needed a type parameter that implemented __hash__
,
so we were able to use typing.Hashable
as the upper bound for the type parameter.
But now there is no suitable type in typing
or abc
to use, so we need to create it.
Here is the new Comparable
type, a Protocol
:
comparable.py
: definition of a Comparable
Protocol
type:from
typing
import
Protocol
,
Any
class
Comparable
(
Protocol
)
:
def
__lt__
(
self
,
other
:
Any
)
-
>
bool
:
.
.
.
A protocol is a subclass of typing.Protocol
.
The body of the protocol has one or more method definitions, with ...
in their bodies.
A type T
is-consistent-with a protocol P
if T
implements all the methods defined in P
,
with matching type signatures.
Given Comparable
, we can now define this working version of top
:
top.py
: definition of the top
function using a TypeVar
with bound=Comparable
:from
typing
import
TypeVar
,
Iterable
,
List
from
comparable
import
Comparable
CT
=
TypeVar
(
'CT'
,
bound
=
Comparable
)
def
top
(
series
:
Iterable
[
CT
],
length
:
int
)
->
List
[
CT
]:
ordered
=
sorted
(
series
,
reverse
=
True
)
return
ordered
[:
length
]
Let’s test-drive top
. Example 8-28 shows part of a test suite for use with pytest
.
It tries calling top
with a generator expression that yields of Tuple[int, str]
,
and then with a list of object
.
With the list of object
, we expect to get a TypeError
exception.
top_test.py
: partial listing of pytestdef
test_top_tuples
()
->
None
:
fruit
=
'mango pear apple kiwi banana'
.
split
()
series
:
Iterator
[
Tuple
[
int
,
str
]]
=
(
(
len
(
s
),
s
)
for
s
in
fruit
)
length
=
3
expected
=
[(
6
,
'banana'
),
(
5
,
'mango'
),
(
5
,
'apple'
)]
result
=
top
(
series
,
length
)
if
TYPE_CHECKING
:
reveal_type
(
series
)
reveal_type
(
expected
)
reveal_type
(
result
)
assert
result
==
expected
def
test_top_objects_error
()
->
None
:
series
=
[
object
()
for
_
in
range
(
4
)]
if
TYPE_CHECKING
:
reveal_type
(
series
)
with
pytest
.
raises
(
TypeError
)
as
exc
:
top
(
series
,
3
)
assert
"'<' not supported"
in
str
(
exc
)
The above tests pass—but they would pass anyway, even without type hints in top.py
.
More to the point, if I check that test file with Mypy, this is what I get:
…
/
comparable
/
$
mypy
top_test
.
py
top_test
.
py
:
27
:
note
:
Revealed
type
is
'typing.Iterator[Tuple[builtins.int, builtins.str]]'
top_test
.
py
:
28
:
note
:
Revealed
type
is
'builtins.list[Tuple[builtins.int, builtins.str]]'
top_test
.
py
:
29
:
note
:
Revealed
type
is
'builtins.list[Tuple[builtins.int, builtins.str]]'
top_test
.
py
:
35
:
note
:
Revealed
type
is
'builtins.list[builtins.object*]'
top_test
.
py
:
37
:
error
:
Value
of
type
variable
"CT"
of
"top"
cannot
be
"object"
Found
1
error
in
1
file
(
checked
1
source
file
)
The type check shows that the TypeVar
is working as intended:
in test_top_tuples
, reveal_type
confirms that the type returned by the top
call is what we expected:
given an Iterator[Tuple[int, str]]
, we got List[Tuple[str, int]]
;
in test_top_objects_error
, reveal_type
shows the series
argument type is List[object]
;
Mypy flags the error: the element type of the series
Iterable
cannot be object
.
A key advantage of a protocol type over ABCs is that a type needs no nominal connection with
a specific protocol type to be consistent with it.
I don’t need to derive or register str
, tuple
, float
, set
, etc.
with Comparable
to be able to use them where a Comparable
parameter is expected.
They only need to implement __lt__
.
And the type checker will still be able do its job,
because Comparable
is explicitly defined as a Protocol
—in contrast with
the implicit protocols that are common with duck typing, which are invisible to the type checker.
The special Procotol
class was introduced in
PEP 544—Protocols: Structural subtyping (static duck typing).
Example 8-27 demonstrates why this feature is
known as static duck typing: the solution to annotate the series
parameter of top
was to say
“The nominal type of series
doesn’t matter, as long as it implements __lt__
“.
Python’s duck typing always allowed us to say that implicitly, but the job of type checkers was much harder.
A type checker can’t read CPython’s source code in C,
or perform console experiments to find out that sorted
only requires that the elements support <
.
Now we are able to express this in code that the type checker can read.
That’s why it makes sense to say that typing.Procotol
gives us static duck typing.23
There’s more to see about typing.Protocol
, but we’ll leave that to Part IV,
where [Link to Come] contrasts structural typing, duck typing,
and ABCs—another approach to formalizing “classic” protocols.
To annotate callback parameters or function objects returned by higher-order functions,
the typing
module provides the Callable
type,
which is parameterized like this:
Callable
[[
ParamType1
,
ParamType2
],
ReturnType
]
The parameter list—[ParamType1, ParamType2]
—can have 0 or more types.
Here is an example in context:
def
repl
(
input_fn
:
Callable
[[
Any
],
str
]
=
input
)
->
None
:
The repl
function is part of a simple interactive interpreter.24
During normal usage, the repl
function uses Python’s input
built-in to read expressions from the user.
However, for automated testing or for integration with other input means,
repl
accepts an optional input_fn
parameter:
a Callable
with the same parameter and return types as input
.
The built-in input()
has this signature on typeshed:
def
input
(
__prompt
:
Any
=
...
)
->
str
:
...
That function is-consistent-with this Callable
type hint:
Callable
[[
Any
],
str
]
As another example, in Chapter 10, the Order.__init__
method in
Example 10-3 uses this signature:
class
Order
:
# the Context
def
__init__
(
self
,
customer
:
Customer
,
cart
:
Sequence
[
LineItem
]
,
promotion
:
Optional
[
Callable
[
[
'
Order
'
]
,
float
]
]
=
None
,
)
-
>
None
:
self
rarely needs a type hint.25.
promotion
may be None
, or Callable[[Order], float]
: a function that takes an Order
and returns float
.
__init__
always returns None
, but I recommend recommend adding the return type hint for it anyway.26
Note that the Order
type appears as the string 'Order'
in the Callable
type hint,
otherwise Python would raise NameError: name 'Order' is not defined
—because the Order
class
is not defined until Python reads the whole body of the class—an issue we’ll discuss in [Link to Come]: Class Metaprogramming.
PEP 563—Postponed Evaluation of Annotations adds support
for forward references in annotations, avoiding the need to write Order
as string in the previous example.
However, that feature is only enabled when from __future__ import annotations
is used at the top of the module,
to avoid breaking code that does weird things in the annotations. See a summary of PEP 563 in
What’s New In Python 3.7.
There is no syntax to annotate optional or keyword arguments in Callable[]
.
The documentation
says “such function types are rarely used as callback types”.
If you need a type hint to match a function with a dynamic signature,
replace the whole parameter list with ...
,
like this: Callable[..., ReturnType]
.
NoReturn
This is a special type used only to annotate the return type of functions that never return. Usually, they exist to raise exceptions. There are dozens of such functions in the standard library.
For example: sys.exit()
raises SystemExit
, to terminate the Python process.
Its signature in typeshed
is:
def
exit
(
__status
:
object
=
...
)
->
NoReturn
:
...
The __status
parameter is postional-only, and it has a default value.
Stub files don’t spell out the default values: they use ...
instead.
The type of __status
is object
which means it may also be None
,
therefore it would be redundant to mark it Optional[object]
.
This ends our overview of the major groups of types used in type hints.
When the return type of a function depends on the type of one parameter,
using a TypeVar
can be enough.
But sometimes the return type depends on the type of more than one parameter.
The solution then is to use the @typing.overload
decorator.
Consider the sum
built-in function. This is help(sum)
from the console:
sum(iterable, /, start=0) Return the sum of a 'start' value (default: 0) plus an iterable of numbers When the iterable is empty, return the start value. This function is intended specifically for use with numeric values and may reject non-numeric types.
On typeshed, sum
is annotated like this, in
stdlib/2and3/builtins.pyi
:
@overload
def
sum
(
__iterable
:
Iterable
[
_T
])
->
Union
[
_T
,
int
]:
...
@overload
def
sum
(
__iterable
:
Iterable
[
_T
],
start
:
_S
)
->
Union
[
_T
,
_S
]:
...
First let’s look at the overall structure of the code with overloads.
On a stub file (.pyi
), that’s all there would be about sum
—the
implementation is elsewhere, and may even be written in C.
You can also use @overload
in a regular Python module,
by writing the overloaded signatures right before the function’s actual signature and implementation.
Example 8-30 shows how sum
would appear annotated and implemented in a Python module.
mysum.py
: definition of the sum
function with overloaded signatures:from
functools
import
reduce
from
operator
import
add
from
typing
import
overload
,
Iterable
,
Union
,
TypeVar
T
=
TypeVar
(
'
T
'
)
S
=
TypeVar
(
'
S
'
)
@overload
def
sum
(
it
:
Iterable
[
T
]
)
-
>
Union
[
T
,
int
]
:
.
.
.
@overload
def
sum
(
it
:
Iterable
[
T
]
,
/
,
start
:
S
)
-
>
Union
[
T
,
S
]
:
.
.
.
def
sum
(
it
,
/
,
start
=
0
)
:
return
reduce
(
add
,
it
,
start
)
I’m lazy, so I’ll use functools.reduce
and operator.add
to implement sum
.
We need this second, different TypeVar
, as we’ll se in the second overload.
This signature is for the simple case: sum(my_iterable)
.
The result type may be T
—the type of the elements that my_iterable
yields—or it may be int
if the
iterable is empty, because the default value of the start
parameter is 0
.
When start
is given, it can be of any type S
, so the result type is Union[T, S]
.
This is why we need S
. If I reused T
for the type of start
, then it would have to be the same type
as the elements of Iterable[T]
, and this is not what we want.
The signature of the actual function implementation has no type hints.
That’s seven lines to annotate a one-line function. Probably overkill, I know.
At least it wasn’t a foo
function.
If you want to learn about @overload
by reading code, typeshed has hundreds of examples.
As it turns out, the handy APIs we call Pythonic are often hard to annotate. On typeshed, the stub file for Python’s built-in functions has 186 overloads as I write this—more than any other in the standard library.27
Aiming for 100% of annotated code may lead to type hints that add lots of noise but little value. Annotation obsession can also lead to bloated, unpleasant APIs. Sometimes it’s better to be pragmatic and leave a piece of code without type hints.
This wraps ups our coverage of Python’s gratual type system for now. [Link to Come] covers type hints in class definitions, as well as other concepts such as variance, type erasure, and type casting.
The last sections in this epic chapter are about positional and variadic parameters, and the function attributes where type hints are stored at runtime.
Recall the tag
function from Example 7-10.
The last time we saw its signature was in section “Positional-only parameters”:
def
tag
(
name
,
/
,
*
content
,
class_
=
None
,
**
attrs
):
Here is tag
, fully annotated, written in several lines—a common convention for long signatures,
with line breaks the way the Black formatter would do it:
from
typing
import
Optional
def
tag
(
name
:
str
,
/
,
*
content
:
str
,
class_
:
Optional
[
str
]
=
None
,
**
attrs
:
str
,
)
->
str
:
Note the type hint *content: str
for the arbitrary positional parameters:
this means all those arguments must be of type str
.
The type of content
in the function body will be Tuple[str, ...]
.
The type hint for the arbitrary keyword arguments is **attrs: str
.
In this example, the type of attrs
will be Dict[str, str]
.
For a type hint like **settings: float
,
the type of settings
would be Dict[str, float]
.
The /
notation for positional-only parameters is only available since Python 3.8.
In Python 3.7 or earlier, that’s a syntax error.
The PEP 484 convention
is to prefix each positional-only parameter name with two underscores.
Here is the tag
signature again, now in two lines, using the PEP 484 convention:
from
typing
import
Optional
def
tag
(
__name
:
str
,
*
content
:
str
,
class_
:
Optional
[
str
]
=
None
,
**
attrs
:
str
)
->
str
:
Mypy understands and enforces both ways of declaring positional-only parameters.
When PEP 3107 introduced
the function annotation syntax and the __annotations__
attribute,
the community was encouraged to experiment with them.
Now the experimentation phase is over.
Any use of annotations that is not compatible with PEP 484 is officially deprecated since
PEP 563—Postponed Evaluation of Annotations
was accepted for Python 3.7. See section Non-typing usage of annotations in PEP 563.
At runtime, as a module is loaded, Python reads the type hints in functions, classes and modules and
stores them in attributes named __annotations__
.
For example, Example 8-31 is an annotated signature of Example 7-15.
def
clip
(
text
:
str
,
max_len
:
int
=
80
)
->
str
:
No processing is done with the annotations at runtime. They are merely stored as a dict
in the __annotations__
attribute of the function:
>>>
from
clip_annot
import
clip
>>>
clip
.
__annotations__
{'text': <class 'str'>, 'max_len': 'int > 0', 'return': <class 'str'>}
The item with key 'return'
holds the return value annotation marked with ->
in the function declaration in Example 8-31.
As far as I know, the only part of the Python standard library that uses function annotations
for any purpose is the @functools.singledispatch
decorator, covered in “Single Dispatch Generic Functions” in the next chapter.
Class-level __annotations__
are extensively used in dataclasses
, typing.NamedTuple
, as seen in Chapter 5. But these packages don’t deal with function or method annotations at all.
The inspect.signature()
function knows how to extract the annotations, as Example 8-32 shows.
>>>
from
clip_annot
import
clip
>>>
from
inspect
import
signature
>>>
sig
=
signature
(
clip
)
>>>
sig
.
return_annotation
<class 'str'>
>>>
for
param
in
sig
.
parameters
.
values
():
...
note
=
repr
(
param
.
annotation
)
.
ljust
(
13
)
...
(
note
,
':'
,
param
.
name
,
'='
,
param
.
default
)
<class 'str'> : text = <class 'inspect._empty'>
'int > 0' : max_len = 80
The signature
function returns a Signature
object, which has a return_annotation
attribute and a parameters
dictionary mapping parameter names to Parameter
objects. Each Parameter
object has its own annotation
attribute. That’s how Example 8-32 works.
FastAPI, a modern Web framework, supports annotations to automate request processing.
For example, a price
argument annotated as price: float
is automatically converted
from a string in the request to the float
expected by the function.
PEP 484 is the biggest change in the history of Python since the unification of types and classes in Python 2.2, which happened in 2001.
Although its use is optional in theory, in some contexts it is becoming mandatory.
We started with a brief introduction to the concept of gradual typing and then switched to a hands-on approach. It’s hard to see how gradual typing works without a tool that actually reads the type hints, so we developed an annotated function guided by Mypy error reports. That section ended with another practical matter: how to annotate code that must run under Python 2.7 and 3.x.
Back to the theory gradual typing, we explored how it is a hybrid of Python’s traditional duck typing and the nominal typing more familiar to users of Java, C++ and other statically typed languages.
Most of the chapter was devoted to presenting the major groups of types used in annotations.
Many of the types we covered are related to familiar Python object types, such as collections, tuples, and
callables—extended to support generic notation like Sequence[float]
.
Many of those types are temporary surrogates implemented in the typing
module
before the standard types were changed to support generics in Python 3.9.
Some of the types are special entities.
Any
, Optional
, Union
, and NoReturn
have nothing to do with actual objects in memory,
but exist only in the abstract domain of the type system.
We studied parameterized generics and type variables, which bring more flexibility to type hints without sacrificing type safety.
Parameterized generics become even more expressive with the use of Protocol
.
Because it appeared only in Python 3.8, Protocol
is not widely used yet—but it is hugely important.
Protocol
enables static duck typing:
the essential bridge between Python’s duck typed core and
the nominal typing that allows type checkers to catch bugs.
While covering some of these types we experimented with Mypy to see type checking errors and
inferred types with the help of Mypy’s magic reveal_type()
function.
The next couple of sections covered overloaded function signatures and how to annotate positional-only and variadic parameters.
Finally, we saw how type hints can be found at runtime in the __annotations__
attribute of functions.
That’s just one of a rich set of attributes that can be read with the help of the inspect
module,
which includes the Signature.bind
method to apply the flexible rules that Python uses to bind actual arguments to declared parameters.
The unification of types and classes in 2001 was a major change that benefited every Python user. It made the language more powerful and easier to use. Type hints mostly benefit one category of users: professional software developers. That is certainly a very important category, but Python’s greatest strength is the diversity of its user base. Students, journalists, makers, artists, traders, activits, and researchers in every field are some of the user groups for whom the added complexity of the type system may not bring enough value to be justified.
Fortunately, type hints are an optional feature. Let us keep Python accessible to the widest user base and stop preaching that all Python code should have type hints—as I’ve seen in public sermons by typing evangelists.
Bernat Gabor wrote in his excellent post The state of type hints in Python:
Type hints should be used whenever unit tests are worth writing.
I am a big fan of testing, but I also do a lot exploratory coding. When I am exploring, tests and type hints are not helpful. They are a drag.
Our BDFL emeritus led this push towards type hints in Python, so it’s only fair that this chapter starts and ends with his words:
I wouldn’t like a version of Python where I was morally obligated to add type hints all the time. I really do think that type hints have their place but there are also plenty of times that it’s not worth it, and it’s so wonderful that you can choose to use them.28
Guido van Rossum
The best introductions to Python’s type hints that I found were Bernat Gabor’s The state of type hints in Python—which I just quoted—and Geir Arne Hjelle’s Python Type Checking (Guide). Hypermodern Python Chapter 4: Typing by Claudio Jolowicz is a shorter introduction that also covers runtime type checking validation.
For deeper coverage, the Mypy documentation is the best source. It is valuable regardless of the type checker you are using, because it has tutorial and reference pages about Python typing in general—not just about the Mypy tool itself. There you will also find super useful cheat sheets for type hints in Python 3 and Python 2.
The typing
module documentation is a good quick reference,
but it doesn’t go into much detail.
The ultimate references are the PEP documents related to typing. There are 17 of them as of May 2020.
PEPs are written by core for core developers,
so they are usually not light reading and assume a lot of prior knowledge from the reader.
Table 8-3 lists the typing PEPs in chronological order, with links on the titles.
Awesome Python Typing is a a good collection of links to tools and references.
PEP 3107—Function Annotations and
PEP 563—Postponed Evaluation of Annotations
cover all about the __annotations__
attributes.
PEP 362—Function Signature Object
is worth reading if you intend to use the inspect
module that implements that feature.
PEP | Title | Python | Year |
---|---|---|---|
484* |
3.5 |
2014 |
|
483* |
n/a |
2014 |
|
482 |
n/a |
2015 |
|
526* |
3.6 |
2016 |
|
544* |
3.8 |
2017 |
|
557 |
3.7 |
2017 |
|
560 |
3.7 |
2017 |
|
561 |
3.7 |
2017 |
|
563 |
3.7 |
2017 |
|
586* |
3.8 |
2018 |
|
585 |
3.9 |
2019 |
|
589* |
TypedDict: Type Hints for Dictionaries with a Fixed Set of Keys |
3.8 |
2019 |
591* |
3.8 |
2019 |
|
593 |
3.9 |
2019 |
|
604 |
3.9 |
2019 |
|
612 |
3.9 |
2019 |
|
613 |
3.10 |
2020 |
1 From YouTube video of A Language Creators’ Conversation: Guido van Rossum, James Gosling, Larry Wall & Anders Hejlsberg, streamed live on April 2, 2019. Quote starts at 1:32:05, edited for brevity. Full transcript available at https://github.com/standupdev/language-creators.
2 For example, recursive types are not supported in Python as of May 2020—see typing
module issue #182 Define a JSON type and Mypy issue #731 Support recursive types
3 A just-in-time compiler like the one in Pypy has much better data than type hints: it monitors the Python program as it runs, and determines the actual types used at runtime, generating optimized machine code on the fly.
4 I am using Mypy 0.770, the most recent release as I write this on April 28, 2020. The Mypy Introduction warns it “is officially beta software. There will be occasional changes that break backward compatibility.” Therefore, you may get different results than I did.
5 The pytest
I’m using is version 5.4.1
6 Python doesn’t provide syntax to control the set of possible values for a type—except in Enum
types. For example, using type hints you can’t define Quantity
as an integer between 1 and 1000, or AirportCode
as a 3-letter combination. NumPy offers uint8
, int16
and other machine-oriented numeric types, but in the Python standard library we only have types with very small sets of values (NoneType
, bool
) or extremely large sets (float
, int
, str
, all possible tuples etc.).
7 Duck typing is a weaker form of structural typing, which Python 3.8 also supports with the introducion of typing.Protocol
. This is covered later in this chapter—in “Protocols”—with more details in [Link to Come].
8 Sorry about the silly example. Inheritance is often overused and hard to justify in examples that are realistic yet simple, so please accept this animal example as a quick illustration of subtyping.
9 MIT Professor, programming language designer, and Turing Award recipient. Wikipedia: Barbara Liskov.
10 To be more precise, ord
only accepts str
or bytes
with len(s) == 1
. But the type system currently can’t express this constraint.
11 In ABC—the language that most influenced Python in its roots—each list was constrained to accept values of a single type: the type of the first item you put into it.
12 An even deeper problem is how to typecheck integer ranges to prevent OverflowError
at runtime when adding elements to arrays. For example, an array
with typecode='B'
can only hold int
values from 0 to 255. Currently, Python’s static type system is not up to this challenge.
13 I will use :=
when it makes sense in examples, but I don’t cover it in the book. Please see PEP 572—Assignment Expressions for all the gory details.
14 As of May 2020, Pytype allows it. But its FAQ says it will be disallowed in the future. See question “Why didn’t pytype catch that I changed the type of an annotated variable?” in the Pytype FAQ.
15 I prefer to use the lxml package to generate and parse XML: it’s easy to get started, full-featured, and fast. Three friends of mine contributed to it: Martijn Faassen, Paul Everitt, and Sidnei da Silva. Unfortunately, lxml and Python’s own ElementTree don’t fit the limited RAM of my hypothetical microcontroller.
16 The Mypy documentation discusses this in its Common issues and solutions page, section Types of empty collections.
17 It’s hard to give a more precise return type hint for json.loads()
. Brett Cannon, Guido van Rossum, and others have been discussing this since 2016 in Mypy issue #182: Define a JSON type.
18 Actually, dict
is a virtual subclass of abc.MutableMapping
. The concept of a virtual subclass is explained in [Link to Come]. For now, know that issubclass(dict, abc.MutableMapping)
is True
, despite the fact that dict
is implemented in C and does not inherit anything from abc.MutableMapping
, but only from object
.
19 See Mypy issue int is not a Number?
20 The implementation here is simpler than the one in the Python standard library statistics
module.
21 I contributed this solution to typeshed, and that’s how mode
is annotated on statistics.pyi
as of May 26, 2020.
22 How wonderful it is to open an interactive console and rely on duck typing to explore language features like I just did. I badly miss this kind of exploration when I learn new languages that don’t support it.
23 I don’t know who invented the term static duck typing, but it became more popular with the success of the Go language, which has a feature very similar to typing.Protocol
. In Go, they call them “interfaces”, but they are much closer to Python’s protocols than to Java’s interfaces.
24 REPL stands for read-eval-print-loop, the common code pattern in interactive interpreters.
25 We’ll see cases where self
is annotated in [Link to Come], [Link to Come]
26 As special case for __init__
, if at least one parameter has a type hint, Mypy does not complain about the missing return type, by default. But if you forget this rule, and __init__
is completely untyped, then it will not be type checked.
27 In the “Soapbox” I discuss the downside of typing, exemplified by the max
function with 6 overloads.
28 From YouTube video of Type Hints by Guido van Rossum (March 2015), Quote starts at 13’40”. I did some light editing for clarity.
18.222.20.20