Python takes a “batteries included” approach to the standard library. Many other languages ship with a small number of common packages and require you to look elsewhere for important functionality. Although Python also has an impressive repository of community-built modules, it strives to provide, in its default installation, the most important modules for common uses of the language.
The full set of standard modules is too large to cover in this book. But some of these built-in packages are so closely intertwined with idiomatic Python that they may as well be part of the language specification. These essential built-in modules are especially important when writing the intricate, error-prone parts of programs.
Python has special syntax for decorators that can be applied to functions. Decorators have the ability to run additional code before and after any calls to the functions they wrap. This allows them to access and modify input arguments and return values. This functionality can be useful for enforcing semantics, debugging, registering functions, and more.
For example, say you want to print the arguments and return value of a function call. This is especially helpful when debugging a stack of function calls from a recursive function. Here, I define such a decorator:
def trace(func):
def wrapper(*args, **kwargs):
result = func(*args, **kwargs)
print('%s(%r, %r) -> %r' %
(func.__name__, args, kwargs, result))
return result
return wrapper
I can apply this to a function using the @
symbol.
@trace
def fibonacci(n):
"""Return the n-th Fibonacci number"""
if n in (0, 1):
return n
return (fibonacci(n - 2) + fibonacci(n - 1))
The @
symbol is equivalent to calling the decorator on the function it wraps and assigning the return value to the original name in the same scope.
fibonacci = trace(fibonacci)
Calling this decorated function will run the wrapper
code before and after fibonacci
runs, printing the arguments and return value at each level in the recursive stack.
fibonacci(3)
>>>
fibonacci((1,), {}) -> 1
fibonacci((0,), {}) -> 0
fibonacci((1,), {}) -> 1
fibonacci((2,), {}) -> 1
fibonacci((3,), {}) -> 2
This works well, but it has an unintended side effect. The value returned by the decorator—the function that’s called above—doesn’t think it’s named fibonacci
.
print(fibonacci)
>>>
<function trace.<locals>.wrapper at 0x107f7ed08>
The cause of this isn’t hard to see. The trace
function returns the wrapper
it defines. The wrapper
function is what’s assigned to the fibonacci
name in the containing module because of the decorator. This behavior is problematic because it undermines tools that do introspection, such as debuggers (see Item 57: “Consider Interactive Debugging with pdb
”) and object serializers (see Item 44: “Make pickle
Reliable with copyreg
”).
For example, the help
built-in function is useless on the decorated fibonacci
function.
help(fibonacci)
>>>
Help on function wrapper in module __main__:
wrapper(*args, **kwargs)
The solution is to use the wraps
helper function from the functools
built-in module. This is a decorator that helps you write decorators. Applying it to the wrapper
function will copy all of the important metadata about the inner function to the outer function.
def trace(func):
@wraps(func)
def wrapper(*args, **kwargs):
# ...
return wrapper
@trace
def fibonacci(n):
# ...
Now, running the help
function produces the expected result, even though the function is decorated.
help(fibonacci)
>>>
Help on function fibonacci in module __main__:
fibonacci(n)
Return the n-th Fibonacci number
Calling help
is just one example of how decorators can subtly cause problems. Python functions have many other standard attributes (e.g., __name__
, __module__
) that must be preserved to maintain the interface of functions in the language. Using wraps
ensures that you’ll always get the correct behavior.
Decorators are Python syntax for allowing one function to modify another function at runtime.
Using decorators can cause strange behaviors in tools that do introspection, such as debuggers.
Use the wraps
decorator from the functools
built-in module when you define your own decorators to avoid any issues.
The with
statement in Python is used to indicate when code is running in a special context. For example, mutual exclusion locks (see Item 38: “Use Lock
to Prevent Data Races in Threads”) can be used in with
statements to indicate that the indented code only runs while the lock is held.
lock = Lock()
with lock:
print('Lock is held')
The example above is equivalent to this try
/finally
construction because the Lock
class properly enables the with
statement.
lock.acquire()
try:
print('Lock is held')
finally:
lock.release()
The with
statement version of this is better because it eliminates the need to write the repetitive code of the try
/finally
construction. It’s easy to make your objects and functions capable of use in with
statements by using the contextlib
built-in module. This module contains the contextmanager
decorator, which lets a simple function be used in with
statements. This is much easier than defining a new class with the special methods __enter__
and __exit__
(the standard way).
For example, say you want a region of your code to have more debug logging sometimes. Here, I define a function that does logging at two severity levels:
def my_function():
logging.debug('Some debug data')
logging.error('Error log here')
logging.debug('More debug data')
The default log level for my program is WARNING
, so only the error message will print to screen when I run the function.
my_function()
>>>
Error log here
I can elevate the log level of this function temporarily by defining a context manager. This helper function boosts the logging severity level before running the code in the with
block and reduces the logging severity level afterward.
@contextmanager
def debug_logging(level):
logger = logging.getLogger()
old_level = logger.getEffectiveLevel()
logger.setLevel(level)
try:
yield
finally:
logger.setLevel(old_level)
The yield
expression is the point at which the with
block’s contents will execute. Any exceptions that happen in the with
block will be re-raised by the yield
expression for you to catch in the helper function (see Item 40: “Consider Coroutines to Run Many Functions Concurrently” for an explanation of how that works).
Now, I can call the same logging function again, but in the debug_logging
context. This time, all of the debug messages are printed to the screen during the with
block. The same function running outside the with
block won’t print debug messages.
with debug_logging(logging.DEBUG):
print('Inside:')
my_function()
print('After:')
my_function()
>>>
Inside:
Some debug data
Error log here
More debug data
After:
Error log here
The context manager passed to a with
statement may also return an object. This object is assigned to a local variable in the as
part of the compound statement. This gives the code running in the with
block the ability to directly interact with its context.
For example, say you want to write a file and ensure that it’s always closed correctly. You can do this by passing open
to the with
statement. open
returns a file handle for the as
target of with
and will close the handle when the with
block exits.
with open('/tmp/my_output.txt', 'w') as handle:
handle.write('This is some data!')
This approach is preferable to manually opening and closing the file handle every time. It gives you confidence that the file is eventually closed when execution leaves the with
statement. It also encourages you to reduce the amount of code that executes while the file handle is open, which is good practice in general.
To enable your own functions to supply values for as
targets, all you need to do is yield
a value from your context manager. For example, here I define a context manager to fetch a Logger
instance, set its level, and then yield
it for the as
target.
@contextmanager
def log_level(level, name):
logger = logging.getLogger(name)
old_level = logger.getEffectiveLevel()
logger.setLevel(level)
try:
yield logger
finally:
logger.setLevel(old_level)
Calling logging methods like debug
on the as
target will produce output because the logging severity level is set low enough in the with
block. Using the logging
module directly won’t print anything because the default logging severity level for the default program logger is WARNING
.
with log_level(logging.DEBUG, 'my-log') as logger:
logger.debug('This is my message!')
logging.debug('This will not print')
>>>
This is my message!
After the with
statement exits, calling debug logging methods on the Logger
named 'my-log'
will not print anything because the default logging severity level has been restored. Error log messages will always print.
logger = logging.getLogger('my-log')
logger.debug('Debug will not print')
logger.error('Error will print')
>>>
Error will print
The with
statement allows you to reuse logic from try
/finally
blocks and reduce visual noise.
The contextlib
built-in module provides a contextmanager
decorator that makes it easy to use your own functions in with
statements.
The value yielded by context managers is supplied to the as
part of the with
statement. It’s useful for letting your code directly access the cause of the special context.
The pickle
built-in module can serialize Python objects into a stream of bytes and deserialize bytes back into objects. Pickled byte streams shouldn’t be used to communicate between untrusted parties. The purpose of pickle
is to let you pass Python objects between programs that you control over binary channels.
Note
The pickle
module’s serialization format is unsafe by design. The serialized data contains what is essentially a program that describes how to reconstruct the original Python object. This means a malicious pickle
payload could be used to compromise any part of the Python program that attempts to deserialize it.
In contrast, the json
module is safe by design. Serialized JSON data contains a simple description of an object hierarchy. Deserializing JSON data does not expose a Python program to any additional risk. Formats like JSON should be used for communication between programs or people that don’t trust each other.
For example, say you want to use a Python object to represent the state of a player’s progress in a game. The game state includes the level the player is on and the number of lives he or she has remaining.
class GameState(object):
def __init__(self):
self.level = 0
self.lives = 4
The program modifies this object as the game runs.
state = GameState()
state.level += 1 # Player beat a level
state.lives -= 1 # Player had to try again
When the user quits playing, the program can save the state of the game to a file so it can be resumed at a later time. The pickle
module makes it easy to do this. Here, I dump
the GameState
object directly to a file:
state_path = '/tmp/game_state.bin'
with open(state_path, 'wb') as f:
pickle.dump(state, f)
Later, I can load
the file and get back the GameState
object as if it had never been serialized.
with open(state_path, 'rb') as f:
state_after = pickle.load(f)
print(state_after.__dict__)
>>>
{'lives': 3, 'level': 1}
The problem with this approach is what happens as the game’s features expand over time. Imagine you want the player to earn points towards a high score. To track the player’s points, you’d add a new field to the GameState
class.
class GameState(object):
def __init__(self):
# ...
self.points = 0
Serializing the new version of the GameState
class using pickle
will work exactly as before. Here, I simulate the round-trip through a file by serializing to a string with dumps
and back to an object with loads
:
state = GameState()
serialized = pickle.dumps(state)
state_after = pickle.loads(serialized)
print(state_after.__dict__)
>>>
{'lives': 4, 'level': 0, 'points': 0}
But what happens to older saved GameState
objects that the user may want to resume? Here, I unpickle an old game file using a program with the new definition of the GameState
class:
with open(state_path, 'rb') as f:
state_after = pickle.load(f)
print(state_after.__dict__)
>>>
{'lives': 3, 'level': 1}
The points
attribute is missing! This is especially confusing because the returned object is an instance of the new GameState
class.
assert isinstance(state_after, GameState)
This behavior is a byproduct of the way the pickle
module works. Its primary use case is making it easy to serialize objects. As soon as your use of pickle
expands beyond trivial usage, the module’s functionality starts to break down in surprising ways.
Fixing these problems is straightforward using the copyreg
built-in module. The copyreg
module lets you register the functions responsible for serializing Python objects, allowing you to control the behavior of pickle
and make it more reliable.
In the simplest case, you can use a constructor with default arguments (see Item 19: “Provide Optional Behavior with Keyword Arguments”) to ensure that GameState
objects will always have all attributes after unpickling. Here, I redefine the constructor this way:
class GameState(object):
def __init__(self, level=0, lives=4, points=0):
self.level = level
self.lives = lives
self.points = points
To use this constructor for pickling, I define a helper function that takes a GameState
object and turns it into a tuple of parameters for the copyreg
module. The returned tuple contains the function to use for unpickling and the parameters to pass to the unpickling function.
def pickle_game_state(game_state):
kwargs = game_state.__dict__
return unpickle_game_state, (kwargs,)
Now, I need to define the unpickle_game_state
helper. This function takes serialized data and parameters from pickle_game_state
and returns the corresponding GameState
object. It’s a tiny wrapper around the constructor.
def unpickle_game_state(kwargs):
return GameState(**kwargs)
Now, I register these with the copyreg
built-in module.
copyreg.pickle(GameState, pickle_game_state)
Serializing and deserializing works as before.
state = GameState()
state.points += 1000
serialized = pickle.dumps(state)
state_after = pickle.loads(serialized)
print(state_after.__dict__)
>>>
{'lives': 4, 'level': 0, 'points': 1000}
With this registration done, now I can change the definition of GameState
to give the player a count of magic spells to use. This change is similar to when I added the points
field to GameState
.
class GameState(object):
def __init__(self, level=0, lives=4, points=0, magic=5):
# ...
But unlike before, deserializing an old GameState
object will result in valid game data instead of missing attributes. This works because unpickle_game_state
calls the GameState
constructor directly. The constructor’s keyword arguments have default values when parameters are missing. This causes old game state files to receive the default value for the new magic
field when they are deserialized.
state_after = pickle.loads(serialized)
print(state_after.__dict__)
>>>
{'level': 0, 'points': 1000, 'magic': 5, 'lives': 4}
Sometimes you’ll need to make backwards-incompatible changes to your Python objects by removing fields. This prevents the default argument approach to serialization from working.
For example, say you realize that a limited number of lives is a bad idea, and you want to remove the concept of lives from the game. Here, I redefine the GameState
to no longer have a lives field:
class GameState(object):
def __init__(self, level=0, points=0, magic=5):
# ...
The problem is that this breaks deserializing old game data. All fields from the old data, even ones removed from the class, will be passed to the GameState
constructor by the unpickle_game_state
function.
pickle.loads(serialized)
>>>
TypeError: __init__() got an unexpected keyword argument 'lives'
The solution is to add a version parameter to the functions supplied to copyreg
. New serialized data will have a version of 2
specified when pickling a new GameState
object.
def pickle_game_state(game_state):
kwargs = game_state.__dict__
kwargs['version'] = 2
return unpickle_game_state, (kwargs,)
Old versions of the data will not have a version
argument present, allowing you to manipulate the arguments passed to the GameState
constructor accordingly.
def unpickle_game_state(kwargs):
version = kwargs.pop('version', 1)
if version == 1:
kwargs.pop('lives')
return GameState(**kwargs)
Now, deserializing an old object works properly.
copyreg.pickle(GameState, pickle_game_state)
state_after = pickle.loads(serialized)
print(state_after.__dict__)
>>>
{'magic': 5, 'level': 0, 'points': 1000}
You can continue this approach to handle changes between future versions of the same class. Any logic you need to adapt an old version of the class to a new version of the class can go in the unpickle_game_state
function.
One other issue you may encounter with pickle
is breakage from renaming a class. Often over the life cycle of a program, you’ll refactor your code by renaming classes and moving them to other modules. Unfortunately, this will break the pickle
module unless you’re careful.
Here, I rename the GameState
class to BetterGameState
, removing the old class from the program entirely:
class BetterGameState(object):
def __init__(self, level=0, points=0, magic=5):
# ...
Attempting to deserialize an old GameState
object will now fail because the class can’t be found.
pickle.loads(serialized)
>>>
AttributeError: Can't get attribute 'GameState' on <module '__main__' from 'my_code.py'>
The cause of this exception is that the import path of the serialized object’s class is encoded in the pickled data.
print(serialized[:25])
>>>
b'x80x03c__main__
GameState
qx00)'
The solution is to use copyreg
again. You can specify a stable identifier for the function to use for unpickling an object. This allows you to transition pickled data to different classes with different names when it’s deserialized. It gives you a level of indirection.
copyreg.pickle(BetterGameState, pickle_game_state)
After using copyreg
, you can see that the import path to pickle_game_state
is encoded in the serialized data instead of BetterGameState
.
state = BetterGameState()
serialized = pickle.dumps(state)
print(serialized[:35])
>>>
b'x80x03c__main__
unpickle_game_state
qx00}'
The only gotcha is that you can’t change the path of the module in which the unpickle_game_state
function is present. Once you serialize data with a function, it must remain available on that import path for deserializing in the future.
The pickle
built-in module is only useful for serializing and deserializing objects between trusted programs.
The pickle
module may break down when used for more than trivial use cases.
Use the copyreg
built-in module with pickle
to add missing attribute values, allow versioning of classes, and provide stable import paths.
Coordinated Universal Time (UTC) is the standard, time-zone-independent representation of time. UTC works great for computers that represent time as seconds since the UNIX epoch. But UTC isn’t ideal for humans. Humans reference time relative to where they’re currently located. People say “noon” or “8 am” instead of “UTC 15:00 minus 7 hours.” If your program handles time, you’ll probably find yourself converting time between UTC and local clocks to make it easier for humans to understand.
Python provides two ways of accomplishing time zone conversions. The old way, using the time
built-in module, is disastrously error prone. The new way, using the datetime
built-in module, works great with some help from the community-built package named pytz
.
You should be acquainted with both time
and datetime
to thoroughly understand why datetime
is the best choice and time
should be avoided.
The localtime
function from the time
built-in module lets you convert a UNIX timestamp (seconds since the UNIX epoch in UTC) to a local time that matches the host computer’s time zone (Pacific Daylight Time, in my case).
from time import localtime, strftime
now = 1407694710
local_tuple = localtime(now)
time_format = '%Y-%m-%d %H:%M:%S'
time_str = strftime(time_format, local_tuple)
print(time_str)
>>>
2014-08-10 11:18:30
You’ll often need to go the other way as well, starting with user input in local time and converting it to UTC time. You can do this by using the strptime
function to parse the time string, then call mktime
to convert local time to a UNIX timestamp.
from time import mktime, strptime
time_tuple = strptime(time_str, time_format)
utc_now = mktime(time_tuple)
print(utc_now)
>>>
1407694710.0
How do you convert local time in one time zone to local time in another? For example, say you are taking a flight between San Francisco and New York, and want to know what time it will be in San Francisco once you’ve arrived in New York.
Directly manipulating the return values from the time
, localtime
, and strptime
functions to do time zone conversions is a bad idea. Time zones change all the time due to local laws. It’s too complicated to manage yourself, especially if you want to handle every global city for flight departure and arrival.
Many operating systems have configuration files that keep up with the time zone changes automatically. Python lets you use these time zones through the time
module. For example, here I parse the departure time from the San Francisco time zone of Pacific Daylight Time:
parse_format = '%Y-%m-%d %H:%M:%S %Z'
depart_sfo = '2014-05-01 15:45:16 PDT'
time_tuple = strptime(depart_sfo, parse_format)
time_str = strftime(time_format, time_tuple)
print(time_str)
>>>
2014-05-01 15:45:16
After seeing that PDT
works with the strptime
function, you might also assume that other time zones known to my computer will also work. Unfortunately, this isn’t the case. Instead, strptime
raises an exception when it sees Eastern Daylight Time (the time zone for New York).
arrival_nyc = '2014-05-01 23:33:24 EDT'
time_tuple = strptime(arrival_nyc, time_format)
>>>
ValueError: unconverted data remains: EDT
The problem here is the platform-dependent nature of the time
module. Its actual behavior is determined by how the underlying C functions work with the host operating system. This makes the functionality of the time
module unreliable in Python. The time
module fails to consistently work properly for multiple local times. Thus, you should avoid the time
module for this purpose. If you must use time
, only use it to convert between UTC and the host computer’s local time. For all other types of conversions, use the datetime
module.
The second option for representing times in Python is the datetime
class from the datetime
built-in module. Like the time
module, datetime
can be used to convert from the current time in UTC to local time.
Here, I take the present time in UTC and convert it to my computer’s local time (Pacific Daylight Time):
from datetime import datetime, timezone
now = datetime(2014, 8, 10, 18, 18, 30)
now_utc = now.replace(tzinfo=timezone.utc)
now_local = now_utc.astimezone()
print(now_local)
>>>
2014-08-10 11:18:30-07:00
The datetime
module can also easily convert a local time back to a UNIX timestamp in UTC.
time_str = '2014-08-10 11:18:30'
now = datetime.strptime(time_str, time_format)
time_tuple = now.timetuple()
utc_now = mktime(time_tuple)
print(utc_now)
>>>
1407694710.0
Unlike the time
module, the datetime
module has facilities for reliably converting from one local time to another local time. However, datetime
only provides the machinery for time zone operations with its tzinfo
class and related methods. What’s missing are the time zone definitions besides UTC.
Luckily, the Python community has addressed this gap with the pytz
module that’s available for download from the Python Package Index (https://pypi.python.org/pypi/pytz/). pytz
contains a full database of every time zone definition you might need.
To use pytz
effectively, you should always convert local times to UTC first. Perform any datetime
operations you need on the UTC values (such as offsetting). Then, convert to local times as a final step.
For example, here I convert an NYC flight arrival time to a UTC datetime
. Although some of these calls seem redundant, all of them are necessary when using pytz
.
arrival_nyc = '2014-05-01 23:33:24'
nyc_dt_naive = datetime.strptime(arrival_nyc, time_format)
eastern = pytz.timezone('US/Eastern')
nyc_dt = eastern.localize(nyc_dt_naive)
utc_dt = pytz.utc.normalize(nyc_dt.astimezone(pytz.utc))
print(utc_dt)
>>>
2014-05-02 03:33:24+00:00
Once I have a UTC datetime
, I can convert it to San Francisco local time.
pacific = pytz.timezone('US/Pacific')
sf_dt = pacific.normalize(utc_dt.astimezone(pacific))
print(sf_dt)
>>>
2014-05-01 20:33:24-07:00
Just as easily, I can convert it to the local time in Nepal.
nepal = pytz.timezone('Asia/Katmandu')
nepal_dt = nepal.normalize(utc_dt.astimezone(nepal))
print(nepal_dt)
>>>
2014-05-02 09:18:24+05:45
With datetime
and pytz
, these conversions are consistent across all environments regardless of what operating system the host computer is running.
Avoid using the time
module for translating between different time zones.
Use the datetime
built-in module along with the pytz
module to reliably convert between times in different time zones.
Always represent time in UTC and do conversions to local time as the final step before presentation.
When you’re implementing Python programs that handle a non-trivial amount of data, you’ll eventually see slowdowns caused by the algorithmic complexity of your code. This usually isn’t the result of Python’s speed as a language (see Item 41: “Consider concurrent.futures
for True Parallelism” if it is). The issue, more likely, is that you aren’t using the best algorithms and data structures for your problem.
Luckily, the Python standard library has many of the algorithms and data structures you’ll need to use built in. Besides speed, using these common algorithms and data structures can make your life easier. Some of the most valuable tools you may want to use are tricky to implement correctly. Avoiding reimplementation of common functionality will save you time and headaches.
The deque
class from the collections
module is a double-ended queue. It provides constant time operations for inserting or removing items from its beginning or end. This makes it ideal for first-in-first-out (FIFO) queues.
fifo = deque()
fifo.append(1) # Producer
x = fifo.popleft() # Consumer
The list
built-in type also contains an ordered sequence of items like a queue. You can insert or remove items from the end of a list in constant time. But inserting or removing items from the head of a list takes linear time, which is much slower than the constant time of a deque
.
Standard dictionaries are unordered. That means a dict
with the same keys and values can result in different orders of iteration. This behavior is a surprising byproduct of the way the dictionary’s fast hash table is implemented.
a = {}
a['foo'] = 1
a['bar'] = 2
# Randomly populate 'b' to cause hash conflicts
while True:
z = randint(99, 1013)
b = {}
for i in range(z):
b[i] = i
b['foo'] = 1
b['bar'] = 2
for i in range(z):
del b[i]
if str(b) != str(a):
break
print(a)
print(b)
print('Equal?', a == b)
>>>
{'foo': 1, 'bar': 2}
{'bar': 2, 'foo': 1}
Equal? True
The OrderedDict
class from the collections
module is a special type of dictionary that keeps track of the order in which its keys were inserted. Iterating the keys of an OrderedDict
has predictable behavior. This can vastly simplify testing and debugging by making all code deterministic.
a = OrderedDict()
a['foo'] = 1
a['bar'] = 2
b = OrderedDict()
b['foo'] = 'red'
b['bar'] = 'blue'
for value1, value2 in zip(a.values(), b.values()):
print(value1, value2)
>>>
1 red
2 blue
Dictionaries are useful for bookkeeping and tracking statistics. One problem with dictionaries is that you can’t assume any keys are already present. That makes it clumsy to do simple things like increment a counter stored in a dictionary.
stats = {}
key = 'my_counter'
if key not in stats:
stats[key] = 0
stats[key] += 1
The defaultdict
class from the collections
module simplifies this by automatically storing a default value when a key doesn’t exist. All you have to do is provide a function that will return the default value each time a key is missing. In this example, the int
built-in function returns 0
(see Item 23: “Accept Functions for Simple Interfaces Instead of Classes” for another example). Now, incrementing a counter is simple.
stats = defaultdict(int)
stats['my_counter'] += 1
Heaps are useful data structures for maintaining a priority queue. The heapq
module provides functions for creating heaps in standard list
types with functions like heappush
, heappop
, and nsmallest
.
Items of any priority can be inserted into the heap in any order.
a = []
heappush(a, 5)
heappush(a, 3)
heappush(a, 7)
heappush(a, 4)
Items are always removed by highest priority (lowest number) first.
print(heappop(a), heappop(a), heappop(a), heappop(a))
>>>
3 4 5 7
The resulting list
is easy to use outside of heapq
. Accessing the 0
index of the heap will always return the smallest item.
a = []
heappush(a, 5)
heappush(a, 3)
heappush(a, 7)
heappush(a, 4)
assert a[0] == nsmallest(1, a)[0] == 3
Calling the sort
method on the list
maintains the heap invariant.
print('Before:', a)
a.sort()
print('After: ', a)
>>>
Before: [3, 4, 7, 5]
After: [3, 4, 5, 7]
Each of these heapq
operations takes logarithmic time in proportion to the length of the list. Doing the same work with a standard Python list would scale linearly.
Searching for an item in a list
takes linear time proportional to its length when you call the index
method.
x = list(range(10**6))
i = x.index(991234)
The bisect
module’s functions, such as bisect_left
, provide an efficient binary search through a sequence of sorted items. The index it returns is the insertion point of the value into the sequence.
i = bisect_left(x, 991234)
The complexity of a binary search is logarithmic. That means using bisect
to search a list of 1 million items takes roughly the same amount of time as using index
to linearly search a list of 14 items. It’s way faster!
The itertools
built-in module contains a large number of functions that are useful for organizing and interacting with iterators (see Item 16: “Consider Generators Instead of Returning Lists” and Item 17: “Be Defensive When Iterating Over Arguments” for background). Not all of these are available in Python 2, but they can easily be built using simple recipes documented in the module. See help(itertools)
in an interactive Python session for more details.
The itertools
functions fall into three main categories:
Linking iterators together
• chain
: Combines multiple iterators into a single sequential iterator.
• cycle
: Repeats an iterator’s items forever.
• tee
: Splits a single iterator into multiple parallel iterators.
• zip_longest
: A variant of the zip
built-in function that works well with iterators of different lengths.
Filtering items from an iterator
• islice
: Slices an iterator by numerical indexes without copying.
• takewhile
: Returns items from an iterator while a predicate function returns True
.
• dropwhile
: Returns items from an iterator once the predicate function returns False
for the first time.
• filterfalse
: Returns all items from an iterator where a predicate function returns False
. The opposite of the filter
built-in function.
Combinations of items from iterators
• product
: Returns the Cartesian product of items from an iterator, which is a nice alternative to deeply nested list comprehensions.
• permutations
: Returns ordered permutations of length N with items from an iterator.
• combination
: Returns the unordered combinations of length N with unrepeated items from an iterator.
There are even more functions and recipes available in the itertools
module that I don’t mention here. Whenever you find yourself dealing with some tricky iteration code, it’s worth looking at the itertools
documentation again to see whether there’s anything there for you to use.
Use Python’s built-in modules for algorithms and data structures.
Don’t reimplement this functionality yourself. It’s hard to get right.
Python is an excellent language for writing code that interacts with numerical data. Python’s integer type can represent values of any practical size. Its double-precision floating point type complies with the IEEE 754 standard. The language also provides a standard complex number type for imaginary values. However, these aren’t enough for every situation.
For example, say you want to compute the amount to charge a customer for an international phone call. You know the time in minutes and seconds that the customer was on the phone (say, 3 minutes 42 seconds). You also have a set rate for the cost of calling Antarctica from the United States ($1.45/minute). What should the charge be?
With floating point math, the computed charge seems reasonable.
rate = 1.45
seconds = 3*60 + 42
cost = rate * seconds / 60
print(cost)
>>>
5.364999999999999
But rounding it to the nearest whole cent rounds down when you want it to round up to properly cover all costs incurred by the customer.
print(round(cost, 2))
>>>
5.36
Say you also want to support very short phone calls between places that are much cheaper to connect. Here, I compute the charge for a phone call that was 5 seconds long with a rate of $0.05/minute:
rate = 0.05
seconds = 5
cost = rate * seconds / 60
print(cost)
>>>
0.004166666666666667
The resulting float is so low that it rounds down to zero. This won’t do!
print(round(cost, 2))
>>>
0.0
The solution is to use the Decimal
class from the decimal
built-in module. The Decimal
class provides fixed point math of 28 decimal points by default. It can go even higher if required. This works around the precision issues in IEEE 754 floating point numbers. The class also gives you more control over rounding behaviors.
For example, redoing the Antarctica calculation with Decimal
results in an exact charge instead of an approximation.
rate = Decimal('1.45')
seconds = Decimal('222') # 3*60 + 42
cost = rate * seconds / Decimal('60')
print(cost)
>>>
5.365
The Decimal
class has a built-in function for rounding to exactly the decimal place you need with the rounding behavior you want.
rounded = cost.quantize(Decimal('0.01'), rounding=ROUND_UP)
print(rounded)
>>>
5.37
Using the quantize
method this way also properly handles the small usage case for short, cheap phone calls. Here, you can see the Decimal
cost is still less than 1 cent for the call:
rate = Decimal('0.05')
seconds = Decimal('5')
cost = rate * seconds / Decimal('60')
print(cost)
>>>
0.004166666666666666666666666667
But the quantize behavior ensures that this is rounded up to one whole cent.
rounded = cost.quantize(Decimal('0.01'), rounding=ROUND_UP)
print(rounded)
>>>
0.01
While Decimal
works great for fixed point numbers, it still has limitations in its precision (e.g., 1/3
will be an approximation). For representing rational numbers with no limit to precision, consider using the Fraction
class from the fractions
built-in module.
Python has built-in types and classes in modules that can represent practically every type of numerical value.
The Decimal
class is ideal for situations that require high precision and exact rounding behavior, such as computations of monetary values.
Python has a central repository of modules (https://pypi.python.org) for you to install and use in your programs. These modules are built and maintained by people like you: the Python community. When you find yourself facing an unfamiliar challenge, the Python Package Index (PyPI) is a great place to look for code that will get you closer to your goal.
To use the Package Index, you’ll need to use a command-line tool named pip
. pip
is installed by default in Python 3.4 and above (it’s also accessible with python -m pip
). For earlier versions, you can find instructions for installing pip
on the Python Packaging website (https://packaging.python.org).
Once installed, using pip
to install a new module is simple. For example, here I install the pytz
module that I used in another item in this chapter (see Item 45: “Use datetime
Instead of time
for Local Clocks”):
$ pip3 install pytz
Downloading/unpacking pytz
Downloading pytz-2014.4.tar.bz2 (159kB): 159kB downloaded
Running setup.py (...) egg_info for package pytz
Installing collected packages: pytz
Running setup.py install for pytz
Successfully installed pytz
Cleaning up...
In the example above, I used the pip3
command-line to install the Python 3 version of the package. The pip
command-line (without the 3
) is also available for installing packages for Python 2. The majority of popular packages are now available for either version of Python (see Item 1: “Know Which Version of Python You’re Using”). pip
can also be used with pyvenv
to track sets of packages to install for your projects (see Item 53: “Use Virtual Environments for Isolated and Reproducible Dependencies”).
Each module in the PyPI has its own software license. Most of the packages, especially the popular ones, have free or open source licenses (see http://opensource.org for details). In most cases, these licenses allow you to include a copy of the module with your program (when in doubt, talk to a lawyer).
The Python Package Index (PyPI) contains a wealth of common packages that are built and maintained by the Python community.
pip
is the command-line tool to use for installing packages from PyPI.
pip
is installed by default in Python 3.4 and above; you must install it yourself for older versions.
The majority of PyPI modules are free and open source software.
18.226.251.206