Python's concept of an extensible library gives us rich access to numerous computing resources. The language provides avenues to make even more resources available. This makes Python programs particularly strong at integrating components to create sophisticated composite processing. In this chapter, we'll address the fundamentals of creating complex applications: managing configuration files, logging, and a design pattern for scripts that permits automated testing.
These new recipes are based on recipes shown earlier. Specifically, in the Using argparse to get command-line input, Using cmd for creating command-line applications, and Using the OS environment settings recipes in Chapter 6, User Inputs and Outputs, some specific techniques for creating top-level (main) application scripts were shown. In Chapter 10, Input/Output, Physical Format, and Logical Layout, we looked at filesystem input and output. In Chapter 12, Web Services, we looked at creating servers, which are the main applications that receive requests from clients.
All of these examples show some aspects of application programming in Python. There are some additional techniques that are helpful, such as processing configuration from files. In the Using argparse to get command-line input recipe in Chapter 6, User Inputs and Outputs, we showed techniques for parsing command-line arguments. In the Using the OS environment settings recipe, we touched on other kinds of configuration details. In this chapter, we'll look at a number of ways to handle configuration files. There are many file formats that can be used to store long-term configuration information:
configparser
module.compile()
and exec()
functions. We'll look at this in the Using Python for configuration files recipe.import
statement. We'll look at this in the Using class-as-namespace for configuration recipe.This chapter will extend some of the concepts from Chapter 7, Basics of Classes and Objects, and Chapter 8, More Advanced Class Design, and apply the idea of the command design pattern to Python programs.
In this chapter, we'll look at the following recipes:
class-as-namespace
for configuration valuesWe'll start with a recipe for handling multiple configuration files that must be combined. This gives users some helpful flexibility. From there, we can dive into the specifics of various common configuration file formats.
Many applications will have a hierarchy of configuration options. The foundation of the hierarchy is often the default values built into a particular release. These might be supplemented by server-wide (or cluster-wide) values from centralized configuration files. There might be user-specific files, or perhaps even configuration files provided when starting a program.
In many cases, configuration parameters are written in text files, so they are persistent and easy to change. The common tradition in Linux is to put system-wide configuration in the /etc
directory. A user's personal changes would be in their home directory, often named ~username
or $HOME
.
In this recipe, we'll see how an application can support a rich hierarchy of locations for configuration files.
The example we'll use is a web service that provides hands of cards to users. The service is shown in several recipes throughout Chapter 12, Web Services. We'll gloss over some of the details of the service so we can focus on fetching configuration parameters from a variety of filesystem locations.
We'll follow the design pattern of the Bash shell, which looks for configuration files in the following places:
/etc/profile
file.~/.bash_profile
~/.bash_login
~/.profile
In a POSIX-compliant operating system, the shell expands the ~
to be the home directory for the logged-in user. This is defined as the value of the HOME
environment variable. In general, the Python pathlib
module can handle this automatically via the Path.home()
method. This technique applies to Windows and Linux derivatives, as well as macOS.
The design pattern from the Bash shell can use a number of separate files. When we include defaults that are part of the release, application-wide settings as part of an installation, and personal settings, we can consider three levels of configuration. This can be handled elegantly with a mapping and the ChainMap
class from the collections
module.
In later recipes, we'll look at ways to parse and process specific formats of configuration files. For the purposes of this recipe, we won't pick a specific format. Instead, we'll assume that a function, load_config_file()
, has been defined that will load a specific configuration mapping from the contents of the file. The function looks like this:
def load_config_file(config_path: Path) -> Dict[str, Any]:
"""Loads a configuration mapping object with the contents
of a given file.
:param config_path: Path to be read.
:returns: mapping with configuration parameter value
"""
# Details omitted.
We'll look at a number of different ways to implement this function.
There's a side topic that sometimes arises when discussing this kind of design—Why have so many choices? Why not specify exactly two places?
The answer depends on the context for the application. When creating entirely new software, it may be possible to limit the choices to exactly two locations. However, when replacing legacy applications, it's common to have a new location that's better in some ways than the legacy location. This often means the legacy location still needs to be supported. After several such evolutionary changes, it's common to see a number of alternative locations for files.
Also, because of variations among Linux distributions, it's common to see variations that are typical for one distribution, but atypical for another. And, of course, when dealing with Windows, there will be variant file paths that are unique to that platform.
We'll make use of the pathlib
module to provide a handy way to work with files in various locations. We'll also use the collections
module to provide the very useful ChainMap
class:
Path
class and the collections
module. There are several type hints that are also required:
from pathlib import Path
import collections
from typing import TextIO, Dict, Any, ChainMap
def get_config() -> ChainMap[str, Any]:
system_path = Path("/etc") / "profile"
local_paths = [
Path.home() / ".bash_profile",
Path.home() / ".bash_login",
Path.home() / ".profile",
]
configuration_items = [
dict(
some_setting="Default Value",
another_setting="Another Default",
some_option="Built-In Choice",
)
]
list
; this becomes the final ChainMap
configuration mapping. We'll assemble the list
of maps by appending items, and then reverse the order after the files are loaded so that the last loaded file becomes the first in the map
.if system_path.exists():
configuration_items.append(
load_config_file(system_path))
break
statement to stop after the first file is found:
for config_path in local_paths:
if config_path.exists():
configuration_items.append(
load_config_file(config_path))
break
list
and create the final ChainMap
. The list
needs to be reversed so that the local file is searched first, then the system settings, and finally the application default settings:
configuration = collections.ChainMap(
*reversed(configuration_items))
return configuration
Once we've built the configuration
object, we can use the final configuration like a simple mapping. This object supports all of the expected dictionary operations.
One of the most elegant features of any object-oriented language is being able to create collections of objects. In this case, one of these collections of objects includes filesystem Path
objects.
As noted in the Using pathlib to work with file names recipe in Chapter 10, Input/Output, Physical Format, and Logical Layout, the Path
object has a resolve()
method that can return a concrete Path
built from a pure Path
. In this recipe, we used the exists()
method to determine if a concrete path could be built. The open()
method, when used to read a file, will resolve the pure Path
and open the associated file.
In the Creating dictionaries – inserting and updating recipe in Chapter 4, Built-In Data Structures Part 1: Lists and Sets, we looked at the basics of using a dictionary. Here, we've combined several dictionaries into a chain. When a key
is not located in the first dictionary of the chain, then later dictionaries in the chain are checked. This is a handy way to provide default values for each key
in the mapping.
Here's an example of creating a ChainMap
manually:
>>> import collections
>>> config = collections.ChainMap(
... {'another_setting': 2},
... {'some_setting': 1},
... {'some_setting': 'Default Value',
... 'another_setting': 'Another Default',
... 'some_option': 'Built-In Choice'})
The config
object is built from three separate mappings. The first might be details from a local file such as ~/.bash_login
. The second might be system-wide settings from the /etc/profile
file. The third contains application-wide defaults.
Here's what we see when we query this object's values:
>>> config['another_setting']
2
>>> config['some_setting']
1
>>> config['some_option']
'Built-In Choice'
The value for any given key
is taken from the first instance of that key
in the chain of maps. This is a very simple way to have local values that override system-wide values that override the built-in defaults.
In the Mocking external resources recipe in Chapter 11, Testing, we looked at ways to mock external resources so that we could write a unit test that wouldn't accidentally delete files. A test for the code in this recipe needs to mock the filesystem resources by mocking the Path
class.
To work with pytest
test cases, it helps to consolidate the Path
operations into a fixture that can be used to test the get_config()
function:
from pathlib import Path
from pytest import * # type: ignore
from unittest.mock import Mock, patch, mock_open, MagicMock, call
import Chapter_13.ch13_r01
@fixture # type: ignore
def mock_path(monkeypatch, tmpdir):
mocked_class = Mock(
wraps=Path,
return_value=Path(tmpdir / "etc"),
home=Mock(return_value=Path(tmpdir / "home")),
)
monkeypatch.setattr(
Chapter_13.ch13_r01, "Path", mocked_class)
(tmpdir / "etc").mkdir()
(tmpdir / "etc" / "profile").write_text(
"exists", encoding="utf-8")
(tmpdir / "home").mkdir()
(tmpdir / "home" / ".profile").write_text(
"exists", encoding="utf-8")
return mocked_class
This mock_path
fixture creates a module-like Mock
object that can be used instead of the Path
class. When the code under test uses Path()
it will always get the etc/profile
file created in the tmpdir
location. The home
attribute of this Mock
object makes sure that Path.home()
will provide a name that's part of the temporary directory created by tmpdir
. By pointing the Path
references to the temporary directory that's unique to the test, we can then load up this directory with any combination of files.
This fixture creates two directories, and a file in each directory. One file is tmpdir/etc/profile
. The other is tmpdir/home/.profile
. This allows us to check the algorithm for finding the system-wide profile as well as a user's local profile.
In addition to a fixture that sets up the files, we'll need one more fixture to mock the details of the load_config_file()
function, which loads one of the configuration files. This allows us to define multiple implementations, confident that the overall get_config()
function will work with any implementation that fills the contract of load_config_file()
.
The fixture looks like this:
@fixture # type: ignore
def mock_load_config(monkeypatch):
mocked_load_config_file = Mock(return_value={})
monkeypatch.setatt(
Chapter_13.ch13_r01,
"load_config_file",
mocked_load_config_file
)
return mocked_load_config_file
Here are some of the tests that will confirm that the path search works as advertised. Each test starts by applying two patches to create a modified context for testing the get_config()
function:
def test_get_config(mock_load_config, mock_path):
config = Chapter_13.ch13_r01.get_config()
assert mock_path.mock_calls == [
call("/etc"),
call.home(),
call.home(),
call.home(),
]
assert mock_load_config.mock_calls == [
call(mock_path.return_value / "profile"),
call(mock_path.home.return_value / ".profile"),
]
The two fixtures mock the Path
class and also mock the load_config_file()
function that the get_config()
function relies on. The assertion shows that several path requests were made, and two individual files were eventually loaded. This is the purpose behind this particular get_config()
function; it loads two of the files it finds. To be complete, of course, the test suite needs to have two more fixtures and two more tests to examine the other two locations for user-specific configuration files.
load_config_file()
function.pathlib
module can help with this processing. This module provides the Path
class definition, which provides a great deal of sophisticated information about the OS's files. For more information, see the Using pathlib to work with filenames recipe in Chapter 10, Input/Output, Physical Format, and Logical Layout.Python offers a variety of ways to package application inputs and configuration files. We'll look at writing files in YAML notation because this format is elegant and simple.
It can be helpful to represent configuration details in YAML notation.
Python doesn't have a YAML parser built in. We'll need to add the pyyaml
project to our library using the pip
package management system. Here's what the installation looks like:
(cookbook) slott@MacBookPro-SLott Modern-Python-Cookbook-Second-Edition % python -m pip install pyyaml
Collecting pyyaml
Downloading https://files.pythonhosted.org/packages/64/c2/b80047c7ac2478f9501676c988a5411ed5572f35d1beff9cae07d321512c/PyYAML-5.3.1.tar.gz (269kB)
|████████████████████████████████| 276kB 784kB/s
Building wheels for collected packages: pyyaml
Building wheel for pyyaml (setup.py) ... done
Created wheel for pyyaml: filename=PyYAML-5.3.1-cp38-cp38-macosx_10_9_x86_64.whl size=44624 sha256=7450b3cc947c2afd5d8191ebe35cb1c8cdd5e212e0478121cd49ce52c835ddaa
Stored in directory: /Users/slott/Library/Caches/pip/wheels/a7/c1/ea/cf5bd31012e735dc1dfea3131a2d5eae7978b251083d6247bd
Successfully built pyyaml
Installing collected packages: pyyaml
Successfully installed pyyaml-5.3.1
The elegance of the YAML syntax is that simple indentation is used to show the structure of the document. Here's an example of some settings that we might encode in YAML:
query:
mz:
- ANZ532
- AMZ117
- AMZ080
url:
scheme: http
netloc: forecast.weather.gov
path: /shmrn.php
description: >
Weather forecast for Offshore including the Bahamas
This document can be seen as a specification for a number of related URLs that are all similar to http://forecast.weather.gov/shmrn.php?mz=ANZ532
. The document contains information about building the URL from a scheme, net location, base path, and several query strings. The yaml.load()
function can load this YAML document; it will create the following Python structure:
{'description': 'Weather forecast for Offshore including the Bahamas
',
'query': {'mz': ['ANZ532', 'AMZ117', 'AMZ080']},
'url': {'netloc': 'forecast.weather.gov',
'path': 'shmrn.php',
'scheme': 'http'}}
This dict-of-dict
structure can be used by an application to tailor its operations. In this case, it specifies a sequence of URLs to be queried to assemble a larger weather briefing.
We'll often use the Finding configuration files recipe, shown earlier in this chapter, to check a variety of locations for a given configuration file. This flexibility is often essential for creating an application that's easy to use on a variety of platforms.
In this recipe, we'll build the missing part of the previous example, the load_config_file()
function. Here's the template that needs to be filled in:
def load_config_file(config_path: Path) -> Dict[str, Any]:
"""Loads a configuration mapping object with contents
of a given file.
:param config_path: Path to be read.
:returns: mapping with configuration parameter values
"""
# Details omitted.
In this recipe, we'll fill in the space held by the Details
omitted
line to load configuration files in YAML format.
This recipe will make use of the yaml
module to parse a YAML-format file. This will create a dictionary from the YAML-format source. This can be part of building a ChainMap
of configurations:
yaml
module along with the Path
definition and the type hints required by the load_config_file()
function definition:
from pathlib import Path
from typing import Dict, Any
import yaml
yaml.load()
function to load the YAML-syntax document:
def load_config_file(config_path: Path) -> Dict[str, Any]:
"""Loads a configuration mapping object with contents
of a given file.
:param config_path: Path to be read.
:returns: mapping with configuration parameter values
"""
with config_path.open() as config_file:
document = yaml.load(
config_file, Loader=yaml.SafeLoader)
return document
This function can be fit into the design from the Finding configuration files recipe to load a configuration file using YAML notation.
The YAML syntax rules are defined at http://yaml.org. The idea of YAML is to write JSON-like data structures in a more flexible, human-friendly syntax. JSON is a special case of the more general YAML syntax.
The trade-off here is that some spaces and line breaks in JSON don't matter—there is visible punctuation to show the structure of the document. In some of the YAML variants, line breaks and indentation determine the structure of the document; the use of white-space means that line breaks will matter with YAML documents.
The essential data structures available in JSON syntax are as follows:
key: value
, key: value
, ...}value
"3.1415926
true
, false
, and null
JSON syntax is one style of YAML; it's called a flow
style. In this style, the document structure is marked by explicit indicators. The syntax requires {…}
and […]
to show the structure.
The alternative that YAML offers is block
style. The document structure is defined by line breaks and indentation. Furthermore, string scalar values can use plain, quoted, and folded styles of syntax. Here is how the alternative YAML syntax works:
-
. This looks like a bullet list and is easy to read. When loaded, it will create a dictionary with a list
of strings in Python: {zoneid: ['ANZ532', 'AMZ117', 'AMZ080']}
. Here's an example:
zoneid:
- ANZ532
- AMZ117
- AMZ080
key: value
syntax to associate a key
with a simple scalar. We can use key:
on a line by itself; the value is indented on the following lines. This creates a nested dictionary that looks like this in Python: {'url': {'scheme': 'http', 'netloc': 'marine.weather.gov'}}
. Here's an example:
url:
scheme: http
netloc: marine.weather.gov
Some more advanced features of YAML will make use of this explicit separation between key and value:
|
prefix; the lines after this are preserved with all of the spacing and newlines intact. It also introduces the >
prefix, which preserves the words as a long string of text—any newlines are treated as single white-space characters. This is common for running text.string
, even though the YAML rules will interpret it as a number. Quotes, of course, can be helpful. To be even more explicit, a local tag of !!str
in front of the value will force a specific data type. !!str 22102
, for example, assures that the digits will be treated as a string object.There are a number of additional features in YAML that are not present in JSON:
#
and continue to the end of the line. They can go almost anywhere. JSON doesn't tolerate comments.---
line at the start of a new document. This allows a YAML file to contain a stream of separate documents....
line is the end of a document in a stream of documents.string
, number
, true
, false
, and null
. YAML allows mapping keys to be considerably more complex. We have to honor Python's restriction that keys must be immutable.Here are some examples of these features. In this first example, we'll look at a stream that contains two documents.
Here is a YAML file with two separate documents, something that JSON does not handle well:
>>> import yaml
>>> yaml_text = '''
... ---
... id: 1
... text: "Some Words."
... ---
... id: 2
... text: "Different Words."
... '''
>>> document_iterator = yaml.load_all(yaml_text)
>>> document_1 = next(document_iterator)
>>> document_1['id']
1
>>> document_2 = next(document_iterator)
>>> document_2['text']
'Different Words.'
The yaml_text
string is a stream with two YAML documents, each of which starts with ---
. The load_all()
function is an iterator that loads the documents one at a time. An application must iterate over the results of this to process each of the documents in the stream.
YAML provides a way to create complex objects for mapping keys. What's important is that Python requires a hashable, immutable object for a mapping key. This means that a complex key must be transformed into an immutable Python
object, often a tuple. In order to create a Python-specific object, we need to use a more complex local tag. Here's an example:
>>> mapping_text = '''
... ? !!python/tuple ["a", "b"]
... : "value"
... '''
>>> yaml.load(mapping_text, Loader=yaml.UnsafeLoader)
{('a', 'b'): 'value'}
This example uses ?
and :
to mark the key
and value
of a mapping. We've done this because the key
is a complex object. The key
value uses a local tag, !!python/tuple
, to create a tuple instead of the default, which would have been a list
. The text of the key
uses a flow-type YAML value, ["a", "b"]
.
Because this steps outside the default type mappings, we also have to use the special UnsafeLoader
. This is a way of acknowledging that a wide variety of Python objects can be created this way.
JSON has no provision for a set
collection. YAML allows us to use the !!set
tag to create a set
instead of a simple sequence. The items in the set
must be identified by a ?
prefix because they are considered keys of a mapping for which there are no values.
Note that the !!set
tag is at the same level of indentation as the values within the set
collection. It's indented inside the dictionary
key of data_values
:
>>> import yaml
>>> set_text = '''
... document:
... id: 3
... data_values:
... !!set
... ? some
... ? more
... ? words
... '''
>>> some_document = yaml.load(set_text, Loader=yaml.SafeLoader)
>>> some_document['document']['id']
3
>>> some_document['document']['data_values'] == {
... 'some', 'more', 'words'}
True
The !!set
local tag modifies the following sequence to become a set
object instead of the default list
object. The resulting set is equal to the expected Python set object, {'some', 'more', 'words'}
.
Items in a set
must be immutable objects. While the YAML syntax allows creating a set of mutable list objects, it's impossible to build the document in Python. A run-time error will reveal the problem when we try to collect mutable objects into a set
.
Python objects of almost any class can be described using YAML local tags. Any class with a simple __init__()
method can be built from a YAML serialization.
Here's a small class definition:
class Card:
def __init__(self, rank: int, suit: str) -> None:
self.rank = rank
self.suit = suit
def __repr__(self) -> str:
return f"{self.rank} {self.suit}"
We've defined a class with two positional attributes. Here's the YAML serialization of an instance of this class:
!!python/object/apply:Chapter_13.ch13_r02.Card
kwds:
rank: 7
suit:
We've used the kwds
key to provide two keyword-based argument values to the Card
constructor function. The Unicode character works well because YAML files are text written using UTF-8 encoding.
Python offers a variety of ways to package application inputs and configuration files. We'll look at writing files in Python notation because it's elegant and simple.
A number of packages use assignment statements in a separate module to provide configuration parameters. The Flask project, in particular, supports this. We looked at Flask in the Using the Flask framework for RESTful APIs recipe and a number of related recipes in Chapter 12, Web Services.
In this recipe, we'll look at how we can represent configuration details in Python notation.
Python assignment statements are particularly elegant. The syntax can be simple, easy to read, and extremely flexible. If we use assignment statements, we can import an application's configuration details from a separate module. This could have a name like settings.py
to show that it's focused on configuration parameters.
Because Python treats each imported module as a global Singleton
object, we can have several parts of an application all use the import settings
statement to get a consistent view of the current, global application configuration parameters.
For some applications, we might want to choose one of several alternative settings files. In this case, we want to load a file using a technique that's more flexible than the fixed import
statement.
We'd like to be able to provide definitions in a text file that look like this:
"""Weather forecast for Offshore including the Bahamas
"""
query = {'mz': ['ANZ532', 'AMZ117', 'AMZ080']}
url = {
'scheme': 'http',
'netloc': 'forecast.weather.gov',
'path': '/shmrn.php'
}
This is Python syntax. The parameters include two variables, query
and url
. The value of the query
variable is a dictionary with a single key, mz
, and a sequence of values.
This can be seen as a specification for a number of related URLs that are all similar to http://forecast.weather.gov/shmrn.php?mz=ANZ532.
We'll often use the Finding configuration files recipe to check a variety of locations for a given configuration file. This flexibility is often essential for creating an application that's easily used on a variety of platforms.
In this recipe, we'll build the missing part of the first recipe, the load_config_file()
function. Here's the template that needs to be filled in:
def load_config_file(config_path: Path) -> Dict[str, Any]:
"""Loads a configuration mapping object with contents
of a given file.
:param config_path: Path to be read.
:returns: mapping with configuration parameter values
"""
# Details omitted.
In this recipe, we'll fill in the space held by the Details
omitted
line to load configuration files in Python format.
We can make use of the pathlib
module to locate the files. We'll leverage the built-in compile()
and exec()
functions to process the code in the configuration file:
Path
definition and the type hints required by the load_config_file()
function definition:
from pathlib import Path
from typing import Dict, Any
compile()
function to compile the Python
module into an executable form. This function requires the source text as well as the filename from which the text was read. The filename is essential for creating trace-back messages that are useful and correct:
def load_config_file(config_path: Path) -> Dict[str, Any]:
code = compile(
config_path.read_text(),
config_path.name,
"exec")
In rare cases where the code doesn't come from a file, the general practice is to provide a name such as <string>
for the filename.
compile()
function. This requires two contexts. The global context provides any previously imported modules, plus the __builtins__
module. The local context is the locals
dictionary; this is where new variables will be created:
locals: Dict[str, Any] = {}
exec(code, {"__builtins__": __builtins__}, locals)
return locals
The details of the Python language–the syntax and semantics–are embodied in the built-in compile()
and exec()
functions. When we launch a Python application or script, the process is essentially this:
compile()
function to create a code object.exec()
function to execute the code object.The __pycache__
directory holds code objects, and saves the work of recompiling text files that haven't changed.
The exec()
function reflects the way Python handles global
and local
variables. There are two namespaces (mappings
) provided to this function. These are visible to a script that's running via the globals()
and locals()
functions.
When code is executed at the very top level of a script file—often inside the if __name__ == "__main__"
condition—it executes in the global context; the globals
and locals
variable collections are the same. When code is executed inside a function
, method
, or class
definition, the local variables for that context are separate from the global variables.
Here, we've created a separate locals
object. This makes sure the imported statements don't make unexpected changes to any other global variables.
We provided two distinct dictionaries:
__builtins__
module is often provided in this dictionary. In some cases, other modules like pathlib
should be added.settings
module.The locals
dictionary will be updated by the exec()
function. We don't expect the globals
to be updated and will ignore any changes that happen to this collection.
This recipe suggests a configuration file is entirely a sequence of name = value
assignments. The assignment statement is in Python syntax, as are the variable names and the literal syntax. This permits Python's large collection of built-in types.
Additionally, the full spectrum of Python statements is available. This leads to some engineering trade-offs.
Because any statement can be used in the configuration file, it can lead to complexity. If the processing in the configuration file becomes too complex, the file ceases to be configuration and becomes a first-class part of the application. Very complex features should be implemented by modifying the application programming, not hacking around with the configuration settings. Since Python applications include the full source, as it is generally easier to fix the source than create hyper-complex configuration files. The goal is for a configuration file to provide values to tailor operations, not provide plug-in functionality.
We might want to include the OS environment variables as part of the global variables used for configuration. This ensures that the configuration values match the current environment settings. This can be done with the os.environ
mapping.
It can also be sensible to do some processing simply to make a number of related settings easier to organize. For example, it can be helpful to write a configuration file with a number of related paths like this:
"""Config with related paths"""
if environ.get("APP_ENV", "production"):
base = Path('/var/app/')
else:
base = Path.cwd("var")
log = base/'log'
out = base/'out'
The values of log
and out
are used by the application. The value of base
is only used to ensure that the other two paths share a common parent directory.
This leads to the following variation on the load_config_file()
function shown earlier. This version includes some additional modules and global classes:
from pathlib import Path
import platform
import os
def load_config_file_xtra(config_path: Path) -> Dict[str, Any]:
def not_allowed(*arg, **kw) -> None:
raise RuntimeError("Operation not allowed")
code = compile(
config_path.read_text(),
config_path.name,
"exec")
safe_builtins = cast(Dict[str, Any], __builtins__).copy()
for name in ("eval", "exec", "compile", "__import__"):
safe_builtins[name] = not_allowed
globals = {
"__builtins__": __builtins__,
"Path": Path,
"platform": platform,
"environ": os.environ.copy()
}
locals: Dict[str, Any] = {}
exec(code, globals, locals)
return locals
Including Path
, platform
, and a copy of os.environ
in the globals means that a configuration file can be written without the overhead of import
statements. This can make the settings simpler to prepare and maintain.
We've also removed four built-in functions: eval()
, exec()
, compile()
, and __import__()
. This will reduce the number of things a Python-language configuration file is capable of doing. This involves some fooling around inside the __builtins__
collection. This module behaves like a dictionary, but the type is not simply Dict[str, Any]
. We've used the cast()
function to tell mypy
that the __builtins__.copy()
method will work even though it's not obviously part of the module's type.
Python offers a variety of ways to package application inputs and configuration files. We'll continue to look at writing files in Python notation because it's elegant and the familiar syntax can lead to easy-to-read configuration files.
A number of projects allow us to use a class definition to provide configuration parameters. The use of a class hierarchy means that inheritance techniques can be used to simplify the organization of parameters. The Flask
package, in particular, can do this. We looked at Flask in the Using the Flask framework for RESTful APIs recipe, and a number of related recipes.
In this recipe, we'll look at how we can represent configuration details in Python class notation.
Python notation for defining the attributes of a class can be simple, easy to read, and reasonably flexible. We can, with a little work, define a sophisticated configuration language that allows someone to change configuration parameters for a Python application quickly and reliably.
We can base this language on class definitions. This allows us to package a number of configuration alternatives in a single module. An application can load the module and pick the relevant class definition from the module.
We'd like to be able to provide definitions that look like this:
class Configuration:
"""
Generic Configuration
"""
url = {
"scheme": "http",
"netloc": "forecast.weather.gov",
"path": "/shmrn.php"}
query = {"mz": ["ANZ532"]}
We can create this class definition in a settings.py
file to create a settings
module. To use the configuration
, the main application could do this:
from settings import Configuration
The application will gather the settings using the fixed module name of settings
with a fixed class name of Configuration. We have two ways to add flexibility to using a module as a configuration file:
PYTHONPATH
environment variable to list a number of locations for configuration modulesThese techniques can be helpful because the configuration file locations follow Python's rules for finding modules. Rather than implementing our own search for the configuration, we can leverage Python's search of sys.path
.
In this recipe, we'll build the missing part of the previous example, the load_config_file()
function. Here's the template that needs to be filled in:
def load_config_file(
config_path: Path, classname: str = "Configuration"
) -> Dict[str, Any]:
"""Loads a configuration mapping object with contents
of a given file.
:param config_path: Path to be read.
:returns: mapping with configuration parameter values
"""
# Details omitted.
We've used a similar template in a number of recipes in this chapter. For this recipe, we've added a parameter to this definition. The classname
parameter is not present in previous recipes, but it is used here to select one of the many classes from a module at the location in the filesystem named by the config_path
parameter.
We can make use of the pathlib
module to locate the files. We'll leverage the built-in compile()
and exec()
functions to process the code in the configuration file. The result is not a dictionary, and isn't compatible with previous ChainMap
-based configurations:
Path
definition and the type hints required by the load_config_file()
function definition:
from pathlib import Path
import platform
from typing import Dict, Any, Type
ConfigClass = Type[object]
compile()
function to compile the Python
module into an executable form. This function requires the source text as well as a filename from which the text was read. The filename is essential for creating trace-back messages that are useful and correct:
def load_config_file(
config_path: Path, classname: str = "Configuration"
) -> ConfigClass:
code = compile(
config_path.read_text(),
config_path.name,
"exec")
compile()
method. We need to provide two contexts. The global context can provide the __builtins__
module, plus the Path
class and the platform
module. The local context is where new variables will be created:
globals = {
"__builtins__": __builtins__,
"Path": Path,
"platform": platform}
locals: Dict[str, ConfigClass] = {}
exec(code, globals, locals)
return locals[classname]
This locates the named class in the locals
mapping. This mapping will have all the local variables set when the module was executed; these local variables will include all class and function definitions in addition to assigned variables. The value of locals[classname]
will be the named class in the definitions created by the module that was executed.
The details of the Python language—syntax and semantics—are embodied in the compile()
and exec()
functions. The exec()
function reflects the way Python handles global and local variables. There are two namespaces provided to this function. The global namespace
instance includes __builtins__
plus a class and module that might be used in the file.
The local variable namespace will have the new class created in it. The local namespace has a __dict__
attribute that makes it accessible via dictionary methods. Because of this, we can then extract the class by name using locals[classname]
. The function returns the class
object for use throughout the application.
We can put any kind of object into the attributes of a class. Our example showed mapping objects. There's no limitation on what can be done when creating attributes at the class level.
We can have complex calculations within the class
statement. We can use this to create attributes that are derived from other attributes. We can execute any kind of statement, including if
statements and for
statements, to create attribute values.
We will not, however, ever create an instance of the class. Ordinary methods of the class will not be used. If a function-like definition is helpful, it would have to be decorated with @classmethod
to be useful.
Using a class definition means that we can leverage inheritance to organize the configuration values. We can easily create multiple subclasses of Configuration
, one of which will be selected for use in the application. The configuration might look like this:
class Configuration:
"""
Generic Configuration
"""
url = {
"scheme": "http",
"netloc": "forecast.weather.gov",
"path": "/shmrn.php"}
class Bahamas(Configuration):
"""
Weather forecast for Offshore including the Bahamas
"""
query = {"mz": ["AMZ117", "AMZ080"]}
class Chesapeake(Configuration):
"""
Weather for Chesapeake Bay
"""
query = {"mz": ["ANZ532"]}
This means that our application must choose an appropriate class from the available classes in the settings
module. We might use an OS environment variable or a command-line option to specify the class name to use. The idea is that our program can be executed like this:
python3 some_app.py -c settings.Chesapeake
This would locate the Chesapeake
class in the settings module. Processing would then be based on the details in that particular configuration class. This idea leads to an extension to the load_config_module()
function.
In order to pick one of the available classes, we'll provide an additional parameter with the class name:
import importlib
def load_config_module(name: str) -> ConfigClass:
module_name, _, class_name = name.rpartition(".")
settings_module = importlib.import_module(module_name)
result: ConfigClass = vars(settings_module)[class_name]
return result
Rather than manually compiling and executing the module, we've used the higher-level importlib
module. This module implements the import
statement semantics. The requested module is imported; compiled and executed; and the resulting module object is assigned to the variable named settings_module
.
We can then look inside the module's variables and pick out the class that was requested. The vars()
built-in function will extract the internal dictionary from a module, a class, or even the local variables.
Now we can use this function as follows:
>>> configuration = Chapter_13.ch13_r04.load_config_module(
... 'Chapter_13.settings.Chesapeake')
>>> configuration.__doc__.strip()
'Weather for Chesapeake Bay'
>>> configuration.query
{'mz': ['ANZ532']}
>>> configuration.url['netloc']
'forecast.weather.gov'
We've located the Chesapeake
configuration class in the settings
module and extracted the various settings the application needs from this class.
One consequence of using a class like this is the default display isn't very informative. When we try to print the configuration, it looks like this:
>>> print(configuration)
<class 'settings.Chesapeake'>
This isn't very helpful. It provides one nugget of information, but that's not nearly enough for debugging.
We can use the vars()
function to see more details. However, this shows local variables, not inherited variables:
>>> pprint(vars(configuration))
mappingproxy({'__doc__': '
Weather for Chesapeake Bay
',
'__module__': 'Chapter_13.settings',
'query': {'mz': ['ANZ532']}})
This is a little better, but it remains incomplete.
In order to see all of the settings, we need something a little more sophisticated. Interestingly, we can't simply define __repr__()
for a class. A method defined in a class is used by the instances of this class, not the class itself.
Each class object we create is an instance of the built-in type
class. We can, using a meta-class, tweak the way the type
class behaves, and implement a slightly nicer __repr__()
method, which looks through all parent classes for attributes.
We'll extend the built-in type with a __repr__
that does a somewhat better job at displaying the working configuration:
class ConfigMetaclass(type):
"""Displays a subclass with superclass values injected"""
def __repr__(self) -> str:
name = (
super().__name__
+ "("
+ ", ".join(b.__name__ for b in super().__bases__)
+ ")"
)
base_values = {
n: v
for base in reversed(super().__mro__)
for n, v in vars(base).items()
if not n.startswith("_")
}
values_text = [f"class {name}:"] + [
f" {name} = {value!r}"
for name, value in base_values.items()
]
return "
".join(values_text)
The class name is available from the superclass, type
, as the __name__
attribute. The names of the base classes are included as well, to show the inheritance hierarchy for this configuration class.
The base_values
are built from the attributes of all of the base classes. Each class is examined in reverse Method Resolution Order (MRO). Loading all of the attribute values in reverse MRO means that all of the defaults are loaded first. These values are then overridden with subclass values.
Names with the _
prefix are quietly ignored. This emphasizes the conventional practice of treating these as implementation details that aren't part of a public interface. This kind of name shouldn't really be used for a configuration file.
The resulting values are used to create a text representation that resembles a class definition. This does not recreate the original class source code; it's the net effect of the original class definition and all the superclass definitions.
Here's a Configuration
class hierarchy that uses this metaclass. The base class, Configuration
, incorporates the metaclass, and provides default definitions. The subclass extends those definitions with values that are unique to a particular environment or context:
class Configuration(metaclass=ConfigMetaclass):
unchanged = "default"
override = "default"
feature_x_override = "default"
feature_x = "disabled"
class Customized(Configuration):
override = "customized"
feature_x_override = "x-customized"
This is the kind of output our meta-class provides:
>>> print(Customized)
class Customized(Configuration):
unchanged = 'default'
override = 'customized'
feature_x_override = 'x-customized'
feature_x = 'disabled'
The output here can make it a little easier to see how the subclass attributes override the superclass defaults. This can help to clarify the resulting configuration used by an application.
We can leverage all of the power of Python's multiple inheritance to build Configuration
class definitions. This can provide the ability to combine details on separate features into a single configuration object.
Many large applications are amalgamations of multiple smaller applications. In enterprise terminology, they are often called application systems comprising individual command-line application programs.
Some large, complex applications include a number of commands. For example, the Git
application has numerous individual commands, such as git pull
, git commit
, and git push
. These can also be seen as separate applications that are part of the overall Git
system of applications.
An application might start as a collection of separate Python script files. At some point during its evolution, it can become necessary to refactor the scripts to combine features and create new, composite scripts from older disjoint scripts. The other path is also possible: a large application might be decomposed and refactored into a new organization of smaller components.
In this recipe, we'll look at ways to design a script so that future combinations and refactoring are made as simple as possible.
We need to distinguish between several aspects of a Python script.
We've seen several aspects of gathering input:
There are several aspects to producing output:
And finally, there's the real work of the application. This is made up of the essential features disentangled from the various input parsing and output formatting considerations. The real work is an algorithm working exclusively with Python data structures.
This separation of concerns suggests that an application, no matter how simple, should be designed as several separate functions. These should then be combined into the complete script. This lets us separate the input and output from the core processing. The processing is the part we'll often want to reuse. The input and output formats should be easy to change.
As a concrete example, we'll look at an application that creates sequences of dice rolls. Each sequence will follow the rules of the game of Craps. Here are the rules:
2
, 3
, or 12
is an immediate loss. The sequence has a single value, for example, [(1, 1)]
.7
or 11
is an immediate win. This sequence also has a single value, for example, [(3, 4)]
.7
or the point value is rolled:7
is a loss, for example, [(3, 1), (3, 2), (1, 1), (5, 6), (4, 3)]
.[(3, 1), (3, 2), (1, 1), (5, 6), (1, 3)]
.The output is a sequence of items. Each item has a different structure. Some will be short lists. Some will be long lists. This is an ideal place for using YAML format files.
This output can be controlled by two inputs—how many sample sequences to create, and whether or not to seed the random number generator. For testing purposes, it can help to have a fixed seed.
This recipe will involve a fair number of design decisions. We'll start by considering the different kinds of output. Then we'll refactor the application around the kinds of output and the different purposes for the output:
The sequence of rolls needs to be written to a file. This suggests that the write_rolls()
function is given an iterator as a parameter. Here's a function that iterates and dumps values to a file in YAML notation:
def write_rolls(
output_path: Path,
game_iterator: Iterable[Game_Summary]
) -> Counter[int]:
face_count: Counter[int] = collections.Counter()
with output_path.open("w") as output_file:
for game_outcome in game_iterator:
output_file.write(
yaml.dump(
game_outcome,
default_flow_style=True,
explicit_start=True
)
)
for roll in game_outcome:
face_count[sum(roll)] += 1
return face_count
def summarize(
configuration: argparse.Namespace,
counts: Counter[int]
) -> None:
print(configuration, file=sys.stderr)
print(counts, file=sys.stderr)
return
or yield
. Use return
to create a single result. Use yield
to generate each item of an iterator that will produce multiple results.In this example, we can easily make the core feature a function that iterates over the interesting values. This generator function relies on a craps_game()
function to generate the requested number of samples. Each sample is a full game, showing all of the dice rolls. The roll_iter()
function provides the face_count
counter to this lower-level function to accumulate some totals to confirm that everything worked properly.
def roll_iter(
total_games: int,
seed: Optional[int] = None
) -> Iterator[Game_Summary]:
random.seed(seed)
for i in range(total_games):
sequence = craps_game()
yield sequence
craps_game()
function implements the Craps game rules to emit a single sequence of one or more rolls. This comprises all the rolls in a single game. We'll look at this craps_game()
function later.os.environ
collection of environment variables:
def get_options(
argv: List[str] = sys.argv[1:]
) -> argparse.Namespace:
–samples
and –output
options. We can leverage additional features of argparse
to better validate the argument values:
parser = argparse.ArgumentParser()
parser.add_argument("-s", "--samples", type=int)
parser.add_argument("-o", "--output")
options = parser.parse_args(argv)
if options.output is None:
parser.error("No output file specified")
output_path
is created from the value of the –output
option. Similarly, the value of the RANDOMSEED
environment variable is validated and placed into the options
namespace. This use of the options
object keeps all of the various arguments in one place:
options.output_path = Path(options.output)
if "RANDOMSEED" in os.environ:
seed_text = os.environ["RANDOMSEED"]
try:
options.seed = int(seed_text)
except ValueError:
parser.error(
f"RANDOMSEED={seed_text!r} invalid seed")
else:
options.seed = None
return options
main()
function, which incorporates the three previous elements, to create the final, overall script:
def main() -> None:
options = get_options(sys.argv[1:])
face_count = write_rolls(
options.output_path,
roll_iter(
options.samples, options.seed
)
)
summarize(options, face_count)
This brings the various aspects of the application together. It parses the command-line and environment options.
The roll_iter()
function is the core processing. It takes the various options, and it emits a sequence of rolls.
The primary output from the roll_iter()
method is collected by write_rolls()
and written to the given output path. Additional control output is written by a separate function, summarize()
, so that we can change the summary without an impact on the primary output.
The central premise here is the separation of concerns. There are three distinct aspects to the processing:
get_options()
. This function can grab inputs from a variety of sources, including configuration files.write_rolls()
function. The other control output was handled by accumulating totals in a Counter
object and then dumping this output at the end.roll_iter()
function. This function can be reused in a variety of contexts.The goal of this design is to separate the roll_iter()
function from the surrounding application details.
The output from this application looks like the following example:
slott$ python Chapter_13/ch13_r05.py --samples 10 --output=x.yaml
Namespace(output='x.yaml', output_path=PosixPath('x.yaml'), samples=10, seed=None)
Counter({5: 7, 6: 7, 7: 7, 8: 5, 4: 4, 9: 4, 11: 3, 10: 1, 12: 1})
The command line requested ten samples and specifies an output file of x.yaml
. The control output is a simple dump of the options. It shows the values for the parameters plus the additional values set in the options
object.
The control output includes the counts from ten samples. This provides some confidence that values such as 6
, 7
, and 8
occur more often. It shows that values such as 3
and 12
occur less frequently.
The output file, x.yaml
, might look like this:
slott$ more x.yaml
--- [[5, 4], [3, 4]]
--- [[3, 5], [1, 3], [1, 4], [5, 3]]
--- [[3, 2], [2, 4], [6, 5], [1, 6]]
--- [[2, 4], [3, 6], [5, 2]]
--- [[1, 6]]
--- [[1, 3], [4, 1], [1, 4], [5, 6], [6, 5], [1, 5], [2, 6], [3, 4]]
--- [[3, 3], [3, 4]]
--- [[3, 5], [4, 1], [4, 2], [3, 1], [1, 4], [2, 3], [2, 6]]
--- [[2, 2], [1, 5], [5, 5], [1, 5], [6, 6], [4, 3]]
--- [[4, 5], [6, 3]]
Consider the larger context for this kind of simulation. There might be one or more analytical applications to make use of the simulation output. These applications could perform some statistical analyses on the sequences of rolls.
After using these two applications to create rolls and summarize them, the users may determine that it would be advantageous to combine the roll creation and the statistical overview into a single application. Because the various aspects of each application have been separated, we can rearrange the features and create a new application.
We can now build a new application that will start with the following two imports to bring in the useful functions from the existing applications:
from generator import roll_iter, craps_rules
from stats_overview import summarize
Ideally, a new application can be built without any changes to the other two applications. This leaves the original suite of applications untouched by the introduction of new features.
More importantly, the new application did not involve any copying or pasting of code. The new application imports working software. Any changes made to fix one application will also fix latent bugs in other applications.
Reuse via copy and paste creates technical debt. Avoid copying and pasting the code.
When we try to copy code from one application and paste it into a new application, we create a confusing situation. Any changes made to one copy won't magically fix latent bugs in the other copy. When changes are made to one copy, and the other copy is not kept up to date, this is an example of code rot.
In the previous section, we skipped over the details of the craps_rules()
function. This function creates a sequence of dice rolls that comprise a single game of Craps. It can vary from a single roll to a sequence of indefinite length. About 98% of the games will consist of thirteen or fewer throws of the dice.
The rules depend on the total of two dice. The data captured include the two separate faces of the dice. In order to support these details, it's helpful to have a NamedTuple
instance that has these two, related properties:
class Roll(NamedTuple):
faces: List[int]
total: int
def roll(n: int = 2) -> Roll:
faces = list(random.randint(1, 6) for _ in range(n))
total = sum(faces)
return Roll(faces, total)
This roll()
function creates a Roll
instance with a sequence that shows the faces
of the dice, as well as the total
of the dice. The craps_game()
function will generate enough Roll
objects to be one complete game:
Game_Summary = List[List[int]]
def craps_game() -> Game_Summary:
"""Summarize the game as a list of dice pairs."""
come_out = roll()
if come_out.total in [2, 3, 12]:
return [come_out.faces]
elif come_out.total in [7, 11]:
return [come_out.faces]
elif come_out.total in [4, 5, 6, 8, 9, 10]:
sequence = [come_out.faces]
next = roll()
while next.total not in [7, come_out.total]:
sequence.append(next.faces)
next = roll()
sequence.append(next.faces)
return sequence
else:
raise Exception(f"Horrifying Logic Bug in {come_out}")
The craps_game()
function implements the rules for Craps. If the first roll is 2
, 3
, or 12
, the sequence only has a single value, and the game is a loss. If the first roll is 7
or 11
, the sequence also has only a single value, and the game is a win. The remaining values establish a point. The sequence of rolls starts with the point value. The sequence continues until it's ended by seven or the point value.
The horrifying logic bug exception represents a way to detect a design problem. The if
statement conditions are quite complex. As we noted in the Designing complex if...elif chains recipe in Chapter 2, Statements and Syntax, we need to be absolutely sure the if
and elif
conditions are complete. If we've designed them incorrectly, the else
statement should alert us to the failure to correctly design the conditions.
The close relationship between the roll_iter()
, roll()
, and craps_game()
methods suggests that it might be better to encapsulate these functions into a single class definition. Here's a class that has all of these features bundled together:
class CrapsSimulator:
def __init__(self, /, seed: int = None) -> None:
self.rng = random.Random(seed)
self.faces: List[int]
self.total: int
def roll(self, n: int = 2) -> int:
self.faces = list(
self.rng.randint(1, 6) for _ in range(n))
self.total = sum(self.faces)
return self.total
def craps_game(self) -> List[List[int]]:
self.roll()
if self.total in [2, 3, 12]:
return [self.faces]
elif self.total in [7, 11]:
return [self.faces]
elif self.total in [4, 5, 6, 8, 9, 10]:
point, sequence = self.total, [self.faces]
self.roll()
while self.total not in [7, point]:
sequence.append(self.faces)
self.roll()
sequence.append(self.faces)
return sequence
else:
raise Exception("Horrifying Logic Bug")
def roll_iter(
self, total_games: int) -> Iterator[List[List[int]]]:
for i in range(total_games):
sequence = self.craps_game()
yield sequence
This class includes an initialization of the simulator to include its own random number generator. It will either use the given seed value, or the internal algorithm will pick the seed value.
The roll()
method will set the self.total
and self.faces
instance variables. There's no clear benefit to having the roll()
method return a value and also cache the current value of the dice in the self.total
attribute. Eliminating self.total
is left as an exercise for the reader.
The craps_game()
method generates one sequence of rolls for one game of Craps. It uses the roll()
method and the two instance variables, self.total
and self.faces
, to track the state of the dice.
The roll_iter()
method generates the sequence of games. Note that the signature of this method is not exactly like the preceding roll_iter()
function. This class separates random number seeding from the game creation algorithm.
Rewriting the main()
function to use the CrapsSimulator
class is left as an exercise for the reader. Since the method names are similar to the original function names, the refactoring should not be terribly complex.
argparse
to get inputs from a user.In the Designing scripts for composition recipe earlier in this chapter, we examined three aspects of an application:
There are several different kinds of output that applications produce:
It's less than optimal to lump all of these various aspects into print()
requests that write to standard output. Indeed, it can lead to confusion because too many different outputs are interleaved in a single stream.
The OS provides each running process with two output files, standard output and standard error. These are visible in Python through the sys
module with the names sys.stdout
and sys.stderr
. By default, the print()
method writes to the sys.stdout
file. We can change this and write the control, audit, and error messages to sys.stderr
. This is an important step in the right direction.
Python also offers the logging
package, which can be used to direct the ancillary output to a separate file (and/or other output channels, such as a database). It can also be used to format and filter that additional output.
In this chapter we'll look at good ways to use the logging
module.
In the Designing scripts for composition recipe, earlier in this chapter, we looked at an application that produced a YAML file with the raw output of a simulation in it. In this recipe, we'll look at an application that consumes that raw data and produces some statistical summaries. We'll call this application overview_stats.py
.
Following the design pattern of separating the input, output, and processing, we'll have an application, main()
, that looks something like this:
def main(argv: List[str] = sys.argv[1:]) -> None:
options = get_options(argv)
if options.output is not None:
report_path = Path(options.output)
with report_path.open("w") as result_file:
process_all_files(result_file, options.file)
else:
process_all_files(sys.stdout, options.file)
This function will get the options from various sources. If an output file is named, it will create the output file using a with
statement context manager. This function will then process all of the command-line argument files as input from which statistics are gathered.
If no output file name is provided, this function will write to the sys.stdout
file. This will display output that can be redirected using the OS shell's >
operator to create a file.
The main()
function relies on a process_all_files()
function. The process_all_files()
function will iterate through each of the argument files and gather statistics from that file. Here's what that function looks like:
def process_all_files(
result_file: TextIO,
file_paths: Iterable[Path]
) -> None:
for source_path in file_paths:
with source_path.open() as source_file:
game_iter = yaml.load_all(
source_file,
Loader=yaml.SafeLoader)
statistics = gather_stats(game_iter)
result_file.write(
yaml.dump(
dict(statistics),
explicit_start=True))
The process_all_files()
function applies gather_stats()
to each file in the file_names
iterable. The resulting collection is written to the given result_file
file.
The function shown here conflates processing and output in a design that is not ideal. We'll address this design flaw in the Combining two applications into one recipe.
The essential processing is in the gather_stats()
function. Given a path to a file, this will read and summarize the games in that file. The resulting summary
object can then be written as part of the overall display or, in this case, appended to a sequence of YAML-format summaries:
def gather_stats(
game_iter: Iterable[List[List[int]]]
) -> Counter[Outcome]:
counts: Counter[Outcome] = collections.Counter()
for game in game_iter:
if len(game) == 1 and sum(game[0]) in (2, 3, 12):
outcome = "loss"
elif len(game) == 1 and sum(game[0]) in (7, 11):
outcome = "win"
elif len(game) > 1 and sum(game[-1]) == 7:
outcome = "loss"
elif len(game) > 1 and sum(game[0]) == sum(game[-1]):
outcome = "win"
else:
detail_log.error("problem with %r", game)
raise Exception(
f"Wait, What? "
f"Inconsistent len {len(game)} and
f"final {sum(game[-1])} roll"
)
event = (outcome, len(game))
counts[event] += 1
return counts
This function determines which of the four game termination rules were applied to the sequence of dice rolls. It starts by opening the given source file and using the load_all()
function to iterate through all of the YAML documents. Each document is a single game, represented as a sequence of dice pairs.
This function uses the first (and sometimes last) rolls to determine the overall outcome of the game. There are four rules, which should enumerate all possible logical combinations of events. In the event that there is an error in our reasoning, an exception will get raised to alert us to a special case that didn't fit the design in some way.
The game is reduced to a single event with an outcome and a length. These are accumulated into a Counter
object. The outcome and length of a game are the two values we're computing. These are a stand-in for more complex or sophisticated statistical analyses that are possible.
We've carefully segregated almost all file-related considerations from this function. The gather_stats()
function will work with any iterable source of game data.
Here's the output from this application. It's not very pretty; it's a YAML document that can be used for further processing:
slott$ python Chapter_13/ch13_r06.py x.yaml
---
? !!python/tuple [loss, 2]
: 2
? !!python/tuple [loss, 3]
: 1
? !!python/tuple [loss, 4]
: 1
? !!python/tuple [loss, 6]
: 1
? !!python/tuple [loss, 8]
: 1
? !!python/tuple [win, 1]
: 1
? !!python/tuple [win, 2]
: 1
? !!python/tuple [win, 4]
: 1
? !!python/tuple [win, 7]
: 1
We'll need to insert logging features into all of these functions to show which file is being read, and any errors or problems with processing the file.
Furthermore, we're going to create two logs. One will have details, and the other will have a minimal summary of files that are created. The first log can go to sys.stderr
, which will be displayed at the console when the program runs. The other log will be appended to a long-term log
file to cover all uses of the application.
One approach to having separate needs is to create two loggers, each with a different intent. The two loggers can have dramatically different configurations. Another approach is to create a single logger and use a Filter
object to distinguish content intended for each logger. We'll focus on creating separate loggers because it's easier to develop and easier to unit test.
Each logger has a variety of methods reflecting the severity of the message. The severity levels defined in the logging
package include the following:
The method names are similar to the severity levels. We use logging.info()
to write an INFO message.
We'll be building a more complete application, leveraging components from previous examples. This will add use of the logging
module:
logging
features into the existing functions. This means that we'll need the logging
module, plus the other packages required by this app:
import argparse
import collections
import logging
from pathlib import Path
import sys
from typing import List, Iterable, Tuple, Counter, TextIO
import yaml
logger
objects as module globals. Loggers have hierarchical names. We'll name the loggers using the application name and a suffix with the content. The overview_stats.detail
logger will have processing details. The overview_stats.write
logger will identify the files read and the files written; this parallels the idea of an audit log because the file writes track state changes in the collection of output files:
detail_log = logging.getLogger("overview_stats.detail")
write_log = logging.getLogger("overview_stats.write")
We don't need to configure these loggers at this time. If we do nothing more, the two logger
objects will silently accept individual log entries, but won't do anything further with the data.
main()
function to summarize the two aspects of the processing. This will use the write_log
logger
object to show when a new file is created. We've added the write_log.info()
line to put an information message into the log for files that have been written:
def main(argv: List[str] = sys.argv[1:]) -> None:
options = get_options(argv)
if options.output is not None:
report_path = Path(options.output)
with report_path.open("w") as result_file:
process_all_files(result_file, options.file)
write_log.info("wrote %r", report_path)
else:
process_all_files(sys.stdout, options.file)
process_all_files()
function to provide a note when a file is read. We've added the detail_log.info()
line to put information messages in the detail log for every file that's read:
def process_all_files(
result_file: TextIO,
file_paths: Iterable[Path]
) -> None:
for source_path in file_paths:
detail_log.info("read %r", source_path)
with source_path.open() as source_file:
game_iter = yaml.load_all(
source_file,
Loader=yaml.SafeLoader)
statistics = gather_stats(game_iter)
result_file.write(
yaml.dump(
dict(statistics),
explicit_start=True))
The gather_stats()
function can have a log line added to it to track normal operations. Additionally, we've added a log entry for the logic error. The detail_log
logger is used to collect debugging information. If we set the overall logging level to include debug messages, we'll see this additional output:
def gather_stats(
game_iter: Iterable[List[List[int]]]
) -> Counter[Outcome]:
counts: Counter[Outcome] = collections.Counter()
for game in game_iter:
if len(game) == 1 and sum(game[0]) in (2, 3, 12):
outcome = "loss"
elif len(game) == 1 and sum(game[0]) in (7, 11):
outcome = "win"
elif len(game) > 1 and sum(game[-1]) == 7:
outcome = "loss"
elif (len(game) > 1
and sum(game[0]) == sum(game[-1])):
outcome = "win"
else:
detail_log.error("problem with %r", game)
raise Exception("Wait, What?")
event = (outcome, len(game))
detail_log.debug(
"game %r -> event %r", game, event)
counts[event] += 1
return counts
get_options()
function will also have a debugging line written. This can help diagnose problems by displaying the options in the log:
def get_options(
argv: List[str] = sys.argv[1:]
) -> argparse.Namespace:
parser = argparse.ArgumentParser()
parser.add_argument("file", nargs="*", type=Path)
parser.add_argument("-o", "--output")
options = parser.parse_args(argv)
detail_log.debug("options: %r", options)
return options
if __name__ == "__main__":
logging.basicConfig(stream=sys.stderr, level=logging.INFO)
main()
logging.shutdown()
This logging configuration builds the default handler object. This object simply prints all of the log messages on the given stream. This handler is assigned to the root logger; it will apply to all children of this logger. Therefore, both of the loggers created in the preceding code will go to the same stream.
Here's an example of running this script:
(cookbook) % python Chapter_13/ch13_r06.py -o data/sum.yaml data/x.yaml
INFO:overview_stats.detail:read PosixPath('data/x.yaml')
INFO:overview_stats.write:wrote PosixPath('data/sum.yaml')
There are two lines in the log. Both have a severity of INFO. The first line is from the overview_stats.detail
logger. The second line is from the overview_stats.write logger
. The default configuration sends all loggers to sys.stderr
so the logging output is kept separate from the main output of the application.
There are three parts to introducing logging into an application:
Logger
objectsCreating loggers can be done in a variety of ways. A common approach is to create one logger with the same name as the module:
logger = logging.getLogger(__name__)
For the top-level, main script, this will have the name "__main__"
. For imported modules, the name will match the module name.
In more complex applications, there will be a variety of loggers serving a variety of purposes. In these cases, simply naming a logger after a module may not provide the required level of flexibility.
It's also possible to use the logging
module itself as the root logger. This means a module can use the logging.info()
method, for example. This isn't recommended because the root logger is anonymous, and we sacrifice the possibility of using the logger name as an important source of information.
There are two concepts that can be used to assign names to the loggers. It's often best to choose one of them and stick with it throughout a large application:
package.module.class
. Other classes in the same module would share a common parent logger name. It's then possible to set the logging level for the whole package, one of the specific modules, or just one of the classes.event
, audit
, and perhaps debug
. This way, all of the audit
loggers will have names that start with "audit.
". This can make it easy to route all loggers under a given parent to a specific handler.In the recipe, we used the first style of naming. The logger names parallel the software architecture.
Placing logging requests near all the important state changes means we can decide which of the interesting state changes in an application belong in a log:
INFO
. This means any change to the OS state, for example removing a file or creating a directory, is a candidate for logging. Similarly, database updates and requests that should change the state of a web service should be logged.ERROR
. Any OS-level exceptions can be logged when they are caught and handled.DEBUG
messages after particularly important assignment statements.DEBUG
message so that object state changes can be tracked through the log.DEBUG
message might be more appropriate than a log entry at the CRITICAL
level.The third aspect of logging is configuring the loggers so that they route the requests to the appropriate destination. By default, with no configuration at all, the loggers will all quietly create log events but won't display them.
With minimal configuration, we can see all of the log events on the console. This can be done with the basicConfig()
method and covers a large number of simple use cases without any real fuss. Instead of a stream, we can use a filename to provide a named file. Perhaps the most important feature is providing a simple way to enable debugging by setting the logging level on the root logger from the basicConfig()
method.
The example configuration in the recipe used two common handlers—the StreamHandler
and FileHandler
classes. There are over a dozen more handlers, each with unique features for gathering and publishing log messages.
In order to route the different loggers to different destinations, we'll need a more sophisticated configuration. This goes beyond what we can build with the basicConfig()
function. We'll need to use the logging.config
module, and the dictConfig()
function. This can provide a complete set of configuration options. The easiest way to use this function is to write the configuration in YAML and then convert this to an internal dict
object using the yaml.load()
function:
from textwrap import dedent
config_yaml = dedent("""
version: 1
formatters:
default:
style: "{"
format: "{levelname}:{name}:{message}"
# Example: INFO:overview_stats.detail:read x.yaml
timestamp:
style: "{"
format: "{asctime}//{levelname}//{name}//{message}"
handlers:
console:
class: logging.StreamHandler
stream: ext://sys.stderr
formatter: default
file:
class: logging.FileHandler
filename: data/write.log
formatter: timestamp
loggers:
overview_stats.detail:
handlers:
- console
overview_stats.write:
handlers:
- file
- console
root:
level: INFO
""")
The YAML document is enclosed in a triple-quoted string. This allows us to write as much text as necessary. We've defined five things in the big block of text using YAML notation:
version
key must be 1
.formatters
key defines the log format. If this is not specified, the default format shows only the message body, without any level or logger information:default
formatter defined here mirrors the format created by the basicConfig()
function.timestamp
formatter defined here is a more complex format that includes the datetime stamp for the record. To make the file easier to parse, a column separator of //
was used.handlers
key defines the two handlers for the two loggers. The console
handler writes to the sys.stderr
stream. We specified the formatter this handler will use. This definition parallels the configuration created by the basicConfig()
function. Unsurprisingly, the FileHandler
class writes to a file. The default mode for opening the file is a
, which will append to the file with no upper limit on the file size. There are other handlers that can rotate through multiple files, each of a limited size. We've provided an explicit filename, and the formatter that will put more detail into the file than is shown on the console.loggers
key provides a configuration for the two loggers that the application will create. Any logger name that begins with overview_stats.detail
will be handled only by the console handler. Any logger name that begins with overview_stats.write
will go to both the file handler and the console handler.root
key defines the top-level logger. It has a name of ''
(the empty string) in case we need to refer to it in code. Setting the level on the root logger will set the level for all of the children of this logger.Use the configuration to wrap the main()
function like this:
logging.config.dictConfig(
yaml.load(config_yaml, Loader=yaml.SafeLoader))
main()
logging.shutdown()
This will start the logging in a known state. It will do the processing of the application. It will finalize all of the logging buffers and properly close any files.
18.221.13.173