© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
M. ZadkaDevOps in Pythonhttps://doi.org/10.1007/978-1-4842-7996-0_2

2. Packaging

Moshe Zadka1  
(1)
Belmont, CA, USA
 

One of the main strengths of Python is the ecosystem, the third-party packages on PyPI. There are packages to do anything from running computations in parallel on GPUs for machine learning to reducing the boilerplate needed for writing classes. This means that a lot of the practical work with Python is handling the third-party dependencies.

The current packaging tooling is pretty good, but things have not always been this way. It is important to understand which best practices are antiquated rituals based on faulty assumptions but have some merit and are actually good ideas.

When dealing with packaging, there are two ways to interact. One is to be a consumer wanting to use the functionality of a package. Another is to be the producer, publishing a package. These describe, usually, different development tasks, not different people.

It is important to have a solid understanding of the consumer side of packages before moving to production. If the goal of a package publisher is to be useful to the package user, it is crucial to imagine the last mile before starting to write a single line of code.

2.1 Virtual Environments

Virtual environments are often misunderstood because the concept of environments is not clear. A Python environment refers to the root of the Python installation. The reason an environment is important is because of the lib/site-packages subdirectory of that root. The lib/site-packages subdirectory is where third-party packages are installed.

The most popular tool to add packages to an environment is pip, which is covered in the next section. Before using pip, it is important to understand how virtual environments work.

A real environment is based on Python installation, which means that to get a new real environment, a new Python must be installed and often rebuilt. This is sometimes an expensive proposition.

The advantage of a virtual environment is that it is cheap to set up and tear down. Some modern Python tooling takes advantage of that, setting up and tearing down virtual environments as a normal part of their operation. Setting up and tearing down virtual environments, being cheap and fast, is also a common part of Python developer workflow.

A virtual environment copies the minimum necessary out of the real environment to mislead Python into thinking it has a new root. The precise file structure is less important than remembering that the command to create a virtual environment is simple and fast.

Here, simple means that all the command does is copy some files and perhaps make a few symbolic links. Because of that, there are a few failure modes—mostly when file creation fails because of permission issues or a full disk.

There are two ways to use virtual environments: activated and inactivated. To use an inactivated virtual environment, which is most common in scripts and automated procedures, you explicitly call Python from the virtual environment.

This means that a virtual environment in /home/name/venvs/my-special-env calling /home/name/venvs/my-special-env/bin/python has a Python process that uses this environment. For example, /home/name/venvs/my-special-env/bin/python -m pip runs pip but installs in the virtual environment.

Note that entrypoint–based scripts are installed alongside Python, so running /home/name/venvs/my-special-env/bin/pip also installs packages in the virtual environment.

The other way to use a virtual environment is to activate it. Activating a virtual environment in a bash-like shell means sourcing its activated script.
$ source /home/name/venvs/my-special-env/bin/activate

The sourcing sets a few environment variables, only one of which is important. The important variable is PATH, which gets prefixed by /home/name/venvs/my-special-env/bin. This means that commands like python or pip are found there first. Two cosmetic variables are set. $VIRTUAL_ENV points to the root of the environment. This is useful in management scripts that want to be aware of virtual environments. PS1 is prefixed with (my-special-env), which is useful for visualizing the virtual environment while working interactively in the console.

It is generally a good practice to only install third-party packages inside a virtual environment. Combined with the fact that virtual environments are cheap, if one gets into a bad state, it is best to remove the whole directory and start from scratch.

For example, imagine a bad package install that causes the Python start-up to fail. Even running pip uninstall is impossible since pip fails on start-up. However, the cheapness means you can remove the whole virtual environment and re-create it with a good set of packages.

A modern practice is to move increasingly toward treating virtual environments as semi-immutable. After creating them, there is a single stage for installing all required packages. Instead of modifying the virtual environment if an upgrade is required, destroy the environment, re-create, and reinstall.

The modern way to create virtual environments is to use the venv standard library module. This only works on Python 3. Since Python 2 has been strongly deprecated since the beginning of 2020, it is best avoided in any case.

venv is used as a command with python -m venv <directory>, as there is no dedicated entrypoint. It creates the directory for the environment.

It is best if this directory does not exist before that. A best practice is to remove it before creating the environment. There are also two options for creating the environment: which interpreter to use and what initial packages to install.

2.2 pip

The packaging tool for Python is pip. There have been other tools that have mostly been abandoned by the community and should not be used.

Installations of Python used to not come with pip out of the box. This has changed in recent versions, but many versions which are still supported do not have it. When running on such a version, python -m ensurepip installs it.

Some Python installations, especially system ones, disable ensurepip. When lacking ensurepip, there is a way of manually getting it: get-pip.py. This is a single downloadable file that, when executed, unpacks pip.

Luckily, pip is the only package that needs these weird gyrations to install. All other packages can, and should, be installed using pip.

For example, if sample-environment is a virtual environment, installing the glom package can be done with the following code.
$ ./sample-environment/bin/python -m pip install glom
...
$ ./sample-environment/bin/python -m glom
{}

The last command tests that glom has been properly installed. Glom is a package to handle deeply-nested data, and called with no arguments, outputs an empty Python dictionary. This makes it handy for quickly testing whether a new virtual environment can install new packages properly.

Internally, pip is also treated as a third-party package. Upgrading pip itself is done with pip install --upgrade pip.

Depending on how Python was installed, its real environment might or might not be modifiable by the user. Many instructions in various README files and blogs might encourage using sudo pip install. This is almost always the wrong thing to do; it installs the packages in the global environment.

The pip install command downloads and installs all dependencies. However, it can fail to downgrade incompatible packages. It is always possible to install explicit versions: pip install package-name==<version> installs this precise version. This is also a good way for local testing to get explicitly non-general-availability packages, such as release candidates, beta, or similar.

If wheel is installed, pip builds, and usually caches, wheels for packages. This is especially useful when dealing with a high virtual environment churn since installing a cached wheel is a fast operation. This is also highly useful when dealing with native or binary packages that need to be compiled with a C compiler. A wheel cache eliminates the need to build it again.

pip does allow uninstalling with pip uninstall <package>. This command, by default, requires manual confirmation. Except for exotic circumstances, this command is not used. If an unintended package has snuck in, the usual response is to destroy the environment and rebuild it. For similar reasons, pip install --upgrade <package> is not often needed; the common response is to destroy and re-create the environment. There is one situation where it is a good idea.

pip install supports a requirements file: pip install --requirements or pip install -r. The requirements file simply has one package per line. This is no different from specifying packages on the command line. However, requirement files often specify strict dependencies. A requirements file can be generated from an environment with pip freeze.

Like most individual packages or wheels, installing anything that is not strict and closed under requirements requires pip to decide which dependencies to install. The general problem of dependency resolution does not have an efficient and complete solution. Different strategies are possible to approach such a solution.

The way pip resolves dependencies is by using backtracking. This means that it optimistically tries to download the latest possible requirements recursively. If a dependency conflict is found, it backtracks; try a different option.

As an example, consider three packages.
  • top

  • middle

  • base

There are two base versions: 1.0 and 2.0. The package dependencies are setup.cfg files.

The following is for the top.
[metadata]
name = top
version = 1.0
[options]
install_requires =
    base
    middle
The following is for the middle.
[metadata]
name = middle
version = 1.0
[options]
install_requires =
    base<2.0

The base package has two versions: 1.0 and 2.0. It does not have any dependencies.

Because top depends directly on base, pre-backtracking versions of pip get the latest and then have a failed resolution.
$ pip install top
Looking in links: .
Collecting top
Collecting middle (from top)
Collecting base (from top)
middle 1.0 has requirement base<2.0, but you'll have base 2.0 which is incompatible.
Installing collected packages: base, middle, top
Successfully installed base-2.0 middle-1.0 top-1.0
The backtracking algorithm discards the base 2.0 version.
$ pip install top
Looking in links: .
Processing ./top-1.0-py3-none-any.whl
Processing ./base-2.0-py3-none-any.whl
Processing ./middle-1.0-py3-none-any.whl
Processing ./base-1.0-py3-none-any.whl
Installing collected packages: base, middle, top
Successfully installed base-1.0 middle-1.0 top-1.0

This solution has the advantage that it is complete, but it can take unfeasible amounts of time in certain cases. This is rare, but merely taking a long time is not.

One way to increase the speed is to include >= dependencies in the loose requirements. This is usually a good idea since packages are better at guaranteeing backward compatibility than forward compatibility. As a side benefit, this can dramatically reduce the solution space that pip needs to backtrack in.

In most scenarios, it is better to use strict requirements for day-to-day development and regenerate the strict requirements from the loose requirements (which can take a while) on a cadence that balances keeping up to date with churn.

2.3 Setup and Wheels

The term third party (as in third-party packages) refers to someone other than the Python core developers (first-party) or the local developers (second-party). I have covered how to install first-party packages in the installation section. You used pip and virtualenv to install third-party packages. It is time to finally turn your attention to the missing link: local development and installing local packages or second-party packages.

Note that the word package here means something different from post-installation. In Python, a package is an importable directory, a way to keep multiple modules together. The pedantic way to call installable things is distribution. A distribution can correspond to no packages (it can be a top-level single-module distribution) or multiple packages.

It is good to keep a 1-1-1 relationship when packaging things: a single distribution corresponding to one package and named the same. Even if there is only one file, put it as an __init__.py file under a directory.

Packaging is an area that has seen a lot of changes. Copying and pasting from existing packages is not a good idea; good packages are, for the most part, mature packages. Following the latest best practices means making changes to an existing working process.

Starting with setuptools version 61.0.0, it is possible to create a package with only two files besides the code files.
  • pyproject.toml

  • README.rst

The README is not strictly necessary. However, most source code management systems display it rendered, so it is best to break it out into its own file.

Even an empty pyproject.toml generates a package. However, almost all packages need at least a few more details.

The build-system is the one mandatory section in a non-empty pyproject.toml file. It is usually the first.
[build-system]
requires = [
    "setuptools"
]
build-backend = "setuptools.build_meta"

Many systems can be used to build valid distributions. The setuptools system, which used to be the only possibility, is now one of several. However, it is still the most popular one.

Most of the rest of the data can be found in the project section.
[project]
name = "awesome_package"
version = "0.0.3"
description = "A pretty awesome package"
readme = "README.rst"
authors = [{name = "My Name",
            email = "[email protected]"}]
dependencies = ["httpx"]

For most popular code organizations, this is enough for the setuptools systems to find the code and create a correct package.

There are ways to have setuptools treat the version as dynamic and take it from a file or an attribute. An alternative is to take advantage of pyproject.toml in a structured format and manipulate it directly.

For example, the following code uses a CalVer (calendar versioning) scheme of YEAR.MONTH.release in a month. It uses the built-in zoneinfo module, which requires Python 3.9 or above, and the tomlkit library, which supports roundtrip-preserving TOML parsing and serialization.
import tomlkit
import datetime
import os
import pathlib
import zoneinfo
now = datetime.datetime.now(tz=zoneinfo.ZoneInfo("UTC"))
prefix=f"{now.year}.{now.month}."
pyproject = pathlib.Path("pyproject.toml")
data = tomlkit.loads(pyproject.read_text())
current = data["project"].get("version", "")
if current.startswith(prefix):
    serial = int(current.split(".")[-1]) + 1
else:
    serial = 0
version = prefix + str(serial)
data["project"]["version"] = version
pyproject.write_text(tomlkit.dumps(data))

Some utilities keep the version synchronized between several files; for example, pyproject.toml and example_package/__init__.py. The best way to use these utilities is by not needing to do it.

If example_package/__init__.py wants to expose the version number, the best way is to calculate it using importlib.metadata.
# example_package/__init__.py
from importlib import metada
__version__ = metadata.distribution("example_package").version
del metadata # Keep top-level namespace clean

This avoids needing to keep more than one place in sync.

The field dependencies in pyproject.toml is present on almost every package. This is how to mark other distributions that the code needs. It is a good practice to put loose dependencies in pyproject.toml. This is in contrast to exact dependencies, which specify a specific version. A loose dependency looks like Twisted>=17.5, specifying a minimum version but no maximum. Exact dependencies, like Twisted==18.1, are usually a bad idea in pyproject. toml. They should only be used in rare cases, for example, when using significant chunks of a private API package.

The pyproject.toml file also allows defining entrypoints. Some frameworks, like Pyramid, allow using entrypoints to add plugin-like features.

It also allows you to define scripts. These used to be console_scripts entrypoints but now have their own section.
[project.scripts]
example-command = "example_package.commands:main"

The syntax is package.....module:function. This function is called with no arguments when the script is being run.

Usually, this includes command-line parsing, but the following is a short example.
# example_package/commands.py
def main():
    print("an example")
In this example, running example-command causes the string to print.
$ example-command
an example

You can build a distribution with pyproject.toml, a README.rst, and some Python code. There are several formats a distribution can take, but the one covered here is the wheel.

After installing build using pip install build, run
python -m build --wheel

This creates a wheel under dist. If the wheel needs to be in a different directory, add --outdir <output directory> to the command.

You can do several things with the wheel, but it is important to note that one thing you can do is pip install <wheel file>. Doing this as part of continuous integration makes sure the wheel, as built by the current directory, is functional.

It is possible to use python -m build to create a source distribution. This is usually a good idea to accommodate use cases that prefer to install from source. These use cases are esoteric, but generating the source distribution is easy enough to be worth it.
$ python -m build --sdist

It is possible to combine the --sdist and --wheel arguments into one run of python -m build. This is also what python -m build does by default: create both a source distribution and a wheel.

By default, python -m build installs any packages it needs to build the package in a fresh virtual environment. When running python -m build in a tight edit-debug loop, perhaps to debug a setup.cfg, this can get tedious. In those cases, create and activate a virtual environment, and then run
$ python -m build --no-isolation

This installs its dependencies in the current environment. While this is not a good fit for production use, this is a faster way to debug packaging issues.

2.4 Binary Wheels

Python is well known and often used as an integration language. One of the ways this integration happens is by linking to native code.

This is usually done using the C Application Binary Interface (C ABI). The C ABI is used not only for integrating with C libraries but with other languages, such as C++, Rust, or Swift, which can generate C ABI-compatible interfaces.

There needs to be some glue code bridging the C ABI to Python to integrate Python with such code. It is possible to write this code by hand.

This is a tedious and error-prone process, so code generators are often used. Cython is a popular generator that uses a Python-compatible language. Although Cython is often used to interface to C ABI libraries, it can be used to generate extensions without such integration. This makes examples slightly simpler, so the following Cython code is used as a running example.
#cython: language_level=3
def add(x, y):
    return x + y

This code is in the binary_module.pyx file. It is short and does just enough to be clear if it works correctly.

To build code with native integration, the files that describe the build are slightly more complicated.

The pyproject.toml file is no longer empty. It now has two lines.
[build-system]
requires = ["setuptools", "cython"]

This makes sure the cython package is installed before trying to build a wheel.

The setup.py is no longer minimal. It contains enough code to integrate with Cython.
import setuptools
from Cython import Build
setuptools.setup(
    ext_modules=Build.cythonize("binary_module.pyx"),
)
The Cython.Build.cythonize function does two things.
  • Creates (or re-creates) binary_module.c from binary_module.pyx.

  • Returns an Extension object.

Since *.pyx files are not included by default, it needs to be enabled explicitly in the MANIFEST.in file.
include *.pyx
Since, in this example, there are no regular Python files, the setup.cfg does not need to specify any.
[metadata]
name = binary_example
version = 1.0

With these files, running python -m build --wheel builds a binary wheel in dist named something like binary_example-1.0-cp39-cp39-linux_x86_64.whl. Details of the name depend on the platform, the architecture, and the Python version.

After installing this wheel, it can be used as follows.
$ pip install dist/binary_example*.whl
$ python -c 'import binary_module;print(binary_module.add(1, 2))'
3

This is a simple example that demonstrates the mechanics of binary packaging. It is designed to show how all the pieces fit together in a small example.

Realistic binary packages are usually more complicated, implementing subtle algorithms that can take advantage of the optimizations Cython gives or wrapping a native-code library.

2.5 manylinux Wheels

A binary wheel is not a pure Python wheel because at least one of its files contains native code. On a Linux system, this native code is a shared library, a file with the .so suffix for a shared object.

This shared library links against other libraries. For a library designed to wrap a specific native library, as with pygtk wrapping gtk, it links with the wrapped library.

In almost all cases, whether it is designed to wrap a specific library or not, it links against the standard C library. This is the library that has C functions like printf. Few things can be done in native code without linking against it.

On most modern Linux systems, this linking is usually dynamic. This means that the binary wheel does not contain the library it is linked with; it expects to load it at runtime.

If a wheel is built on a different system than the one it is installed on, a library that is binary compatible with the one it is linked with has to be installed on the system. If a binary compatible library is not installed, this leads to a failure at import time.

2.5.1 Self-Contained Wheels

The auditwheel tool takes binary wheels and patches them to make them more portable. One of its functions is to grab the pieces from dynamic libraries and put them in the wheel. This allows the wheels to be installed without requiring a different library.

For auditwheel to work correctly, the patchelf utility needs to be installed. Older versions might produce wheels that break in strange ways. The safest way to have the right version of patchelf is to download the latest source distribution and build it.

To make a self-contained wheel, first, build a regular build. This might require careful reading of the instructions for building the package from source. This results in the regular binary wheel in dist/. This was the case in the example before with the binary_example module.

After this is done, run
$ auditwheel repair --plat linux_x86_64 dist/*.whl

By default, auditwheel creates the self-contained wheel in a wheelhouse subdirectory. The wheel created is self-contained but expects to be installed on a compatible version of Linux.

2.5.2 Portable Wheels

The --plat flag in auditwheel is the platform tag. If it is linux_<cpu architecture>, the wheel makes no guarantees about which GNU C Library it is compatible with.

Wheels like that should only be installed on a compatible Linux system. To avoid mistakes, most Python package index systems, including PyPI, do not let these wheels be uploaded.

Uploadable Python wheels must be tagged with a proper platform tag, which shows which versions of the GNU C Library they are compatible with. Historically, those tags relied on the CentOS release year: manylinux1 corresponded to CentOS 5, manylinux20210 corresponded to CentOS 6, and manylinux20214 corresponded to CentOS 7.

At the time of writing, manylinux_2_24 and manylinux_2_27 are the only post–CentOS 7 versions. These correspond to Debian 9 and Ubuntu 18.04, respectively.

After deciding on the oldest supported platform tag, build on the newest system which supports it. For example, if no deployment target uses GNU C Library < 2.24, build the wheel until Debian 9. Especially for binary wheels with complicated build dependencies, a newer system makes it easier to follow the documentation and reduces the chances of running into unexpected issues.

2.5.3 manylinux Containers

Making sure that the patchelf tool is correctly installed and Python is built with the correct version of the C library is a subtle and error-prone process. One way to avoid this is to use the official manylinux container images.

These container images are available at quay/pypa/manylinux_<version>. There are versions available for manylinux_2_24, manylinux2014, manylinux2010, and manylinux1. These images contain all officially supported versions of Python and the rest of the tooling necessary.

Note that specialized build dependencies need to be installed on those systems; for example, when using manylinux_2_24 (a Debian-based container).
$ docker run --rm -it quay/pypa/manylinux_2_24
# apt-get update
# apt-get install -y <dependencies>

2.5.4 Installing manylinux Wheels

By default, pip uses manylinux wheels with compatible platform tags. Wheels can be uploaded to a package index or added to a directory passed to pip with --find-links.

In some situations, it is better if pip fails quickly when no prebuilt wheel is available. The --only-binary :all: option can be given to disable installing from source distributions.

2.6 tox

tox is a tool to automatically manage virtual environments, usually for tests and builds. It is used to make sure that those run in well-defined environments and is smart about caching them to reduce churn. True to its roots as a test-running tool, tox is configured in test environments.

tox itself is a PyPI package usually installed in a virtual environment. Because tox creates ad hoc temporary virtual environments for testing, the virtual environment tox is installed in can be common to many projects. A common pattern is to create a virtual environment dedicated to tox.
$ python -m venv ~/.venvs/tox
$ ~/.venvx/tox/bin/python -m pip install tox
$ alias tox=~/.venvs/tox/bin/tox

It uses a unique ini-based configuration format. This can make writing configurations difficult since remembering the subtleties of the file format can be hard. However, while hard to tap, there is a lot of power that can certainly configure tests and build clear and concise runs.

One thing that tox lacks is a notion of dependencies between build steps. This means that those are usually managed from the outside by running specific test runs after others and sharing artifacts somewhat ad hoc.

A tox environment more or less corresponds to a section in the configuration file. By default, tox uses the tox.ini file.
[testenv:some-name]
.
.
.

Note that if the name of the environment contains pyNM (for example, py36), then tox defaults to using CPython, the standard Python implementation, version N.M (3.6, in this case) as the Python interpreter for that test environment.

tox also supports name-based environment guessing for more esoteric implementations of Python. For example, PyPy, an implementation of Python in Python, is supported with the name pypyNM.

If the name does not include one of the supported short names, or if there is a need to override the default, a basepython field in the section can be used to indicate a specific Python version. By default, tox looks for Python available in the path. However, if the plug-in tox-pyenv is installed in the virtual environment that tox itself is installed in, tox will query pyenv if it cannot find the right Python on the path.

Let’s analyze a few tox configuration files in order of increasing complexity.

2.6.1 One Environment

In this example, there is only one test environment. This test environment uses Python 3.9.
[tox]
envlist = py39
The tox section is a global configuration. In this example, the only global configuration is the list of environments.
[testenv]
This section configures the test environment. Since there is only one test environment, there is no need for a separate configuration.
deps =
    flake8

The deps subsection details which packages should be installed in the virtual test environment. Here the configuration specifies flake8 with a loose dependency. Another option is to specify a strict dependency; for example, flake8==1.0.0..

This helps with reproducible test runs. It could also specify -r <requirements file> and manage them separately. This is useful when there is another tool that takes the requirements file.
commands =
    flake8 useful

In this case, the only command is to run flake8 in the useful directory. By default, a tox test run succeeds if all commands return a successful status code. As something designed to run from command lines, flake8 respects this convention and only exits with a successful status code if there are no problems detected with the code.

2.6.2 Multiple Environments

In the following examples, the tox configuration runs unit tests against both Python 3.9 and Python 3.8. This is common for libraries that need to support more than one version.
[tox]
envlist = py39,py38
The two environments can share configuration. Even though there is no configuration difference, they are not redundant. They run the tests against different versions of the interpreter.
[testenv]
deps =
    pytest
    hypothesis
    pyhamcrest
commands =
    pytest useful

In this environment, tox is configured to install the pytest runner and two testing helper libraries. The tox.ini file documents the assumptions on the tools needed to run the tests.

The command to be run is short. The pytest tool also respects the testing tools convention and only exit successfully if there are no test failures.

2.6.3 Multiple Differently Configured Environments

As a more realistic example, let’s turn to the tox.ini of ncolony.
[tox]
envlist = {py38,py39}-{unit,func},py39-wheel,docs
toxworkdir = {toxinidir}/build/.tox

It is possible to define environments in a matrix way like this. The environments to be created are py38-unit, py38-func, py39-unit, and py39-func.

This becomes more useful the more environments there are. This is also a way to make too many environments. For example, {py37,py38,py39}-{unit,func}-{olddeps,newdeps}-{mindeps,maxdeps} creates 3*2*2*2*2=24 environments, which takes a toll when running the tests.

The numbers for a matrix test like this climb up fast; using an automated test environment means things would either take longer or need higher parallelism.

This is a normal trade-off between the comprehensiveness of testing and resource use. There is no magic solution other than carefully considering how many variations to officially support.

Instead of having a separate testenv-<name> configuration section per environment, it is possible to use one section and special-case the environments using matching. This is a more efficient way to create many similar versions of test environments.
[testenv]
deps =
    {py38,py39}-unit: coverage
    {py38,py39}-{func,unit}: twisted
    {py38,py39}-{func,unit}: ncolony
The coverage tool is only used for unit tests. The Twisted and ncolony libraries are needed for unit and functional tests.
commands =
    {py38,py39}-unit: python -Wall
                                  -Wignore::DeprecationWarning
                                  -m coverage
                                  run -m twisted.trial
                                  --temp-directory build/_trial_temp
                                  {posargs:ncolony}
    {py38,py39}-unit: coverage report --include ncolony*
                           --omit */tests/*,*/interfaces*,*/_version*
                                      --show-missing --fail-under=100
    {py38,py39}-func: python -Werror -W ignore::DeprecationWarning
                                  -W ignore::ImportWarning
                                  -m ncolony tests.functional_test

Configuring one big test environment means all the commands are in one bag and selected based on patterns. This is also a more realistic test run command, including warnings configuration, coverage, and arguments to the test runner.

While the exact complications vary, there are almost always enough things that lead the commands to grow to a decent size.

The following environment is different enough that it makes sense to break it out into its own section.
[testenv:py39-wheel]
skip_install = True
deps =
      build
commands =
      python -c 'import os, sys;os.makedirs(sys.argv[1])" {envtmpdir}/dist
      python -m build --outdir {envtmpdir}/dist --no-isolation

The py39-wheel section ensures that the wheel can be built. A more sophisticated configuration might install the wheel and run the unit tests.

Finally, the docs section builds the documentation. This helps avoid syntax errors resulting in the documentation failing to build.
[testenv:docs]
changedir = docs
deps =
    sphinx
commands =
    sphinx-build -W -b html -d {envtmpdir}/doctrees . {envtmpdir}/html
basepython = python3.9

For it to run, it needs to have a docs subdirectory with a conf.py and, depending on the contexts of the configuration file, more files. Note that basepython must be explicitly declared in this case since it is not part of the environment’s name.

The documentation build is one of the reasons why tox shines. It only installs sphinx in the virtual environment for building documentation. This means that an undeclared dependency on sphinx would make the unit tests fail since sphinx is not installed there.

2.7 Pip Tools

The pip-tools PyPI package contains a dedicated command to freeze dependencies. The pip-compile command takes a loose requirements file as an input and produces one with strict requirements.

The usual names for the files are requirements.in for the input and requirements.txt for the output. Sometimes, when there are a few related variations of dependencies, the files are called requirements-<purpose>.in and requirements-<purpose>.txt, respectively.

There are two common purposes.
  • dev: packages needed for development, but not while running

  • test: packages needed for testing but not running regularly

Running the command can be as simple as the following.
$ pip-compile < requirements.in > requirements.txt
...

More commonly, the loose requirements are already in a setup.cfg for the local code that uses the libraries. In those cases, pip-compile can take this as input directly.

For example, a setup.cfg file for a web application might have a dependency on gunicorn and a test dependency on pytest.
[options]
install_requires=
    gunicorn
[options.extras_require]
test =
    pytest
The pip-compile commands also need a trivial setup.py.
import setuptools
setuptools.setup()

It is usually best to also have a pyproject.toml, though it can be empty. Even though pip-compile does not depend on it, it does help with other parts of the workflow.

In such a case, pip-compile uses the package metadata automatically.
$ pip-compile > requirements.txt
<output snipped>
$ sed -e 's/ *#.*//' -e '/^$/d' requirements.txt
gunicorn==20.1.0

The output, requirements.txt, contains quite a few comment lines. The only non-comment line is the pinned gunicorn dependency. Note that the version is different when running pip-compile again in the future when a new package has been released.

It is also possible to generate the requirements-test.txt file by running pip-compile with the --extra argument.
$ pip-compile --extra test > requirements-test.txt
<output snipped>
$ sed -e 's/ *#.*//' -e '/^$/d' requirements.txt
attrs==21.4.0
gunicorn==20.1.0
iniconfig==1.1.1
packaging==21.3
pluggy==1.0.0
py==1.11.0
pyparsing==3.0.6
pytest==6.2.5
toml==0.10.2

This time, the pytest dependency generated more dependencies. All dependencies are pinned.

Note that while pip-tools does try to replicate the algorithm pip uses, there are some edge cases in which the resolution differs, or one might fail to find a resolution. This tends to happen in edge cases, and one way to improve things is to add specificity to some of the dependencies, often as >= details.

Note that even this relatively simple (hypothetical) Python program with two direct dependencies had nine total dependencies. This is typical; frozen, complete dependencies often number in the tens for simple programs and hundreds for many code bases.

Although requirements.txt and requirements-*.txt are generated files, checking them into the source code for applications is usually recommended. For libraries, it sometimes makes sense to check in requirements-test.txt to run at least one set of tests with known-good dependencies.

In either case, it is good to refresh the dependencies occasionally. Most libraries only fix bugs, and security issues, at their tip. Being too far behind the latest version means that if a bug fix or security patch becomes important, a library upgrade is potentially needed. These upgrades tend to be harder. Even with perfectly executed semantic versioning, the library might have bumped its major version number. More realistically, even without a major version number, some assumptions might be wrong. The bigger the jump, the more these assumptions interact and become complicated to resolve.

There are services that automatically open pull requests updating requirements.txt. If none of those are feasible to use, for any reason, writing a script that re-runs pip-compile and produces a pull request is possible. In that case, use the -U flag to pip-compile to let it know that all dependencies need to be upgraded to the latest mutually-consistent versions.

The most important thing is to do it regularly and merge the resulting pull request. These pull requests need to be treated as carefully, if not more so, than those that change the Python code itself. Whatever workflow is used for the latter is appropriate for these dependency-bump pull requests.

2.8 Poetry

Poetry is a package and dependency management system. It gives one tool which handles the entire Python development process: managing dependencies, creating virtual environments, building and publishing packages, and installing Python applications.

2.8.1 Installing

There are several ways to install Poetry. One is by using a get-poetry.py script, which uses Python to install Poetry locally.

This can be done by piping it straight into Python.
$ curl -sSL
  https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py
  | python -

It is also possible to download the script with curl -o get-poetry.py ... and then run it.

In some circumstances, it might make sense to install Poetry into a dedicated virtual environment, using pip install poetry in the virtual environment. One advantage of the pip-based installation method is that it works with a local Python package index. This is sometimes useful for compliance or security reasons.

Regardless of how it was installed, poetry self update updates Poetry to the latest version. It is also possible to update to a specific version as a parameter to the update command.

For shell completions, poetry completions <shell name> outputs shellcode for completions compatible with the given shell. This can be loaded globally or per user as appropriate for the relevant shell. Among the shells supported are bash, zsh, and fish.

2.8.2 Creating

Usually, the best way to start with Poetry is on a fresh project. It can create a skeleton for a Poetry-based project.
$ poetry new simple_app

This creates a directory called simple_app with a minimal Poetry skeleton.

In most cases, this new directory, simple_app, is version-controlled. This is not done by poetry new, so it is a good idea to do it immediately after; for example, using git.
$ git init .
$ git add .
$ git commit -a -m 'Output of "poetry new"'
The most important file is pyproject.toml at the root of the directory. It contains the build-system section.
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

This makes it compatible with python -m build for building wheels and source distributions.

The pyproject.toml file also contains some Poetry-specific sections, all marked by having tool.poetry as a prefix. The main section, tool.poetry, contains package metadata.
[tool.poetry]
name = <name>
version = <version>
description = <description>
authors = ["<author name and e-mail>", ...]
The version field can be edited manually, but it is better to use poetry version <bump rule> or poetry version <version> to modify the versions.
$ poetry version patch
...
$ git diff
....
--- pyproject.toml
+++ pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "simple_app"
-version = "0.1.0"
+version = "0.1.1"
 description = ""
$ poetry version 1.2.3
...
$ git diff
....
--- pyproject.toml
+++ pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "simple_app"
-version = "0.1.0"
+version = "1.2.3"
 description = ""
The following are other files that are created.
  • README.rst is a bare-bones README.

  • <name>/__init__.py is a top-level file that makes the <name> directory a package. It also defines an __version__ variable.

  • Testing files
    • tests/test_<name>.py is a minimal test checking that the version number is valid. Its main utility is that poetry run pytest does not fail.

    • tests/__init__.py is a file that does not contain tests. Making the tests directory into a package is required for the way Poetry runs pytest.

2.8.3 Dependencies

Assume that the goal of simple_app is to be a Pyramid-based web application that runs with gunicorn. The first step is to add those dependencies to Poetry. The pyramid add subcommand adds dependencies.
$ poetry add pyramid gunicorn
This modifies the tool.poetry.dependencies section in pyproject.toml as follows.
[tool.poetry.dependencies]
python = "^3.8"
pyramid = "^2.0"
gunicorn = "^20.1.0"

By default, Poetry assumes that the dependencies are semantically versioned. This means that potential security fixes can be fixed if they are not backported to previous versions. Most Python packages do not backport fixes, so this is something to be careful with.

This command also creates the poetry.lock file, which has recursively-complete pinned dependencies. These are the dependencies used by Poetry. The poetry lock command updates the pinned dependencies.

It is possible to export those locked dependencies to a requirements.txt file.
$ poetry export > requirements.txt

Depending on what the package is used for, either poetry.lock, requirements.txt, or both should be checked into source control.

2.8.4 Developing

Even a minimally functional Pyramid app requires a little more code.
# simple_app/web.py
from pyramid import config, response
def root(request):
    return response.Response("Useful string")
with config.Configurator() as cfg:
    cfg.add_route("root", "/")
    cfg.add_view(root, route_name='root')
    application = cfg.make_wsgi_app()
Since the pyramid and gunicorn dependencies are already in Poetry, it can directly run the code. There is no need to explicitly create a virtual environment.
$ poetry run gunicorn simple_app.web
[2021-09-25 14:26:29 -0700] [2190296] [INFO] Starting gunicorn 20.1.0
...

Similarly, to run tests, use poetry run pytest.

2.8.5 Building

The poetry build command generates a source distribution and a wheel under dist. Alternatively, python -m build does the same.
$ pip install build
...
$ python -m build
...
Successfully built simple_app-0.1.0.tar.gz and simple_app-0.1.0-py3-none-any.whl

The latter is useful when building wheels inside a minimal OS, say a container. In that case, installing Poetry in addition to Python might be too awkward or complicated.

Whether by poetry build or python -m build, the wheel package can be installed without requiring Poetry. This is true whether the wheel is uploaded to a package index and installed from there or installed using pip install <path to wheel>.

Note that the wheel has the loose dependencies defined in pyproject.toml. To install pinned dependencies, use Poetry to install or prefix the wheel installation with pip install -r requirements.txt. This requires exporting the poetry.lock dependencies to requirements.txt, as shown earlier.

2.9 Pipenv

Pipenv is a tool to create virtual environments that match a specification and ways to evolve the specification. It relies on two files: Pipfile and Pipfile.lock.

A popular way to install Pipenv is in a custom virtual environment. Usually, it is best to run it from this virtual environment inactivated. This can be done using a command-line-level alias.
$ python3 -m venv ~/.venvs/pipenv
$ ~/.venvs/pipenv/bin/pip install pipenv
$ alias pipenv=~/.venvs/pipenv/bin/pipenv
If you intend to run Pipenv from an activated virtual environment, the PIPENV_IGNORE_VIRTUALENVS environment variable should be set to 1.
$ export PIPENV_IGNORE_VIRTUALENVS=1
$ . ~/.venvs/pipenv/bin/activate

Pipenv assumes that it controls a project’s directory. To start using it, create a new directory. Inside this directory, it is possible to install packages using pipenv add.

To run code that uses the packages, use pipenv shell. This is similar to activating a virtual environment, but it opens a new shell. Instead of deactivating the environment, exit the shell.
$ mkdir useful
$ cd useful
$ pipenv add termcolor
$ mkdir useful
$ touch useful/__init__.py
$ cat > useful/__main__.py
import termcolor
print(termcolor.colored("Hello", "red"))
$ pipenv shell
(pipenv)$ python -m useful
(pipenv)$ exit
$
This leaves in its wake a Pipfile that looks like the following.
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
termcolor = "*"
[dev-packages]
[requires]
python_version = "3.8"

The Pipenv tool is not used for building packages. It is strictly a tool to improve local development or create virtual environments for deployment. For useful to be shippable distribution, in this case, it would need a setup.cfg and pyproject.toml.

2.10 DevPI

DevPI is a PyPI-compatible server that can be run locally. Though it does not scale to PyPI-like levels, it can be a powerful tool in several situations.

DevPI is made up of three parts. The most important one is devpi-server. For many use cases, this is the only part that needs to run. The server serves, first and foremost, as a caching proxy to PyPI. It takes advantage of the fact that packages on PyPI are immutable: once you have a package, it can never change.

There is also a web server that allows you to search in the local package directory. Since many use cases do not even involve searching on the PyPI website, this is optional. Finally, there is a client command-line tool that allows configuring various parameters on the running instance. The client is most useful in more esoteric use cases.

Installing and running DevPI is straightforward. In a virtual environment, simply run
(devpi)$ pip install devpi-server
(devpi)$ devpi-init
(devpi)$ devpi-server
The pip tool, by default, goes to pypi.org. For some basic testing of DevPI, you can create a new virtual environment, playground, and run
(playground)$ pip install
              -i http://localhost:3141/root/pypi/+simple/
              httpie glom
(playground)$ http --body https://httpbin.org/get | glom '{"url":"url"}'
{
  "url": "https://httpbin.org/get"
}
Having to specify the -i ... argument to pip every time would be annoying. After checking that everything worked correctly, you can put the configuration in an environment variable.
$ export PIP_INDEX_URL=http://localhost:3141/root/pypi/+simple/
To make things more permanent, configure a pip.conf file.
[global]
index-url = http://localhost:3141/root/pypi/+simple/
[search]
index = http://localhost:3141/root/pypi/
It is possible to put pip.conf in the root of a virtual environment to test it. To make it permanent, for example, if working on a computer that often loses connectivity, uses the user-specific pip.conf.
  • Unix: ~/.pip/pip.conf

  • macOS: $HOME/Library/Application Support/pip/pip.conf.

  • Windows: %APPDATA%pippip.ini.

To apply it to all users, edit /etc/pip.conf on Unix. This can be useful, for example, when building container images against a DevPI.

DevPI is useful for disconnected operations. To install packages without a network, DevPI can be used to cache them. As mentioned earlier, virtual environments are disposable and often treated as mostly immutable. This means that a virtual environment with the right packages is not useful without a network. The chances are high that some situations will either require or suggest creating it from scratch.

However, a caching server is a different matter. If all package retrieval is done through a caching proxy, then destroying a virtual environment and rebuilding it is fine since the source of truth is the package cache. This is as useful for taking a laptop into the woods for disconnected development as maintaining proper firewall boundaries and having a consistent record of all installed software.

To warm up the DevPI cache (i.e., make sure it contains all needed packages), you need to use pip to install them. One way to do it is, after configuring DevPI and pip, to run tox against a source repository of software under development. Since tox goes through all test environments, it downloads all needed packages.

It is a good practice to also preinstall any requirements.txt files that are relevant in a disposable virtual environment.

However, the utility of DevPI is not limited to disconnected operations. Configuring one inside your build cluster, and pointing the build cluster at it, completely avoids the risk of a left-pad incident, where a package you rely on gets removed by the author from PyPI. It might also make builds faster and cut out a lot of outgoing traffic.

Another use for DevPI is to test uploads before uploading them to PyPI. Assuming devpi-server is already running on the default port, it is possible to upload a package using twine. Usually, this is a package that is being tested, but as a test, it reuploads the popular boltons package.
(devpi)$ pip install devpi-client twine
(devpi)$ devpi use http://localhost:3141
(devpi)$ devpi user -c testuser password=123
(devpi)$ devpi login testuser --password=123
(devpi)$ devpi index -c dev bases=root/pypi
(devpi)$ devpi use testuser/dev
(devpi)$ pip download boltons==21.0.0
(devpi)$ twine upload --repository-url http://localhost:3141/testuser/dev
               -u testuser -p 123 boltons-21.0.0-py2.py3-none-any.whl
(devpi)$ pip install -i http://localhost:3141/testuser/dev my-package

Note that this allows you to upload to an index that you only use explicitly, so you are not shadowing my-package for all environments that are not using this explicitly.

In an even more advanced use case, you can do the following.
(devpi)$ devpi index root/pypi mirror_url=https://ourdevpi.local

This makes your DevPI server a mirror of a local, upstream DevPI server. This allows you to upload private packages to the central DevPI server to share with your team. In those cases, the upstream DevPI server often needs to be run behind a proxy, and you need to have some tools to properly manage user access.

Running a centralized DevPI behind a simple proxy that asks for a username and password allows an effective private repository.

For that, create the server without a root index: .. code.
$ devpi-init --no-root-pypi
$ devpi login root
...
$ devpi index --create pypi

This means the root index no longer mirrors pypi. You can upload packages now directly to it. This type of server is often used with the --extra-index-url argument to pip, allowing pip to retrieve from the private and external sources. However, sometimes it is useful to have a DevPI instance that only serves specific packages. This allows enforcing rules about auditing before using any packages. It is downloaded, audited, and then added to the private repository whenever a new package is needed.

2.11 pex and shiv

While it is non-trivial to compile a Python program into one self-contained executable, you can do something almost as good. You can compile a Python program into a single file that only needs an installed interpreter to run. This takes advantage of the particular way Python handles start-ups.

When running python /path/to/filename, Python does two things.
  • Adds the /path/to directory to the module path.

  • Executes the code in /path/to/filename.

When running python /path/to/directory/, Python acts as though you typed python /path/to/ directory/__main__.py.

In other words, Python does the following two things.
  • Add the /path/to/directory/ directory to the module path.

  • Executes the code in /path/to/directory/__main__.py.

When running python /path/to/filename.zip, Python treats the file as a directory. In other words, Python does the following two things.
  • Add the /path/to/filename.zip directory to the module path.

  • Executes the code in the __main__.py it extracts from /path/to/filename.zip.

A zip file is an end-oriented format. The metadata, and pointers to the data, are all at the end. Adding a prefix to a zip file does not change its contents.

So if you take a zip file and prefix it with #!/usr/bin/python<newline>, and mark it executable, then when running it, Python is running a zip file. If you put the right bootstrapping code in __main__.py and the right modules in the zip file, you can get all the third-party dependencies in one big file.

pex and shiv are tools for producing such files, but they both rely on the same underlying behavior of Python and zip files.

2.11.1 pex

pex can be used either as a command-line tool or as a library. When using it as a command-line tool, it is good to prevent it from trying to make dependency resolution against PyPI. All dependency resolution algorithms are flawed in some way. However, packages work around flaws explicitly in their algorithm due to pip’s popularity. pex is less popular, and there are no guarantee packages try explicitly to work with it.

The safest thing to do is to use pip wheel to build all wheels in a directory, and then tell pex to use only this directory.

For example,
$ pip instal pex
$ pip wheel --wheel-dir my-wheels -r requirements.txt
$ pex -o my-file.pex --find-links my-wheels --no-index
      -m some_package

pex has a few ways to find the entrypoint. The two most popular ones are -m some_package, which behaves like python -m some_package, and -c console-script, which finds which script would have been installed as console-script and invokes the relevant entrypoint.

It is also possible to use pex as a library. This allows writing Python code, rather than using shell automation, to build a pex file.
from pex import pex_builder
import sys, subprocess
builder = pex_builder.PEXBuilder()
builder.set_entry_point('some_package')
builder.set_shebang(sys.executable)
subprocess.check_call([sys.executable, '-m', 'pip', 'wheel',
                       '--wheel-dir', 'my-wheels',
                       '--requirements', 'requirements.txt'])
for dist in os.listdir('my-wheels'):
    dist = os.path.join('my-wheels', dist)
    builder.add_dist_location(dist)
builder.build('my-file.pex')

This code is largely equivalent to the earlier shell lines. The .set_entry_point() method is the equivalent of the -m argument.

This example sets the shebang line explicitly to sys.executable. By default, pex uses a sophisticated algorithm to get a good shebang line. This example overrode it, choosing to be explicit about using the interpreter.

The shebang line is sometimes specific to the expected deployment environment, so it is good to put some thought into it. One option is /usr/bin/env python, which finds what the current shell calls python.

Note that even though this is Python code, it creates wheels by starting pip as a process. The pip tool is not usable as a library; calling it a process is the only supported interface.

While this is more code than a few shell commands, having it in Python means that as the build process becomes more sophisticated, it is unnecessary to write complicated code in shell.

2.11.2 shiv

shiv is a modern take on the same ideas behind pex. However, since it uses pip directly, it needs to do a lot less itself.
$ pip install shiv
$ shiv -o my-file.shiv -e some_package -r requirements.txt

Because shiv just off-loads to pip actual dependency resolution, it is safe to call it directly. shiv is a younger alternative to pex. A lot of cruft has been removed, but it is still lacking somewhat in maturity.

For example, the documentation for command-line arguments is a bit thin. There is also no way to use it as a library currently.

Note that shiv only supports Python 3.6 and above.

2.12 Summary

Much of the power of Python comes from its powerful third-party ecosystems. Whether for data science or networking code, there are many good options. Understanding how to install, use, and update third-party packages is crucial to using Python well.

With private package repositories, using Python packages for internal libraries, and distributing them in a way compatible with open source libraries, is often a good idea. It allows using the same machinery for internal distribution, versioning, and dependency management.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.51.88