© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
M. ZadkaDevOps in Pythonhttps://doi.org/10.1007/978-1-4842-7996-0_12

12. Containers

Moshe Zadka1  
(1)
Belmont, CA, USA
 

Many modern applications are deployed as containers. Containers are a way of running applications in isolation. This isolation allows for building self-contained images with all dependencies an application needs to run.

There are several ways to run containers. A popular one used by Kubernetes and Docker is containerd.

For a container runner to run an application as a container, the application needs to be inside of an Open Container Initiative (OCI) image.

There are several ways to build images. The most popular ways—buildctl, docker build, and nerdctl build—wrap buildkit.

Internally, buildkit uses a format called Low-Level Builder (LLB), which is not designed for people to write container build specifications. Instead, many LLB front ends compile a specification.

The most common front end for LLB is dockerfile.v0. This front end is sometimes referred to as a Dockerfile because, by default, the front end looks for a file named Dockerfile as the source for the build specification.

The most common way to enable running Python code as a container is to write a Dockerfile containing a build specification. This Dockerfile is then given to buildkit to produce an OCI Image.

Writing a Dockerfile to build a Python application into a container image is different from writing a good Dockerfile. When creating container images, there are a lot of concerns to address.

Container images should be small, have fast build times, build reproducibly, and be easy to update with third-party security patches. These goals are, at least partially, in conflict with one another.

Making good choices or appropriate trade-offs when writing a Dockerfile for a Python application is why a good Dockerfile is subtle to achieve. It requires understanding the consequences of different ways of using the commands in a Dockerfile and how Python applications can be installed.

12.1 Choosing a Base Image

One of the first lines in a dockerfile.v0 build specification starts with FROM. This indicates the base of the image. Images can be made from scratch, but starting from a popular Linux distribution is more common. Almost all distributions have an official container image or images.

12.1.1 GNU C Library Support

Because Python portable binary wheels (manylinux wheels) only support GNU C Library (glibc), it is usually good to stick to a glibc-based distribution. The most popular non-glibc-based distribution is Alpine Linux. It is possible to build Alpine-based container images for Python applications, but it is harder.

12.1.2 Long-Term Support

The packages installed in the container image, either in the base image or using the distributions package manager, inevitably have bugs or security issues. The container image needs to be rebuilt when there is a need to integrate a patch for a bug or security issue.

Such rebuilds can often be time-sensitive if the problem is urgent. Minimizing the changes needed to the container to integrate such fixes allows faster deployment. Less testing is required, and the chances that these changes require another part of the application to change are lower.

This makes rolling distributions, which integrate new upstream versions continuously, a bad fit as bases for Python-based container images. Distributions with conservative policy about adding changes to an existing release, and long-term security and bug fix support for a release, are better. Distributions like Arch and Fedora should be avoided as a base for Python-based container images.

12.1.3 Avoiding Unexpected Changes

Official distribution images hosted on public registries are regularly updated with fixes. This is true whether they are on general public registries such as registy.hub.docker.com/library/debian or ones with a more focused purpose like registry.suse.com/suse/sle15.

Even the images exposed under specific tags, such as registry.suse.com/suse/sle15:15.3, can change. In the case of SUSE, this can reflect 15.3.17.8.25 or 15.3.17.8.24.

It is possible to address images by the specific digest, as shown in the following example.
FROM
registry.hub.docker.com/library/debian@sha256:8a71adf557086b1f0379142a24cbea502d9bc864b890f48819992b009118c481

There is no guarantee that the digest is still available at the upstream registry. The old container image might get the garbage collector when a new image gets uploaded to the same tag.

A good practice is to keep the base image in a locally controlled registry. It should be updated regularly, but tags and versions are now under direct local control; for example, images can be tagged by the pulling date and guaranteed to last 60 days. The images, tags, frequency of pulling, and expiration policies should be clearly documented. It is often useful to centralize this process and support a small number of organization-wide blessed base images.

How centralized the process should be and how many base images to support depends on local needs, compliance requirements, and the size of the organization. This can be a contentious issue. One way to partially mitigate that is to clearly document the process and the trade-offs made and how decisions were made.

12.2 Installing the Python Interpreter

There are a few potential sources for the Python interpreter. Native Python distributions have a few downsides.
  • A Python distribution is optimized for distributing Python packages; it is not optimized to be a development platform.

  • New versions of Python, even patch versions, might not be available until a new version of the distribution is available.

12.2.1 conda

One option is installing conda and then installing Python in a conda environment. This is a good option if you are willing to commit to conda, creating conda environments, and installing things inside them.

This option is especially attractive when needing other features of conda, such as the availability of prebuilt binaries that are not Python-specific. This can be useful, especially in containers for data science and machine learning applications.

Unless the application developers are already using conda as their day-to-day development environment, adding another tool that developers must be familiar with can have significant downsides. This is not the best choice to get a Python interpreter in those cases.

12.2.2 Third-Party Repositories

Some third-party repositories, like deadsnakes for Ubuntu, build versions of Python designed to be used for development. Adding these repositories as an extra upstream, and installing Python through them, is one way to get Python into a container image.

Some due diligence should be done to vet the build process and options to make sure the Python builds are appropriate for the application. It is also important to understand how soon a new version of Python is available through the repository after it is released.

12.2.3 Building Python in the Container

The pyenv tool downloads and installs a version of Python. It can be used on any Unix-like operating system, including containers.

When pyenv builds and deploys Python, it also deploys shims. These are designed to switch between versions of Python.

Since a container image usually only has one version of Python, these shims are superfluous. It is possible to build Python directly with python-build, a pyenv subproject.

Finally, it is possible to download the source code from Python.org, unpack it, and run it.

Regardless of which one is used, this is time-consuming and requires quite a few build requirements. Because of this, this is usually done as an internal base image build, which is then used by an application-specific image build. Using a multistage build, where the Python interpreter’s directory is copied to the second, can avoid the build dependencies.

The directory also contains some things which are not needed in a runtime image. For example, the tests and static libraries are not useful.

It is important to verify that some built-in modules are correctly built, or some packages fail in strange ways. Some of the usual culprits are ssl, sqlite3, and lzma.

12.2.4 Python Base Image

If Debian as a base is a reasonable choice, the python images on Docker Hub are a useful alternative. Note that images intended for runtime should be started from a –slim variant.

12.3 Installing Python Applications

A container image for a Python application has to have an application installed. It is almost always the base that a dedicated virtual environment, or a conda environment, is a good idea.

Although a container image only runs one application, having the application in a virtual environment costs little and simplifies a few potential next steps. The most important is that virtual environments can be copied from one stage to the next.

This allows the application to be installed in a stage that, for example, includes installing any relevant build tools for dependencies or other steps. The runtime stage does not need these tools, only the installed version in the virtual environment.

After creating the virtual environment and installing any non-Python dependencies, the next step is to install the Python dependencies. Dependencies should be installed from a requirements.txt file.

When using Poetry or Pipenv, it makes sense to export the requirement.txt file from those systems. Otherwise, pip-compile can be used for that.

In either case, requirements.txt should be checked in and not generated during the build process. This makes the container image build reproducible; rebuilding the image at a later date installs the same dependencies.

After installing the dependencies, the application itself needs to be installed. The best way to do it is in two steps.
  • python -m build to generate a wheel

  • Install the wheel using pip install --no-dependencies

Separating the build step from the install step means that the build does not need to be installed in the virtual environment for the runtime. Installing without dependencies means that no superfluous dependencies are installed.

After installing, use pip check in the environment to check that there are no missing dependencies and that all versions are compatible.

Since the results of pip check depend only on the wheel and the requirements.txt file, this depends only on the source code.

A container image build step can be added to the continuous integration system workflow that is triggered on suggested code patches (pull requests or merge requests). If such a step is added, it can be made into a “gating” step: code changes cannot be merged unless it succeeds. Installing versions of the dependencies that are described in the source code, and having the container build verified in continuous integration, allows for confidence when building the image on the main branch. Since container images built from the main branch are often the ones used in production, this is an important goal.

Putting it all together, a Dockerfile for an application that uses Pyramid might look like the following.
FROM python:bullseye as venv-builder
RUN pip install build
RUN mkdir /src/
WORKDIR /src
RUN python -m venv /opt/pyr-venv/
COPY requirements.txt /src/
RUN /opt/pyr-venv/bin/pip install -r /src/requirements.txt
COPY setup.cfg pyproject.toml /src/
# copy source code
RUN python -m build
RUN /opt/pyr-venv/bin/pip install --no-dependencies dist/*.whl
RUN /opt/pyr-venv/bin/pip check
FROM python:slim-bullseye as runtime
COPY --from=venv-builder /opt/pyr-venv /opt/pyr-venv
The setup.cfg file only declares loose dependencies.
[metadata]
name = pyrapp
version = 0.0.1
[options]
install_requires =
    pyramid
    gunicorn
The requirements.txt file has complete, pinned dependencies.
$ wc -l requirements.txt
33 requirements.txt
$ egrep 'pyramid|gunicorn' requirements.txt |grep -v '#'
gunicorn==20.1.0
pyramid==2.0
If a requirement is added to the setup.cfg, the build fails.
$ tail -4 setup.cfg
install_requires =
    pyramid
    gunicorn
    attrs
$ docker build .
...
Step 11/13 : RUN /opt/pyr-venv/bin/pip check
---> Running in 6a186bd1f533
pyrapp 0.0.1 requires attrs, which is not installed.
The command '/bin/sh -c /opt/pyr-venv/bin/pip check' ...

12.4 Optimizing Container Build Cache

The buildkit container image build has sophisticated caching capabilities. Understanding how to use those can improve build speed by a significant amount.

The first thing to consider is the base image. Neither the distribution packages nor the Python interpreter tends to change often. Even the most diligent of compliance policies do not require updating with security patches that have been released for less than a day.

In other words, creating a daily image with the latest distribution packages and the desired Python version can be done daily. The base can be tagged with the date.

For example, it can follow a naming convention like internal-registry.example.com/base-python-image:3. 9-2022-03-10. An ONBUILD instruction can be used to enforce a compliance policy, such as refusing to build if the date is more than 60 days from the build date.

The advantage of enacting such a policy is that the first line in a Dockerfile can start with a specific FROM image.
FROM internal-registry.example.com/base-python-image:3.9-2022-03-10

Since this image never changes, the cache is not invalidated by a new image uploaded to this tag. When images change, cache behavior can be dependent on local configuration. It either invalidates the entire cache, or it might not notice it.

In the latter cache, the advice is sometimes to build without caching. This makes container image builds slower and more frustrating.

The first line is all-important because all other steps must be re-executed if it invalidates the cache. It is worth it to carefully tune it to improve container image build times.

After this line, the next step is to be careful about COPY lines and do as much work as possible before another COPY line.

Look at the following example.
COPY requirements.txt setup.cfg ... /app/sources/
RUN python -m venv /app/runtime/
RUN /app/runtime/bin/pip install -r /app/sources/requirements.txt
RUN pip install build
RUN cd /app/sources/ && python -m build
RUN /app/runtime/bin/pip install --no-dependencies /app/sources/dist/*.whl

Instead, it is better to move the pip install build line to the base build, removing the need to carefully manage the dependencies of build while still generating reproducible builds.

The remaining lines are better off broken and reordered.
RUN python -m venv /app/runtime/
Creating the virtual environment is a reasonably fast operation. However, there is no need to delay it. It does not depend on files and can be cached forever or if the FROM line is the same.
COPY requirements.txt /app/sources/
RUN /app/runtime/bin/pip install -r /app/sources/requirements.txt
Copying only the requirements.txt allows the cache to avoid rerunning the pip install if this file has not changed. Re-pinning the dependency list is not a frequent process; even the most diligent programmer rarely updates more than once per day.
COPY setup.cfg ... /app/sources/
RUN cd /app/sources/ && python -m build
RUN /app/runtime/bin/pip install --no-dependencies /app/sources/dist/*.whl

This part depends on every single source file, the most volatile part of the code. It is also pretty fast in general. The most time-consuming step is python -m build since it might require installing dependencies in a virtual environment.

If this might be a problem, there is a solution. The following runs a mock build, which might fail, with only pyproject.toml and setup.cfg copied.
COPY setup.cfg pyproject.toml /app/sources/
RUN cd /app/sources/ && (python -m build --no-isolation || true)
RUN rm -rf /app/soures/dist
COPY setup.cfg ... /app/sources/
RUN cd /app/sources/ && python -m build --no-isolation
RUN /app/runtime/bin/pip install --no-dependencies /app/sources/dist/*.whl

Adding --no-isolation installs the build dependencies in the environment where build is installed. The first run primes installing the build dependencies.

The pyproject.toml and setup.cfg files tend to change less than the Python source code itself. Because priming is invalidated only if one of those files changes, it allows caching the installation of the build dependencies.

The real python -m build run does not need to reinstall these dependencies. This means that changing only Python code and running another container image build does not require reinstalling the build dependencies.

Carefully considering which files change and how often and copying files at the right point can make a big difference in build times.

When using continuous integration to build images, it is a good idea to export and import the cache from a persistent store. Often CI (Continuous Integration) workers are short-lived. The correct persistent store to use depends on factors like availability and the CI system itself.

After optimizing the Dockerfile to take advantage of the cache, it is worthwhile to spend time considering how to use this cache correctly in CI. The buildkit documentation covers which caches are available and how to use them, including registry, inline, local, and a specialized one for GitHub actions: gha.

12.5 Rebuilding Containers

Building containers reproducibly and in a cache-friendly way means that the dependencies are not upgraded when running a new build. This is good for predictability, but dependencies should be eventually upgraded.

Since dependencies eventually need to be upgraded, it is easier to continuously upgrade them. Otherwise, the latest version might be far ahead of the current version when needing to upgrade.

Most open source projects try to deprecate things slowly and mindfully. Upgrading to a newer version might require changes, but rarely big changes. Upgrading to a version that is a year ahead might be much harder.

Some upgrades need to be done quickly, such as integrating a patch for a security fix. Upgrading continuously builds up the experience to make those upgrades much easier.

This means that regularly rebuilding on top of the newest dependencies is the best practice. A good frequency is between one to two weeks.

Do the following when upgrading.
  1. 1.

    Regenerate the requirements.txt, pinning to the latest versions.

     
  2. 2.

    Update the FROM header to use the latest date (or one day before).

     

Because both the Dockerfile and the requirements.txt file are under source control, this can be handled no differently than any other merge request. It can be generated automatically or semi-automatically.

Regardless of how it is generated, it needs to go through the usual development cycle. This usually means a green continuous integration result on the branch. It might also mean a manual or automated run against a staging environment. If either fails, and the branch cannot be merged, it should be the responsibility of someone in the team to triage and fix it.

If the immediate fix is too big, usually, this is the cause of one or two new dependencies. Those dependencies can be pinned in the source (for example, setup.cfg when using setuptools, or Pipenevif using Pipenv).

This should be treated as a temporary solution. A good practice is to annotate those pins linked to a ticket or issue in the ticket management system and prioritize those.

There are some tools to automate the creation of these change requests. Using them, at least, alleviates needing to remember to generate the requests. The other tasks, reviewing the CI, checking what other tests need to happen before the merge, and triaging problems, are usually harder to automate.

12.6 Container Security

There are three important things to keep in mind to improve the security of a container image.
  • Avoid unneeded files

  • Keep dependencies up to date

  • Follow the principle of least privilege

A good technique to reduce unneeded files is to use a multistage build and copy over the virtual environment, as shown earlier. Keeping dependencies up to date is mostly an exercise updating regularly, as shown earlier.

The last part to cover is the principle of least privilege. To do that at all, the processes in the container should not be run as root. This can be done at container image build time by ensuring there is a less-privileged user, say runtime, and including a line to set the default user executing the processes.
USER runtime

Even without any other work, this already eliminates, or makes significantly harder, a few potential attacks. For example, most container-escape attacks depend on root privileges. Forcing an attacker to find a root escalation hole makes attacks harder, takes more time, and potentially triggers more auditing events, allowing a security team to respond.

Security can further be improved by giving the runtime user few file system access permissions. If it is possible to make no part of the file system writable by this user, this is best. If it is not possible to eliminate file system writes completely, the virtual environment running the process should not be writable by this user.

One way to test that is to pip install in a test that uses the resulting container image. If the pip install succeeds, the test should fail.

12.7 Summary

Python is a popular language for different kinds of back-end applications, such as machine learning and web applications. These applications are often built into container images and then deployed into various container runtimes.

Building a container image that hosts a Python application should balance the local considerations. Typical considerations include development speed, being able to ship hotfixes, and being able to update when new security fixes become available.

Rather than being reactive, gather the requirements in advance and build a container image build process that satisfies them. When different criteria need to be traded off against each other, it is easier to do such trade-offs in advance rather than as a response to an emergency.

When done well, container images can be a part of the developer experience throughout the software development life cycle. This eliminates a source of friction: differences between the developer environments and the production environment. Building good container images is not a complete solution for this, but it is a necessary part.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.239.178